JP6797344B1

JP6797344B1 - Learning device, utilization device, program, learning method and utilization method

Info

Publication number: JP6797344B1
Application number: JP2020552066A
Authority: JP
Inventors: 康平栗原; 大祐鈴木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-12-09
Anticipated expiration: 2040-07-10
Also published as: WO2022009419A1; JPWO2022009419A1

Abstract

被写体から放射される赤外線を利用することで、その被写体の温度分布を画像化した熱画像と、その被写体から反射される可視光を利用することで、その被写体を画像化した可視画像とを含む学習用データを取得する学習側データ取得部（１１２）と、学習用データを用いて熱画像から可視画像への推論を学習することで、熱画像から可視画像を推論するための学習済モデルを生成するモデル生成部（１１３）と、を備えることを特徴とする。Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using visible light reflected from the subject. A trained model for inferring a visible image from a thermal image by learning inference from a thermal image to a visible image using the learning side data acquisition unit (112) that acquires training data and the training data. It is characterized by including a model generation unit (113) to be generated.

Description

本開示は、学習装置、活用装置、プログラム、学習方法及び活用方法に関する。 The present disclosure relates to learning devices, utilization devices, programs, learning methods and utilization methods.

一般的な熱型赤外線固体撮像素子（以下、熱画像センサという）は、被写体が放射する入射赤外線を映像化し、赤外線を吸収することにより生じる温度上昇の差が画像の濃淡となる。被写体が放射する赤外線はレンズにより集光され、撮像素子上に結像する。 A general thermal infrared solid-state image sensor (hereinafter referred to as a thermal image sensor) visualizes incident infrared rays emitted by a subject, and the difference in temperature rise caused by absorbing the infrared rays is the shade of the image. The infrared rays emitted by the subject are focused by the lens and imaged on the image sensor.

熱情報を取得可能な熱画像センサは、可視カメラでは取得できない情報を取得可能な一方で、例えば、安価な小型センサであると、画像の解像度、コントラスト、輪郭の鮮明度、又は、ＳＮ比が小さくなる。また、大型センサで形成された熱画像センサは、コストが高い。 While a thermal image sensor capable of acquiring thermal information can acquire information that cannot be acquired by a visible camera, for example, an inexpensive small sensor has a high image resolution, contrast, contour sharpness, or SN ratio. It becomes smaller. Further, the thermal image sensor formed by the large sensor is expensive.

一方、宅内モニタリング、スマートビルディング又は防犯等の分野では、人間の行動又は姿勢を識別し、異常行動を検出するサービスが存在する。人間の姿勢は、立つ（立位）、座る（座位）、横たわる（臥位）等がある。熱画像センサは、プライバシー保護の観点から可視カメラと比較して導入の障壁が低く有利である。 On the other hand, in the fields of home monitoring, smart building, crime prevention, etc., there are services that identify human behavior or posture and detect abnormal behavior. Human postures include standing (standing position), sitting (sitting position), and lying down (lying position). Thermal image sensors are advantageous because they have lower barriers to introduction than visible cameras from the viewpoint of privacy protection.

ここで、可視画像又は距離画像と、被写体の姿勢情報（正解）とを入力として学習済モデルを生成し、生成された学習済モデルを用いて可視画像又は距離画像から姿勢情報を推定する技術がある（例えば、特許文献１参照）。 Here, a technique of generating a trained model by inputting a visible image or a distance image and a posture information (correct answer) of a subject and estimating a posture information from the visible image or a distance image using the generated trained model. (See, for example, Patent Document 1).

特開２０１７−９７５７７号公報（第５頁）Japanese Unexamined Patent Publication No. 2017-97577 (page 5)

特許文献１には、可視画像又は距離画像から姿勢を推定する姿勢推定装置が記載されている。この姿勢推定装置では、熱画像は、解像度又はＳＮ比等の画質が可視画像又は距離画像と比べて小さく、姿勢推定が容易ではないという課題がある。 Patent Document 1 describes a posture estimation device that estimates a posture from a visible image or a distance image. In this posture estimation device, there is a problem that the image quality such as the resolution or the SN ratio of the thermal image is smaller than that of the visible image or the distance image, and the posture estimation is not easy.

そこで、本開示の一又は複数の態様は、熱画像中の被写体の姿勢を高精度に推定できるようにすることを目的とする。 Therefore, one or a plurality of aspects of the present disclosure is intended to enable highly accurate estimation of the posture of a subject in a thermal image.

本開示の一態様に係る学習装置は、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データを取得するデータ取得部と、前記熱画像から前記被写体の輪郭を示す輪郭画像を抽出する輪郭抽出部と、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを生成するモデル生成部と、を備え、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 The learning device according to one aspect of the present disclosure uses infrared rays emitted from a subject to image a thermal image of the temperature distribution of the subject, and visible light reflected from the subject. A data acquisition unit that acquires learning data including a visible image of the subject, a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and the thermal image and the contour image. from the combination by learning the reasoning to the visible image, and a model generating unit for generating a learned model for inferring the visible image from the combination of the thermal image and the outline image, the learned model Is formed by a layer of a decoder portion and a layer of an encoder portion, and the decoder portion has two paths in parallel, the two paths being a path for decoding the thermal image and the contour. It is characterized in that it is a path for decoding an image .

本開示の一態様に係る活用装置は、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データ、及び、前記熱画像から抽出された、前記被写体の輪郭を示す輪郭画像を示す輪郭画像データを用いて、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで生成された、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを記憶する記憶部と、対象となる被写体である対象被写体の熱画像である対象熱画像を示す対象熱画像データを取得するデータ取得部と、前記対象熱画像から前記対象被写体の輪郭を示す輪郭画像である対象輪郭画像を抽出する輪郭抽出部と、前記学習済モデルを用いて、前記対象熱画像及び前記対象輪郭画像の組み合わせから、前記対象被写体の可視画像である対象可視画像を推論する推論部と、前記対象可視画像から、前記対象被写体の姿勢を推定する姿勢推定部と、を備え、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 The utilization device according to one aspect of the present disclosure utilizes infrared rays emitted from a subject to image a thermal image of the temperature distribution of the subject and visible light reflected from the subject. Using the training data including the visible image obtained by imaging the subject and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the thermal image and the contour image are used. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image, which is generated by learning the inference from the combination to the visible image, and a target subject. A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of a target subject, and a contour extraction unit that extracts a target contour image that is a contour image indicating the contour of the target subject from the target thermal image. And the inference unit that infers the target visible image that is the visible image of the target subject from the combination of the target thermal image and the target contour image using the trained model, and the target subject from the target visible image. The trained model is formed by a layer of a decoder portion and a layer of an encoder portion, and the decoder portion has two parallel paths. One path is a path for decoding the thermal image and a path for decoding the contour image .

本開示の一態様に係るプログラムは、コンピュータを、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データを取得するデータ取得部、前記熱画像から前記被写体の輪郭を示す輪郭画像を抽出する輪郭抽出部、及び、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを生成するモデル生成部、として機能させ、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to use a thermal image that images the temperature distribution of the subject and visible light reflected from the subject. A data acquisition unit that acquires learning data including a visible image of the subject, a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and the thermal image and the contour. By learning the inference from the combination of images to the visible image, it functions as a model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image, and the learning. The completed model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel, and the two paths are a path for decoding the thermal image and a path for decoding the thermal image. The path is for decoding the contour image .

本開示の一態様に係るプログラムは、コンピュータを、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データ、及び、前記熱画像から抽出された、前記被写体の輪郭を示す輪郭画像を示す輪郭画像データを用いて、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで生成された、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを記憶する記憶部、対象となる被写体である対象被写体の熱画像である対象熱画像を示す対象熱画像データを取得するデータ取得部、前記対象熱画像から前記対象被写体の輪郭を示す輪郭画像である対象輪郭画像を抽出する輪郭抽出部、前記学習済モデルを用いて、前記対象熱画像及び前記対象輪郭画像の組み合わせから、前記対象被写体の可視画像である対象可視画像を推論する推論部、及び、前記対象可視画像から、前記対象被写体の姿勢を推定する姿勢推定部、として機能させ、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to use a thermal image that images the temperature distribution of the subject and visible light reflected from the subject. Then, using the learning data including the visible image obtained by imaging the subject and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the thermal image and the contour are used. A storage unit that stores a learned model for inferring the visible image from the combination of the thermal image and the contour image, which is generated by learning the inference from the combination of images to the visible image, and a target subject. A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject, and a contour extraction unit that extracts a target contour image that is a contour image indicating the contour of the target subject from the target thermal image. , The inference unit that infers the target visible image that is the visible image of the target subject from the combination of the target thermal image and the target contour image using the trained model, and the target subject from the target visible image. The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel. One path is a path for decoding the thermal image and a path for decoding the contour image .

本開示の一態様に係る学習方法は、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データを取得し、前記熱画像から前記被写体の輪郭を示す輪郭画像を抽出し、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを生成する学習方法であって、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 The learning method according to one aspect of the present disclosure is to use an infrared image emitted from a subject, a thermal image that images the temperature distribution of the subject, and visible light reflected from the subject. Learning data including a visible image of the subject is acquired, a contour image showing the contour of the subject is extracted from the thermal image, and inference from the combination of the thermal image and the contour image to the visible image. Is a learning method for generating a trained model for inferring the visible image from a combination of the thermal image and the contour image by learning the above, and the trained model is a layer of a decoder portion and an encoder portion. The decoder portion is formed by the layers of the above, and the decoder portion includes two paths in parallel, and the two paths are a path for decoding the thermal image and a path for decoding the contour image. It is characterized by.

本開示の位置態様に係る活用方法は、対象となる被写体である対象被写体の熱画像である対象熱画像を示す対象熱画像データを取得し、前記対象熱画像から前記対象被写体の輪郭を示す輪郭画像である対象輪郭画像を抽出し、被写体から放射される赤外線を利用することで、前記被写体の温度分布を画像化した熱画像と、前記被写体から反射される可視光を利用することで、前記被写体を画像化した可視画像とを含む学習用データ、及び、前記熱画像から抽出された、前記被写体の輪郭を示す輪郭画像を示す輪郭画像データを用いて、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像への推論を学習することで生成された、前記熱画像及び前記輪郭画像の組み合わせから前記可視画像を推論するための学習済モデルを用いて、前記対象熱画像及び前記対象輪郭画像の組み合わせから、前記対象被写体の可視画像である対象可視画像を推論し、前記対象可視画像から、前記対象被写体の姿勢を推定する活用方法であって、前記学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとで形成され、前記デコーダー部分が並列の二つのパスを備えており、前記二つのパスは、前記熱画像をデコードするためのパスと、前記輪郭画像をデコードするためのパスであることを特徴とする。 In the utilization method according to the position aspect of the present disclosure, the target thermal image data showing the target thermal image which is the thermal image of the target subject which is the target subject is acquired, and the contour showing the outline of the target subject is obtained from the target thermal image. By extracting the target contour image which is an image and using the infrared rays emitted from the subject, the thermal image which imaged the temperature distribution of the subject and the visible light reflected from the subject are used. A combination of the thermal image and the contour image using learning data including a visible image obtained by imaging the subject and contour image data indicating a contour image showing the contour of the subject extracted from the thermal image. Using the trained model for inferring the visible image from the combination of the thermal image and the contour image generated by learning the inference to the visible image from the target thermal image and the target contour image. This is a utilization method in which a target visible image, which is a visible image of the target subject, is inferred from the combination of the above and the posture of the target subject is estimated from the target visible image. The trained model is a layer of a decoder portion. , The decoder portion is formed by a layer of an encoder portion, and the decoder portion has two paths in parallel, and the two paths are a path for decoding the thermal image and a path for decoding the contour image. It is characterized by being.

本開示の一又は複数の態様によれば、熱画像中の被写体の姿勢を高精度に推定することができる。 According to one or more aspects of the present disclosure, the posture of a subject in a thermal image can be estimated with high accuracy.

実施の形態１〜２に係る姿勢推定システムの構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the posture estimation system which concerns on Embodiments 1 and 2. 実施の形態１及び２における学習装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the learning apparatus in Embodiments 1 and 2. 三層のニューラルネットワークの一例を示す概略図である。It is a schematic diagram which shows an example of a three-layer neural network. 実施の形態１における、熱画像を可視画像へ変換する画像変換処理の学習済モデルの構造の一例を示す概略図である。It is the schematic which shows an example of the structure of the trained model of the image conversion process which converts a thermal image into a visible image in Embodiment 1. FIG. コンピュータの構成を概略的に示すブロック図である。It is a block diagram which shows the structure of a computer schematicly. 学習装置が学習する処理を示すフローチャートである。It is a flowchart which shows the process which a learning apparatus learns. 実施の形態１における姿勢推定装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the posture estimation apparatus in Embodiment 1. FIG. 姿勢推定装置が、熱画像に対応する可視画像を推論し、その可視画像から姿勢を推定する処理を示すフローチャートである。It is a flowchart which shows the process which the attitude estimation apparatus infers the visible image corresponding to a thermal image, and estimates the attitude from the visible image. 実施の形態２における姿勢推定装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the posture estimation apparatus in Embodiment 2. FIG. 実施の形態３における学習装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the learning apparatus in Embodiment 3. FIG. 実施の形態３における、熱画像及び輪郭画像を可視画像へ変換する画像変換処理の学習済モデルの構造の一例を示す概略図である。It is a schematic diagram which shows an example of the structure of the trained model of the image conversion process which converts a thermal image and a contour image into a visible image in Embodiment 3. 実施の形態３における姿勢推定装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the posture estimation apparatus in Embodiment 3. FIG.

実施の形態１．
図１は、実施の形態１に係る姿勢推定システム１００の構成を概略的に示すブロック図である。
姿勢推定システム１００は、モデル生成装置として機能する学習装置１１０と、活用装置として機能する姿勢推定装置１３０とを備える。なお、姿勢推定装置１３０で行われる処理方法が活用方法となる。
姿勢推定システム１００では、学習装置１１０で学習された学習済モデルを用いて、姿勢推定装置１３０が、姿勢の推定を行う。Embodiment 1.
FIG. 1 is a block diagram schematically showing the configuration of the posture estimation system 100 according to the first embodiment.
The posture estimation system 100 includes a learning device 110 that functions as a model generation device, and a posture estimation device 130 that functions as a utilization device. The processing method performed by the posture estimation device 130 is used as a utilization method.
In the posture estimation system 100, the posture estimation device 130 estimates the posture using the trained model learned by the learning device 110.

図２は、学習装置１１０の構成を概略的に示すブロック図である。
学習装置１１０は、学習側入力部１１１と、学習側データ取得部１１２と、モデル生成部１１３と、学習側学習済モデル記憶部１１４と、学習側通信部１１５とを備える。FIG. 2 is a block diagram schematically showing the configuration of the learning device 110.
The learning device 110 includes a learning side input unit 111, a learning side data acquisition unit 112, a model generation unit 113, a learning side learned model storage unit 114, and a learning side communication unit 115.

学習側入力部１１１は、学習用データの入力を受け付ける入力部である。入力された学習用データは、学習側データ取得部１１２に与えられる。
ここで、学習用データは、熱画像と、熱画像から推論されるべき正解として可視画像との組み合わせ示す教師データである。The learning side input unit 111 is an input unit that receives input of learning data. The input learning data is given to the learning side data acquisition unit 112.
Here, the learning data is teacher data showing a combination of a thermal image and a visible image as a correct answer to be inferred from the thermal image.

熱画像は、被写体から放射される赤外線を利用することで、被写体の温度分布を画像化することで取得される。
また、可視画像は、被写体から反射される可視光を利用することで、被写体を画像化することで取得される。可視画像では、被写体の外観が画像化される。The thermal image is acquired by imaging the temperature distribution of the subject by using infrared rays radiated from the subject.
Further, the visible image is acquired by imaging the subject by using the visible light reflected from the subject. In the visible image, the appearance of the subject is imaged.

学習側データ取得部１１２は、学習側入力部１１１を介して、学習用データを取得するデータ取得部である。取得された学習用データは、モデル生成部１１３に与えられる。 The learning side data acquisition unit 112 is a data acquisition unit that acquires learning data via the learning side input unit 111. The acquired learning data is given to the model generation unit 113.

モデル生成部１１３は、学習側データ取得部１１２から与えられる学習用データに基づいて、熱画像に対応する可視画像を学習する。言い換えると、モデル生成部１１３は、学習用データで示される熱画像及び可視画像の組み合わせを学習することで、熱画像に対応する最適な可視画像を推論するための学習済モデルを生成する。具体的には、モデル生成部１１３は、学習用データを用いて熱画像から可視画像への推論を学習することで、熱画像から可視画像を推論するための学習済モデルを生成する。
そして、モデル生成部１１３は、生成された学習済モデルを学習側学習済モデルとして学習側学習済モデル記憶部１１４に記憶させる。The model generation unit 113 learns the visible image corresponding to the thermal image based on the learning data given from the learning side data acquisition unit 112. In other words, the model generation unit 113 generates a trained model for inferring the optimum visible image corresponding to the thermal image by learning the combination of the thermal image and the visible image shown in the training data. Specifically, the model generation unit 113 generates a trained model for inferring a visible image from a thermal image by learning inference from a thermal image to a visible image using training data.
Then, the model generation unit 113 stores the generated learned model as the learning side learned model in the learning side learned model storage unit 114.

モデル生成部１１３が用いる学習アルゴリズムは、教師あり学習、教師なし学習、強化学習等の公知のアルゴリズムを用いることができる。一例として、ここでは、ニューラルネットワークを適用した場合について説明する。 As the learning algorithm used by the model generation unit 113, known algorithms such as supervised learning, unsupervised learning, and reinforcement learning can be used. As an example, here, a case where a neural network is applied will be described.

ここで教師あり学習の場合、学習用データで示される熱画像と可視画像とは、同一被写体を収めたペアのデータである必要がある。教師なし学習の場合、熱画像と、可視画像とは、同一被写体を収めている必要はない。 Here, in the case of supervised learning, the thermal image and the visible image shown in the learning data need to be paired data containing the same subject. In the case of unsupervised learning, the thermal image and the visible image do not have to contain the same subject.

モデル生成部１１３は、例えば、ニューラルネットワークモデルに従って、いわゆる教師あり学習により、熱画像に対応する可視画像を学習する。
ここで、教師あり学習とは、入力と結果（ラベル）のデータの組を学習用データとして学習装置に与えることで、それらの学習用データにある特徴を学習し、入力から結果を推論する手法をいう。The model generation unit 113 learns a visible image corresponding to a thermal image by, for example, supervised learning according to a neural network model.
Here, supervised learning is a method of learning a feature in the learning data by giving a set of input and result (label) data to the learning device as learning data, and inferring the result from the input. To say.

ニューラルネットワークは、複数のニューロンからなる入力層、複数のニューロンからなる中間層（隠れ層）、及び、複数のニューロンからなる出力層で構成される。中間層は、一層又は二層以上でもよい。 A neural network is composed of an input layer composed of a plurality of neurons, an intermediate layer (hidden layer) composed of a plurality of neurons, and an output layer composed of a plurality of neurons. The intermediate layer may be one layer or two or more layers.

図３は、三層のニューラルネットワークの一例を示す概略図である。
図３に示されているように、三層のニューラルネットワークであれば、複数の入力値が入力層Ｘ１〜Ｘ３に入力されると、その入力値に第一の重みｗ１１〜ｗ１６（以下、第一の重みＷ１ともいう）が掛けられる。入力値に第一の重みｗ１１〜ｗ１６が掛けられた値である算出値は、中間層Ｙ１、Ｙ２に入力される。算出値には、第二の重みｗ２１〜ｗ２６（以下、第二の重みＷ２ともいう）が掛けられ、算出値に第二の重みｗ２１〜ｗ２６が掛けられた値である出力値が、出力層Ｚ１〜Ｚ３から出力される。この出力値は、第一の重みＷ１の値と、第二の重みＷ２の値とによって変わる。FIG. 3 is a schematic view showing an example of a three-layer neural network.
As shown in FIG. 3, in the case of a three-layer neural network, when a plurality of input values are input to the input layers X1 to X3, the first weights w11 to w16 (hereinafter referred to as the third) are added to the input values. (Also called one weight W1) is multiplied. The calculated value, which is the value obtained by multiplying the input value by the first weights w11 to w16, is input to the intermediate layers Y1 and Y2. The calculated value is multiplied by the second weights w21 to w26 (hereinafter, also referred to as the second weight W2), and the output value obtained by multiplying the calculated value by the second weights w21 to w26 is the output layer. It is output from Z1 to Z3. This output value changes depending on the value of the first weight W1 and the value of the second weight W2.

本実施の形態において、ニューラルネットワークは、学習側データ取得部１１２によって取得される学習用データで示される熱画像及び可視画像の組み合せに基づいて作成される学習用データに従って、いわゆる教師あり学習により、熱画像に対応する最適な可視画像を推論するための学習済モデルを学習する。 In the present embodiment, the neural network is subjected to so-called supervised learning according to the learning data created based on the combination of the thermal image and the visible image represented by the learning data acquired by the learning side data acquisition unit 112. Learn a trained model to infer the optimal visible image for the thermal image.

すなわち、ニューラルネットワークは、入力層に熱画像を入力して出力層から出力された結果が、正解としての可視画像に近づくように第一の重みＷ１及び第二の重みＷ２を調整することで、学習済モデルを学習する。 That is, the neural network inputs the thermal image to the input layer and adjusts the first weight W1 and the second weight W2 so that the result output from the output layer approaches the visible image as the correct answer. Learn the trained model.

図４は、熱画像を可視画像へ変換する画像変換処理の学習済モデルの構造の一例を示す概略図である。
図４に示されている学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとが対称構造となっており、スキップコネクションで接続されたＵ−Ｎｅｔ構造を有している。FIG. 4 is a schematic view showing an example of the structure of a trained model of an image conversion process for converting a thermal image into a visible image.
In the trained model shown in FIG. 4, the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure, and have a U-Net structure connected by a skip connection.

図２に戻り、学習側学習済モデル記憶部１１４は、モデル生成部１１３から与えられた学習済モデルである学習側学習済モデルを記憶する。 Returning to FIG. 2, the learning-side learned model storage unit 114 stores the learning-side learned model, which is a learned model given by the model generation unit 113.

学習側通信部１１５は、学習側学習済モデル記憶部１１４に記憶されている学習側学習済モデルを姿勢推定装置１３０に送る。 The learning side communication unit 115 sends the learning side learned model stored in the learning side learned model storage unit 114 to the posture estimation device 130.

以上に記載された学習装置１１０は、図４に示されているようなコンピュータ１６０で実現することができる。
図５は、コンピュータ１６０の構成を概略的に示すブロック図である。
コンピュータ１６０は、通信装置１６１と、補助記憶装置１６２と、メモリ１６３と、プロセッサ１６４とを備える。The learning device 110 described above can be realized by a computer 160 as shown in FIG.
FIG. 5 is a block diagram schematically showing the configuration of the computer 160.
The computer 160 includes a communication device 161, an auxiliary storage device 162, a memory 163, and a processor 164.

通信装置１６１は、例えば、ネットワークを介してデータを通信する。
補助記憶装置１６２は、コンピュータ１６０での処理に必要なデータ及びプログラムを記憶する。
メモリ１６３は、プログラム及びデータを一時的に記憶し、プロセッサ１６４の作業領域を提供する。
プロセッサ１６４は、補助記憶装置１６２に記憶されているプログラムをメモリ１６３に読み出し、そのプログラムを実行することで、コンピュータ１６０での処理を実行する。The communication device 161 communicates data via a network, for example.
The auxiliary storage device 162 stores data and programs required for processing on the computer 160.
Memory 163 temporarily stores programs and data and provides a work area for processor 164.
The processor 164 reads the program stored in the auxiliary storage device 162 into the memory 163, and executes the program to execute the processing on the computer 160.

以上に記載された、学習側入力部１１１及び学習側通信部１１５は、通信装置１６１により実現することができる。
学習側学習済モデル記憶部１１４は、補助記憶装置１６２により実現することができる。The learning side input unit 111 and the learning side communication unit 115 described above can be realized by the communication device 161.
The learned model storage unit 114 on the learning side can be realized by the auxiliary storage device 162.

学習側データ取得部１１２及びモデル生成部１１３は、プロセッサ１６４が、メモリ１６３に読み出されたプログラムを実行することで実現することができる。このようなプログラムは、ネットワークを通じて提供されてもよく、また、記録媒体に記録されて提供されてもよい。即ち、このようなプログラムは、例えば、プログラムプロダクトとして提供されてもよい。 The learning side data acquisition unit 112 and the model generation unit 113 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided as, for example, a program product.

図６は、学習装置１１０が学習する処理を示すフローチャートである。
まず、学習側データ取得部１１２は、学習側入力部１１１を介して、学習用データを取得する（Ｓ１０）。ここでは、学習用データとして用いられる、熱画像の画像データである熱画像データ及び可視画像の画像データである可視画像データが同時に取得されるものとしているが、実施の形態１はこのような例に限定されない。熱画像データと、その熱画像データの正解として用いられる可視画像データとを関連付けることができれば、これらは別のタイミングで取得されてもよい。取得された学習用データは、モデル生成部１１３に与えられる。FIG. 6 is a flowchart showing a process of learning by the learning device 110.
First, the learning side data acquisition unit 112 acquires learning data via the learning side input unit 111 (S10). Here, it is assumed that the thermal image data, which is the image data of the thermal image, and the visible image data, which is the image data of the visible image, which are used as the training data, are acquired at the same time, and the first embodiment is such an example. Not limited to. If the thermal image data can be associated with the visible image data used as the correct answer for the thermal image data, they may be acquired at different timings. The acquired learning data is given to the model generation unit 113.

次に、モデル生成部１１３は、学習用データで示される熱画像及び可視画像の組み合わせに基づいて、いわゆる教師あり学習により、熱画像に対応する出力である可視画像を学習し、学習済モデルを生成する（Ｓ１１）。 Next, the model generation unit 113 learns the visible image, which is the output corresponding to the thermal image, by so-called supervised learning based on the combination of the thermal image and the visible image shown in the training data, and obtains the trained model. Generate (S11).

次に、学習側学習済モデル記憶部１１４は、生成された学習モデルを記憶する（Ｓ１２）。そして、学習側通信部１１５は、その学習モデルを姿勢推定装置１３０に送信する。 Next, the learning side learned model storage unit 114 stores the generated learning model (S12). Then, the learning side communication unit 115 transmits the learning model to the posture estimation device 130.

図７は、姿勢推定装置１３０の構成を概略的に示すブロック図である。
姿勢推定装置１３０は、推論装置１４０と、姿勢推定部として機能する姿勢推定実行装置１５０とを備える。FIG. 7 is a block diagram schematically showing the configuration of the posture estimation device 130.
The posture estimation device 130 includes an inference device 140 and a posture estimation execution device 150 that functions as a posture estimation unit.

推論装置１４０は、学習装置１１０から与えられる学習済モデルを推論側学習モデルとして用いて、熱画像から可視画像を推論する。
推論装置１４０は、推論側通信部１４１と、推論側学習済モデル記憶部１４２と、推論側入力部１４３と、推論側データ取得部１４４と、推論部１４５とを備える。The inference device 140 infers a visible image from a thermal image by using the learned model given by the learning device 110 as an inference side learning model.
The inference device 140 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 144, and an inference unit 145.

推論側通信部１４１は、学習装置１１０からの学習済モデルを受信して、その学習済モデルを推論側学習済モデルとして、推論側学習済モデル記憶部１４２に記憶させる。
推論側学習済モデル記憶部１４２は、推論側学習済モデルを記憶する記憶部である。The inference side communication unit 141 receives the learned model from the learning device 110, and stores the learned model as the inference side learned model in the inference side learned model storage unit 142.
The inference side trained model storage unit 142 is a storage unit that stores the inference side trained model.

推論側入力部１４３は、被写体の熱画像を示す熱画像データの入力を受け付ける入力部である。ここで入力される熱画像データを対象熱画像データともいう。また、対象熱画像データで示される熱画像を対象熱画像ともいい、対象熱画像に含まれている、姿勢を推定する対象である被写体を対象被写体ともいう。
推論側データ取得部１４４は、推論側入力部１４３を介して、対象熱画像データを取得するデータ取得部である。取得された対象熱画像データは、推論部１４５に与えられる。The inference side input unit 143 is an input unit that accepts input of thermal image data indicating a thermal image of a subject. The thermal image data input here is also referred to as target thermal image data. Further, the thermal image represented by the target thermal image data is also referred to as a target thermal image, and the subject included in the target thermal image, which is the target for estimating the posture, is also referred to as the target subject.
The inference side data acquisition unit 144 is a data acquisition unit that acquires target thermal image data via the inference side input unit 143. The acquired target thermal image data is given to the inference unit 145.

推論部１４５は、推論側学習済モデル記憶部１４２に記憶されている推論側学習済モデルを用いて、対象熱画像データで示される熱画像から、対象被写体の可視画像を推論する。言い換えると、推論部１４５は、推論側学習済モデルに、対象熱画像データで示される熱画像を入力することで、その熱画像から推論される、その熱画像に対応する可視画像を取得することができる。そして、推論部１４５は、推論された可視画像を示す可視画像データを生成し、その可視画像データを姿勢推定実行装置１５０に与える。ここで生成される可視画像データを、対象可視画像データともいう。また、対象可視画像データで示される可視画像、言い換えると、推論された可視画像を対象可視画像ともいう。 The inference unit 145 infers a visible image of the target subject from the thermal image indicated by the target thermal image data by using the inference side trained model stored in the inference side trained model storage unit 142. In other words, the inference unit 145 inputs the thermal image indicated by the target thermal image data into the inference side trained model, and acquires the visible image corresponding to the thermal image inferred from the thermal image. Can be done. Then, the inference unit 145 generates visible image data indicating the inferred visible image, and gives the visible image data to the posture estimation execution device 150. The visible image data generated here is also referred to as target visible image data. Further, the visible image indicated by the target visible image data, in other words, the inferred visible image is also referred to as the target visible image.

姿勢推定実行装置１５０は、対象可視画像データで示される可視画像から、その可視画像中に存在する被写体の姿勢を推定する。姿勢を推定する方法としては、予め可視画像と、人物の姿勢（例えば、パーツの位置関係）の対応関係を大量に学習しておき、可視画像が入力されたら、その可視画像に対応する人物の姿勢を学習結果に基づいて決定する、といった方法がある。 The posture estimation execution device 150 estimates the posture of the subject existing in the visible image from the visible image indicated by the target visible image data. As a method of estimating the posture, a large amount of correspondence between the visible image and the posture of the person (for example, the positional relationship of parts) is learned in advance, and when the visible image is input, the person corresponding to the visible image There is a method of determining the posture based on the learning result.

以上に記載された姿勢推定装置１３０も、図５に示されているようなコンピュータ１６０で実現することができる。
例えば、推論側通信部１４１及び推論側入力部１４３は、通信装置１６１により実現することができる。
推論側学習済モデル記憶部１４２は、補助記憶装置１６２により実現することができる。The posture estimation device 130 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side communication unit 141 and the inference side input unit 143 can be realized by the communication device 161.
The inference side learned model storage unit 142 can be realized by the auxiliary storage device 162.

推論側データ取得部１４４及び推論部１４５は、プロセッサ１６４が、メモリ１６３に読み出されたプログラムを実行することで実現することができる。このようなプログラムは、ネットワークを通じて提供されてもよく、また、記録媒体に記録されて提供されてもよい。即ち、このようなプログラムは、例えば、プログラムプロダクトとして提供されてもよい。 The inference side data acquisition unit 144 and the inference unit 145 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided as, for example, a program product.

図８は、姿勢推定装置１３０が、熱画像に対応する可視画像を推論し、その可視画像から姿勢を推定する処理を示すフローチャートである。
まず、推論側データ取得部１４４は、推論側入力部１４３を介して、熱画像を示す対象熱画像データを取得する（Ｓ２０）。取得された対象熱画像データは、推論部１４５に与えられる。FIG. 8 is a flowchart showing a process in which the posture estimation device 130 infers a visible image corresponding to a thermal image and estimates the posture from the visible image.
First, the inference side data acquisition unit 144 acquires the target thermal image data indicating the thermal image via the inference side input unit 143 (S20). The acquired target thermal image data is given to the inference unit 145.

次に、推論部１４５は、推論側学習済モデル記憶部１４２に記憶された推論側学習済モデルに、対象熱画像データで示される熱画像を入力し、その熱画像に対応する可視画像を得る（Ｓ２１）。 Next, the inference unit 145 inputs the thermal image indicated by the target thermal image data into the inference side trained model stored in the inference side trained model storage unit 142, and obtains a visible image corresponding to the thermal image. (S21).

次に、推論部１４５は、推論側学習済モデルにより得られた、熱画像に対応する可視画像を示す対象可視画像データを生成し、その対象可視画像データを姿勢推定実行装置１５０に与える（Ｓ２２）。 Next, the inference unit 145 generates target visible image data indicating a visible image corresponding to the thermal image obtained by the inference side trained model, and gives the target visible image data to the posture estimation execution device 150 (S22). ).

次に、姿勢推定実行装置１５０は、対象可視画像データで示される可視画像中の被写体の姿勢を推定する（Ｓ２３）。このようにして推定された姿勢に基づき、例えば、熱画像内に写る被写体の異常行動を検出することができる。 Next, the posture estimation execution device 150 estimates the posture of the subject in the visible image indicated by the target visible image data (S23). Based on the posture estimated in this way, for example, it is possible to detect the abnormal behavior of the subject reflected in the thermal image.

以上のように、実施の形態１に係る姿勢推定システム１００によれば、熱画像センサ等から出力される熱画像を可視画像へ変換し、学習済の可視画像向け姿勢推定器である姿勢推定実行装置１５０を用いて熱画像中の被写体の姿勢を推定することができる。このため、既存の学習済みの可視画像向け姿勢推定器を用いて、姿勢推定をすることが可能になる。 As described above, according to the attitude estimation system 100 according to the first embodiment, the thermal image output from the thermal image sensor or the like is converted into a visible image, and the attitude estimation execution which is the learned attitude estimator for the visible image is executed. The orientation of the subject in the thermal image can be estimated using the device 150. Therefore, it is possible to estimate the posture using the existing learned posture estimator for visible images.

また、熱画像向けの姿勢推定器を用いる場合は、熱画像と姿勢との関係を学習させる必要があり、熱画像への姿勢のアノテーション作業が必要となる。熱画像への人手でのアノテーション作業では、熱画像の解像度の不足から十分な精度で実施できない。実施の形態１では、熱画像向けの姿勢推定器を用いる必要がないため、これらの課題を回避することができる。 Further, when a posture estimator for a thermal image is used, it is necessary to learn the relationship between the thermal image and the posture, and it is necessary to annotate the posture to the thermal image. Manual annotation work on the thermal image cannot be performed with sufficient accuracy due to the lack of resolution of the thermal image. In the first embodiment, it is not necessary to use the posture estimator for the thermal image, so that these problems can be avoided.

なお、実施の形態１では、モデル生成部１１３が用いる学習アルゴリズムに教師あり学習を適用した場合について説明したが、実施の形態１はこのような例に限定されない。例えば、学習アルゴリズムについては、教師あり学習以外にも、強化学習、教師なし学習又は半教師あり学習等を使用することができる。 In the first embodiment, the case where supervised learning is applied to the learning algorithm used by the model generation unit 113 has been described, but the first embodiment is not limited to such an example. For example, as a learning algorithm, reinforcement learning, unsupervised learning, semi-supervised learning, or the like can be used in addition to supervised learning.

また、モデル生成部１１３は、姿勢推定装置１３０を含む複数の姿勢推定装置に対して作成される学習用データに従って、熱画像に対応する可視画像を学習するようにしてもよい。なお、モデル生成部１１３は、同一のエリアで使用される複数の姿勢推定装置から学習用データを取得してもよいし、異なるエリアで独立して動作する複数の姿勢推定装置から収集される学習用データを利用して熱画像に対応する可視画像を学習してもよい。 Further, the model generation unit 113 may learn the visible image corresponding to the thermal image according to the learning data created for the plurality of posture estimation devices including the posture estimation device 130. The model generation unit 113 may acquire learning data from a plurality of posture estimation devices used in the same area, or may collect learning data from a plurality of posture estimation devices that operate independently in different areas. The visible image corresponding to the thermal image may be learned by using the data.

また、モデル生成部１１３は、学習用データを収集する姿勢推定装置を途中で対象に追加したり、対象から除去したりすることも可能である。
さらに、モデル生成部１１３は、ある姿勢推定装置に関して熱画像に対応する可視画像を学習した学習済モデルを、これとは別の姿勢推定装置に適用し、その別の姿勢推定装置に関して熱画像に対応する可視画像を再学習して、学習済モデルを更新するようにしてもよい。Further, the model generation unit 113 can add or remove the posture estimation device for collecting learning data to the target on the way.
Further, the model generation unit 113 applies the trained model in which the visible image corresponding to the thermal image is learned for one posture estimation device to another posture estimation device, and applies the trained model to another posture estimation device for the thermal image. The corresponding visible image may be retrained to update the trained model.

また、モデル生成部１１３に用いられる学習アルゴリズムとしては、特徴量そのものの抽出を学習する、深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）を用いることもできる。また、モデル生成部１１３は、他の公知の方法、例えば、遺伝的プログラミング、機能論理プログラミング、又は、サポートベクターマシン等に従って機械学習を実行してもよい。 Further, as the learning algorithm used in the model generation unit 113, deep learning, which learns the extraction of the feature amount itself, can also be used. In addition, the model generation unit 113 may execute machine learning according to other known methods such as genetic programming, functional logic programming, or a support vector machine.

なお、学習装置１１０及び推論装置１４０は、姿勢推定システム１００の熱画像に対応する可視画像を学習するために使用されるが、例えば、ネットワークを介して姿勢推定実行装置１５０に接続されるようになっていてもよい。
また、学習装置１１０、推論装置１４０又は姿勢推定実行装置１５０は、クラウドサーバ上に存在していてもよい。The learning device 110 and the inference device 140 are used to learn the visible image corresponding to the thermal image of the posture estimation system 100, and are connected to the posture estimation execution device 150 via a network, for example. It may be.
Further, the learning device 110, the inference device 140, or the posture estimation execution device 150 may exist on the cloud server.

また、以上に記載した実施の形態１における姿勢推定システム１００では、学習装置１１０と、姿勢推定装置１３０とが別の装置であるが、例えば、学習装置１１０が、姿勢推定装置１３０内に設けられていてもよい。このような場合、学習側通信部１１５及び推論側通信部１４１は、不要となり、学習側学習済モデル記憶部１１４及び推論側学習済モデル記憶部１４２は、学習済モデル記憶部として統合することができる。 Further, in the posture estimation system 100 according to the first embodiment described above, the learning device 110 and the posture estimation device 130 are separate devices. For example, the learning device 110 is provided in the posture estimation device 130. You may be. In such a case, the learning side communication unit 115 and the inference side communication unit 141 become unnecessary, and the learning side learned model storage unit 114 and the inference side learned model storage unit 142 can be integrated as the learned model storage unit. it can.

なお、実施の形態１に係る姿勢推定システム１００では、学習装置１１０で生成された学習済モデルを用いて、姿勢推定装置１３０が熱画像に対応する可視画像を推論しているが、実施の形態はこのような例に限定されない。例えば、姿勢推定装置１３０は、他のシステム等の外部から学習済モデルを取得し、この学習済モデルに基づいて熱画像に対応する可視画像を推論してもよい。 In the posture estimation system 100 according to the first embodiment, the posture estimation device 130 infers a visible image corresponding to a thermal image by using the trained model generated by the learning device 110. Is not limited to such an example. For example, the attitude estimation device 130 may acquire a trained model from the outside such as another system and infer a visible image corresponding to a thermal image based on the trained model.

実施の形態２．
図１に示されているように、実施の形態２に係る姿勢推定システム２００は、学習装置２１０と、姿勢推定装置２３０とを備える。Embodiment 2.
As shown in FIG. 1, the posture estimation system 200 according to the second embodiment includes a learning device 210 and a posture estimation device 230.

図２に示されているように、実施の形態２における学習装置２１０は、学習側入力部１１１と、学習側データ取得部２１２と、モデル生成部２１３と、学習側学習済モデル記憶部１１４と、学習側通信部１１５とを備える。
実施の形態２における学習装置２１０の学習側入力部１１１、学習側学習済モデル記憶部１１４及び学習側通信部１１５は、実施の形態１における学習装置１１０の学習側入力部１１１、学習側学習済モデル記憶部１１４及び学習側通信部１１５と同様である。As shown in FIG. 2, the learning device 210 according to the second embodiment includes a learning side input unit 111, a learning side data acquisition unit 212, a model generation unit 213, and a learning side learned model storage unit 114. , The learning side communication unit 115 is provided.
The learning side input unit 111, the learning side learned model storage unit 114, and the learning side communication unit 115 of the learning device 210 in the second embodiment are the learning side input unit 111 and the learning side learned side of the learning device 110 in the first embodiment. This is the same as the model storage unit 114 and the learning side communication unit 115.

学習側データ取得部２１２は、学習側入力部１１１を介して、学習用データを取得する。実施の形態２において取得される学習用データは、熱画像を示す熱画像データと、その熱画像に対応する正解である可視画像を示す可視画像データと、その可視画像に対応する正解である、被写体の姿勢を示す姿勢情報とを含む。取得された学習用データは、モデル生成部２１３に与えられる。 The learning side data acquisition unit 212 acquires learning data via the learning side input unit 111. The learning data acquired in the second embodiment is thermal image data indicating a thermal image, visible image data indicating a visible image which is a correct answer corresponding to the thermal image, and a correct answer corresponding to the visible image. Includes posture information indicating the posture of the subject. The acquired learning data is given to the model generation unit 213.

モデル生成部２１３は、学習側データ取得部２１２から与えられる学習用データに基づいて、熱画像に対応する可視画像と、その可視画像に対応する姿勢とを学習する。言い換えると、モデル生成部２１３は、学習用データで示される熱画像及び可視画像の組み合わせ、並びに、可視画像及び姿勢の組み合わせを学習することで、熱画像に対応する最適な姿勢を推論するための学習済モデルを生成する。具体的には、モデル生成部１１３は、学習用データを用いて、熱画像から可視画像への推論及び可視画像から姿勢への推論を学習することで、熱画像から姿勢を推論するための学習済モデルを生成する。
そして、モデル生成部２１３は、生成された学習済モデルを学習側学習済モデルとして学習側学習済モデル記憶部１１４に記憶させる。The model generation unit 213 learns the visible image corresponding to the thermal image and the posture corresponding to the visible image based on the learning data given from the learning side data acquisition unit 212. In other words, the model generation unit 213 learns the combination of the thermal image and the visible image shown in the training data and the combination of the visible image and the posture to infer the optimum posture corresponding to the thermal image. Generate a trained model. Specifically, the model generation unit 113 learns to infer the posture from the thermal image by learning the inference from the thermal image to the visible image and the inference from the visible image to the posture using the training data. Generate a finished model.
Then, the model generation unit 213 stores the generated learned model as the learning side trained model in the learning side trained model storage unit 114.

図９は、実施の形態２における姿勢推定装置２３０の構成を概略的に示すブロック図である。
姿勢推定装置２３０は、推論側通信部１４１と、推論側学習済モデル記憶部１４２と、推論側入力部１４３と、推論側データ取得部１４４と、推論部２４５とを備える。FIG. 9 is a block diagram schematically showing the configuration of the posture estimation device 230 according to the second embodiment.
The attitude estimation device 230 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 144, and an inference unit 245.

実施の形態２における姿勢推定装置２３０の推論側通信部１４１、推論側学習済モデル記憶部１４２、推論側入力部１４３及び推論側データ取得部１４４は、実施の形態１における姿勢推定装置１３０の推論側通信部１４１、推論側学習済モデル記憶部１４２、推論側入力部１４３及び推論側データ取得部１４４と同様である。 The inference side communication unit 141, the inference side learned model storage unit 142, the inference side input unit 143, and the inference side data acquisition unit 144 of the attitude estimation device 230 according to the second embodiment are used to infer the attitude estimation device 130 according to the first embodiment. This is the same as the side communication unit 141, the inference side learned model storage unit 142, the inference side input unit 143, and the inference side data acquisition unit 144.

推論部２４５は、推論側学習済モデル記憶部１４２に記憶されている推論側学習済モデルを用いて、対象熱画像データで示される熱画像から、可視画像を推論し、その可視画像から姿勢を推論する。言い換えると、推論部１４５は、推論側学習済モデルに、対象熱画像データで示される熱画像を入力することで、その熱画像から推論される、その熱画像中に存在する被写体の姿勢を推定する。 The inference unit 245 infers a visible image from the thermal image indicated by the target thermal image data by using the inference side trained model stored in the inference side trained model storage unit 142, and determines the posture from the visible image. Infer. In other words, the inference unit 145 estimates the posture of the subject existing in the thermal image, which is inferred from the thermal image, by inputting the thermal image indicated by the target thermal image data into the inference side trained model. To do.

以上に記載された姿勢推定装置２３０も、図５に示されているようなコンピュータ１６０で実現することができる。
例えば、推論側通信部１４１及び推論側入力部１４３は、通信装置１６１により実現することができる。
推論側学習済モデル記憶部１４２は、補助記憶装置１６２により実現することができる。The posture estimation device 230 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side communication unit 141 and the inference side input unit 143 can be realized by the communication device 161.
The inference side learned model storage unit 142 can be realized by the auxiliary storage device 162.

推論側データ取得部１４４及び推論部２４５は、プロセッサ１６４が、メモリ１６３に読み出されたプログラムを実行することで実現することができる。このようなプログラムは、ネットワークを通じて提供されてもよく、また、記録媒体に記録されて提供されてもよい。即ち、このようなプログラムは、例えば、プログラムプロダクトとして提供されてもよい。 The inference side data acquisition unit 144 and the inference unit 245 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided as, for example, a program product.

以上のように、実施の形態２に係る姿勢推定システム２００によれば、熱画像センサ等から出力される熱画像から被写体の姿勢を推定することが可能になる。学習時に可視画像及び姿勢を教師データとして入力することで、熱画像への姿勢情報のアノテーション作業を回避することが可能となる。 As described above, according to the posture estimation system 200 according to the second embodiment, it is possible to estimate the posture of the subject from the thermal image output from the thermal image sensor or the like. By inputting the visible image and the posture as teacher data at the time of learning, it is possible to avoid the work of annotating the posture information on the thermal image.

更に、実施の形態１とは異なり、活用フェーズで可視画像を生成、出力しないことでネットワークの規模を抑えることができ、演算量を削減することができる。 Further, unlike the first embodiment, the scale of the network can be suppressed and the amount of calculation can be reduced by not generating and outputting the visible image in the utilization phase.

実施の形態３．
図１に示されているように、実施の形態３に係る姿勢推定システム３００は、学習装置３１０と、姿勢推定装置３３０とを備える。Embodiment 3.
As shown in FIG. 1, the posture estimation system 300 according to the third embodiment includes a learning device 310 and a posture estimation device 330.

図１０は、学習装置３１０の構成を概略的に示すブロック図である。
学習装置３１０は、学習側入力部１１１と、学習側データ取得部３１２と、モデル生成部３１３と、学習側学習済モデル記憶部１１４と、学習側通信部１１５と、学習側輪郭抽出部３１６とを備える。FIG. 10 is a block diagram schematically showing the configuration of the learning device 310.
The learning device 310 includes a learning side input unit 111, a learning side data acquisition unit 312, a model generation unit 313, a learning side learned model storage unit 114, a learning side communication unit 115, and a learning side contour extraction unit 316. To be equipped.

実施の形態３における学習装置３１０の学習側入力部１１１、学習側学習済モデル記憶部１１４及び学習側通信部１１５は、実施の形態１における学習装置１１０の学習側入力部１１１、学習側学習済モデル記憶部１１４及び学習側通信部１１５と同様である。 The learning side input unit 111, the learning side learned model storage unit 114, and the learning side communication unit 115 of the learning device 310 in the third embodiment are the learning side input unit 111 and the learning side learned side of the learning device 110 in the first embodiment. This is the same as the model storage unit 114 and the learning side communication unit 115.

学習側データ取得部３１２は、学習側入力部１１１を介して、学習用データを取得する。取得された学習用データは、モデル生成部３１３に与えられる。
また、学習側データ取得部３１２は、取得された学習用データに含まれている熱画像を示す熱画像データを学習側熱画像データとして学習側輪郭抽出部３１６に与える。The learning side data acquisition unit 312 acquires learning data via the learning side input unit 111. The acquired learning data is given to the model generation unit 313.
Further, the learning side data acquisition unit 312 gives the learning side contour extraction unit 316 the thermal image data indicating the thermal image included in the acquired learning data as the learning side thermal image data.

学習側輪郭抽出部３１６は、学習側熱画像データで示される熱画像から、被写体の輪郭を示す輪郭画像を抽出する輪郭抽出部である。抽出方法は、キャニー法若しくはソーベル法等のエッジ検出処理を用いる方法、又は、二値化処理とエッジ検出を組み合わせる方法等がある。エッジ検出処理では、被写体のエッジが検出される。また、二値化処理とエッジ検出の組み合わせは、熱画像に対して二値化処理を行ってから、エッジ検出処理が行われればよい。そして、学習側輪郭抽出部３１６は、抽出された輪郭画像を示す輪郭画像データを、学習側輪郭画像データとしてモデル生成部３１３に与える。 The learning side contour extraction unit 316 is a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image indicated by the learning side thermal image data. As the extraction method, there are a method using edge detection processing such as the Canny method or the Sobel method, or a method of combining binarization processing and edge detection. In the edge detection process, the edge of the subject is detected. Further, in the combination of the binarization process and the edge detection, the edge detection process may be performed after the binarization process is performed on the thermal image. Then, the learning side contour extraction unit 316 gives the contour image data indicating the extracted contour image to the model generation unit 313 as the learning side contour image data.

モデル生成部３１３は、学習側データ取得部３１２から与えられる学習用データ及び学習側輪郭抽出部３１６から与えられる学習側輪郭画像データに基づいて、熱画像に対応する可視画像を学習する。言い換えると、モデル生成部３１３は、学習用データで示される熱画像及び学習側輪郭画像データで示される輪郭画像と、学習用データで示される可視画像との組み合わせを学習することで、熱画像及びその輪郭画像に対応する最適な可視画像を推論するための学習済モデルを生成する。具体的には、モデル生成部３１３は、熱画像及び輪郭画像の組み合わせから可視画像への推論を学習することで、熱画像及び輪郭画像の組み合わせから可視画像を推論するための学習済モデルを生成する。
そして、モデル生成部３１３は、生成された学習済モデルを学習側学習済モデルとして学習側学習済モデル記憶部１１４に記憶させる。The model generation unit 313 learns a visible image corresponding to a thermal image based on the learning data given by the learning side data acquisition unit 312 and the learning side contour image data given by the learning side contour extraction unit 316. In other words, the model generation unit 313 learns the combination of the thermal image shown by the training data, the contour image shown by the learning side contour image data, and the visible image shown by the training data, thereby forming the thermal image and the thermal image. A trained model for inferring the optimum visible image corresponding to the contour image is generated. Specifically, the model generation unit 313 generates a trained model for inferring a visible image from a combination of thermal images and contour images by learning inference from a combination of thermal images and contour images to a visible image. To do.
Then, the model generation unit 313 stores the generated learned model as the learning side learned model in the learning side trained model storage unit 114.

図１１は、実施の形態３における、熱画像及び輪郭画像を可視画像へ変換する画像変換処理の学習済モデルの構造の一例を示す概略図である。
図１１に示されている学習済モデルは、デコーダー部分のレイヤーと、エンコーダー部分のレイヤーとが対称構造となり、スキップコネクションで接続されたＵ−Ｎｅｔ構造を有している。そのデコーダー部分は、並列の二つのパスを備えており、その二つのパスは、熱画像をデコードするためのパスと、輪郭画像をデコードするためのパスである。FIG. 11 is a schematic view showing an example of the structure of the trained model of the image conversion process for converting the thermal image and the contour image into the visible image in the third embodiment.
The trained model shown in FIG. 11 has a U-Net structure in which the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure and are connected by a skip connection. The decoder portion has two parallel paths, one for decoding the thermal image and the other for decoding the contour image.

これにより、図１１に示されている学習済モデルは、デコーダー部分が並列で２パス存在し、一方は熱画像を、もう一方は輪郭画像のデコードを行う。モデルの中央のレイヤーでデコードされた２つのベクトル情報が連結され、連結された情報がエンコーダー部分へ入力される。
このような構造を有することで、実施の形態３では、熱画像から変換された可視画像にエッジ成分がより多く含まれ、姿勢推定の精度を向上させることができる。As a result, in the trained model shown in FIG. 11, the decoder portion has two paths in parallel, one of which decodes the thermal image and the other of which decodes the contour image. The two vector information decoded in the center layer of the model is concatenated, and the concatenated information is input to the encoder part.
By having such a structure, in the third embodiment, the visible image converted from the thermal image contains a larger amount of edge components, and the accuracy of posture estimation can be improved.

以上に記載された学習装置３１０も、図５に示されているようなコンピュータ１６０で実現することができる。
例えば、学習側データ取得部３１２、モデル生成部３１３及び学習側輪郭抽出部３１６も、プロセッサ１６４が、メモリ１６３に読み出されたプログラムを実行することで実現することができる。このようなプログラムは、ネットワークを通じて提供されてもよく、また、記録媒体に記録されて提供されてもよい。即ち、このようなプログラムは、例えば、プログラムプロダクトとして提供されてもよい。The learning device 310 described above can also be realized by a computer 160 as shown in FIG.
For example, the learning side data acquisition unit 312, the model generation unit 313, and the learning side contour extraction unit 316 can also be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided as, for example, a program product.

図１２は、姿勢推定装置３３０の構成を概略的に示すブロック図である。
姿勢推定装置３３０は、推論装置３４０と、姿勢推定実行装置１５０とを備える。
実施の形態３における姿勢推定装置３３０の姿勢推定実行装置１５０は、実施の形態１における姿勢推定実行装置１５０と同様である。FIG. 12 is a block diagram schematically showing the configuration of the posture estimation device 330.
The posture estimation device 330 includes an inference device 340 and a posture estimation execution device 150.
The posture estimation execution device 150 of the posture estimation device 330 in the third embodiment is the same as the posture estimation execution device 150 in the first embodiment.

推論装置３４０は、推論側通信部１４１と、推論側学習済モデル記憶部１４２と、推論側入力部１４３と、推論側データ取得部３４４と、推論部３４５と、推論側輪郭抽出部３４６とを備える。
実施の形態３における推論装置３４０の推論側通信部１４１、推論側学習済モデル記憶部１４２及び推論側入力部１４３は、実施の形態１における推論装置１４０の推論側通信部１４１、推論側学習済モデル記憶部１４２及び推論側入力部１４３と同様である。The inference device 340 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 344, an inference side 345, and an inference side contour extraction unit 346. Be prepared.
The inference side communication unit 141, the inference side learned model storage unit 142 and the inference side input unit 143 of the inference device 340 in the third embodiment are the inference side communication unit 141 and the inference side trained in the inference device 140 in the first embodiment. This is the same as the model storage unit 142 and the inference side input unit 143.

推論側データ取得部１４４は、推論側入力部１４３を介して、対象熱画像データを取得する。そして、推論側データ取得部１４４は、取得された対象熱画像データを、推論部３４５及び推論側輪郭抽出部３４６に与える。 The inference side data acquisition unit 144 acquires the target thermal image data via the inference side input unit 143. Then, the inference side data acquisition unit 144 gives the acquired target thermal image data to the inference unit 345 and the inference side contour extraction unit 346.

推論側輪郭抽出部３４６は、対象熱画像データで示される熱画像から輪郭画像を抽出する輪郭抽出部である。抽出方法は、学習側輪郭抽出部３１６と同一とする。そして、推論側輪郭抽出部３４６は、抽出された輪郭画像を示す輪郭画像データを、推論側輪郭画像データとして推論部３４５に与える。ここで抽出される輪郭画像を対象輪郭画像ともいい、推論側輪郭画像データを対象輪郭画像データともいう。 The inference side contour extraction unit 346 is a contour extraction unit that extracts a contour image from the thermal image indicated by the target thermal image data. The extraction method is the same as that of the learning side contour extraction unit 316. Then, the inference side contour extraction unit 346 gives the inference side contour image data to the inference unit 345 as contour image data indicating the extracted contour image. The contour image extracted here is also referred to as a target contour image, and the inference side contour image data is also referred to as a target contour image data.

推論部３４５は、推論側学習済モデル記憶部１４２に記憶されている推論側学習済モデルを用いて、対象熱画像データで示される熱画像及び推論側輪郭画像データで示される輪郭画像の組み合わせから、可視画像を推論する。言い換えると、推論部３４５は、推論側学習済モデルに、対象熱画像データで示される熱画像及び推論側輪郭画像データで示される輪郭画像を入力することで、その熱画像から推論される、その熱画像に対応する可視画像を取得することができる。そして、推論部３４５は、推論された可視画像を示す可視画像データを生成し、その可視画像データを姿勢推定実行装置１５０に与える。ここで生成される可視画像データを、対象可視画像データともいう。対象可視画像データで示される可視画像を対象可視画像ともいう。 The inference unit 345 uses the inference side trained model stored in the inference side trained model storage unit 142 from the combination of the thermal image shown in the target thermal image data and the contour image shown in the inference side contour image data. , Infer the visible image. In other words, the inference unit 345 infers from the thermal image by inputting the thermal image shown by the target thermal image data and the contour image indicated by the inference side contour image data into the inference side trained model. A visible image corresponding to a thermal image can be acquired. Then, the inference unit 345 generates visible image data indicating the inferred visible image, and gives the visible image data to the posture estimation execution device 150. The visible image data generated here is also referred to as target visible image data. The visible image indicated by the target visible image data is also referred to as a target visible image.

以上に記載された姿勢推定装置３３０も、図５に示されているようなコンピュータ１６０で実現することができる。
例えば、推論側データ取得部３４４、推論部３４５及び推論側輪郭抽出部３４６は、プロセッサ１６４が、メモリ１６３に読み出されたプログラムを実行することで実現することができる。このようなプログラムは、ネットワークを通じて提供されてもよく、また、記録媒体に記録されて提供されてもよい。即ち、このようなプログラムは、例えば、プログラムプロダクトとして提供されてもよい。The posture estimation device 330 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side data acquisition unit 344, the inference side 345, and the inference side contour extraction unit 346 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided as, for example, a program product.

一般的に、熱画像は、曖昧な輪郭情報を有するため、学習済モデルを用いて生成される可視画像も曖昧な輪郭となる。姿勢の推定では輪郭情報が重要となるため、輪郭が曖昧な画像では姿勢推定の精度が低下する。
これに対して、実施の形態３に係る姿勢推定システム３００によれば、熱画像と輪郭画像とを学習済モデルへ同時に入力することで、輪郭が曖昧ではない可視画像を生成することができる。これにより熱画像単体を学習済モデルへ入力することと比較して、生成された可視画像からの姿勢推定精度を向上させることができる。In general, since the thermal image has ambiguous contour information, the visible image generated by using the trained model also has an ambiguous contour. Since contour information is important for posture estimation, the accuracy of posture estimation is reduced for images with ambiguous contours.
On the other hand, according to the posture estimation system 300 according to the third embodiment, by simultaneously inputting the thermal image and the contour image into the trained model, it is possible to generate a visible image in which the contour is not ambiguous. As a result, the attitude estimation accuracy from the generated visible image can be improved as compared with inputting the thermal image alone into the trained model.

１００，２００，３００姿勢推定システム、１１０，２１０，３１０学習装置、１１１学習側入力部、１１２，２１２，３１２学習側データ取得部、１１３，２１３，３１３モデル生成部、１１４学習側学習済モデル記憶部、１１５学習側通信部、３１６学習側輪郭抽出部、１３０，２３０，３３０姿勢推定装置、１４０，３４０推論装置、１４１推論側通信部、１４２推論側学習済モデル記憶部、１４３推論側入力部、１４４，３４４推論側データ取得部、１４５，２４５，３４５推論部、３４６推論側輪郭抽出部、１５０姿勢推定実行装置。 100,200,300 posture estimation system, 110,210,310 learning device, 111 learning side input unit, 112,212,312 learning side data acquisition unit, 113,213,313 model generation unit, 114 learning side learned model storage Unit, 115 Learning side communication unit, 316 Learning side contour extraction unit, 130, 230, 330 Attitude estimation device, 140, 340 Inference device, 141 Inference side communication unit, 142 Inference side trained model storage unit, 143 Inference side input unit , 144,344 Inference side data acquisition unit, 145,245,345 Inference unit, 346 Inference side contour extraction unit, 150 Attitude estimation execution device.

Claims

Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. A data acquisition unit that acquires training data,
A contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image,
A model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning inference from the combination of the thermal image and the contour image to the visible image. With ,
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The learning device, characterized in that the two paths are a path for decoding the thermal image and a path for decoding the contour image .

The learned model, a layer of the decoder portion, and layers of the encoder portion becomes symmetrical structure, according to claim 1, characterized in that it has a connected U-Net structure skip connection Learning device.

In the central layer of the trained model, the vector in which the thermal image is decoded and the vector information in which the contour image is decoded are concatenated, and the concatenated information is input to the encoder portion.
2. The learning device according to claim 2.

The learning device according to any one of claims 1 to 3, wherein the contour extraction unit extracts the contour image from the thermal image by an edge detection process for detecting the edge of the subject.

Claims 1 to 3, wherein the contour extraction unit extracts the contour image from the thermal image by performing a binarization process on the thermal image and then performing an edge detection process . The learning device according to any one item .

Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using the visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and a data acquisition unit.
A contour extraction unit that extracts a target contour image, which is a contour image showing the contour of the target subject, from the target thermal image.
An inference unit that infers a target visible image, which is a visible image of the target subject, from a combination of the target thermal image and the target contour image using the trained model.
A posture estimation unit that estimates the posture of the target subject from the target visible image is provided .
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The utilization device, characterized in that the two paths are a path for decoding the thermal image and a path for decoding the contour image .

The learned model, a layer of the decoder portion, and layers of the encoder portion becomes symmetrical structure, according to claim 6, characterized in that it has a connected U-Net structure skip connection Utilization device.

In the central layer of the trained model, the vector in which the thermal image is decoded and the vector information in which the contour image is decoded are concatenated, and the concatenated information is input to the encoder portion.
7. The utilization device according to claim 7.

The utilization device according to any one of claims 6 to 8, wherein the contour extraction unit extracts the contour image from the thermal image by an edge detection process.

Claims 6 to 8, wherein the contour extraction unit extracts the contour image from the thermal image by performing a binarization process on the thermal image and then performing an edge detection process . Utilization device according to any one item .

Computer,
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Data acquisition unit that acquires training data,
A contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and a contour extraction unit.
A model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning the inference from the combination of the thermal image and the contour image to the visible image. to function as,
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The two paths are a program for decoding the thermal image and a path for decoding the contour image .

Computer,
It includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject and a visible image that images the subject by using the visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject.
A contour extraction unit that extracts a target contour image, which is a contour image showing the contour of the target subject, from the target thermal image.
An inference unit that infers a target visible image, which is a visible image of the target subject, from a combination of the target thermal image and the target contour image using the trained model, and
It functions as a posture estimation unit that estimates the posture of the target subject from the target visible image .
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The two paths are a program for decoding the thermal image and a path for decoding the contour image .

Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Get training data,
A contour image showing the contour of the subject is extracted from the thermal image,
It is a learning method for generating a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning the inference from the combination of the thermal image and the contour image to the visible image. hand,
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The learning method , wherein the two paths are a path for decoding the thermal image and a path for decoding the contour image .

Acquires the target thermal image data indicating the target thermal image which is the thermal image of the target subject which is the target subject.
A target contour image, which is a contour image showing the contour of the target subject, is extracted from the target thermal image.
It includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject and a visible image that images the subject by using the visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. Using the trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above, the visible image of the target subject is obtained from the combination of the target thermal image and the target contour image. Infer the target visible image that is
It is a utilization method of estimating the posture of the target subject from the target visible image.
The trained model is formed by a layer of a decoder part and a layer of an encoder part, and the decoder part has two paths in parallel.
The utilization method, characterized in that the two paths are a path for decoding the thermal image and a path for decoding the contour image .