JP2021144359A

JP2021144359A - Learning apparatus, estimation apparatus, learning method, and program

Info

Publication number: JP2021144359A
Application number: JP2020041377A
Authority: JP
Inventors: 涼介芝崎; Ryosuke Shibasaki; 竜太郎谷村; Ryutaro Tanimura
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2021-09-24

Abstract

To provide a technology of generating an identification model for estimating a depth of a hidden section of a human body.SOLUTION: A learning apparatus 10 includes: a three-dimensional model generation unit 1 which generates a three-dimensional model of a human body; a depth image generation unit 2 which generates a depth image of the three-dimensional model; a neck area selection unit 3 which selects a depth image of a neck area including a neck of the human body from the depth image of the three-dimensional model; a depth acquisition unit 4 which acquires a depth of a specific section of the human body from the depth image of the three-dimensional model; and an identification model generation unit 5 which generates an identification model for identifying a depth of the specific section of the human body from a depth image captured by an imaging apparatus on the basis of a difference between the depth of the neck area to be acquired from the selected depth image of the neck area and the acquired depth of the specific section.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、推定装置、学習方法、及びプログラムに関する。 The present invention relates to a learning device, an estimation device, a learning method, and a program.

これまでに、深度画像から人体部位の頭、肩、首、肘など、人体骨格を推定する認識処理の研究が行われている。この認識処理では、人体の三次元モデルを生成した上で、その三次元モデルでの各部位（頭・肩など）が、どのような深度分布になっているかを機械学習させて、人体の各骨格位置を推定している。この推定において、三次元モデルの視えている部分のみを機械学習した場合、例えば、腕を前にすることによって肩が視えなくなると、肩の関節位置を推定できなくなることがある。 So far, research has been conducted on recognition processing that estimates the human skeleton such as the head, shoulders, neck, and elbows of human body parts from depth images. In this recognition process, after generating a three-dimensional model of the human body, machine learning is performed to learn what kind of depth distribution each part (head, shoulder, etc.) in the three-dimensional model has, and each part of the human body. The skeletal position is estimated. In this estimation, if only the visible part of the three-dimensional model is machine-learned, for example, if the shoulder cannot be seen by putting the arm in front, the shoulder joint position may not be estimated.

特許文献１には、深度画像を用いて、分析対象が視える可視部分だけでなく、他部位で隠れて視えない隠し部分の部位を認識することができる技術が開示されている。特許文献１では、分類ツリーを用いて、分析対象が可視部分であるか、隠し部分であるかを認識する。その結果から、その分析対象部分の深度値を復元して、各部位を推定している。この分類ツリーは学習させて、認識性能を向上させている。 Patent Document 1 discloses a technique that can recognize not only a visible portion that can be seen by an analysis target but also a hidden portion that cannot be seen because it is hidden by another portion by using a depth image. In Patent Document 1, the classification tree is used to recognize whether the analysis target is a visible part or a hidden part. From the result, the depth value of the analysis target part is restored and each part is estimated. This classification tree is trained to improve recognition performance.

また、非特許文献１には、深度画像から人体の関節位置の座標を推定する技術が開示されている。非特許文献１では、可視部分の人体部位を分類する分類ツリーで各最終ノードに分類される類似部位の位置から関節位置を回帰分析により推定し、隠れた部位でも推定できる。 Further, Non-Patent Document 1 discloses a technique for estimating the coordinates of the joint position of the human body from a depth image. In Non-Patent Document 1, the joint position is estimated by regression analysis from the position of the similar part classified into each final node in the classification tree for classifying the human body part of the visible part, and the hidden part can also be estimated.

特開２０１７−２０８１２６号公報JP-A-2017-208126

Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, Andrew Blake, “Efficient Human Pose Estimation from Single Depth Images” Article in IEEE Transactions on Software Engineering December 2013Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, Andrew Blake, “Efficient Human Pose Optimization from Single Depth Images” Article in IEEE Transactions on Software Engineering December 2013

しかしながら、特許文献１では、可視部分及び隠し部分の両方を入力として、分類ツリーから分析対象が可視部分であるか、隠し部分であるかを認識しているため、認識処理に大量のデータが必要となり、処理の複雑化、又は、処理時間の長期化するおそれがある。また、非特許文献１では、人体の全身部位をマルチクラス分類しているため、非特許文献１でも、処理の複雑化、又は、処理時間の長期化するおそれがある。 However, in Patent Document 1, since both the visible part and the hidden part are input and whether the analysis target is the visible part or the hidden part is recognized from the classification tree, a large amount of data is required for the recognition process. Therefore, there is a risk that the processing will be complicated or the processing time will be prolonged. Further, in Non-Patent Document 1, since the whole body part of the human body is classified into multi-class, even in Non-Patent Document 1, there is a possibility that the processing becomes complicated or the processing time becomes long.

本発明の目的の一例は、人体の隠れた部位の深度を推定できる識別モデルを生成する学習装置、学習方法、及びプログラム、並びに、その識別モデルを用いた推定装置を提供することにある。 An example of an object of the present invention is to provide a learning device, a learning method, and a program that generate a discriminative model capable of estimating the depth of a hidden part of the human body, and an estimation device using the discriminative model.

上記目的を達成するため、本発明の一側面における学習装置は、
人体の三次元モデルを生成する三次元モデル生成部と、
前記三次元モデルの深度画像を生成する深度画像生成部と、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択する首領域選択部と、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得する深度取得部と、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、識別モデル生成部と、
を備える。 In order to achieve the above object, the learning device in one aspect of the present invention is
A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A neck region selection unit that selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.
A depth acquisition unit that acquires the depth of a specific part of the human body from the depth image of the three-dimensional model.
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. The discriminative model generator, which is generated based on the depth difference from the depth,
To be equipped.

また、上記目的を達成するため、本発明の一側面における推定装置は、
撮像装置から深度画像を取得する深度画像取得部と、
取得された前記深度画像に、人体の首を中心とした領域を推定し、推定した前記領域に基づいて、前記深度画像から人体を検出する人体検出部と、
前記人体が検出された場合、前記領域内における深度分布に基づいて、前記人体の特定部位の深度を推定する深度推定部と、
を備え、
前記深度推定部は、
生成された人体の三次元モデルの深度画像から、前記人体の首を含む領域の深度画像が選択され、選択された前記深度画像から生成された識別モデルを用いて、前記人体の特定部位の深度を推定する。 Further, in order to achieve the above object, the estimation device in one aspect of the present invention is used.
A depth image acquisition unit that acquires a depth image from an image pickup device,
A human body detection unit that estimates a region centered on the neck of the human body from the acquired depth image and detects the human body from the depth image based on the estimated region.
When the human body is detected, a depth estimation unit that estimates the depth of a specific part of the human body based on the depth distribution in the region, and a depth estimation unit.
With
The depth estimation unit
From the generated depth image of the three-dimensional model of the human body, a depth image of the region including the neck of the human body is selected, and the depth of a specific part of the human body is used by using the discriminative model generated from the selected depth image. To estimate.

また、上記目的を達成するため、本発明の一側面における学習方法は、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択するステップと、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得するステップと、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、ステップと、
を備える。 Further, in order to achieve the above object, the learning method in one aspect of the present invention is:
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
To be equipped.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択するステップと、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得するステップと、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、ステップと、
を実行させる命令を含む。 Further, in order to achieve the above object, the program in one aspect of the present invention is:
On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
Includes instructions to execute.

以上のように本発明によれば、隠れた人体の特定部位の深度を推定することができる。 As described above, according to the present invention, the depth of a specific part of the hidden human body can be estimated.

図１は、学習装置の構成図である。FIG. 1 is a configuration diagram of a learning device. 図２は、三次元モデルの深度画像から、識別モデルを生成する方法を説明するための図である。FIG. 2 is a diagram for explaining a method of generating an discriminative model from a depth image of a three-dimensional model. 図３は、推定装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of the estimation device. 図４は、学習装置の動作を示すフロー図である。FIG. 4 is a flow chart showing the operation of the learning device. 図５は、推定装置の動作を示すフロー図である。FIG. 5 is a flow chart showing the operation of the estimation device. 図６は、学習装置を実現するコンピュータの一例を示すブロック図である。FIG. 6 is a block diagram showing an example of a computer that realizes a learning device.

以下、本発明の一実施形態における学習装置および学習方法について、図１〜図６を参照しながら説明する。 Hereinafter, the learning device and the learning method according to the embodiment of the present invention will be described with reference to FIGS. 1 to 6.

［装置構成］
図１は、学習装置１０の構成図である。 [Device configuration]
FIG. 1 is a configuration diagram of the learning device 10.

学習装置１０は、撮像装置で撮像された深度画像（以下、撮像画像と言う）から、人体の特定部位の深度を推定するための識別モデルを学習させるための装置である。「特定部位」は、人体の頭、肩又は首である。学習装置１０は、生成した人体の三次元モデルを用いて、識別モデルを学習させていくことで、撮像画像から人体の特定部位の深度を推定する精度を高めることを可能としている。 The learning device 10 is a device for learning an identification model for estimating the depth of a specific part of the human body from a depth image (hereinafter referred to as an captured image) captured by the imaging device. The "specific site" is the head, shoulders or neck of the human body. The learning device 10 makes it possible to improve the accuracy of estimating the depth of a specific part of the human body from the captured image by learning the discriminative model using the generated three-dimensional model of the human body.

学習装置１０は、三次元モデル生成部１、深度画像生成部２、首領域選択部３と、深度取得部４と、識別モデル生成部５と、を備えている。 The learning device 10 includes a three-dimensional model generation unit 1, a depth image generation unit 2, a neck region selection unit 3, a depth acquisition unit 4, and an identification model generation unit 5.

三次元モデル生成部１は、人体の三次元モデルを生成する。 The three-dimensional model generation unit 1 generates a three-dimensional model of the human body.

深度画像生成部２は、生成された三次元モデルの深度画像を生成する。 The depth image generation unit 2 generates a depth image of the generated three-dimensional model.

首領域選択部３は、三次元モデルの深度画像から、人体の首を含む首領域の深度画像を選択する。 The neck region selection unit 3 selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.

深度取得部４は、三次元モデルの深度画像から、人体の特定部位の深度を取得する。 The depth acquisition unit 4 acquires the depth of a specific part of the human body from the depth image of the three-dimensional model.

識別モデル生成部５は、撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された首領域の深度画像から取得できる首領域の深度と、取得された特定部位の深度との深度差に基づいて生成する。 The discriminative model generation unit 5 acquires the discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device, the depth of the neck region that can be acquired from the depth image of the selected neck region, and the depth of the neck region. It is generated based on the depth difference from the depth of a specific part.

この構成の学習装置１０によると、生成した三次元モデルを用いて、首領域と特定部位との深度差を推定する識別モデルを生成している。撮像装置で人を撮像して、その撮像画像から人体の特定部位の深度を推定する場合、この識別モデルを用いることで、撮像画像内の人の首の深度を基準とした深度差に基づいて、特定部位の深度を推定できる。 According to the learning device 10 having this configuration, a discriminative model for estimating the depth difference between the neck region and a specific part is generated using the generated three-dimensional model. When a person is imaged with an imaging device and the depth of a specific part of the human body is estimated from the captured image, this identification model is used based on the depth difference based on the depth of the person's neck in the captured image. , The depth of a specific part can be estimated.

続いて、学習装置１０の構成についてさらに具体的に説明する。 Subsequently, the configuration of the learning device 10 will be described more specifically.

三次元モデル生成部１は、人体の三次元モデルを生成する。三次元モデルは、モーションキャプチャを利用して生成してもよいし、人体を撮像して得られた画像から生成してもよく、その生成方法は、特に限定されない。 The three-dimensional model generation unit 1 generates a three-dimensional model of the human body. The three-dimensional model may be generated by using motion capture, or may be generated from an image obtained by imaging a human body, and the generation method thereof is not particularly limited.

深度画像生成部２は、三次元モデル生成部１により生成された三次元モデルの深度画像を生成する。深度画像は、カメラ位置から人体の各部位までの距離を示す距離情報を有した画像である。 The depth image generation unit 2 generates a depth image of the three-dimensional model generated by the three-dimensional model generation unit 1. The depth image is an image having distance information indicating the distance from the camera position to each part of the human body.

図２は、三次元モデルの深度画像から、識別モデルを生成する方法を説明するための図である。図２は、特定部位が左肩である場合を示す。 FIG. 2 is a diagram for explaining a method of generating an discriminative model from a depth image of a three-dimensional model. FIG. 2 shows a case where the specific part is the left shoulder.

首領域選択部３は、人体の三次元モデルを一方向から視たときの首の位置を含む首領域の深度画像を選択する。三次元モデルは生成されたものであるため、人体の各部位の位置は特定可能である。首領域選択部３は、三次元モデルの深度画像から、人体の首の位置を特定し、その位置を含む首領域３１を生成し、その首領域３１の深度画像を選択する。 The neck region selection unit 3 selects a depth image of the neck region including the position of the neck when a three-dimensional model of the human body is viewed from one direction. Since the three-dimensional model is generated, the position of each part of the human body can be specified. The neck region selection unit 3 identifies the position of the neck of the human body from the depth image of the three-dimensional model, generates a neck region 31 including the position, and selects the depth image of the neck region 31.

深度取得部４は、三次元モデルの深度画像から、人体の特定部位である肩３２の位置を特定し、その肩３２を含む特定部位領域３２の深度を取得する。 The depth acquisition unit 4 identifies the position of the shoulder 32, which is a specific part of the human body, from the depth image of the three-dimensional model, and acquires the depth of the specific part region 32 including the shoulder 32.

識別モデル生成部５は、首領域３１の深度画像と、特定部位領域３２の深度とから、深度画像の首領域に含まれる各首領域の深度画像から得られる深度と、特定部位の深度との深度差を推定する識別モデルを生成する。首領域３１に含まれる首領域の各ピクセルを中心とする深度分布を特徴量として、各ピクセルの深度と特定部位領域３２の深度との深度差を推定する識別モデルを生成する。特徴量は、首領域３１に含まれるピクセル数分だけ作成される。この識別モデルで推定した首領域３１の各ピクセルでの深度差と、各ピクセルの深度とを足すと、推定した特定部位３２の深度が得られる。この深度データに対して最頻値探索又は平均値処理を行うことで特定部位３２の最終的な推定結果が計算される。 The discriminative model generation unit 5 combines the depth image of the neck region 31 and the depth of the specific region region 32 with the depth obtained from the depth image of each neck region included in the neck region of the depth image and the depth of the specific region. Generate a discriminative model that estimates the depth difference. Using the depth distribution centered on each pixel of the neck region included in the neck region 31 as a feature amount, a discriminative model for estimating the depth difference between the depth of each pixel and the depth of the specific site region 32 is generated. Feature quantities are created for the number of pixels included in the neck region 31. By adding the depth difference at each pixel of the neck region 31 estimated by this discriminative model and the depth of each pixel, the estimated depth of the specific portion 32 can be obtained. The final estimation result of the specific portion 32 is calculated by performing the mode search or the average value processing on the depth data.

なお、特定部位が首である場合、例えば、識別モデル生成部５は、首領域３１の中心点の深度と、特定部位領域３２の最頻値深度との差（深度差）を、識別モデルとして生成する。 When the specific part is the neck, for example, the identification model generation unit 5 uses the difference (depth difference) between the depth of the center point of the neck area 31 and the mode depth of the specific part area 32 as the identification model. Generate.

識別モデル生成部５は、生成した識別モデルを、学習装置１０が備える記憶装置、又は、学習装置１０の外部の記憶装置に記憶する。 The discriminative model generation unit 5 stores the generated discriminative model in a storage device included in the learning device 10 or a storage device external to the learning device 10.

このように、学習装置１０は、三次元モデルによる識別モデルの学習を行う。識別モデルは、撮像画像から人体の特定部位の深度を推定する推定装置で用いられる。以下に、その推定装置について説明する。 In this way, the learning device 10 learns the discriminative model by the three-dimensional model. The discriminative model is used in an estimation device that estimates the depth of a specific part of the human body from a captured image. The estimation device will be described below.

図３は、推定装置２０の構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the estimation device 20.

推定装置２０は、撮像装置３０から撮像画像を取得する。撮像装置３０は、空間内の人を撮像するように、配置されている。撮像装置３０は、人体の正面から撮像するように配置されてもよいし、側方又は上方から撮像するように配置されていてもよい。推定装置２０は、取得した撮像画像から人を検出し、検出された人体の特定部位である、頭、肩又は首の二次元座標上の位置を推定し、また、その特定部位の深度を推定する。頭、肩又は首の位置及び深度を推定することで、頭、肩又は首周囲の三次元座標上の位置を推定できる。そして、頭、肩又は首周囲の関節位置を推定できる。関節位置を推定することで、身振り手振りなどのジェスチャー動作を認識したり、人の姿勢を推定したりできるようになる。 The estimation device 20 acquires a captured image from the image pickup device 30. The image pickup device 30 is arranged so as to image a person in the space. The image pickup apparatus 30 may be arranged so as to take an image from the front of the human body, or may be arranged so as to take an image from the side or above. The estimation device 20 detects a person from the acquired captured image, estimates the position of the detected specific part of the human body on the two-dimensional coordinates of the head, shoulders, or neck, and estimates the depth of the specific part. do. By estimating the position and depth of the head, shoulders or neck, the position on the three-dimensional coordinates around the head, shoulders or neck can be estimated. Then, the joint positions around the head, shoulders or neck can be estimated. By estimating the joint position, it becomes possible to recognize gesture movements such as gestures and to estimate the posture of a person.

なお、推定装置２０と撮像装置３０とは、それぞれ独立した装置で、データ通信可能に接続された構成であってもよいし、推定装置２０が撮像装置３０を備えた構成であってもよい。また、識別モデルを記憶する記憶装置が、学習装置１０の外部に設けられている場合、推定装置２０は、その記憶装置に対してのみ、データ通信可能に接続された構成であってもよい。 The estimation device 20 and the image pickup device 30 may be independent devices and may be connected to each other so as to be capable of data communication, or the estimation device 20 may be provided with the image pickup device 30. Further, when the storage device for storing the identification model is provided outside the learning device 10, the estimation device 20 may be configured to be connected to the storage device so that data communication is possible.

推定装置２０は、深度画像取得部２１と、人検出部２２と、位置推定部２３と、深度推定部２４と、関節推定部２５とを備えている。 The estimation device 20 includes a depth image acquisition unit 21, a person detection unit 22, a position estimation unit 23, a depth estimation unit 24, and a joint estimation unit 25.

深度画像取得部２１は、撮像装置３０から撮像画像を取得する。 The depth image acquisition unit 21 acquires an captured image from the imaging device 30.

人検出部２２は、取得された撮像画像に対して、人体の首を含む領域を推定し、推定した領域に基づいて、撮像画像から人を検出する。まず、人検出部２２は、取得された撮像画像に含まれる距離情報に応じて、撮像画像の縮小画像を生成する。例えば、撮像画像内の人の大きさは、撮像装置３０の近距離に位置する場合と遠距離に位置する場合とで、異なる。このため、人検出部２２は、近距離の場合には縮小率を大きくした縮小画像を撮像画像から生成し、遠距離の場合には縮小率を小さくした縮小画像を撮像画像から生成する。 The human detection unit 22 estimates a region including the neck of the human body with respect to the acquired captured image, and detects a person from the captured image based on the estimated region. First, the person detection unit 22 generates a reduced image of the captured image according to the distance information included in the acquired captured image. For example, the size of a person in a captured image differs depending on whether the image pickup device 30 is located at a short distance or a long distance. Therefore, the person detection unit 22 generates a reduced image with a large reduction ratio from the captured image in the case of a short distance, and generates a reduced image with a small reduction ratio in the case of a long distance from the captured image.

人検出部２２は、生成した縮小画像内で、人体の首の位置を推定し、その推定した位置を含む領域を生成する。ここでは、人検出部２２は、例えば、ＮＭＳ（Non Maximum Suppression）を用いた処理により、領域を生成する。人検出部２２は、撮像画像に含まれる距離情報に応じた縮小画像を生成し、その縮小画像に対して領域を生成するため、撮像画像における人の大きさに関わらず、略同じ領域を生成することができる。 The human detection unit 22 estimates the position of the neck of the human body in the generated reduced image, and generates a region including the estimated position. Here, the person detection unit 22 generates an area by, for example, a process using NMS (Non Maximum Suppression). Since the person detection unit 22 generates a reduced image according to the distance information included in the captured image and generates an area for the reduced image, substantially the same area is generated regardless of the size of the person in the captured image. can do.

人検出部２２は、人検出用の識別モデルを用いて、撮像画像に人が写っているかを判定する。人検出用の識別モデルは、例えば、学習装置１０で生成される。学習装置１０で生成された三次元モデルでは、各部位の位置を特定できるので、学習装置１０は、三次元モデルにおける「首」の位置が特定され、その位置を中心とする領域を生成し、その領域の深度画像を選択する。選択した領域の深度画像の深度情報の深度分布を特徴量として人検出する識別モデルを生成する。 The human detection unit 22 determines whether or not a person is captured in the captured image by using the identification model for detecting a person. The discriminative model for human detection is generated by, for example, the learning device 10. In the three-dimensional model generated by the learning device 10, the position of each part can be specified. Therefore, the learning device 10 specifies the position of the "neck" in the three-dimensional model and generates a region centered on that position. Select a depth image for that area. A discriminative model that detects a person using the depth distribution of the depth information of the depth image of the selected area as a feature is generated.

人検出部２２は、撮像画像（具体的には縮小画像）に推定した領域の深度分布と、人検出用の識別モデルと対比することで、推定した領域が首を含む領域であるかを判定し、首を含む領域であると判定すると、撮像画像には人が写っていると特定する。これにより、人検出部２２は、撮像画像内の人を検出する。 The human detection unit 22 determines whether the estimated region is a region including the neck by comparing the depth distribution of the region estimated in the captured image (specifically, the reduced image) with the identification model for human detection. However, if it is determined that the area includes the neck, it is specified that a person is shown in the captured image. As a result, the person detection unit 22 detects a person in the captured image.

位置推定部２３は、撮像画像に人が検出されると、人検出部２２が推定した領域内における深度分布に基づいて、首の周囲にある人体の特定部位である頭、肩又は首の位置を、位置推定用の識別モデルを用いて推定する。位置推定用の識別モデルは、学習装置１０で生成される。 When a person is detected in the captured image, the position estimation unit 23 determines the position of the head, shoulders, or neck, which is a specific part of the human body around the neck, based on the depth distribution in the region estimated by the person detection unit 22. Is estimated using the discriminative model for position estimation. The discriminative model for position estimation is generated by the learning device 10.

学習装置１０において、三次元モデルにおける特定部位の位置を特定して、その位置を中心とする領域を生成し、その領域の深度画像を選択する。このとき、三次元モデルを動かして、特定部位が他の部位で隠れた姿勢（オクルージョンが発生した姿勢）、又は、特定部位が隠れていない姿勢（オクルージョンが発生していない姿勢）とする。そして、各姿勢に対して、特定部位の位置を特定して、その位置を中心とする領域を生成し、その領域の深度画像を選択する。そして、領域の深度画像の深度分布を、位置推定用の識別モデルとして生成する。 In the learning device 10, the position of a specific part in the three-dimensional model is specified, a region centered on the position is generated, and a depth image of the region is selected. At this time, the three-dimensional model is moved so that the specific part is hidden in another part (a posture in which occlusion occurs) or the specific part is not hidden (a posture in which occlusion does not occur). Then, for each posture, the position of a specific portion is specified, a region centered on that position is generated, and a depth image of that region is selected. Then, the depth distribution of the depth image of the region is generated as an identification model for position estimation.

位置推定部２３は、位置推定用の識別モデルを用いることで、撮像画像にオクルージョンが発生しているか否かに関わらず、特定部位の位置を推定することができる。例えば、撮像画像内の人の姿勢が、左肩が左手で隠れた姿勢である場合であっても、位置推定部２３は、人検出部２２が推定した領域の深度分布が、同じ姿勢の三次元モデルから生成された位置推定用の識別モデルの深度分布と類似していることで、撮像画像における人の左肩の位置を推定することができる。 By using the identification model for position estimation, the position estimation unit 23 can estimate the position of a specific part regardless of whether or not occlusion has occurred in the captured image. For example, even if the posture of the person in the captured image is a posture in which the left shoulder is hidden by the left hand, the position estimation unit 23 has a three-dimensional depth distribution of the region estimated by the person detection unit 22. By resembling the depth distribution of the identification model for position estimation generated from the model, the position of the person's left shoulder in the captured image can be estimated.

位置推定部２３は、撮像画像における推定した位置に、円形状のラベルを付与する。例えば、位置推定部２３は、推定した左肩の位置に対して、左肩を示すラベルを付与する。これにより、位置推定部２３は、二次元座標上における、特定部位の位置を推定する。 The position estimation unit 23 assigns a circular label to the estimated position in the captured image. For example, the position estimation unit 23 assigns a label indicating the left shoulder to the estimated position of the left shoulder. As a result, the position estimation unit 23 estimates the position of the specific portion on the two-dimensional coordinates.

深度推定部２４は、人検出部２２により生成された領域内における深度分布に基づいて、人体の特定部位の深度を推定する。深度推定部２４は、人検出部２２が推定した首領域に含まれる各ピクセルの深度と、学習装置１０の識別モデル生成部５で推定できる特定部位との深度差から、特定部位の深度を推定する。 The depth estimation unit 24 estimates the depth of a specific part of the human body based on the depth distribution in the region generated by the human detection unit 22. The depth estimation unit 24 estimates the depth of a specific part from the depth difference between the depth of each pixel included in the neck region estimated by the human detection unit 22 and the specific part estimated by the discriminative model generation unit 5 of the learning device 10. do.

関節推定部２５は、位置推定部２３が推定した特定部位の二次元座標上の位置と、深度推定部２４が推定した特定部位の深度とから、特定部位の三次元座標上の位置を推定し、その推定結果に基づいて、関節位置の三次元座標上の位置を推定する。 The joint estimation unit 25 estimates the position of the specific part on the three-dimensional coordinates from the position on the two-dimensional coordinates of the specific part estimated by the position estimation unit 23 and the depth of the specific part estimated by the depth estimation unit 24. , The position of the joint position on the three-dimensional coordinates is estimated based on the estimation result.

［装置動作］
次に、本実施形態における学習装置１０の動作について図４を用いて説明する。図４は、学習装置１０の動作を示すフロー図である。以下の説明においては、適宜図１〜図３を参照する。また、本実施形態では、学習装置１０を動作させることによって、学習方法が実施される。よって、本実施形態における学習方法の説明は、以下の学習装置１０の動作説明に代える。 [Device operation]
Next, the operation of the learning device 10 in the present embodiment will be described with reference to FIG. FIG. 4 is a flow chart showing the operation of the learning device 10. In the following description, FIGS. 1 to 3 will be referred to as appropriate. Further, in the present embodiment, the learning method is implemented by operating the learning device 10. Therefore, the description of the learning method in the present embodiment is replaced with the following description of the operation of the learning device 10.

三次元モデル生成部１は、三次元モデルを生成する（Ｓ１）。深度画像生成部２は、生成された三次元モデルの深度画像を生成する（Ｓ２）。首領域選択部３は、三次元モデルの深度画像から、人体の首の位置を特定し、その位置を含む首領域３１を生成し、その首領域３１の深度画像を選択する（Ｓ３）。続いて、深度取得部４は、三次元モデルの深度画像から、人体の特定部位である肩３２の位置を特定し、その肩３２の深度を取得する（Ｓ４）。 The three-dimensional model generation unit 1 generates a three-dimensional model (S1). The depth image generation unit 2 generates a depth image of the generated three-dimensional model (S2). The neck region selection unit 3 identifies the position of the neck of the human body from the depth image of the three-dimensional model, generates a neck region 31 including the position, and selects the depth image of the neck region 31 (S3). Subsequently, the depth acquisition unit 4 identifies the position of the shoulder 32, which is a specific part of the human body, from the depth image of the three-dimensional model, and acquires the depth of the shoulder 32 (S4).

識別モデル生成部５は、首領域３１の深度画像と、特定部位領域３２の深度とから、首領域に含まれる各ピクセルの深度と特定部位領域３２の深度との深度差を推定する識別モデルを生成する（Ｓ５）。識別モデル生成部５は、生成した識別モデルを、記憶装置へ記憶する（Ｓ６）。 The discriminative model generation unit 5 estimates the depth difference between the depth of each pixel included in the neck region and the depth of the specific region 32 from the depth image of the neck region 31 and the depth of the specific region 32. Generate (S5). The discriminative model generation unit 5 stores the generated discriminative model in the storage device (S6).

次に、推定装置２０の動作について説明する。図５は、推定装置２０の動作を示すフロー図である。 Next, the operation of the estimation device 20 will be described. FIG. 5 is a flow chart showing the operation of the estimation device 20.

深度画像取得部２１は、撮像装置３０から撮像画像を取得する（Ｓ１１）。人検出部２２は、取得された撮像画像の縮小画像を生成する（Ｓ１２）。このとき、人検出部２２は、撮像画像に含まれる距離情報に応じて、撮像画像の縮小画像を生成する。人検出部２２は、近距離の場合には縮小率を大きくした縮小画像を撮像画像から生成し、遠距離の場合には縮小率を小さくした縮小画像を撮像画像から生成する。 The depth image acquisition unit 21 acquires an captured image from the imaging device 30 (S11). The human detection unit 22 generates a reduced image of the acquired captured image (S12). At this time, the person detection unit 22 generates a reduced image of the captured image according to the distance information included in the captured image. The person detection unit 22 generates a reduced image with a large reduction ratio in the case of a short distance from the captured image, and generates a reduced image with a small reduction ratio in the case of a long distance from the captured image.

人検出部２２は、生成した縮小画像から人を検出する（Ｓ１３）。人検出部２２は、生成した縮小画像に対して、人体の首を含む領域を推定し、その領域の深度画像の深度分布と、人検出用の識別モデルとを対比する。そして、人検出部２２は、対比することで、推定した領域が首を含む領域であるかを判定し、首を含む領域であると判定すると、撮像画像には人が写っていると特定する。 The person detection unit 22 detects a person from the generated reduced image (S13). The human detection unit 22 estimates a region including the neck of the human body with respect to the generated reduced image, and compares the depth distribution of the depth image in that region with the discriminative model for human detection. Then, the person detection unit 22 determines whether the estimated region is a region including the neck by comparison, and if it is determined that the region includes the neck, the person detection unit 22 identifies that the captured image shows a person. ..

位置推定部２３は、撮像画像に人が検出されると、Ｓ１３で推定された領域の深度分布に基づいて、首の周囲にある人体の特定部位である頭、肩又は首の位置を、位置推定用の識別モデルを用いて推定する（Ｓ１４）。そして、位置推定部２３は、推定した位置に、円形状のラベルを付与する（Ｓ１５）。 When a person is detected in the captured image, the position estimation unit 23 positions the head, shoulders, or neck, which is a specific part of the human body around the neck, based on the depth distribution of the region estimated in S13. Estimate using the discriminative model for estimation (S14). Then, the position estimation unit 23 assigns a circular label to the estimated position (S15).

深度推定部２４は、人検出部２２により生成された領域内における深度分布に基づいて、人体の特定部位の深度を推定する（Ｓ１６）。深度推定部２４は、人検出部２２が推定した領域の深度を基準として、図４のＳ５で生成した識別モデルに含まれる深度差から、特定部位の深度を推定する。関節推定部２５は、位置推定部２３が推定した特定部位の二次元座標上の位置と、深度推定部２４が推定した特定部位の深度とから、特定部位の三次元座標上の位置を推定し、その推定結果に基づいて、関節位置の三次元座標上の位置を推定する（Ｓ１７）。 The depth estimation unit 24 estimates the depth of a specific part of the human body based on the depth distribution in the region generated by the human detection unit 22 (S16). The depth estimation unit 24 estimates the depth of a specific portion from the depth difference included in the discrimination model generated in S5 of FIG. 4 with reference to the depth of the region estimated by the human detection unit 22. The joint estimation unit 25 estimates the position of the specific part on the three-dimensional coordinates from the position on the two-dimensional coordinates of the specific part estimated by the position estimation unit 23 and the depth of the specific part estimated by the depth estimation unit 24. , The position of the joint position on the three-dimensional coordinates is estimated based on the estimation result (S17).

以上のように、学習装置１０が、首を中心として識別モデルを生成することで、推定装置２０は、特定部位にオクルージョンが発生しているか否かにかかわらず、撮像画像における人の特定部位の深度を推定することができる。 As described above, the learning device 10 generates the discriminative model centered on the neck, and the estimation device 20 determines the specific part of the person in the captured image regardless of whether or not occlusion occurs in the specific part. The depth can be estimated.

［プログラム］
本実施形態におけるプログラムは、コンピュータに、図４に示す各ステップを実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施形態における学習装置と学習方法とを実現することができる。この場合、コンピュータのプロセッサは、三次元モデル生成部１、深度画像生成部２、首領域選択部３、特定部位領域選択部４及び識別モデル生成部５として機能し、処理を行なう。 [program]
The program in this embodiment may be any program that causes a computer to execute each step shown in FIG. By installing this program on a computer and executing it, the learning device and the learning method in the present embodiment can be realized. In this case, the computer processor functions as a three-dimensional model generation unit 1, a depth image generation unit 2, a neck area selection unit 3, a specific site area selection unit 4, and an identification model generation unit 5 to perform processing.

また、本実施形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、三次元モデル生成部１、深度画像生成部２、首領域選択部３、特定部位領域選択部４及び識別モデル生成部５のいずれかとして機能しても良い。 Moreover, the program in this embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer functions as one of a three-dimensional model generation unit 1, a depth image generation unit 2, a neck area selection unit 3, a specific site area selection unit 4, and an identification model generation unit 5. Is also good.

また、コンピュータに、図５に示す各ステップを実行させるプログラムをインストールし、実行することによって、本実施形態における推定装置を実現することができる。この場合、コンピュータのプロセッサは、深度画像取得部２１、人検出部２２、位置推定部２３、深度推定部２４及び関節推定部２５として機能し、処理を行なう。このプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、深度画像取得部２１、人検出部２２、位置推定部２３、深度推定部２４及び関節推定部２５のいずれかとして機能しても良い。 Further, the estimation device according to the present embodiment can be realized by installing and executing a program for executing each step shown in FIG. 5 on a computer. In this case, the computer processor functions as a depth image acquisition unit 21, a person detection unit 22, a position estimation unit 23, a depth estimation unit 24, and a joint estimation unit 25, and performs processing. This program may be executed by a computer system built by a plurality of computers. In this case, for example, each computer may function as one of the depth image acquisition unit 21, the person detection unit 22, the position estimation unit 23, the depth estimation unit 24, and the joint estimation unit 25, respectively.

コンピュータとしては、汎用のＰＣの他に、スマートフォン、タブレット型端末装置が挙げられる。 Examples of computers include smartphones and tablet terminal devices in addition to general-purpose PCs.

［物理構成］
ここで、本実施形態におけるプログラムを実行することによって、学習装置１０を実現するコンピュータについて図６を用いて説明する。図６は、学習装置１０を実現するコンピュータの一例を示すブロック図である。 [Physical configuration]
Here, a computer that realizes the learning device 10 by executing the program in the present embodiment will be described with reference to FIG. FIG. 6 is a block diagram showing an example of a computer that realizes the learning device 10.

図６に示すように、コンピュータ１１０は、ＣＰＵ（Central Processing Unit）１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。 As shown in FIG. 6, the computer 110 includes a CPU (Central Processing Unit) 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. And. Each of these parts is connected to each other via a bus 121 so as to be capable of data communication.

また、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-Programmable Gate Array）を備えていても良い。この態様では、ＧＰＵ又はＦＰＧＡが、実施の形態におけるプログラムを実行することができる。 Further, the computer 110 may include a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array) in addition to the CPU 111 or in place of the CPU 111. In this aspect, the GPU or FPGA can execute the program in the embodiment.

ＣＰＵ１１１は、記憶装置１１３に格納された、コード群で構成された実施形態におけるプログラムをメインメモリ１１２に展開し、各コードを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。 The CPU 111 executes various operations by expanding the program in the embodiment composed of the code group stored in the storage device 113 into the main memory 112 and executing each code in a predetermined order. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory).

また、実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 Further, the program in the embodiment is provided in a state of being stored in a computer-readable recording medium 120. The program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, specific examples of the storage device 113 include a semiconductor storage device such as a flash memory in addition to a hard disk drive. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads a program from the recording medium 120, and writes a processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic recording medium such as a flexible disk, or a CD-. Examples include optical recording media such as ROM (Compact Disk Read Only Memory).

なお、本実施形態における学習装置１０は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、学習装置１０は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 The learning device 10 in the present embodiment can also be realized by using hardware corresponding to each part instead of the computer in which the program is installed. Further, the learning device 10 may be partially realized by a program and the rest may be realized by hardware.

なお、推定装置２０を実現するコンピュータについて、図６と同様であるため、その説明は省略する。 Since the computer that realizes the estimation device 20 is the same as that in FIG. 6, the description thereof will be omitted.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１０）によって表現することができるが、以下の記載に限定されるものではない。 A part or all of the above-described embodiments can be expressed by the following descriptions (Appendix 1) to (Appendix 10), but the present invention is not limited to the following description.

（付記１）
人体の三次元モデルを生成する三次元モデル生成部と、
前記三次元モデルの深度画像を生成する深度画像生成部と、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択する首領域選択部と、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得する深度取得部と、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、識別モデル生成部と、
を備える、学習装置。 (Appendix 1)
A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A neck region selection unit that selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.
A depth acquisition unit that acquires the depth of a specific part of the human body from the depth image of the three-dimensional model.
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. The discriminative model generator, which is generated based on the depth difference from the depth,
A learning device equipped with.

（付記２）
付記１に記載の学習装置であって、
前記識別モデル生成部は、前記首領域の深度画像の深度分布を特徴量として学習して、前記首領域の深度画像から得られる深度と、選択された前記特定部位の深度との深度差を推定する、前記識別モデルを生成する、
学習装置。 (Appendix 2)
The learning device according to Appendix 1,
The discriminative model generation unit learns the depth distribution of the depth image of the neck region as a feature amount, and estimates the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion. To generate the discriminative model,
Learning device.

（付記３）
付記１又は付記２に記載の学習装置であって、
前記特定部位は、前記人体の首、肩又は頭である、
学習装置。 (Appendix 3)
The learning device according to Appendix 1 or Appendix 2.
The specific site is the neck, shoulders or head of the human body.
Learning device.

（付記４）
撮像装置から深度画像を取得する深度画像取得部と、
取得された前記深度画像に、人体の首を中心とした領域を推定し、推定した前記領域に基づいて、前記深度画像から人体を検出する人体検出部と、
前記人体が検出された場合、前記領域内における深度分布に基づいて、前記人体の特定部位の深度を推定する深度推定部と、
を備え、
前記深度推定部は、
生成された人体の三次元モデルの深度画像から、前記人体の首を含む領域の深度画像が選択され、選択された前記深度画像から生成された識別モデルを用いて、前記人体の特定部位の深度を推定する、
推定装置。 (Appendix 4)
A depth image acquisition unit that acquires a depth image from an image pickup device,
A human body detection unit that estimates a region centered on the neck of the human body from the acquired depth image and detects the human body from the depth image based on the estimated region.
When the human body is detected, a depth estimation unit that estimates the depth of a specific part of the human body based on the depth distribution in the region, and a depth estimation unit.
With
The depth estimation unit
From the generated depth image of the three-dimensional model of the human body, a depth image of the region including the neck of the human body is selected, and the depth of a specific part of the human body is used by using the discriminative model generated from the selected depth image. To estimate,
Estimator.

（付記５）
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択するステップと、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得するステップと、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、ステップと、
を備える、学習方法。 (Appendix 5)
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
A learning method that includes.

（付記６）
付記５に記載の学習方法であって、
前記識別モデルを生成するステップでは、前記首領域の深度画像の深度分布を特徴量として学習して、前記首領域の深度画像から得られる深度と、選択された前記特定部位の深度との深度差を推定する、前記識別モデルを生成する、
学習方法。 (Appendix 6)
The learning method described in Appendix 5
In the step of generating the discriminative model, the depth distribution of the depth image of the neck region is learned as a feature amount, and the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion is obtained. To generate the discriminative model,
Learning method.

（付記７）
付記５又は付記６に記載の学習方法であって、
前記特定部位は、前記人体の首、肩又は頭である、
学習方法。 (Appendix 7)
The learning method according to Appendix 5 or Appendix 6.
The specific site is the neck, shoulders or head of the human body.
Learning method.

（付記８）
コンピュータに、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択するステップと、
前記三次元モデルの深度画像から、前記人体の特定部位の深度を取得するステップと、
撮像装置で撮像された深度画像から人体の特定部位の深度を識別するための識別モデルを、選択された前記首領域の深度画像から取得できる前記首領域の深度と、取得された前記特定部位の深度との深度差に基づいて生成する、ステップと、
を実行させる命令を含む、プログラム。 (Appendix 8)
On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
A program that contains instructions to execute.

（付記９）
付記８に記載のプログラムであって、
前記識別モデルを生成するステップでは、前記首領域の深度画像の深度分布を特徴量として学習して、前記首領域の深度画像から得られる深度と、選択された前記特定部位の深度との深度差を推定する、前記識別モデルを生成する、
プログラム。 (Appendix 9)
The program described in Appendix 8
In the step of generating the discriminative model, the depth distribution of the depth image of the neck region is learned as a feature amount, and the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion is obtained. To generate the discriminative model,
program.

（付記１０）
付記８又は付記９に記載のプログラムであって、
前記特定部位は、前記人体の首、肩又は頭である、
プログラム。 (Appendix 10)
The program described in Appendix 8 or Appendix 9.
The specific site is the neck, shoulders or head of the human body.
program.

１三次元モデル生成部
２深度画像生成部
３首領域選択部
４深度取得部
５識別モデル生成部
１０学習装置
２０推定装置
２１深度画像取得部
２２人検出部
２３位置推定部
２４深度推定部
２５関節推定部
３０撮像装置
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス
1 3D model generation unit 2 Depth image generation unit 3 Neck area selection unit 4 Depth acquisition unit 5 Discriminative model generation unit 10 Learning device 20 Estimator 21 Depth image acquisition unit 22 Human detection unit 23 Position estimation unit 24 Depth estimation unit 25 Joints Estimator 30 Imaging device 110 Computer 111 CPU
112 Main memory 113 Storage device 114 Input interface 115 Display controller 116 Data reader / writer 117 Communication interface 118 Input device 119 Display device 120 Recording medium 121 Bus

Claims

A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A neck region selection unit that selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.
A depth acquisition unit that acquires the depth of a specific part of the human body from the depth image of the three-dimensional model.
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. The discriminative model generator, which is generated based on the depth difference from the depth,
A learning device equipped with.

The learning device according to claim 1.
The discriminative model generation unit learns the depth distribution of the depth image of the neck region as a feature amount, and estimates the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion. To generate the discriminative model,
Learning device.

The learning device according to claim 1 or 2.
The specific site is the neck, shoulders or head of the human body.
Learning device.

A depth image acquisition unit that acquires a depth image from an image pickup device,
A human body detection unit that estimates a region centered on the neck of the human body from the acquired depth image and detects the human body from the depth image based on the estimated region.
When the human body is detected, a depth estimation unit that estimates the depth of a specific part of the human body based on the depth distribution in the region, and a depth estimation unit.
With
The depth estimation unit
From the generated depth image of the three-dimensional model of the human body, a depth image of the region including the neck of the human body is selected, and the depth of a specific part of the human body is used by using the discriminative model generated from the selected depth image. To estimate,
Estimator.

Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
A learning method that includes.

The learning method according to claim 5.
In the step of generating the discriminative model, the depth distribution of the depth image of the neck region is learned as a feature amount, and the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion is obtained. To generate the discriminative model,
Learning method.

The learning method according to claim 5 or 6.
The specific site is the neck, shoulders or head of the human body.
Learning method.

On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model, and
The step of acquiring the depth of a specific part of the human body from the depth image of the three-dimensional model, and
A discriminative model for identifying the depth of a specific part of the human body from the depth image captured by the imaging device is obtained from the selected depth image of the neck area, the depth of the neck area, and the acquired depth of the specific part. Steps and steps generated based on the depth difference from the depth
A program that contains instructions to execute.

The program according to claim 8.
In the step of generating the discriminative model, the depth distribution of the depth image of the neck region is learned as a feature amount, and the depth difference between the depth obtained from the depth image of the neck region and the depth of the selected specific portion is obtained. To generate the discriminative model,
program.

The program according to claim 8 or 9.
The specific site is the neck, shoulders or head of the human body.
program.