JP2021144358A

JP2021144358A - Learning apparatus, estimation apparatus, learning method, and program

Info

Publication number: JP2021144358A
Application number: JP2020041376A
Authority: JP
Inventors: 涼介芝崎; Ryosuke Shibasaki; 竜太郎谷村; Ryutaro Tanimura
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2021-09-24

Abstract

To provide a technology of generating an identification model for estimating a position of a hidden section of a human body.SOLUTION: A learning apparatus 10 includes: a three-dimensional model generation unit 1 which generates a three-dimensional model of a human body; a depth image generation unit 2 which generates a depth image of the three-dimensional model; a first selection unit 3 which selects a depth image of a first specific section area including a specific section from the depth image of the three-dimensional model of the human body in a posture with the specific section hidden by another section; a second selection unit 4 which selects a depth image of a second specific section area including the specific section from a depth image of the three-dimensional model of the human body in a posture with the specific section not hidden by the other section; and a first generation unit 5 which generates a first identification model for identifying the specific section of the human body from a depth image captured by an imaging apparatus on the basis of the depth image of the first specific section area and the depth image of the second specific section area.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、推定装置、学習方法、及びプログラムに関する。 The present invention relates to a learning device, an estimation device, a learning method, and a program.

これまでに、深度画像から人体部位の頭、肩、首、肘など、人体骨格を推定する認識処理の研究が行われている。この認識処理では、人体の三次元モデルを生成した上で、その三次元モデルでの各部位（頭・肩など）が、どのような深度分布になっているかを機械学習させて、人体の各骨格位置を推定している。この推定において、三次元モデルの視えている部分のみを機械学習した場合、例えば、腕を前にすることによって肩が視えなくなると、肩の関節位置を推定できなくなることがある。 So far, research has been conducted on recognition processing that estimates the human skeleton such as the head, shoulders, neck, and elbows of human body parts from depth images. In this recognition process, after generating a three-dimensional model of the human body, machine learning is performed to learn what kind of depth distribution each part (head, shoulder, etc.) in the three-dimensional model has, and each part of the human body. The skeletal position is estimated. In this estimation, if only the visible part of the three-dimensional model is machine-learned, for example, if the shoulder cannot be seen by putting the arm in front, the shoulder joint position may not be estimated.

特許文献１には、深度画像を用いて、分析対象が視える可視部分だけでなく、他部位で隠れて視えない隠し部分の部位を認識することができる技術が開示されている。特許文献１では、分類ツリーを用いて、分析対象が可視部分であるか、隠し部分であるかを認識する。その結果から、その分析対象部分の深度値を復元して、各部位を推定している。この分類ツリーは学習させて、認識性能を向上させている。 Patent Document 1 discloses a technique that can recognize not only a visible portion that can be seen by an analysis target but also a hidden portion that cannot be seen because it is hidden by another portion by using a depth image. In Patent Document 1, the classification tree is used to recognize whether the analysis target is a visible part or a hidden part. From the result, the depth value of the analysis target part is restored and each part is estimated. This classification tree is trained to improve recognition performance.

また、非特許文献１には、深度画像から人体の関節位置の座標を推定する技術が開示されている。非特許文献１では、可視部分の人体部位を分類する分類ツリーで各最終ノードに分類される類似部位の位置から関節位置を回帰分析により推定し、隠れた部位でも推定できる。 Further, Non-Patent Document 1 discloses a technique for estimating the coordinates of the joint position of the human body from a depth image. In Non-Patent Document 1, the joint position is estimated by regression analysis from the position of the similar part classified into each final node in the classification tree for classifying the human body part of the visible part, and the hidden part can also be estimated.

特開２０１７−２０８１２６号公報JP-A-2017-208126

Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, Andrew Blake, “Efficient Human Pose Estimation from Single Depth Images” Article in IEEE Transactions on Software Engineering December 2013Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, Andrew Blake, “Efficient Human Pose Optimization from Single Depth Images” Article in IEEE Transactions on Software Engineering December 2013

しかしながら、特許文献１では、可視部分及び隠し部分の両方を入力として、分類ツリーから分析対象が可視部分であるか、隠し部分であるかを認識しているため、認識処理に大量のデータが必要となり、処理の複雑化、又は、処理時間の長期化するおそれがある。また、非特許文献１では、人体の全身部位をマルチクラス分類しているため、非特許文献１でも、処理の複雑化、又は、処理時間の長期化するおそれがある。 However, in Patent Document 1, since both the visible part and the hidden part are input and whether the analysis target is the visible part or the hidden part is recognized from the classification tree, a large amount of data is required for the recognition process. Therefore, there is a risk that the processing will be complicated or the processing time will be prolonged. Further, in Non-Patent Document 1, since the whole body part of the human body is classified into multi-class, even in Non-Patent Document 1, there is a possibility that the processing becomes complicated or the processing time becomes long.

本発明の目的の一例は、人体の隠れた部位の位置を推定できる識別モデルを生成する学習装置、学習方法、及びプログラム、並びに、その識別モデルを用いた推定装置を提供することにある。 An example of an object of the present invention is to provide a learning device, a learning method, and a program for generating a discriminative model capable of estimating the position of a hidden part of the human body, and an estimation device using the discriminative model.

上記目的を達成するため、本発明の一側面における学習装置は、
人体の三次元モデルを生成する三次元モデル生成部と、
前記三次元モデルの深度画像を生成する深度画像生成部と、
前記特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択する第１選択部と、
前記人体の特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択する第２選択部と、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成する、第１生成部と、
を備える。 In order to achieve the above object, the learning device in one aspect of the present invention is
A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A first selection unit that selects a depth image of a second specific part region including the specific part from a depth image of the three-dimensional model in which the specific part is hidden by another part.
A second selection unit that selects a depth image of a first specific part region including the specific part from a depth image of the three-dimensional model in a posture in which the specific part of the human body is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. , 1st generation part,
To be equipped.

上記目的を達成するため、本発明の一側面における推定装置は、
撮像装置から深度画像を取得する深度画像取得部と、
取得された前記深度画像に、人体の首を含む領域を推定し、推定した前記領域に基づいて、前記深度画像から人を検出する人検出部と、
前記人が検出された場合、前記領域内における深度分布に基づいて、前記首の周囲にある人体の特定部位の位置を、第１識別モデルを用いて推定する位置推定部と、
を備え、
前記第１識別モデルは、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像が選択され、前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像が選択され、前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて生成されている。 In order to achieve the above object, the estimation device in one aspect of the present invention is
A depth image acquisition unit that acquires a depth image from an image pickup device,
A human detection unit that estimates a region including the neck of the human body in the acquired depth image and detects a person from the depth image based on the estimated region.
When the person is detected, a position estimation unit that estimates the position of a specific part of the human body around the neck based on the depth distribution in the region by using the first discriminative model.
With
The first discriminative model is
From the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part, the depth image of the first specific part region including the specific part is selected, and the specific part is hidden by the other part. A depth image of the second specific part region including the specific part is selected from the depth image of the three-dimensional model having no posture, and the depth image of the first specific part region and the depth image of the second specific part region are selected. Is generated based on.

また、上記目的を達成するため、本発明の一側面における学習方法は、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択するステップと、
前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択するステップと、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成するステップと、
を備える。 Further, in order to achieve the above object, the learning method in one aspect of the present invention is:
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
To be equipped.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択するステップと、
前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択するステップと、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成するステップと、
を実行させる命令を含む。 Further, in order to achieve the above object, the program in one aspect of the present invention is:
On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
Includes instructions to execute.

以上のように本発明によれば、撮像装置で撮像された深度画像において、人体の特定部位が他の部位で隠れていても、その特定部位の位置の推定を可能にできる。 As described above, according to the present invention, even if a specific part of the human body is hidden by another part in the depth image captured by the imaging device, it is possible to estimate the position of the specific part.

図１は、学習装置の構成図である。FIG. 1 is a configuration diagram of a learning device. 図２は、学習装置の具体的な構成を示すブロック図である。FIG. 2 is a block diagram showing a specific configuration of the learning device. 図３は、特定部位にオクルージョンが発生した姿勢の三次元モデルを示す図である。FIG. 3 is a diagram showing a three-dimensional model of a posture in which occlusion occurs at a specific portion. 図４は、特定部位にオクルージョンが発生していない姿勢の三次元モデルを示す図である。FIG. 4 is a diagram showing a three-dimensional model of a posture in which occlusion does not occur in a specific part. 図５は、第３選択部が選択する首領域の深度画像を説明するための図である。FIG. 5 is a diagram for explaining a depth image of the neck region selected by the third selection unit. 図６は、第３選択部が選択する首領域の深度画像を説明するための図である。FIG. 6 is a diagram for explaining a depth image of the neck region selected by the third selection unit. 図７は、推定装置の構成を示すブロック図である。FIG. 7 is a block diagram showing the configuration of the estimation device. 図８は、学習装置の動作を示すフロー図である。FIG. 8 is a flow chart showing the operation of the learning device. 図９は、学習装置の動作を示すフロー図である。FIG. 9 is a flow chart showing the operation of the learning device. 図１０は、推定装置の動作を示すフロー図である。FIG. 10 is a flow chart showing the operation of the estimation device. 図１１は、学習装置を実現するコンピュータの一例を示すブロック図である。FIG. 11 is a block diagram showing an example of a computer that realizes a learning device.

以下、本発明の一実施形態における学習装置および学習方法について、図１〜図１１を参照しながら説明する。 Hereinafter, the learning device and the learning method according to the embodiment of the present invention will be described with reference to FIGS. 1 to 11.

［装置構成］
図１は、学習装置１０の構成図である。 [Device configuration]
FIG. 1 is a configuration diagram of the learning device 10.

学習装置１０は、撮像装置で撮像された深度画像（以下、撮像画像と言う）から、人体の特定部位の位置を推定するための識別モデルを学習させるための装置である。学習装置１０は、生成した人体の三次元モデルを用いて、識別モデルを学習させていくことで、撮像画像から人体の特定部位の位置を推定する精度を高めることを可能としている。 The learning device 10 is a device for learning an identification model for estimating the position of a specific part of the human body from a depth image (hereinafter referred to as an captured image) captured by the imaging device. The learning device 10 makes it possible to improve the accuracy of estimating the position of a specific part of the human body from the captured image by learning the discriminative model using the generated three-dimensional model of the human body.

学習装置１０は、三次元モデル生成部１と、深度画像生成部２と、第１選択部３と、第２選択部４と、第１生成部５と、を備えている。 The learning device 10 includes a three-dimensional model generation unit 1, a depth image generation unit 2, a first selection unit 3, a second selection unit 4, and a first generation unit 5.

三次元モデル生成部１は、人体の三次元モデルを生成する。 The three-dimensional model generation unit 1 generates a three-dimensional model of the human body.

深度画像生成部２は、生成された三次元モデルの深度画像を生成する。 The depth image generation unit 2 generates a depth image of the generated three-dimensional model.

第１選択部３は、人体の特定部位が他の部位で隠れた姿勢の三次元モデルの深度画像から、特定部位を含む第１特定部位領域の深度画像を選択する。「特定部位」は、人体の頭、肩又は首である。また、「他の部位」は、人体の三次元モデルを一方向から視たときに、特定部位の前方に位置することができる部位である。例えば、「特定部位」が「肩」である場合、「他の部位」は「腕」、「手」、「肘」などが相当する。そして、人体の特定部位が他の部位で隠れた姿勢とは、人体の三次元モデルを一方向から視たときに、特定部位の前方に、他の部位が位置した状態である。 The first selection unit 3 selects a depth image of a first specific part region including a specific part from a depth image of a three-dimensional model in which a specific part of the human body is hidden by another part. The "specific site" is the head, shoulders or neck of the human body. Further, the "other part" is a part that can be located in front of a specific part when the three-dimensional model of the human body is viewed from one direction. For example, when the "specific part" is the "shoulder", the "other part" corresponds to the "arm", "hand", "elbow" and the like. The posture in which a specific part of the human body is hidden by another part is a state in which the other part is located in front of the specific part when the three-dimensional model of the human body is viewed from one direction.

以下の説明では、人体の特定部位が他の部位で隠れた姿勢を、特定部位にオクルージョンが発生した姿勢と言う場合もある。 In the following description, a posture in which a specific part of the human body is hidden by another part may be referred to as a posture in which occlusion occurs in the specific part.

第２選択部４は、人体の特定部位が他の部位で隠れていない姿勢の三次元モデルの深度画像から、特定部位を含む第２特定部位領域の深度画像を選択する。人体の特定部位が他の部位で隠れていない姿勢とは、人体の三次元モデルを一方向から視たときに、特定部位の前方に、他の部位が位置していない状態である。 The second selection unit 4 selects a depth image of the second specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is not hidden by the other part. The posture in which a specific part of the human body is not hidden by another part is a state in which the other part is not located in front of the specific part when the three-dimensional model of the human body is viewed from one direction.

以下の説明では、人体の特定部位が他の部位で隠れていない姿勢を、特定部位にオクルージョンが発生していない姿勢と言う場合もある。 In the following description, a posture in which a specific part of the human body is not hidden by another part may be referred to as a posture in which occlusion does not occur in the specific part.

第１生成部５は、第１特定部位領域の深度画像、及び、第２特定部位領域の深度画像に基づいて、撮像画像から人体の特定部位を識別するための第１識別モデルを生成する。 The first generation unit 5 generates a first identification model for identifying a specific part of the human body from the captured image based on the depth image of the first specific part region and the depth image of the second specific part region.

この構成の学習装置１０によると、人体の特定部位が他の部位で隠れた姿勢の三次元モデル、及び、特定部位が隠れていない姿勢の三次元モデルそれぞれから、第１識別モデルを生成する。これにより、撮像装置で人を撮像して、その撮像画像から人体の特定部位の位置を推定する場合、撮像画像内の人の姿勢が、人体の特定部位が他の部位で隠れた姿勢であっても、隠れていない姿勢であっても、第１識別モデルを用いることで、その特定部位の位置の推定が可能となる。 According to the learning device 10 having this configuration, a first discriminative model is generated from each of a three-dimensional model in which a specific part of the human body is hidden by another part and a three-dimensional model in which the specific part is not hidden. As a result, when a person is imaged with an imaging device and the position of a specific part of the human body is estimated from the captured image, the posture of the person in the captured image is a posture in which the specific part of the human body is hidden by another part. However, even in a posture that is not hidden, the position of the specific portion can be estimated by using the first discriminative model.

続いて、図２〜図６を用いて、学習装置１０の構成について具体的に説明する。 Subsequently, the configuration of the learning device 10 will be specifically described with reference to FIGS. 2 to 6.

図２は、学習装置１０の具体的な構成を示すブロック図である。学習装置１０は、三次元モデル生成部１、深度画像生成部２、第１選択部３、第２選択部４、及び第１生成部５に加え、第３選択部７と、第２生成部８と、第１識別モデル記憶部６と、第２識別モデル記憶部９とを備えている。 FIG. 2 is a block diagram showing a specific configuration of the learning device 10. The learning device 10 includes a third selection unit 7 and a second generation unit in addition to the three-dimensional model generation unit 1, the depth image generation unit 2, the first selection unit 3, the second selection unit 4, and the first generation unit 5. 8, a first discriminative model storage unit 6, and a second discriminative model storage unit 9 are provided.

三次元モデル生成部１は、人体の三次元モデルを生成する。三次元モデルは、モーションキャプチャを利用して生成してもよいし、人体を撮像して得られた画像から生成してもよく、その生成方法は、特に限定されない。 The three-dimensional model generation unit 1 generates a three-dimensional model of the human body. The three-dimensional model may be generated by using motion capture, or may be generated from an image obtained by imaging a human body, and the generation method thereof is not particularly limited.

深度画像生成部２は、三次元モデル生成部１により生成された三次元モデルの深度画像を生成する。深度画像は、カメラ位置から人体の各部位までの距離を示す距離情報を有した画像である。 The depth image generation unit 2 generates a depth image of the three-dimensional model generated by the three-dimensional model generation unit 1. The depth image is an image having distance information indicating the distance from the camera position to each part of the human body.

第１選択部３は、生成された三次元モデルが、特定部位にオクルージョンが発生した姿勢である場合に、その三次元モデルの深度画像から、第１特定部位領域の深度画像を選択する。 When the generated three-dimensional model is in a posture in which occlusion occurs in a specific part, the first selection unit 3 selects a depth image of the first specific part region from the depth image of the three-dimensional model.

図３は、特定部位にオクルージョンが発生した姿勢の三次元モデルを示す図である。 FIG. 3 is a diagram showing a three-dimensional model of a posture in which occlusion occurs at a specific portion.

生成された三次元モデルは、手動、例えば学習装置１０の操作者による操作、又は自動で動かされる。三次元モデルは、動かされることによって、特定部位が他の部位で隠れた姿勢となる。図３は、左肩が左腕で隠れた姿勢を示す。三次元モデルは生成されたものであるため、人体の各部位の位置は特定可能である。特定部位を左肩とする場合、第１選択部３は、左肩の位置を特定する。そして、第１選択部３は、その位置を中心とする矩形状の第１特定部位領域３１を生成し、第１特定部位領域３１の深度画像を選択する。 The generated three-dimensional model is manually operated, for example, operated by the operator of the learning device 10, or is automatically operated. When the three-dimensional model is moved, a specific part becomes a posture hidden by another part. FIG. 3 shows a posture in which the left shoulder is hidden by the left arm. Since the three-dimensional model is generated, the position of each part of the human body can be specified. When the specific part is the left shoulder, the first selection unit 3 specifies the position of the left shoulder. Then, the first selection unit 3 generates a rectangular first specific site region 31 centered on the position, and selects a depth image of the first specific site region 31.

なお、図３の三次元モデルは、左肩が腕で隠れているため、第１選択部３が選択した第１特定部位領域３１の深度画像には、人体の「左肩」の深度画像は含まれていない。 In the three-dimensional model of FIG. 3, since the left shoulder is hidden by the arm, the depth image of the first specific part region 31 selected by the first selection unit 3 includes the depth image of the "left shoulder" of the human body. Not.

第１選択部３は、左肩以外の特定部位に対しても同様に、三次元モデルが動かされた結果、他の部位で隠れた右肩、首又は頭の位置を特定し、特定した位置を中心とする矩形状の第１特定部位領域３１を生成し、その第１特定部位領域３１の深度画像を選択する。また、第１選択部３は、図３に示す正面から視た三次元モデルだけでなく、人体の側方又は上方などから視た三次元モデルに対しても同様に、特定部位を含む第１特定部位領域３１の深度画像を選択する。このように、第１選択部３は、多様に動かされ、多方向から視た三次元モデルの深度画像に、第１特定部位領域３１を生成し、その第１特定部位領域３１の深度画像を随時選択する。 Similarly, the first selection unit 3 identifies the position of the right shoulder, neck, or head hidden in other parts as a result of moving the three-dimensional model for a specific part other than the left shoulder, and determines the specified position. A rectangular first specific site region 31 as a center is generated, and a depth image of the first specific site region 31 is selected. Further, the first selection unit 3 includes not only the three-dimensional model viewed from the front shown in FIG. 3 but also the three-dimensional model viewed from the side or the upper side of the human body, and similarly includes the specific portion. A depth image of the specific site region 31 is selected. In this way, the first selection unit 3 is moved in various ways to generate the first specific part region 31 in the depth image of the three-dimensional model viewed from multiple directions, and the depth image of the first specific part region 31 is generated. Select at any time.

第２選択部４は、生成された三次元モデルが、特定部位にオクルージョンが発生していない姿勢である場合に、その三次元モデルの深度画像から、第２特定部位領域の深度画像を選択する。 The second selection unit 4 selects the depth image of the second specific part region from the depth image of the three-dimensional model when the generated three-dimensional model is in a posture in which occlusion does not occur in the specific part. ..

図４は、特定部位にオクルージョンが発生していない姿勢の三次元モデルを示す図である。 FIG. 4 is a diagram showing a three-dimensional model of a posture in which occlusion does not occur in a specific part.

三次元モデルが、特定部位が他の部位で隠れていない姿勢である場合、特定部位を左肩とすると、第２選択部４は、左肩の位置を特定し、その位置を中心とする矩形状の第２特定部位領域３２を生成し、その第２特定部位領域３２の深度画像を選択する。なお、第２特定部位領域３２の大きさは、第１特定部位領域３１の大きさと同じであることが好ましい。 When the three-dimensional model is in a posture in which a specific part is not hidden by another part and the specific part is the left shoulder, the second selection unit 4 specifies the position of the left shoulder and has a rectangular shape centered on that position. A second specific site region 32 is generated, and a depth image of the second specific site region 32 is selected. The size of the second specific site region 32 is preferably the same as the size of the first specific site region 31.

第２選択部４は、左肩以外の特定部位に対しても同様に、オクルージョンが発生していない姿勢の三次元モデルにおける右肩、首又は頭の位置を特定し、特定した位置を中心とする矩形状の第２特定部位領域３２を生成し、その第２特定部位領域３２の深度画像を選択する。また、第２選択部４は、図４に示す正面から視た三次元モデルだけでなく、人体の側方又は上方などから視た三次元モデルに対しても同様に、特定部位を含む第２特定部位領域３２の深度画像を選択する。このように、第２選択部４は、多様に動かされ、多方向から視た三次元モデルの深度画像に第２特定部位領域３２を生成し、その第２特定部位領域３２の深度画像を随時選択する。 Similarly, the second selection unit 4 specifies the position of the right shoulder, neck, or head in the three-dimensional model of the posture in which occlusion does not occur for a specific part other than the left shoulder, and centers on the specified position. A rectangular second specific site region 32 is generated, and a depth image of the second specific site region 32 is selected. Further, the second selection unit 4 includes not only the three-dimensional model viewed from the front as shown in FIG. 4 but also the third-dimensional model viewed from the side or the upper side of the human body, and similarly includes the specific portion. A depth image of the specific site region 32 is selected. In this way, the second selection unit 4 is moved in various ways to generate a second specific region region 32 in the depth image of the three-dimensional model viewed from multiple directions, and the depth image of the second specific region region 32 is displayed at any time. select.

第１生成部５は、第１選択部３が選択した、第１特定部位領域３１の深度画像、及び、第２選択部４が選択した、第２特定部位領域３２の深度画像それぞれの深度分布を特徴量として学習して、第１識別モデルとして生成する。 The first generation unit 5 is the depth distribution of each of the depth image of the first specific site region 31 selected by the first selection unit 3 and the depth image of the second specific site region 32 selected by the second selection unit 4. Is learned as a feature quantity and generated as a first discriminative model.

第１生成部５は、生成した第１識別モデルを、第１識別モデル記憶部６に記憶する。第１識別モデル記憶部６は、第１識別モデルを記憶する記憶装置である。なお、第１識別モデル記憶部６は、学習装置１０が備えていてもよいし、学習装置１０の外部に設けられていてもよい。 The first generation unit 5 stores the generated first identification model in the first identification model storage unit 6. The first discriminative model storage unit 6 is a storage device that stores the first discriminative model. The first identification model storage unit 6 may be provided in the learning device 10 or may be provided outside the learning device 10.

第３選択部７は、生成された三次元モデルの深度画像から、人体の首を含む首領域の深度画像を選択する。 The third selection unit 7 selects a depth image of the neck region including the neck of the human body from the depth image of the generated three-dimensional model.

図５及び図６は、第３選択部７が選択する首領域３３の深度画像を説明するための図である。上記のように、三次元モデルは生成されたものであるため、人体の各部位の位置は特定される。第３選択部７は、三次元モデルにおける「首」の位置を特定し、その位置を中心とする矩形状の首領域３３を生成し、その首領域３３の深度画像を選択する。 5 and 6 are diagrams for explaining a depth image of the neck region 33 selected by the third selection unit 7. As described above, since the three-dimensional model is generated, the position of each part of the human body is specified. The third selection unit 7 specifies the position of the "neck" in the three-dimensional model, generates a rectangular neck region 33 centered on the position, and selects a depth image of the neck region 33.

第３選択部７は、図５に示す正面から視た状態以外に、三次元モデルのあらゆる方向から視た状態で、人体の首を含む首領域の深度画像を選択する。図６は、三次元モデルの上方から視た三次元モデルを示している。図６の場合、三次元モデルの頭及び肩が視え、首が頭で隠れたオクルージョンが発生した姿勢となっている。この状態であっても、第３選択部７は、首の位置を特定し、特定した位置を中心とする矩形状の首領域３３を生成し、その首領域３３の深度画像を選択する。 The third selection unit 7 selects a depth image of the neck region including the neck of the human body when viewed from all directions of the three-dimensional model, in addition to the state viewed from the front shown in FIG. FIG. 6 shows a three-dimensional model viewed from above the three-dimensional model. In the case of FIG. 6, the head and shoulders of the three-dimensional model can be seen, and the neck is hidden by the head in an occlusion posture. Even in this state, the third selection unit 7 specifies the position of the neck, generates a rectangular neck region 33 centered on the specified position, and selects a depth image of the neck region 33.

第２生成部８は、選択された首領域３３の深度画像の深度情報に基づいて、第２識別モデルを生成する。詳しくは、第２生成部８は、首領域３３の深度画像の深度分布を特徴量として学習して、第２識別モデルとして生成する。 The second generation unit 8 generates a second discriminative model based on the depth information of the depth image of the selected neck region 33. Specifically, the second generation unit 8 learns the depth distribution of the depth image of the neck region 33 as a feature amount and generates it as a second discriminative model.

第２生成部８は、生成した第２識別モデルを、第２識別モデル記憶部９に記憶する。第２識別モデル記憶部９は、第２識別モデルを記憶する記憶装置である。なお、第２識別モデル記憶部９は、学習装置１０が備えていてもよいし、学習装置１０の外部に設けられていてもよい。 The second generation unit 8 stores the generated second identification model in the second identification model storage unit 9. The second discriminative model storage unit 9 is a storage device that stores the second discriminative model. The second identification model storage unit 9 may be provided in the learning device 10 or may be provided outside the learning device 10.

このように、学習装置１０は、三次元モデルによる、第１識別モデル及び第２識別モデルを学習させる。第１識別モデル及び第２識別モデルは、撮像画像から人体の特定部位の位置を推定する推定装置で用いられる。第１識別モデルは、撮像画像から人体の特定部位である頭、肩又は首の位置を推定する際に用いられる。また、第２識別モデルは、撮像画像に人が写っているかの人検出を行う際に用いられる。以下に、その推定装置について説明する。 In this way, the learning device 10 trains the first discriminative model and the second discriminative model by the three-dimensional model. The first discriminative model and the second discriminative model are used in an estimation device that estimates the position of a specific part of the human body from a captured image. The first discriminative model is used when estimating the position of the head, shoulders or neck, which is a specific part of the human body, from the captured image. Further, the second discriminative model is used when detecting a person as to whether or not a person is shown in the captured image. The estimation device will be described below.

図７は、推定装置２０の構成を示すブロック図である。 FIG. 7 is a block diagram showing the configuration of the estimation device 20.

推定装置２０は、撮像装置３０から撮像画像を取得する。撮像装置３０は、空間内の人を撮像するように、配置されている。撮像装置３０は、人体の正面から撮像するように配置されてもよいし、側方又は上方から撮像するように配置されていてもよい。推定装置２０は、取得した撮像画像から人を検出し、検出された人体の特定部位である、頭、肩又は首の二次元座標上の位置を推定する。頭、肩又は首の位置を推定することで、頭、肩又は首周囲の関節位置を推定できる。関節位置を推定することで、身振り手振りなどのジェスチャー動作を認識したり、人の姿勢を推定したりできるようになる。 The estimation device 20 acquires a captured image from the image pickup device 30. The image pickup device 30 is arranged so as to image a person in the space. The image pickup apparatus 30 may be arranged so as to take an image from the front of the human body, or may be arranged so as to take an image from the side or above. The estimation device 20 detects a person from the acquired captured image and estimates the position of the detected specific part of the human body on the two-dimensional coordinates of the head, shoulders, or neck. By estimating the position of the head, shoulders or neck, the joint positions around the head, shoulders or neck can be estimated. By estimating the joint position, it becomes possible to recognize gesture movements such as gestures and to estimate the posture of a person.

なお、推定装置２０と撮像装置３０とは、それぞれ独立した装置で、データ通信可能に接続された構成であってもよいし、推定装置２０が撮像装置３０を備えた構成であってもよい。また、第１識別モデル記憶部６及び第２識別モデル記憶部９が、学習装置１０の外部に設けられている場合、推定装置２０は、第１識別モデル記憶部６及び第２識別モデル記憶部９に対してのみ、データ通信可能に接続された構成であってもよい。 The estimation device 20 and the image pickup device 30 may be independent devices and may be connected to each other so as to be capable of data communication, or the estimation device 20 may be provided with the image pickup device 30. When the first identification model storage unit 6 and the second identification model storage unit 9 are provided outside the learning device 10, the estimation device 20 is the first identification model storage unit 6 and the second identification model storage unit. The configuration may be such that data communication is possible only for 9.

推定装置２０は、深度画像取得部２１と、人検出部２２と、位置推定部２３と、関節位置推定部２４とを備えている。 The estimation device 20 includes a depth image acquisition unit 21, a human detection unit 22, a position estimation unit 23, and a joint position estimation unit 24.

深度画像取得部２１は、撮像装置３０から撮像画像を取得する。 The depth image acquisition unit 21 acquires an captured image from the imaging device 30.

人検出部２２は、取得された撮像画像に対して、人体の首を含む領域を推定し、推定した領域に基づいて、撮像画像から人を検出する。まず、人検出部２２は、取得された撮像画像に含まれる距離情報に応じて、撮像画像の縮小画像を生成する。例えば、撮像画像内の人の大きさは、撮像装置３０の近距離に位置する場合と遠距離に位置する場合とで、異なる。このため、人検出部２２は、近距離の場合には縮小率を大きくした縮小画像を撮像画像から生成し、遠距離の場合には縮小率を小さくした縮小画像を撮像画像から生成する。 The human detection unit 22 estimates a region including the neck of the human body with respect to the acquired captured image, and detects a person from the captured image based on the estimated region. First, the person detection unit 22 generates a reduced image of the captured image according to the distance information included in the acquired captured image. For example, the size of a person in a captured image differs depending on whether the image pickup device 30 is located at a short distance or a long distance. Therefore, the person detection unit 22 generates a reduced image with a large reduction ratio from the captured image in the case of a short distance, and generates a reduced image with a small reduction ratio in the case of a long distance from the captured image.

人検出部２２は、生成した縮小画像内で、人体の首の位置を推定し、その推定した位置を含む領域を生成する。ここでは、人検出部２２は、例えば、ＮＭＳ（Non Maximum Suppression）を用いた処理により、領域を生成する。 The human detection unit 22 estimates the position of the neck of the human body in the generated reduced image, and generates a region including the estimated position. Here, the person detection unit 22 generates an area by, for example, a process using NMS (Non Maximum Suppression).

第２識別モデルは、上記したように、三次元モデルから生成した、首を含む首領域３３の深度画像の深度分布である。人検出部２２は、撮像画像（具体的には縮小画像）に推定した領域の深度分布と、第２識別モデルと対比することで、推定した領域が首を含む領域であるかを判定し、首を含む領域であると判定すると、撮像画像には人が写っていると特定する。これにより、人検出部２２は、撮像画像内の人を検出する。 The second discriminative model is the depth distribution of the depth image of the neck region 33 including the neck generated from the three-dimensional model as described above. The human detection unit 22 determines whether the estimated region is a region including the neck by comparing the depth distribution of the region estimated in the captured image (specifically, the reduced image) with the second identification model. If it is determined that the area includes the neck, it is specified that a person is shown in the captured image. As a result, the person detection unit 22 detects a person in the captured image.

ここで、人検出範囲とする矩形サイズの水平長さ（ｃｍ）と、首領域３３の水平サイズ（ピクセル数）が決まっていて、撮像装置の画角と解像度とが決まっているとする。この場合、検出したい距離での人検出の水平長さが、撮像画像中の水平何ピクセルに相当するのかが計算できる。この水平のピクセル数が、特徴量の水平サイズピクセルに合うように撮像画像を縮小すると、首領域３３を固定化して、縮小画像中から人検出することが可能となる。 Here, it is assumed that the horizontal length (cm) of the rectangular size used as the human detection range and the horizontal size (number of pixels) of the neck region 33 are determined, and the angle of view and the resolution of the imaging device are determined. In this case, it is possible to calculate how many horizontal pixels in the captured image correspond to the horizontal length of human detection at the desired distance. When the captured image is reduced so that the number of horizontal pixels matches the horizontal size pixels of the feature amount, the neck region 33 can be fixed and a person can be detected from the reduced image.

位置推定部２３は、撮像画像に人が検出されると、人検出部２２が推定した領域内における深度分布に基づいて、首の周囲にある人体の特定部位である頭、肩又は首の位置を、第１識別モデルを用いて推定する。第１識別モデルは、オクルージョンが発生した姿勢の三次元モデル、及び、オクルージョンが発生していない姿勢の三次元モデルから生成された識別モデルである。位置推定部２３は、第１識別モデルを用いることで、撮像画像にオクルージョンが発生しているか否かに関わらず、特定部位の位置を推定することができる。例えば、撮像画像内の人の姿勢が、図３で示すように、左肩が左手で隠れた姿勢である場合であっても、位置推定部２３は、人検出部２２が推定した領域の深度分布が、図３の三次元モデルから生成された第１識別モデルの学習された深度分布と類似していることで、撮像画像における人の左肩の位置を推定することができる。 When a person is detected in the captured image, the position estimation unit 23 determines the position of the head, shoulders, or neck, which is a specific part of the human body around the neck, based on the depth distribution in the region estimated by the person detection unit 22. Is estimated using the first discriminative model. The first discriminative model is a discriminative model generated from a three-dimensional model of a posture in which occlusion has occurred and a three-dimensional model of a posture in which occlusion has not occurred. By using the first identification model, the position estimation unit 23 can estimate the position of a specific part regardless of whether or not occlusion has occurred in the captured image. For example, even when the posture of the person in the captured image is a posture in which the left shoulder is hidden by the left hand as shown in FIG. 3, the position estimation unit 23 uses the depth distribution of the region estimated by the person detection unit 22. However, it is possible to estimate the position of the left shoulder of a person in the captured image because it is similar to the learned depth distribution of the first discriminative model generated from the three-dimensional model of FIG.

位置推定部２３は、撮像画像における推定した位置に、円形状のラベルを付与する。例えば、位置推定部２３は、推定した左肩の位置に対して、左肩を示すラベルを付与する。 The position estimation unit 23 assigns a circular label to the estimated position in the captured image. For example, the position estimation unit 23 assigns a label indicating the left shoulder to the estimated position of the left shoulder.

関節位置推定部２４は、位置推定部２３が付与したラベルから、関節位置を推定する。関節位置推定部２４は、例えば、左肩のラベルが付与された位置と、首のラベルが付与された位置との間の領域を、右肩関節の位置として推定する。 The joint position estimation unit 24 estimates the joint position from the label given by the position estimation unit 23. The joint position estimation unit 24 estimates, for example, a region between the position labeled on the left shoulder and the position labeled on the neck as the position of the right shoulder joint.

なお、撮像画像上で、特定部位同士によるオクルージョンが発生した場合、例えば、首が肩で隠れた姿勢の場合、位置推定部２３は、重なった領域に対して２つの部位の重なり領域として、ラベルを付与する。そして、関節位置推定部２４は、特定部位が重なっていない領域と重なった領域とを合わせて、関節位置を推定する。 When occlusion occurs between specific parts on the captured image, for example, when the neck is hidden by the shoulder, the position estimation unit 23 labels the overlapping area as an overlapping area of the two parts. Is given. Then, the joint position estimation unit 24 estimates the joint position by combining the region where the specific parts do not overlap and the region where the specific parts overlap.

［装置動作］
次に、本実施形態における学習装置１０の動作について図８及び図９を用いて説明する。図８及び図９は、学習装置１０の動作を示すフロー図である。以下の説明においては、適宜図１〜図７を参照する。また、本実施形態では、学習装置１０を動作させることによって、学習方法が実施される。よって、本実施形態における学習方法の説明は、以下の学習装置１０の動作説明に代える。 [Device operation]
Next, the operation of the learning device 10 in the present embodiment will be described with reference to FIGS. 8 and 9. 8 and 9 are flow charts showing the operation of the learning device 10. In the following description, FIGS. 1 to 7 will be referred to as appropriate. Further, in the present embodiment, the learning method is implemented by operating the learning device 10. Therefore, the description of the learning method in the present embodiment is replaced with the following description of the operation of the learning device 10.

図８は、第１識別モデル生成時のフロー図である。三次元モデル生成部１は、三次元モデルを生成する（Ｓ１）。深度画像生成部２は、生成された三次元モデルの深度画像を生成する（Ｓ２）。三次元モデルは、オクルージョンが発生する姿勢、オクルージョンが発生しない姿勢に動かされる。第１選択部３は、オクルージョンが発生した三次元モデルの深度画像から、第１特定部位領域３１の深度画像を選択する（Ｓ３）。具体的には、第１選択部３は、三次元モデルの深度画像における、人体の特定部位の位置を特定する。そして、第１選択部３は、その位置を中心とする矩形状の第１特定部位領域３１を生成し、第１特定部位領域３１の深度画像を選択する。第２選択部４は、オクルージョンが発生していない三次元モデルの深度画像から、第２特定部位領域３２の深度画像を選択する（Ｓ４）。第１生成部５は、選択された第１特定部位領域３１の深度画像、及び、第２特定部位領域３２の深度画像の深度画像の深度分布を特徴量として学習して、第１識別モデルを生成する（Ｓ５）。第１生成部５は、生成した第１識別モデルを、第１識別モデル記憶部６に記憶する（Ｓ６）。 FIG. 8 is a flow chart at the time of generating the first discriminative model. The three-dimensional model generation unit 1 generates a three-dimensional model (S1). The depth image generation unit 2 generates a depth image of the generated three-dimensional model (S2). The three-dimensional model is moved to a posture in which occlusion occurs and a posture in which occlusion does not occur. The first selection unit 3 selects the depth image of the first specific site region 31 from the depth image of the three-dimensional model in which occlusion has occurred (S3). Specifically, the first selection unit 3 specifies the position of a specific part of the human body in the depth image of the three-dimensional model. Then, the first selection unit 3 generates a rectangular first specific site region 31 centered on the position, and selects a depth image of the first specific site region 31. The second selection unit 4 selects the depth image of the second specific part region 32 from the depth image of the three-dimensional model in which occlusion has not occurred (S4). The first generation unit 5 learns the depth distribution of the selected depth image of the first specific site region 31 and the depth image of the depth image of the second specific site region 32 as feature quantities, and obtains the first discriminative model. Generate (S5). The first generation unit 5 stores the generated first identification model in the first identification model storage unit 6 (S6).

図９は、第２識別モデル生成時のフロー図である。第３選択部７は、図８のＳ１で生成された三次元モデルの深度画像から、人体の首を含む首領域３３の深度画像を選択する（Ｓ８）。第２生成部８は、首領域３３の深度画像の深度分布を特徴量として学習して、第２識別モデルを生成し（Ｓ９）、第２識別モデル記憶部９に記憶する（Ｓ１０）。 FIG. 9 is a flow chart at the time of generating the second discriminative model. The third selection unit 7 selects a depth image of the neck region 33 including the neck of the human body from the depth image of the three-dimensional model generated in S1 of FIG. 8 (S8). The second generation unit 8 learns the depth distribution of the depth image of the neck region 33 as a feature amount, generates a second discrimination model (S9), and stores it in the second discrimination model storage unit 9 (S10).

学習装置１０は、図８のＳ３〜Ｓ６の処理、及び、図９のＳ８〜Ｓ１０の処理を、三次元モデルを動かしつつ、繰り返し実行する。また、学習装置１０は、あらゆる方向から視た三次元モデルに対して、各処理を行う。 The learning device 10 repeatedly executes the processes S3 to S6 of FIG. 8 and the processes S8 to S10 of FIG. 9 while moving the three-dimensional model. Further, the learning device 10 performs each process on the three-dimensional model viewed from all directions.

次に、推定装置２０の動作について説明する。図１０は、推定装置２０の動作を示すフロー図である。 Next, the operation of the estimation device 20 will be described. FIG. 10 is a flow chart showing the operation of the estimation device 20.

深度画像取得部２１は、撮像装置３０から撮像画像を取得する（Ｓ２１）。人検出部２２は、取得された撮像画像の縮小画像を生成する（Ｓ２２）。このとき、人検出部２２は、撮像画像に含まれる距離情報に応じて、撮像画像の縮小画像を生成する。人検出部２２は、近距離の場合には縮小率を大きくした縮小画像を撮像画像から生成し、遠距離の場合には縮小率を小さくした縮小画像を撮像画像から生成する。 The depth image acquisition unit 21 acquires an captured image from the imaging device 30 (S21). The human detection unit 22 generates a reduced image of the acquired captured image (S22). At this time, the person detection unit 22 generates a reduced image of the captured image according to the distance information included in the captured image. The person detection unit 22 generates a reduced image with a large reduction ratio in the case of a short distance from the captured image, and generates a reduced image with a small reduction ratio in the case of a long distance from the captured image.

人検出部２２は、生成した縮小画像から人を検出する（Ｓ２３）。人検出部２２は、生成した縮小画像に対して、人体の首を含む領域を推定し、その領域の深度画像の深度分布と、第２識別モデルとを対比する。そして、人検出部２２は、対比することで、推定した領域が首を含む領域であるかを判定し、首を含む領域であると判定すると、撮像画像には人が写っていると特定する。 The person detection unit 22 detects a person from the generated reduced image (S23). The human detection unit 22 estimates a region including the neck of the human body with respect to the generated reduced image, and compares the depth distribution of the depth image of that region with the second discriminative model. Then, the person detection unit 22 determines whether the estimated region is a region including the neck by comparison, and if it is determined that the region includes the neck, the person detection unit 22 identifies that the captured image shows a person. ..

位置推定部２３は、撮像画像に人が検出されると、Ｓ２３で推定された領域の深度分布に基づいて、首の周囲にある人体の特定部位である頭、肩又は首の位置を、第１識別モデル及び第２識別モデルを用いて推定する（Ｓ２４）。そして、位置推定部２３は、推定した位置に、円形状のラベルを付与する（Ｓ２５）。関節位置推定部２４は、付与されたラベルに基づいて、関節位置を推定する（Ｓ２６）。 When a person is detected in the captured image, the position estimation unit 23 determines the position of the head, shoulders, or neck, which is a specific part of the human body around the neck, based on the depth distribution of the region estimated in S23. Estimate using the 1 discriminative model and the 2nd discriminative model (S24). Then, the position estimation unit 23 assigns a circular label to the estimated position (S25). The joint position estimation unit 24 estimates the joint position based on the given label (S26).

以上のように、学習装置１０が、第１識別モデル及び第２識別モデルを生成することで、推定装置２０は、オクルージョンの発生の有無にかかわらず、撮像画像における人の特定部位の位置を推定することができる。また、学習装置１０が、第２識別モデルを生成し、推定装置２０が、その第２識別モデルを用いて、首を中心とした人の検出を行うことで、撮像装置３０の撮像アングルに依存せず、撮像画像における人の検出が可能となる。そして、学習装置１０が、あらゆる方向から視た三次元モデルから第１識別モデル及び第２識別モデルを生成することで、撮像装置３０の撮像アングルに依存せず、撮像画像における人体の特定部の位置の推定が可能となる。 As described above, the learning device 10 generates the first discriminative model and the second discriminative model, so that the estimation device 20 estimates the position of a specific part of the person in the captured image regardless of the occurrence of occlusion. can do. Further, the learning device 10 generates a second discriminative model, and the estimation device 20 uses the second discriminative model to detect a person centered on the neck, thereby depending on the imaging angle of the imaging device 30. Instead, it is possible to detect a person in the captured image. Then, the learning device 10 generates the first discriminative model and the second discriminative model from the three-dimensional model viewed from all directions, so that the specific part of the human body in the captured image is independent of the imaging angle of the imaging device 30. The position can be estimated.

［プログラム］
本実施形態におけるプログラムは、コンピュータに、図８に示す各ステップを実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施形態における学習装置と学習方法とを実現することができる。この場合、コンピュータのプロセッサは、三次元モデル生成部１、深度画像生成部２、第１選択部３、第２選択部４、第１生成部５、第３選択部７、及び第２生成部８として機能し、処理を行なう。 [program]
The program in this embodiment may be any program that causes a computer to execute each step shown in FIG. By installing this program on a computer and executing it, the learning device and the learning method in the present embodiment can be realized. In this case, the computer processor is a three-dimensional model generation unit 1, a depth image generation unit 2, a first selection unit 3, a second selection unit 4, a first generation unit 5, a third selection unit 7, and a second generation unit. It functions as 8 and performs processing.

また、本実施形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、三次元モデル生成部１、深度画像生成部２、第１選択部３、第２選択部４、第１生成部５、第３選択部７、及び第２生成部８のいずれかとして機能しても良い。 Moreover, the program in this embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer has a three-dimensional model generation unit 1, a depth image generation unit 2, a first selection unit 3, a second selection unit 4, a first generation unit 5, a third selection unit 7, and so on. It may function as any of the second generation units 8.

さらに、本実施の形態では、第１識別モデル記憶部６、及び第２識別モデル記憶部９は、コンピュータに備えられたハードディスク等の記憶装置で実現されていても良いし、別のコンピュータの記憶装置によって実現されていても良い。 Further, in the present embodiment, the first identification model storage unit 6 and the second identification model storage unit 9 may be realized by a storage device such as a hard disk provided in the computer, or may be stored in another computer. It may be realized by the device.

また、コンピュータに、図１０に示す各ステップを実行させるプログラムをインストールし、実行することによって、本実施形態における推定装置を実現することができる。この場合、コンピュータのプロセッサは、深度画像取得部２１、人検出部２２、位置推定部２３及び関節位置推定部２４として機能し、処理を行なう。このプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、深度画像取得部２１、人検出部２２、位置推定部２３及び関節位置推定部２４のいずれかとして機能しても良い。 Further, the estimation device according to the present embodiment can be realized by installing and executing a program for executing each step shown in FIG. 10 on a computer. In this case, the computer processor functions as a depth image acquisition unit 21, a person detection unit 22, a position estimation unit 23, and a joint position estimation unit 24, and performs processing. This program may be executed by a computer system built by a plurality of computers. In this case, for example, each computer may function as any of the depth image acquisition unit 21, the person detection unit 22, the position estimation unit 23, and the joint position estimation unit 24, respectively.

コンピュータとしては、汎用のＰＣの他に、スマートフォン、タブレット型端末装置が挙げられる。 Examples of computers include smartphones and tablet terminal devices in addition to general-purpose PCs.

［物理構成］
ここで、本実施形態におけるプログラムを実行することによって、学習装置１０を実現するコンピュータについて図１１を用いて説明する。図１１は、学習装置１０を実現するコンピュータの一例を示すブロック図である。 [Physical configuration]
Here, a computer that realizes the learning device 10 by executing the program in the present embodiment will be described with reference to FIG. FIG. 11 is a block diagram showing an example of a computer that realizes the learning device 10.

図１１に示すように、コンピュータ１１０は、ＣＰＵ（Central Processing Unit）１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。 As shown in FIG. 11, the computer 110 includes a CPU (Central Processing Unit) 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. And. Each of these parts is connected to each other via a bus 121 so as to be capable of data communication.

また、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-Programmable Gate Array）を備えていても良い。この態様では、ＧＰＵ又はＦＰＧＡが、実施の形態におけるプログラムを実行することができる。 Further, the computer 110 may include a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array) in addition to the CPU 111 or in place of the CPU 111. In this aspect, the GPU or FPGA can execute the program in the embodiment.

ＣＰＵ１１１は、記憶装置１１３に格納された、コード群で構成された実施形態におけるプログラムをメインメモリ１１２に展開し、各コードを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。 The CPU 111 executes various operations by expanding the program in the embodiment composed of the code group stored in the storage device 113 into the main memory 112 and executing each code in a predetermined order. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory).

また、実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 Further, the program in the embodiment is provided in a state of being stored in a computer-readable recording medium 120. The program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, specific examples of the storage device 113 include a semiconductor storage device such as a flash memory in addition to a hard disk drive. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads a program from the recording medium 120, and writes a processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic recording medium such as a flexible disk, or a CD-. Examples include optical recording media such as ROM (Compact Disk Read Only Memory).

なお、本実施形態における学習装置１０は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、学習装置１０は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 The learning device 10 in the present embodiment can also be realized by using hardware corresponding to each part instead of the computer in which the program is installed. Further, the learning device 10 may be partially realized by a program and the rest may be realized by hardware.

なお、推定装置２０を実現するコンピュータについて、図１１と同様であるため、その説明は省略する。 Since the computer that realizes the estimation device 20 is the same as that in FIG. 11, the description thereof will be omitted.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１０）によって表現することができるが、以下の記載に限定されるものではない。 A part or all of the above-described embodiments can be expressed by the following descriptions (Appendix 1) to (Appendix 10), but the present invention is not limited to the following description.

（付記１）
人体の三次元モデルを生成する三次元モデル生成部と、
前記三次元モデルの深度画像を生成する深度画像生成部と、
前記特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択する第１選択部と、
前記人体の特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択する第２選択部と、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成する、第１生成部と、
を備える、学習装置。 (Appendix 1)
A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A first selection unit that selects a depth image of a second specific part region including the specific part from a depth image of the three-dimensional model in which the specific part is hidden by another part.
A second selection unit that selects a depth image of a first specific part region including the specific part from a depth image of the three-dimensional model in a posture in which the specific part of the human body is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. , 1st generation part,
A learning device equipped with.

（付記２）
付記１に記載の学習装置であって、
前記特定部位は、前記人体の頭、肩又は首であり、
前記第１生成部は、深度画像の深度分布を特徴量として学習して、前記第１識別モデルを生成する、
学習装置。 (Appendix 2)
The learning device according to Appendix 1,
The specific site is the head, shoulders or neck of the human body.
The first generation unit learns the depth distribution of the depth image as a feature amount and generates the first discriminative model.
Learning device.

（付記３）
付記１又は付記２に記載の学習装置であって、
前記三次元モデルの深度画像から、前記人体の首を含む首領域の深度画像を選択する第３選択部と、
撮像装置で撮像された深度画像から人体の有無を識別するための第２識別モデルを、選択された前記首領域の深度画像の深度情報に基づいて生成する、第２生成部と、
を備える、学習装置。 (Appendix 3)
The learning device according to Appendix 1 or Appendix 2.
A third selection unit that selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.
A second generation unit that generates a second identification model for identifying the presence or absence of a human body from a depth image captured by an imaging device based on the depth information of the selected depth image of the neck region.
A learning device equipped with.

（付記４）
付記３に記載の学習装置であって、
前記第２生成部は、前記首領域の深度画像の深度分布を特徴量として学習して、前記第２識別モデルを生成する、
学習装置。 (Appendix 4)
The learning device according to Appendix 3,
The second generation unit learns the depth distribution of the depth image of the neck region as a feature amount to generate the second discriminative model.
Learning device.

（付記５）
撮像装置から深度画像を取得する深度画像取得部と、
取得された前記深度画像に、人体の首を含む領域を推定し、推定した前記領域に基づいて、前記深度画像から人を検出する人検出部と、
前記人が検出された場合、前記領域内における深度分布に基づいて、前記首の周囲にある人体の特定部位の位置を、識別モデルを用いて推定する位置推定部と、
を備え、
前記識別モデルは、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像が選択され、前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像が選択され、前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて生成されている、
推定装置。 (Appendix 5)
A depth image acquisition unit that acquires a depth image from an image pickup device,
A human detection unit that estimates a region including the neck of the human body in the acquired depth image and detects a person from the depth image based on the estimated region.
When the person is detected, a position estimation unit that estimates the position of a specific part of the human body around the neck using a discriminative model based on the depth distribution in the region.
With
The discriminative model is
From the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part, the depth image of the first specific part region including the specific part is selected, and the specific part is hidden by the other part. A depth image of the second specific part region including the specific part is selected from the depth image of the three-dimensional model having no posture, and the depth image of the first specific part region and the depth image of the second specific part region are selected. Generated based on,
Estimator.

（付記６）
付記５に記載の推定装置であって、
前記人体検出部は、
前記三次元モデルの深度画像から、前記人体の首を含む領域の深度画像が選択され、選択された前記深度画像から生成された第２識別モデルを用いて、前記人を検出する、
推定装置。 (Appendix 6)
The estimation device according to Appendix 5, which is the estimation device.
The human body detection unit
A depth image of a region including the neck of the human body is selected from the depth image of the three-dimensional model, and the person is detected using the second identification model generated from the selected depth image.
Estimator.

（付記７）
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択するステップと、
前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択するステップと、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成するステップと、
を備える、学習方法。 (Appendix 7)
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
A learning method that includes.

（付記８）
付記７に記載の学習方法であって、
前記特定部位は、前記人体の頭、肩又は首であり、
前記第１識別モデルを生成するステップでは、深度画像の深度分布を特徴量として学習して、前記第１識別モデルを生成する、
学習方法。 (Appendix 8)
The learning method described in Appendix 7
The specific site is the head, shoulders or neck of the human body.
In the step of generating the first discriminative model, the depth distribution of the depth image is learned as a feature amount to generate the first discriminative model.
Learning method.

（付記９）
コンピュータに、
人体の三次元モデルを生成するステップと、
前記三次元モデルの深度画像を生成するステップと、
前記人体の特定部位が他の部位で隠れた姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第１特定部位領域の深度画像を選択するステップと、
前記特定部位が他の部位で隠れていない姿勢の前記三次元モデルの深度画像から、前記特定部位を含む第２特定部位領域の深度画像を選択するステップと、
前記第１特定部位領域の深度画像、及び、前記第２特定部位領域の深度画像に基づいて、撮像装置で撮像された深度画像から人体の特定部位を識別するための第１識別モデルを生成するステップと、
を実行させる命令を含むプログラム。 (Appendix 9)
On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
A program that contains instructions to execute.

（付記１０）
付記９に記載のプログラムであって、
前記特定部位は、前記人体の頭、肩又は首であり、
前記第１識別モデルを生成するステップでは、深度画像の深度分布を特徴量として学習して、前記第１識別モデルを生成する、
プログラム。 (Appendix 10)
The program described in Appendix 9
The specific site is the head, shoulders or neck of the human body.
In the step of generating the first discriminative model, the depth distribution of the depth image is learned as a feature amount to generate the first discriminative model.
program.

１三次元モデル生成部
２深度画像生成部
３第１選択部
４第２選択部
５第１生成部
６第１識別モデル記憶部
７第３選択部
８第２生成部
９第２識別モデル記憶部
１０学習装置
２０推定装置
２１深度画像取得部
２２人検出部
２３位置推定部
２４関節位置推定部
３０撮像装置
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス 1 3D model generation unit 2 Depth image generation unit 3 1st selection unit 4 2nd selection unit 5 1st generation unit 6 1st identification model storage unit 7 3rd selection unit 8 2nd generation unit 9 2nd identification model storage unit 10 Learning device 20 Estimator 21 Depth image acquisition unit 22 Person detection unit 23 Position estimation unit 24 Joint position estimation unit 30 Imaging device 110 Computer 111 CPU
112 Main memory 113 Storage device 114 Input interface 115 Display controller 116 Data reader / writer 117 Communication interface 118 Input device 119 Display device 120 Recording medium 121 Bus

Claims

A 3D model generator that generates a 3D model of the human body,
A depth image generator that generates a depth image of the three-dimensional model,
A first selection unit that selects a depth image of a second specific part region including the specific part from a depth image of the three-dimensional model in which the specific part is hidden by another part.
A second selection unit that selects a depth image of a first specific part region including the specific part from a depth image of the three-dimensional model in a posture in which the specific part of the human body is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. , 1st generation part,
A learning device equipped with.

The learning device according to claim 1.
The specific site is the head, shoulders or neck of the human body.
The first generation unit learns the depth distribution of the depth image as a feature amount and generates the first discriminative model.
Learning device.

The learning device according to claim 1 or 2.
A third selection unit that selects a depth image of the neck region including the neck of the human body from the depth image of the three-dimensional model.
A second generation unit that generates a second identification model for identifying the presence or absence of a human body from a depth image captured by an imaging device based on the depth information of the selected depth image of the neck region.
A learning device equipped with.

The learning device according to claim 3.
The second generation unit learns the depth distribution of the depth image of the neck region as a feature amount to generate the second discriminative model.
Learning device.

A depth image acquisition unit that acquires a depth image from an image pickup device,
A human detection unit that estimates a region including the neck of the human body in the acquired depth image and detects a person from the depth image based on the estimated region.
When the person is detected, a position estimation unit that estimates the position of a specific part of the human body around the neck using a discriminative model based on the depth distribution in the region.
With
The discriminative model is
From the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part, the depth image of the first specific part region including the specific part is selected, and the specific part is hidden by the other part. A depth image of the second specific part region including the specific part is selected from the depth image of the three-dimensional model having no posture, and the depth image of the first specific part region and the depth image of the second specific part region are selected. Generated based on,
Estimator.

The estimation device according to claim 5.
The human body detection unit
A depth image of a region including the neck of the human body is selected from the depth image of the three-dimensional model, and the person is detected using the second identification model generated from the selected depth image.
Estimator.

Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
A learning method that includes.

The learning method according to claim 7.
The specific site is the head, shoulders or neck of the human body.
In the step of generating the first discriminative model, the depth distribution of the depth image is learned as a feature amount to generate the first discriminative model.
Learning method.

On the computer
Steps to generate a three-dimensional model of the human body,
Steps to generate a depth image of the 3D model,
A step of selecting a depth image of a first specific part region including the specific part from the depth image of the three-dimensional model in which the specific part of the human body is hidden by another part.
A step of selecting a depth image of a second specific part region including the specific part from the depth image of the three-dimensional model in a posture in which the specific part is not hidden by another part.
Based on the depth image of the first specific part region and the depth image of the second specific part region, a first identification model for identifying a specific part of the human body is generated from the depth image captured by the imaging device. Steps and
A program that contains instructions to execute.

The program according to claim 9.
The specific site is the head, shoulders or neck of the human body.
In the step of generating the first discriminative model, the depth distribution of the depth image is learned as a feature amount to generate the first discriminative model.
program.