JP2015028702A

JP2015028702A - Information processor, information processing method, and program

Info

Publication number: JP2015028702A
Application number: JP2013157783A
Authority: JP
Inventors: 敦史野上; Atsushi Nogami; 優和真継; Masakazu Matsugi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-07-30
Filing date: 2013-07-30
Publication date: 2015-02-12

Abstract

PROBLEM TO BE SOLVED: To accurately estimate the attitude of an object with use of the limited number of pieces of known attitude information.SOLUTION: An object region extraction section 102 extracts person area information from a distance image input. An attitude selection section 103 selects a plurality of pieces of known attitude information stored in a storage section 106, for comparing the person area information extracted with the attitude information. A part attitude selection section 104 determines the conformity between a point group of the person area extracted and a cylindrical model of each part in the attitude information selected, and selects a part attitude representing the conformity more than a prescribed value. An attitude integration section 105 integrates the part attitude information and outputs the information integrated as person attitude information.

Description

本発明は、特に、物体の姿勢を推定するために用いて好適な情報処理装置、情報処理方法およびプログラムに関する。 The present invention particularly relates to an information processing apparatus, an information processing method, and a program suitable for use in estimating the posture of an object.

画像中の人物の姿勢を推定する方法として、例えば非特許文献１に記載されているような方法が知られている。非特許文献１に記載の方法では、入力画像の人物と最も近い姿勢を、既知の姿勢の中から選択することにより画像中の人物の姿勢を推定している。より具体的には、入力画像の人物領域シルエットを決定木により分類して、決定木末端に記録された既知姿勢を出力している。非特許文献１に記載されているような既知の姿勢から出力姿勢を選択する方法は、既存の姿勢から人物姿勢を選択するため、人物として正しい姿勢を出力することができるという利点がある。 As a method for estimating the posture of a person in an image, for example, a method described in Non-Patent Document 1 is known. In the method described in Non-Patent Document 1, the posture of the person in the image is estimated by selecting the posture closest to the person in the input image from the known postures. More specifically, the person area silhouette of the input image is classified by the decision tree, and the known posture recorded at the end of the decision tree is output. The method of selecting an output posture from a known posture as described in Non-Patent Document 1 has an advantage that a correct posture can be output as a person because a human posture is selected from an existing posture.

Okada, Ryuzo, Stenger, Bjorn, "A Single Camera Motion Capture System for Human-Computer Interaction", IEICE Transactions on Information and Systems, Volume E91.D, Issue 7, pp. 1855-1862 (2010).Okada, Ryuzo, Stenger, Bjorn, "A Single Camera Motion Capture System for Human-Computer Interaction", IEICE Transactions on Information and Systems, Volume E91.D, Issue 7, pp. 1855-1862 (2010). Mao Ye, Xianwang Wang, Ruigang Yang, Liu Ren, and Marc Pollefeys, "Accurate 3D pose estimation from a single depth image," IEEE International Conference on Computer Vision, 2011, pp. 731-738Mao Ye, Xianwang Wang, Ruigang Yang, Liu Ren, and Marc Pollefeys, "Accurate 3D pose estimation from a single depth image," IEEE International Conference on Computer Vision, 2011, pp. 731-738

しかしながら、非特許文献１に記載された方法のように、既知の姿勢を姿勢推定結果として出力する方法では、出力する姿勢が限られてしまうという問題がある。このような手法では、既知の姿勢と異なる姿勢を出力することができず、既知の姿勢の中で最も近い姿勢を出力することとなる。したがって、推定した姿勢の一部（例えば、腕や脚）が人物領域から明らかに外れた姿勢が出力されてしまう場合もある。 However, in the method of outputting a known posture as the posture estimation result as in the method described in Non-Patent Document 1, there is a problem that the posture to be output is limited. With such a method, a posture different from the known posture cannot be output, and the closest posture among the known postures is output. Therefore, a posture in which a part of the estimated posture (for example, an arm or a leg) is clearly out of the person area may be output.

そこで、この問題を解決する単純な方法としては、予め準備する既知の姿勢の数を増加させることが考えられる。しかしながら、起こりうる全ての姿勢を既知の姿勢として準備すると、辞書によるメモリ使用量が膨大になるという問題が発生する。また、起こりうる全ての人物姿勢の情報を準備する作業そのものも非常に困難である。 Therefore, as a simple method for solving this problem, it is conceivable to increase the number of known postures prepared in advance. However, if all the possible postures are prepared as known postures, there is a problem that the amount of memory used by the dictionary becomes enormous. In addition, the work itself of preparing information on all possible human postures is very difficult.

本発明は前述の問題点に鑑み、限られた数の既知の姿勢情報を用いて、物体の姿勢をより正確に推定できるようにすることを目的としている。 An object of the present invention is to make it possible to estimate the posture of an object more accurately by using a limited number of known posture information.

本発明に係る情報処理装置は、入力された距離画像から物体の領域情報を抽出する抽出手段と、前記物体に関する複数の姿勢情報を記憶する記憶手段と、前記抽出手段によって抽出された物体の領域情報と比較するための姿勢情報を、前記記憶手段に記憶されている姿勢情報から複数選択する選択手段と、前記選択手段によって選択された姿勢情報について、前記物体の領域を構成する部位ごとに前記物体の領域情報との整合性を評価し、前記整合性が所定値以上となる部位の姿勢に係る部分姿勢情報を前記選択された姿勢情報から取得する評価手段と、前記評価手段によって取得された部分姿勢情報を統合して前記物体の姿勢を推定する統合手段と、を有することを特徴とする。 An information processing apparatus according to the present invention includes an extraction unit that extracts object region information from an input distance image, a storage unit that stores a plurality of posture information about the object, and an object region extracted by the extraction unit. Selection means for selecting a plurality of posture information for comparison with information from the posture information stored in the storage means, and the posture information selected by the selection means for each part constituting the region of the object An evaluation unit that evaluates consistency with the region information of the object and acquires partial posture information relating to a posture of a part where the consistency is equal to or greater than a predetermined value from the selected posture information; and acquired by the evaluation unit Integrating means for integrating partial posture information to estimate the posture of the object.

本発明によれば、既知の姿勢情報の種類を増大させないようにして、物体の姿勢をより正確に推定することができる。 According to the present invention, the posture of an object can be estimated more accurately without increasing the types of known posture information.

第１の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram showing an example of composition of an information processor concerning a 1st embodiment. 第１の実施形態に係る情報処理装置による基本的な処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the basic processing procedure by the information processing apparatus which concerns on 1st Embodiment. 抽出した人体領域に対応する姿勢情報を選択する処理を説明するための図である。It is a figure for demonstrating the process which selects the attitude | position information corresponding to the extracted human body area | region. 既知の姿勢情報の関節位置から作成する人物の３次元モデルの一例を示す図である。It is a figure which shows an example of the three-dimensional model of the person created from the joint position of known posture information. 距離画像の人物領域を表す３次元の点群に姿勢情報から作成された人物の３次元モデルを当てはめた例を示す図である。It is a figure which shows the example which applied the three-dimensional model of the person created from attitude | position information to the three-dimensional point cloud showing the person area | region of a distance image. 距離画像の人物領域に姿勢情報の関節位置または投影人物モデルを重ねた例を示す図である。It is a figure which shows the example which overlapped the joint position or projection person model of attitude | position information on the person area of the distance image. 第１の実施形態において、部分姿勢選択部による詳細な処理手順の一例を示すフローチャートである。5 is a flowchart illustrating an example of a detailed processing procedure by a partial posture selection unit in the first embodiment. 第２および第３の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on 2nd and 3rd embodiment. 第２の実施形態に係る情報処理装置による基本的な処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the basic process sequence by the information processing apparatus which concerns on 2nd Embodiment. 基準部位情報を選択する方法を説明するための図である。It is a figure for demonstrating the method to select reference | standard part information. 基準部位情報と部分姿勢情報とを統合した例を示す図である。It is a figure which shows the example which integrated reference | standard part information and partial orientation information.

（第１の実施形態）
以下の実施形態では、対象物（物体）を人物として説明する。また、入力する画像は距離画像として説明する。ただし、本発明の対象物および画像は、これらに限定されるものではない。 (First embodiment)
In the following embodiments, the object (object) is described as a person. The input image will be described as a distance image. However, the object and image of the present invention are not limited to these.

また、以下の説明では、人物の姿勢、人物の部位、および部分姿勢を以下のように定義する。まず、人物の姿勢は３次元空間での各関節の位置で表現されるものとする。この関節位置の情報を姿勢情報と呼ぶ。次に、人物の部位は、人物の中で部分的に剛体と見なすことができる部分とし、例えば、頭部、胴体、上腕、前腕、上腿、下腿などである。また、ある人物の部位の位置姿勢を表す姿勢情報の一部の情報を部位情報と呼ぶ。次に、部分姿勢は、人物の姿勢の部分的な姿勢を示し、例えば、上腕と前腕とを含む腕の姿勢である。部分姿勢には、単独の人物の部位（例えば、前腕のみ）の位置姿勢を含むものとする。さらに部分姿勢の位置姿勢を表現する関節位置座標の情報を部分姿勢情報と呼ぶ。 In the following description, the posture of the person, the part of the person, and the partial posture are defined as follows. First, it is assumed that the posture of a person is expressed by the position of each joint in a three-dimensional space. This joint position information is called posture information. Next, the part of the person is a part that can be regarded as a rigid body in the person, for example, the head, the trunk, the upper arm, the forearm, the upper leg, the lower leg, and the like. Also, part of the posture information indicating the position and orientation of a part of a certain person is referred to as part information. Next, the partial posture indicates a partial posture of the human posture, for example, an arm posture including an upper arm and a forearm. The partial posture includes the position and posture of a single person's part (for example, only the forearm). Further, information on joint position coordinates representing the position and orientation of the partial posture is referred to as partial posture information.

なお、姿勢情報の表現方法は関節位置座標に限らず、例えば、基準関節の３次元位置座標と、基準関節からの角度とで他の関節位置を表現する方法などにより表してもよい。また、人物の各部位の剛体モデルの位置姿勢を表す情報（例えば、中心座標と回転情報として位置姿勢を表す）などを用いてもよく、本実施形態の処理を実行できれば、どのような情報としてもよい。 The method of expressing the posture information is not limited to the joint position coordinates, and may be expressed by, for example, a method of expressing other joint positions using the three-dimensional position coordinates of the reference joint and the angle from the reference joint. Also, information indicating the position and orientation of the rigid model of each part of the person (for example, the position and orientation may be expressed as center coordinates and rotation information) may be used. As long as the processing of this embodiment can be executed, any information may be used. Also good.

＜全体構成＞
図１は、本実施形態に係る情報処理装置１００の構成例を示すブロック図である。本実施形態の情報処理装置１００は、ネットワークまたは各種記録媒体を介して取得したソフトウェア（プログラム）を、ＣＰＵ、メモリ、ストレージデバイス、入出力装置、バス、表示装置などにより構成される計算機によって実行することにより実現できる。なお、制御用の計算機については、汎用の計算機を用いてもよく、本発明のソフトウェアに最適に設計されたハードウェアを用いてもよい。 <Overall configuration>
FIG. 1 is a block diagram illustrating a configuration example of the information processing apparatus 100 according to the present embodiment. The information processing apparatus 100 according to the present embodiment executes software (program) acquired via a network or various recording media by a computer including a CPU, a memory, a storage device, an input / output device, a bus, a display device, and the like. Can be realized. As the control computer, a general-purpose computer may be used, or hardware optimally designed for the software of the present invention may be used.

図２は、本実施形態に係る情報処理装置１００による基本的な処理手順の一例を示すフローチャートである。以下、図１及び図２を参照しながら、本実施形態の基本的な処理を説明する。なお、詳細な処理及び内容については、それぞれ後述する。 FIG. 2 is a flowchart illustrating an example of a basic processing procedure performed by the information processing apparatus 100 according to the present embodiment. Hereinafter, basic processing of this embodiment will be described with reference to FIGS. 1 and 2. Detailed processing and contents will be described later.

図１に示す画像入力部１０１は、姿勢を推定する対象となる人物が写った距離画像を入力する構成であり、例えば図２のステップＳ２０１の処理を実行する。対象物領域抽出部１０２は、入力された距離画像から人物領域情報を抽出する構成であり、例えば図２のステップＳ２０２の処理を実行する。記録部１０６には、複数の既知の姿勢情報が格納されており、姿勢選択部１０３は、人物領域の人物姿勢に近いと推定される姿勢を、記録部１０６に格納された既知の姿勢情報から複数選択する。このように、例えば図２のステップＳ２０３の処理を実行する。 The image input unit 101 illustrated in FIG. 1 is configured to input a distance image in which a person whose posture is to be estimated is captured, and executes, for example, the process of step S201 in FIG. The object area extraction unit 102 is configured to extract person area information from the input distance image, and executes, for example, the process of step S202 in FIG. The recording unit 106 stores a plurality of known posture information, and the posture selection unit 103 determines the posture estimated to be close to the human posture in the person area from the known posture information stored in the recording unit 106. Select multiple. Thus, for example, the process of step S203 in FIG. 2 is executed.

部分姿勢選択部１０４は、複数選択した既知の姿勢情報と人物領域情報とを比較して、既知の姿勢情報と人物領域情報とが一致している部分を部分姿勢情報として選択する構成であり、例えば図２のステップＳ２０４の処理を実行する。姿勢統合部１０５は、部分姿勢選択部１０４で選択した部分姿勢情報を統合して人物姿勢情報を生成し、姿勢推定結果として出力する構成であり、例えば図２のステップＳ２０５の処理を実行する。 The partial posture selection unit 104 is configured to compare a plurality of selected known posture information and person region information, and select a portion where the known posture information and the person region information match as partial posture information, For example, the process of step S204 in FIG. 2 is executed. The posture integration unit 105 is configured to integrate the partial posture information selected by the partial posture selection unit 104 to generate person posture information and output it as a posture estimation result. For example, the processing of step S205 in FIG. 2 is executed.

以下、各構成による処理について詳細に説明する。 Hereinafter, processing by each configuration will be described in detail.

＜画像入力部１０１＞
画像入力部１０１に入力される距離画像は、画像中の各画素に奥行き方向の距離情報が付加された画像である。距離画像は、照射した光の到達時間を基に距離を計算する方式や、パターン光を投影して、パターンの変形から距離を計算する方法などにより取得することができる。画像入力部１０１は、これらの手法によって距離画像を撮影するカメラから距離画像を取得する。また、不図示の記録装置に保存された撮影済みの距離画像を、画像入力部１０１に順次入力するようにしてもよい。 <Image input unit 101>
The distance image input to the image input unit 101 is an image in which distance information in the depth direction is added to each pixel in the image. The distance image can be acquired by a method of calculating the distance based on the arrival time of the irradiated light, a method of calculating the distance from the deformation of the pattern by projecting pattern light, and the like. The image input unit 101 acquires a distance image from a camera that captures a distance image using these methods. Alternatively, the captured distance images stored in a recording device (not shown) may be sequentially input to the image input unit 101.

＜対象物領域抽出部１０２＞
次に、対象物領域抽出部１０２は、入力した距離画像から人物の画像領域を抽出し、この画像領域を表す座標などの情報を人物領域情報とする。距離画像から人物領域情報を抽出する処理は、以下のような方法で実施できる。 <Object region extraction unit 102>
Next, the object area extraction unit 102 extracts a person image area from the input distance image, and uses information such as coordinates representing the image area as person area information. The process of extracting the person area information from the distance image can be performed by the following method.

まず、距離画像に対して、背景差分処理や動体領域抽出を行うことにより、人物の候補となる画素のみを抽出する。次に、前景の候補となる画素の距離値をカメラ座標系（３次元座標系）の点群へ変換する。３次元の点群の固まりの中心位置を求め、中心位置周辺の点群の中から、人物サイズに収まる範囲に存在する点を人物領域とする。人物領域とラベル付けされた点群を再び画像平面に投影することにより、人物領域が抽出された距離画像を取得することができる。以下の実施形態では、この抽出された人物領域の距離画像が人物領域情報となる。人物領域情報の抽出方法は、これに限らず公知の手法を用いればよい。例えば、上記に加えて、さらに床面を推定する処理を加え、人物領域をより正確に分離するようにしてもよい。 First, only a candidate pixel is extracted by performing background difference processing and moving object region extraction on the distance image. Next, the distance value of the pixel that is a foreground candidate is converted into a point group in the camera coordinate system (three-dimensional coordinate system). The center position of the cluster of the three-dimensional point group is obtained, and a point existing in a range that falls within the person size from the point group around the center position is defined as a person region. By projecting the point group labeled as the person area onto the image plane again, a distance image from which the person area is extracted can be acquired. In the following embodiment, the distance image of the extracted person area becomes person area information. The person area information extraction method is not limited to this, and a known method may be used. For example, in addition to the above, a process of estimating the floor surface may be further added so that the person region is more accurately separated.

＜記録部１０６＞
次に、記録部１０６に記録されている姿勢情報について説明する。図３に示すように、記録部１０６には、ｍ個の姿勢情報３０１〜３０ｍが格納されている。本実施形態における姿勢情報は、人物の手足や頭などの人物部位の位置関係を表すことができれば、どのような形態で姿勢情報を保持してもよい。姿勢情報の形態としては、例えば人物の主要な部位の座標であり、人物の主要な部位とは、各関節位置や、頭部や手足などの末端部位である。以下、これらをまとめて関節位置と呼ぶ。 <Recording unit 106>
Next, the posture information recorded in the recording unit 106 will be described. As illustrated in FIG. 3, m pieces of posture information 301 to 30 m are stored in the recording unit 106. The posture information in the present embodiment may hold the posture information in any form as long as it can represent the positional relationship between human parts such as a person's limbs and head. The form of posture information is, for example, the coordinates of the main part of the person, and the main part of the person is the position of each joint or the terminal part such as the head or limbs. Hereinafter, these are collectively referred to as a joint position.

図３に示す姿勢情報３０１〜３０ｍは、頭部、首、肩、胸、腰、肘、手首、股関節、膝、足首などの関節位置に係る情報である。記録部１０６に記録される関節位置に係る情報は、これに限らず、さらに手先や足先などを含んでもよい。本実施形態では、既知の姿勢情報として、上記の関節位置の情報が、以下の式（１）に示すような腰位置を基準とした３次元情報Ｐ_Hとして記録されているものとする。 Posture information 301 to 30m shown in FIG. 3 is information related to joint positions such as the head, neck, shoulder, chest, waist, elbow, wrist, hip joint, knee, and ankle. The information related to the joint position recorded in the recording unit 106 is not limited to this, and may further include a hand tip or a foot tip. In the present embodiment, as a known position information, information of the joint positions are assumed to be recorded as a three-dimensional information P _H relative to the waist position as shown in the following equation (1).

ここで、（ξ_xj，ξ_yj，ξ_zj）（ｊ＝０，１，・・・，Ｊ）はｊ番目の関節の３次元位置の座標を表している。例えば座標（ξ_x0，ξ_y0，ξ_z0）を腰位置の座標とすると、ξ_x0＝０、ξ_y0＝０、ξ_z0＝０であり、その他の関節位置は、腰位置を原点とした座標系での位置で表されている。なお、原点とする関節は、腰位置に限定することなく他の関節でもよいが、以下の説明では、腰位置を原点とした３次元情報が記録部１０６に記録されているものとする。 Here, (ξ _xj , ξ _yj , ξ _zj ) (j = 0, 1,..., J) represents the coordinates of the three-dimensional position of the j-th joint. For example, if the coordinates (ξ _x0 , ξ _y0 , ξ _z0 ) are the coordinates of the waist position, ξ _x0 = 0, ξ _y0 = 0, ξ _z0 = 0, and other joint positions are coordinates with the waist position as the origin. It is represented by the position in the system. The joint as the origin may be another joint without being limited to the waist position, but in the following description, it is assumed that three-dimensional information with the waist position as the origin is recorded in the recording unit 106.

記録部１０６に記録されている既知の姿勢情報として、あらゆる姿勢に対応した姿勢情報を保持していることが理想ではあるが、現実的には離散的あるいは限定的な姿勢数となる。ただし、記録されている姿勢情報は、その部分的な姿勢の組み合わせにより、多くの姿勢を表現できるような姿勢情報であることが望ましい。例えば、ある姿勢を、既知の姿勢情報の上半身と下半身との組み合わせにより表現できるように、既知の姿勢情報が複数種類記録されていることが望ましい。 Ideally, the posture information corresponding to any posture is held as the known posture information recorded in the recording unit 106, but in reality, the number of postures is discrete or limited. However, it is desirable that the recorded posture information is posture information that can express many postures by a combination of the partial postures. For example, it is desirable that a plurality of types of known posture information are recorded so that a certain posture can be expressed by a combination of the upper and lower bodies of the known posture information.

＜姿勢選択部１０３＞
姿勢選択部１０３は、対象物領域抽出部１０２で抽出した人物領域情報に近い既知の姿勢情報を、記録部１０６から複数選択する。図３に示すように、人物領域３１０に対して、姿勢選択部１０３の処理によって、記録部１０６からｎ個（ｎ＜ｍ）の姿勢情報３２１〜３２ｎを選択する。 <Attitude selection unit 103>
The posture selection unit 103 selects a plurality of pieces of known posture information close to the person region information extracted by the target region extraction unit 102 from the recording unit 106. As illustrated in FIG. 3, n (n <m) pieces of posture information 321 to 32n are selected from the recording unit 106 by the processing of the posture selection unit 103 for the person region 310.

まず、記録部１０６には、既知の姿勢情報３０１〜３０ｍとともに、その既知の姿勢情報に合致する距離画像の人物領域情報が記録されているものとする。そして、姿勢選択部１０３は、入力された人物領域情報と既知の姿勢情報に係る人物領域情報とを比較し、誤差が小さい順に既知の姿勢情報をソートする。このとき、人物領域情報を特徴量に変換して比較してもよい。特徴量としては、ＨＯＧ（Histograms of Oriented Gradient）特徴などのグレースケール向け特徴量を、距離画像向けに変更して用いることができる。特徴量を用いた比較を行う場合には、記録部１０６に記録されるべき情報は、既知の姿勢の人物領域の画像特徴量となる。 First, it is assumed that the recording unit 106 records the person area information of the distance image that matches the known posture information together with the known posture information 301 to 30m. The posture selection unit 103 compares the input human region information with the human region information related to the known posture information, and sorts the known posture information in ascending order of error. At this time, the person area information may be converted into a feature amount for comparison. As the feature amount, a feature amount for gray scale such as a HOG (Histograms of Oriented Gradient) feature can be changed and used for a distance image. In the case of performing comparison using the feature amount, the information to be recorded in the recording unit 106 is the image feature amount of the human region having a known posture.

そして、姿勢選択部１０３は、誤差が小さい既知の姿勢情報を上位から所定数選択する。なお、所定数に限定するのではなく、所定の誤差以下となる複数の姿勢情報を選択して出力してもよい。また、誤差が最小となる最近傍の姿勢情報が、入力された人物領域情報に合致していると判断できる場合（例えば、姿勢情報の誤差が他の姿勢情報に比べて非常に小さい場合）には、以下に説明する省略して、情報処理装置１００の出力結果としてもよい。 The posture selection unit 103 selects a predetermined number of known posture information with small errors from the top. In addition, it is not limited to a predetermined number, You may select and output the some attitude | position information which becomes below a predetermined error. In addition, when it is possible to determine that the nearest posture information that minimizes the error matches the input person area information (for example, when the posture information error is very small compared to other posture information). May be an output result of the information processing apparatus 100, which will be omitted below.

人物領域情報に類似する姿勢情報を記録部１０６から選択する方法は、これに限らず他の方法を用いてもよく、例えば、非特許文献１に記載の方法のように、決定木を用いた方法を用いてもよい。決定木を用いる方法では、多数の既知の姿勢情報と人物領域情報との類似性を効率良く探索できるため、処理速度の面において有効な方法である。決定木を用いる場合には、末端ノードに複数の既知の姿勢情報の確率を格納しておくか、または複数の決定木を用いて類似する既知の姿勢情報を探索するなどの手法により、複数の姿勢情報を選択することができる。 The method for selecting posture information similar to the person area information from the recording unit 106 is not limited to this, and other methods may be used. For example, a decision tree is used as in the method described in Non-Patent Document 1. A method may be used. The method using a decision tree is an effective method in terms of processing speed because it can efficiently search for similarities between a large number of known posture information and person area information. When using a decision tree, the probabilities of a plurality of known posture information are stored in the terminal node, or a plurality of decision trees are used to search for similar known posture information. Attitude information can be selected.

効率良く既知の姿勢情報を探索する他の方法として、決定木に限らず、他の近似的近傍探索の方法も利用することができる。近似的近傍探索の方法としては、例えば、ハッシュを用いる方法などがある。さらに、別な手法の例としては、非特許文献２に記載されているような人物領域の点群と予め保存した各姿勢の点群とを比較する手法もある。入力画像の人物と一致する可能性が高い既知の姿勢情報を複数選択することができれば、どのような手法を用いてもよい。 As another method for efficiently searching for known posture information, not only a decision tree but also other approximate neighborhood search methods can be used. As an approximate neighborhood search method, for example, there is a method using a hash. Furthermore, as another example of the technique, there is a technique of comparing a point group of a person area as described in Non-Patent Document 2 with a point group of each posture stored in advance. Any method may be used as long as a plurality of pieces of known posture information that are highly likely to match the person in the input image can be selected.

本実施形態においては、既知の姿勢情報を複数の選択する方法としては、例えば、以下のように実施する。各姿勢情報と人物領域情報との誤差や距離値が得られる場合には、所定値以下の誤差や距離値を示す複数の姿勢情報を選択すればよい。一方、各姿勢情報と人物領域情報との関係が確率で得られる場合には、所定の確率以上の姿勢情報を選択すればよい。また、人物領域情報に示す姿勢と一致する可能性が高い上位の既知の姿勢情報を、常に所定の数を選択するようにしてもよい。姿勢選択部１０３は、人物領域の姿勢と一致する可能性が高い既知の姿勢情報を複数出力する。 In this embodiment, as a method for selecting a plurality of known posture information, for example, the following is performed. When an error or distance value between each posture information and the person area information is obtained, a plurality of posture information indicating an error or distance value equal to or less than a predetermined value may be selected. On the other hand, if the relationship between each piece of posture information and person area information can be obtained with probability, posture information having a predetermined probability or higher may be selected. Alternatively, a predetermined number of upper known posture information that is highly likely to match the posture shown in the person area information may be always selected. The posture selection unit 103 outputs a plurality of pieces of known posture information that are highly likely to match the posture of the person area.

また、後述する部分姿勢情報の重み付け平均を求めるためには、選択した既知の姿勢情報と合わせて、その姿勢情報と人物領域情報との類似度を示す値を出力する必要がある。上記の例では、類似度を計算する際に誤差、距離、確率などを用いることができる。以下、これらをまとめて既知の姿勢情報のスコアと呼び、スコアが高いほど人物領域情報と既知の姿勢情報とで高い類似性を示すものとする。 In addition, in order to obtain a weighted average of partial posture information, which will be described later, it is necessary to output a value indicating the similarity between the posture information and the person area information together with the selected known posture information. In the above example, errors, distances, probabilities, etc. can be used when calculating the similarity. Hereinafter, these are collectively referred to as a known posture information score, and the higher the score, the higher the similarity between the person area information and the known posture information.

なお、本実施形態では、記録部１０６に記録されている既知の姿勢情報は、腰位置を原点とした関節位置情報とした。したがって、入力される距離画像の人物位置に合わせて、姿勢情報における関節位置の座標を必要に応じて並進または回転させる必要がある。例えば、カメラ位置を原点としたカメラ座標系に、選択した既知の姿勢情報の関節座標を平行移動させて合わせる場合には、以下の式（２）により関節位置座標Ｐ_Cを得る。 In the present embodiment, the known posture information recorded in the recording unit 106 is joint position information with the waist position as the origin. Therefore, it is necessary to translate or rotate the coordinates of the joint position in the posture information as necessary in accordance with the person position of the input distance image. For example, when the joint coordinates of the selected known posture information are moved in parallel to the camera coordinate system with the camera position as the origin, the joint position coordinates P _C are obtained by the following equation (2).

ここで座標（ξ_xc，ξ_yc，ξ_zc）は、カメラ座標系における腰位置の座標であり、ｊ＝０，１，・・・，Ｊまで加算される。この結果、カメラ座標系における姿勢情報の関節位置座標Ｐ_Cを得ることができる。カメラ座標系における腰位置は、人物領域情報の重心から求める方法や、姿勢の選択と同時に腰位置らしい位置を投票などによって求める方法など、様々な方法で求めることができる。また、腰位置の算出方法によっては、投票により複数の腰位置ピークが得られる場合など、一つの姿勢情報に対して、カメラ座標系における腰位置を複数得ることがある。このような場合には、一つの姿勢情報の３次元情報Ｐ_Hを元に、複数のカメラ座標系での関節位置座標Ｐ_Cを得ることになる。この複数の関節位置座標Ｐ_Cも、姿勢選択部１０３が出力する複数の姿勢情報のバリエーションとしてもよい。以下の説明では、姿勢選択部１０３が出力する複数の既知の姿勢情報には、カメラ座標系の関節位置座標Ｐ_Cが表されているものとする。 Here, the coordinates (ξ _xc , ξ _yc , ξ _zc ) are the coordinates of the waist position in the camera coordinate system, and j = 0, 1,. As a result, it is possible to obtain a joint position coordinates P _C of the posture information in the camera coordinate system. The waist position in the camera coordinate system can be obtained by various methods, such as a method of obtaining from the center of gravity of the person area information, or a method of obtaining a position corresponding to the waist position by voting at the same time as selecting the posture. Also, depending on the method of calculating the waist position, a plurality of waist positions in the camera coordinate system may be obtained for one piece of posture information, such as when a plurality of waist position peaks are obtained by voting. In such a case, based on the three-dimensional information P _H of one orientation information, it will get a joint position coordinates P _C of a plurality of the camera coordinate system. The plurality of joint position coordinate P _C may also as a variation of the plurality of orientation information output from the posture selection unit 103. In the following description, it is assumed that the plurality of known posture information output by the posture selection unit 103 represents the joint position coordinates P _C of the camera coordinate system.

＜部分姿勢選択部１０４＞
部分姿勢選択部１０４は、姿勢選択部１０３で選択した複数の姿勢情報と人物領域情報とを比較して、人物領域情報と整合する部分姿勢情報を選択する。既知の姿勢情報として、起こりうる全ての姿勢に対応した姿勢情報を記録部１０６に準備しておくことは困難である。したがって、姿勢選択部１０３は、人物領域情報に類似していると推定した姿勢情報を選択しているが、人物領域情報と完全に一致しないことも多い。しかしながら、このような場合であっても部分的には入力された距離画像の人物姿勢と一致している場合も多い。 <Partial posture selection unit 104>
The partial posture selection unit 104 compares the plurality of posture information selected by the posture selection unit 103 with the person area information, and selects partial posture information that matches the person area information. It is difficult to prepare posture information corresponding to all possible postures in the recording unit 106 as known posture information. Therefore, the posture selection unit 103 selects posture information estimated to be similar to the person area information, but often does not completely match the person area information. However, even in such a case, there are many cases where it partially matches the human posture of the inputted distance image.

例えば、人物領域情報と最も類似している最近傍の既知の姿勢情報において、腕を含む上半身については人物姿勢と整合しているが、下半身は人物姿勢と異なっている場合がある。このような場合でも、他の既知の姿勢情報には、下半身の姿勢が人物領域に整合している姿勢情報が存在していることが期待できる。部分姿勢選択部１０４は、これらの部分的に人物領域に整合した姿勢情報を収集する。 For example, in the nearest known posture information that is most similar to the person region information, the upper body including the arm is consistent with the human posture, but the lower body may be different from the human posture. Even in such a case, it can be expected that the other known posture information includes posture information in which the posture of the lower body matches the person area. The partial posture selection unit 104 collects posture information that partially matches the person area.

以下、部分姿勢情報を選択する例について、図面を参照しながら説明する。
図４は、既知の姿勢情報４０１の関節位置から作成する人物の３次元モデル４０２の一例を示す図である。図４に示す３次元モデル４０２は、各部位が円筒で近似されたモデルであり、関節位置の各関節座標を元に各部位の円筒モデルが配置されている。以下、図５を参照しながら、この３次元モデルを用いて３次元空間で人物領域情報と姿勢情報との対応を確認する方法について説明する。 Hereinafter, an example of selecting partial posture information will be described with reference to the drawings.
FIG. 4 is a diagram illustrating an example of a three-dimensional model 402 of a person created from the joint positions of known posture information 401. A three-dimensional model 402 shown in FIG. 4 is a model in which each part is approximated by a cylinder, and a cylindrical model of each part is arranged based on each joint coordinate of the joint position. Hereinafter, a method for confirming the correspondence between the person area information and the posture information in the three-dimensional space using the three-dimensional model will be described with reference to FIG.

図５（ａ）は、人物領域３１０の距離値を３次元の点群（Point Cloud）５１０として表した図である。ここで人物領域の点群５１０は、カメラの視線と異なる方向から表示されている。図５（ｂ）は、人物領域３１０に対して姿勢選択部１０３により選択された姿勢情報３２１の関節位置を３次元モデル５２０で表した図である。姿勢選択部１０３から複数の既知の姿勢情報が出力されるが、図５（ｂ）に示した例はその中の一つである。 FIG. 5A is a diagram illustrating the distance value of the person region 310 as a three-dimensional point cloud 510. Here, the point group 510 of the person area is displayed from a direction different from the line of sight of the camera. FIG. 5B is a diagram in which the joint position of the posture information 321 selected by the posture selection unit 103 with respect to the person region 310 is represented by a three-dimensional model 520. A plurality of known posture information is output from the posture selection unit 103, and the example shown in FIG. 5B is one of them.

図５（ｃ）は、図５（ａ）に示す点群５１０と図５（ｂ）に示す３次元モデル５２０とを重ね合わせた例を示す図である。この状態において、人物の各部位と点群との対応状況を調べることにより、人物領域情報と整合性の高い部分姿勢情報を選択する。 FIG. 5C is a diagram illustrating an example in which the point group 510 illustrated in FIG. 5A and the three-dimensional model 520 illustrated in FIG. In this state, by examining the correspondence between each part of the person and the point group, partial posture information having high consistency with the person region information is selected.

例えば、図５（ｃ）に示す左上腕５３１では、円筒モデル付近に十分な点群が存在するため、人物領域情報と良く一致していると判断できる。一方、左上腿部５３２では、円筒モデル付近に点群が存在しないため、人物領域情報と一致していないと判断できる。このように各部位と点群との対応状況を判断する。図５（ｃ）に示す例では、頭部、胴体、左右の上腕および前腕の部分は、人物領域情報に一致した部分姿勢情報である判断できる。 For example, in the upper left arm 531 shown in FIG. 5C, it can be determined that there is a sufficient point cloud in the vicinity of the cylindrical model, so that it matches well with the person area information. On the other hand, in the left upper thigh 532, since there is no point cloud near the cylindrical model, it can be determined that it does not match the person area information. In this way, the correspondence status between each part and the point group is determined. In the example shown in FIG. 5C, the head, torso, left and right upper arms, and forearm portions can be determined as partial posture information that matches the person area information.

このように人物の各部位の整合性を評価するより具体的な処理について説明する。まず、各部位の円筒モデルに点群が対応しているか否かを評価するために、円筒モデルの表面から所定範囲内に存在する点を、その円筒モデルに対応づける。ある部位の円筒モデルに、所定数以上の点が対応付けされている場合には、その部位と人物領域（点群）との整合性が高いと評価する。円筒モデルと点群の整合性を判断する基準となる対応点の数は、完全に円筒モデルと点群が一致することを想定した数としてもよく、円筒モデルの一部でも点群と一致すればよいとする数としてもよい。このように対応点の数が所定値以上である場合に整合性が高いと評価する。 A more specific process for evaluating the consistency of each part of a person will be described. First, in order to evaluate whether or not a point group corresponds to the cylindrical model of each part, a point existing within a predetermined range from the surface of the cylindrical model is associated with the cylindrical model. When a predetermined number or more points are associated with a cylindrical model of a certain part, it is evaluated that the consistency between the part and the person area (point group) is high. The number of corresponding points used as a criterion for judging the consistency between the cylindrical model and the point cloud may be a number that assumes that the cylindrical model and the point cloud completely match, and even a part of the cylindrical model matches the point cloud. It is good also as the number which should just do. Thus, when the number of corresponding points is equal to or greater than a predetermined value, it is evaluated that the consistency is high.

また、図５（ａ）に示す点群５１０は、距離画像のカメラ方向から見える部分のみに基づいて生成されるため、既知の姿勢情報から作成される円筒モデルと最も良い整合性を示す部位であっても、対応する点群数が少ない場合がある。そこで、部位によって、円筒モデルと点群との整合性を判断する基準（対応付けされる点の数の閾値）を変更するようにしてもよい。なお、図４および図５に示した例では、３次元モデルとして円筒モデルを用いたが、人物モデルの表現方法はこれに限定されない。３次元空間で人物領域情報との対応を確認することができれば、どのような形状のモデルを用いてもよい。 Further, since the point group 510 shown in FIG. 5A is generated based only on the portion of the distance image that can be seen from the camera direction, the point group 510 is a part that shows the best consistency with the cylindrical model created from the known posture information. In some cases, the number of corresponding point groups is small. Therefore, the criteria for determining the consistency between the cylindrical model and the point group (threshold value threshold) may be changed depending on the part. In the examples shown in FIGS. 4 and 5, the cylindrical model is used as the three-dimensional model. However, the method of expressing the person model is not limited to this. Any shape model may be used as long as the correspondence with the person area information can be confirmed in the three-dimensional space.

次に、部分姿勢情報を選択する別の方法について、図６を参照しながら説明する。図５に示した例では、３次元空間で人物の各部位と人物領域との整合性を評価したが、簡易的に距離画像の平面上（２次元上）で評価するようにしてもよい。図６（ａ）は、距離画像の人物領域３１０と既知の姿勢情報３２１の関節位置とを重畳して表示した例を示す図である。この方法では、各関節位置または関節間の部位が人物領域の座標上に存在するか否かを確認することにより、各部位と人物領域との整合性を判断する。 Next, another method for selecting partial posture information will be described with reference to FIG. In the example shown in FIG. 5, the consistency between each part of the person and the person region is evaluated in the three-dimensional space, but the evaluation may be simply performed on the plane (two-dimensional) of the distance image. FIG. 6A is a diagram illustrating an example in which the person region 310 of the distance image and the joint position of the known posture information 321 are superimposed and displayed. In this method, the consistency between each part and the person area is determined by checking whether or not each joint position or a part between the joints exists on the coordinates of the person area.

図６（ａ）に示す例では、頭部、胴体、左右の上腕および前腕に関する関節位置が、人物領域上に存在するため、これらの部位の座標は人物領域情報に整合した部分姿勢情報であると判断する。なお、人物領域上に関節位置の座標が存在するか否かによって各部位の整合性を評価したが、人物領域の近傍付近に関節位置が存在する場合も、整合性が高いと評価するようにしてもよい。これにより、多少の姿勢のズレを許容して部分姿勢情報を選択することができるようになる。 In the example shown in FIG. 6A, the joint positions relating to the head, torso, left and right upper arms, and forearms are present on the person area, so the coordinates of these parts are partial posture information that matches the person area information. Judge. The consistency of each part was evaluated based on whether or not the coordinates of the joint position exist on the person area. However, when the joint position exists near the person area, it is evaluated that the consistency is high. May be. As a result, partial posture information can be selected while allowing some posture deviation.

また、図６（ｂ）は、図４に示した円筒モデルを距離画像の平面上に投影した投影人物モデル３２２を人物領域３１０に重ねた例を示す図である。この投影人物モデル３２２と人物領域３１０とで各部位の整合性を判断し、選択する部分姿勢情報を決定するようにしてもよい。この場合、投影人物モデル３２２の各部位と、人物領域３１０との重複度合いを算出することにより、整合性を評価することができる。例えば、投影人物モデル３２２におけるある部位の面積と、その投影人物モデル３２２内の人物領域の面積（人物領域の画素数）との比をとり、所定の割合以上に人物領域が含まれている場合は、その部位の座標と人物領域情報とは整合性が高いと評価する。 FIG. 6B is a diagram showing an example in which a projected person model 322 obtained by projecting the cylindrical model shown in FIG. The consistency of each part may be determined based on the projected person model 322 and the person region 310, and the partial posture information to be selected may be determined. In this case, the consistency can be evaluated by calculating the degree of overlap between each part of the projected person model 322 and the person region 310. For example, when the ratio of the area of a certain part in the projected person model 322 and the area of the person area in the projected person model 322 (number of pixels of the person area) is included, the person area is included in a predetermined ratio or more. Evaluates that the coordinates of the part and the person area information are highly consistent.

以上のような方法により、姿勢選択部１０３で得た複数の姿勢情報に対して判断することにより、各姿勢情報から部分姿勢情報を得ることができる。ただし、人物領域情報と全く一致していない姿勢情報が存在する場合には、その姿勢情報から部分姿勢情報が選択できない場合もある。部分姿勢選択部１０４は、以上のようにして選択した部分姿勢の姿勢情報（本実施形態では、部分姿勢の関節位置の情報）を部分姿勢情報として出力する。図５及び図６に示した例では、頭部、首、肩、胸、腰、肘、手首、および股関節に関する関節位置の座標を部分姿勢情報として出力する。 By determining the plurality of posture information obtained by the posture selection unit 103 by the method as described above, partial posture information can be obtained from each piece of posture information. However, if there is posture information that does not match the person area information at all, partial posture information may not be selected from the posture information. The partial posture selection unit 104 outputs the posture information of the partial posture selected as described above (in this embodiment, information on the joint position of the partial posture) as partial posture information. In the example shown in FIGS. 5 and 6, the coordinates of the joint positions related to the head, neck, shoulder, chest, waist, elbow, wrist, and hip joint are output as partial posture information.

＜姿勢統合部１０５＞
姿勢統合部１０５は、以上の処理により得られた部分姿勢情報を統合して、情報処理装置１００が出力する人物姿勢情報を生成する。部分姿勢情報の統合処理では、各部分姿勢情報における各関節位置の座標を平均化して人物姿勢情報を生成する。例えば、頭部座標については、頭部を含む部分姿勢情報から頭部座標を参照し、全ての頭部座標を平均化する。また、平均化する際に、各姿勢情報に付随したスコアに基づく重み付けを行ってもよい。前述したように、姿勢選択部１０３は、姿勢情報とともに、人物領域情報との類似度を示すスコアを出力しており、これを元に各部分姿勢が示す関節位置の座標に重みを付けた加重平均により人物姿勢情報を求めてもよい。 <Attitude integration unit 105>
The posture integration unit 105 integrates the partial posture information obtained by the above processing and generates person posture information output from the information processing apparatus 100. In the partial posture information integration process, the coordinates of each joint position in each partial posture information are averaged to generate human posture information. For example, for the head coordinates, the head coordinates are referred to from the partial posture information including the head, and all the head coordinates are averaged. Further, when averaging, weighting based on a score attached to each posture information may be performed. As described above, the posture selection unit 103 outputs a score indicating the similarity to the human region information together with the posture information, and based on this, weighting is applied to the coordinates of the joint position indicated by each partial posture. Person posture information may be obtained by averaging.

以上の処理により、情報処理装置１００から人物姿勢情報を出力することができる。なお、情報処理装置１００から出力される人物姿勢情報を初期値として、人物領域情報に対して詳細な部位の位置合わせを行ってもよい。部位の位置合わせは、各部位の円筒モデルと、人物領域の点群とのＩＣＰ（Iterative Closest Point）など、公知の技術により実施することができる。 Through the above processing, the human posture information can be output from the information processing apparatus 100. It should be noted that the person position information output from the information processing apparatus 100 may be used as an initial value to perform detailed position alignment on the person area information. The position alignment can be performed by a known technique such as an ICP (Iterative Closest Point) between the cylindrical model of each part and the point group of the person region.

図７は、図２のステップＳ２０４に相当する部分姿勢選択部１０４による処理手順の一例を示すフローチャートである。
まず、ステップＳ７０１において、部分姿勢選択部１０４は、姿勢選択部１０３で選択された複数の既知の姿勢情報から、一つの姿勢情報を選択する。そして、ステップＳ７０２において、選択した姿勢情報と人物領域情報との整合性を評価する。例えば、人物領域の３次元の点群と姿勢情報の関節位置情報を元に作成した人物の３次元モデルとを比較することによって整合性を評価する。 FIG. 7 is a flowchart illustrating an example of a processing procedure performed by the partial posture selection unit 104 corresponding to step S204 of FIG.
First, in step S <b> 701, the partial posture selection unit 104 selects one piece of posture information from a plurality of known posture information selected by the posture selection unit 103. In step S702, the consistency between the selected posture information and the person area information is evaluated. For example, the consistency is evaluated by comparing a three-dimensional point cloud of a person region with a three-dimensional model of a person created based on joint position information of posture information.

続いてステップＳ７０３において、整合性の高い部位がある場合に、その部位を部分姿勢として選択し、部分姿勢情報を姿勢統合部１０５に出力する。そして、ステップＳ７０４において、まだ整合性を評価していない姿勢情報が存在するか否かを判定する。この判定の結果、整合性を評価していない姿勢情報が存在する場合はステップＳ７０１へと戻る。一方、全ての姿勢情報において整合性を評価して部分姿勢情報を取得した場合は、処理を終了する。 In step S <b> 703, if there is a highly consistent part, the part is selected as a partial posture, and the partial posture information is output to the posture integrating unit 105. In step S704, it is determined whether there is posture information that has not yet been evaluated for consistency. As a result of this determination, if there is posture information whose consistency has not been evaluated, the process returns to step S701. On the other hand, when consistency is evaluated in all posture information and partial posture information is acquired, the process ends.

以上のように本実施形態によれば、入力された距離画像の人物領域情報に対して類似すると推定した複数の姿勢情報を元に、人物領域情報と部分的に整合する部分姿勢情報を選択して統合することにより出力すべき人物姿勢情報を生成するようにした。この結果、予め記録部１０６に記録されている姿勢情報では対応できない未知の姿勢に対しても、精度良く姿勢を出力することができる。 As described above, according to the present embodiment, based on a plurality of posture information estimated to be similar to the human region information of the input distance image, partial posture information that partially matches the human region information is selected. The person posture information that should be output is generated by integrating them. As a result, it is possible to output the posture with high accuracy even for an unknown posture that cannot be handled by the posture information recorded in the recording unit 106 in advance.

（第２の実施形態）
第１の実施形態では、既知の姿勢情報から人物領域情報と部分的に整合性の高い部分領域情報を選択し、平均化することにより人物姿勢情報を生成する方法について説明した。第１の実施形態では、部分姿勢情報を平均化することにより人物らしい姿勢が多少崩れた人物姿勢情報を生成してしまう可能性がある。そこで本実施形態では、さらに人物らしい人物姿勢情報を生成する例について説明する。 (Second Embodiment)
In the first embodiment, the method of generating the human posture information by selecting the partial region information that is partially consistent with the human region information from the known posture information and averaging the selected partial region information. In the first embodiment, by averaging the partial posture information, there is a possibility that the human posture information in which the posture that seems to be a person is slightly broken is generated. Therefore, in this embodiment, an example of generating person posture information that is more like a person will be described.

本実施形態では、人物領域情報に最も一致する姿勢情報を基準姿勢情報として、さらにその一部を基準部位情報とする。そして、基準部位情報と他の姿勢情報から取得した部分姿勢情報とを組み合わせて人物姿勢情報を作成する。以下、具体的な実施形態について説明する。 In this embodiment, the posture information that most closely matches the person area information is set as reference posture information, and a part thereof is set as reference portion information. Then, the human posture information is created by combining the reference part information and the partial posture information acquired from the other posture information. Hereinafter, specific embodiments will be described.

図８は、本実施形態に係る情報処理装置８００の構成例を示すブロック図である。また、図９は、本実施形態に係る情報処理装置８００による基本的な処理手順の一例を示すフローチャートである。図１に示した構成と比較して、本実施形態では、さらに基準部位選択部１０７を備えている。基準部位選択部１０７は、姿勢選択部１０３で選択された複数の姿勢情報から、最も人物領域情報に一致する姿勢情報を基準姿勢情報として選択し、その一部を基準部位情報として選択する。 FIG. 8 is a block diagram illustrating a configuration example of the information processing apparatus 800 according to the present embodiment. FIG. 9 is a flowchart illustrating an example of a basic processing procedure performed by the information processing apparatus 800 according to the present embodiment. Compared to the configuration shown in FIG. 1, the present embodiment further includes a reference site selection unit 107. The reference part selection unit 107 selects, as reference posture information, the posture information that most closely matches the person area information from the plurality of posture information selected by the posture selection unit 103, and selects a part thereof as reference part information.

本実施形態の部分姿勢選択部１０４は、基準部位以外の部分姿勢（部位）を姿勢情報から選択する。例えば、基準部位として上半身の部位が選択されている場合には、最適な下半身の部分姿勢情報を複数の姿勢情報から探索する。姿勢統合部１０５では、基準部位情報と部分姿勢情報とを組み合わせて人物姿勢情報を生成する。 The partial posture selection unit 104 of the present embodiment selects a partial posture (part) other than the reference part from the posture information. For example, when the upper body part is selected as the reference part, the optimum lower body partial posture information is searched from a plurality of posture information. The posture integration unit 105 combines the reference part information and the partial posture information to generate person posture information.

また、図９のステップＳ９０１〜Ｓ９０３の処理は、それぞれ図２のステップＳ２０１〜Ｓ２０３の処理と同様であるため、説明は省略する。以下、ステップＳ９０４以降の処理について詳細に説明する。 Moreover, since the process of step S901-S903 of FIG. 9 is the same as the process of step S201-S203 of FIG. 2, respectively, description is abbreviate | omitted. Hereinafter, the processing after step S904 will be described in detail.

＜記録部１０６＞
第１の実施形態では、記録部１０６には、様々な姿勢の関節位置情報が既知の姿勢情報として記録されていた。本実施形態では、各姿勢情報について、その姿勢に関する情報が合わせて保存されているものとする。姿勢に関する情報としては、例えば、歩く、走る、しゃがむなどの動作カテゴリや、身長・体格などの人物のサイズ、カメラに対する人物向き・角度、などの情報である。以下、これらをまとめて姿勢パラメータと呼ぶ。姿勢パラメータは、関節位置座標の情報（または関節角度情報）に加えて姿勢の状態や属性を表す情報であって、上記以外の情報を含んでもよい。記録部１０６には、各姿勢情報に、この姿勢パラメータが対応づけられて保存されている。 <Recording unit 106>
In the first embodiment, joint position information of various postures is recorded in the recording unit 106 as known posture information. In the present embodiment, for each posture information, information on the posture is stored together. The information on the posture is, for example, information such as an action category such as walking, running, squatting, the size of a person such as height and physique, and the direction and angle of the person relative to the camera. Hereinafter, these are collectively referred to as posture parameters. The posture parameter is information indicating the posture state and attributes in addition to the information on the joint position coordinates (or joint angle information), and may include information other than the above. In the recording unit 106, each posture information is stored in association with the posture parameter.

＜基準部位選択部１０７＞
本実施形態の基準部位選択部１０７は、姿勢選択部１０３によって選択された複数の姿勢情報のうち、最も人物領域情報に整合した姿勢情報を基準姿勢情報として選択する。さらに基準姿勢情報のうち、人物領域情報との整合性の高い部分を基準部位として選択する。すなわち、図９のステップＳ９０４の処理を行う。 <Reference site selection unit 107>
The reference part selection unit 107 according to the present embodiment selects, as reference posture information, posture information most consistent with the person area information from among a plurality of posture information selected by the posture selection unit 103. Further, a portion having high consistency with the person area information is selected as the reference portion from the reference posture information. That is, the process of step S904 in FIG. 9 is performed.

まず、基準姿勢情報の選択方法について説明する。基準姿勢情報は、人物領域の姿勢を最も良く示していると推定される姿勢情報を選択することにより得ることができる。したがって、姿勢選択部１０３の算出結果をそのまま反映する場合には、スコアが最大となる姿勢情報を基準姿勢情報とする。あるいは、姿勢情報と人物領域情報との整合度を算出して、最も整合度の高い姿勢情報を基準姿勢情報としてもよい。整合度は、例えば、以下のように算出する。 First, a method for selecting reference posture information will be described. The reference posture information can be obtained by selecting posture information presumed to best indicate the posture of the person area. Therefore, when the calculation result of the posture selection unit 103 is reflected as it is, the posture information having the maximum score is set as the reference posture information. Alternatively, the degree of matching between the posture information and the person area information may be calculated, and the posture information with the highest degree of matching may be used as the reference posture information. The degree of matching is calculated as follows, for example.

まず、図５（ｃ）に示すように、既知の姿勢情報から作成した人体の３次元モデルと人物領域の点群とを重ね合わせる。第１の実施形態と同様に、各部位と点群との対応付けを行い、以下の式（３）により整合度Ｃを求める。 First, as shown in FIG. 5C, a three-dimensional model of a human body created from known posture information and a point group of a person region are superimposed. As in the first embodiment, each part is associated with a point group, and the degree of matching C is obtained by the following equation (3).

ここで、Ωは人物領域の全点数を表し、ωは人物領域の点群のうち、部位と対応がついた点の数（円筒モデルの所定近傍範囲に存在する点の数）を表している。また、Ｔは部位の数（図４及び図５に示す例では１０）を表し、τは点群との対応がついた（整合性が高いと判定した）部位の数を表している。また、図６（ｂ）に示すように、２次元の投影モデルを用いて人物領域と対応付けて整合度Ｃを求めてもよい。この場合、式（３）におけるΩは、距離画像の人物領域の画素数となり、ωは投影人物モデル３２２の部位内に含まれる人物領域の画素数となる。 Here, Ω represents the total number of points in the person area, and ω represents the number of points corresponding to the part (number of points existing in a predetermined neighborhood range of the cylindrical model) in the point group of the person area. . T represents the number of sites (10 in the examples shown in FIGS. 4 and 5), and τ represents the number of sites that correspond to the point group (determined to have high consistency). Further, as shown in FIG. 6B, the matching degree C may be obtained in association with a person region using a two-dimensional projection model. In this case, Ω in Equation (3) is the number of pixels in the person area of the distance image, and ω is the number of pixels in the person area included in the portion of the projected person model 322.

以上のような方法により、姿勢情報と人物領域情報との整合度を算出する。基準部位選択部１０７は、姿勢選択部１０３によって選択された複数の姿勢情報に対して、整合度を順次算出し、最大の整合度を示す姿勢情報を基準姿勢情報とする。また、姿勢のスコアと整合度との和や積の情報を元に、最も人物領域の姿勢を表す姿勢情報を選択するようにしてもよい。また、基準姿勢情報を選択する他の方法として、特定の部位と人物領域との整合性に基づいて、基準姿勢情報を選択するようにしてもよい。例えば、人物の大まかな姿勢を決める胴体、または胴体および頭部が人物領域と最も整合性のある姿勢情報を基準姿勢情報としてもよい。 The degree of matching between the posture information and the person area information is calculated by the method as described above. The reference part selection unit 107 sequentially calculates the degree of matching for a plurality of pieces of posture information selected by the posture selection unit 103, and uses the posture information indicating the maximum degree of matching as reference posture information. Alternatively, posture information that most represents the posture of the person region may be selected based on information on the sum or product of the posture score and the degree of matching. Further, as another method for selecting the reference posture information, the reference posture information may be selected based on the consistency between the specific part and the person region. For example, the body that determines the general posture of the person, or posture information in which the body and the head are most consistent with the person region may be used as the reference posture information.

次に、基準部位選択部１０７は、基準姿勢情報から基準部位を選択する。基準部位情報は、基準姿勢情報のうち、人物領域に一致している部分を選択することによって得られる。これは、第１の実施形態の部分姿勢情報を選択する処理と同様の処理で実施することができる。 Next, the reference part selection unit 107 selects a reference part from the reference posture information. The reference part information is obtained by selecting a portion that matches the person area from the reference posture information. This can be performed by a process similar to the process of selecting partial posture information in the first embodiment.

図１０は、基準部位情報１００１を選択する方法を説明するための図である。図１０に示す例では、人物領域３１０に対して最も整合性の高い既知の姿勢情報として基準姿勢情報１０００が選択されている。次に、基準姿勢情報の中の部位のうち、人物領域３１０と整合する部位として、頭部、胴体、左右の腕が基準部位として選択されている。この結果、残りの部位、すなわち左右の脚に該当する部分１００２を他の姿勢情報１００３から探索することになる。以下、この基準部位に含まれておらず、他の姿勢情報から探索する部位を探索部位と呼ぶ。 FIG. 10 is a diagram for explaining a method of selecting the reference part information 1001. In the example illustrated in FIG. 10, the reference posture information 1000 is selected as known posture information having the highest consistency with the person region 310. Next, the head, the torso, and the left and right arms are selected as reference parts as parts that match the person region 310 among the parts in the reference posture information. As a result, the remaining part, that is, the part 1002 corresponding to the left and right legs is searched from the other posture information 1003. Hereinafter, a part that is not included in the reference part and is searched from other posture information is referred to as a search part.

上記の例では、基準部位を人物領域情報との整合性を元に決定した。このため、例えば、前腕や下腿部などの末端部位のみが探索部位となる場合もある。例えば、腕（上腕および前腕）、および脚（上腿部および下腿部）の関係性を既知の姿勢情報のまま維持したい場合、すなわち人物として正しい姿勢であることを、より強く保証したい場合には、不都合となることがある。このような場合には、探索部位が必ず腕や脚の部分姿勢となるように、基準姿勢情報から基準部位を選択する時に、肩関節及び股関節のみで基準部位を切り離すようにしてもよい。 In the above example, the reference part is determined based on the consistency with the person area information. For this reason, for example, only a terminal part such as the forearm or the lower leg may be a search part. For example, if you want to maintain the relationship between the arms (upper arm and forearm) and legs (upper leg and lower leg) with known posture information, that is, if you want to more strongly guarantee that you are in the correct posture as a person May be inconvenient. In such a case, when the reference part is selected from the reference posture information so that the search part is always the partial posture of the arm or leg, the reference part may be separated only by the shoulder joint and the hip joint.

基準部位を選択する別の方法として、基準部位とする部位を常に特定の部位としてもよい。具体的には、胴体、または胴体および頭部を必ず基準部位としても良い。この場合、常に四肢が探索部位となり、基準姿勢情報により胴体の姿勢を決定した後に、残りの姿勢情報から最適な四肢の姿勢を探索することになる。また、この場合においては、後述する部分姿勢の探索対象に、基準姿勢情報自体を含むようにしてもよい。この結果、基準姿勢情報の四肢の部位が、最終的に出力される人物姿勢情報の四肢として選択される場合もあり得る。 As another method for selecting the reference site, the site used as the reference site may always be a specific site. Specifically, the trunk or the trunk and the head may be necessarily used as the reference part. In this case, the extremity is always a search part, and after determining the body posture based on the reference posture information, the optimum posture of the limb is searched from the remaining posture information. In this case, the reference posture information itself may be included in a partial posture search target described later. As a result, the limb part of the reference posture information may be selected as the limb of the human posture information that is finally output.

基準部位選択部１０７は、以上のようにして決定した基準部位の関節位置情報を既知の姿勢情報から取り出し、基準部位情報として出力する。なお、基準姿勢情報が全ての部位（関節座標）を含む場合、または、十分に人物領域と一致していると判断できる場合には、以下の処理を実行しないで、基準姿勢情報をそのまま人物姿勢情報としてもよい。 The reference part selection unit 107 extracts the joint position information of the reference part determined as described above from the known posture information and outputs it as reference part information. If the reference posture information includes all parts (joint coordinates), or if it can be determined that the reference posture information sufficiently matches the human region, the reference posture information is directly used as the human posture without executing the following processing. It may be information.

＜部分姿勢選択部１０４＞
本実施形態における部分姿勢選択部１０４は、人物領域に対して最適な探索部位に係る部分姿勢情報を既知の姿勢情報から取得する。すなわち、図９のステップＳ９０５の処理を行う。図１０に示す例では、左右の脚に該当する部分１００２が探索する対象であり、姿勢選択部１０３で選択した複数の姿勢情報１００３から、人物領域情報と整合性の高い左右の脚を探索して、左右の脚の部分姿勢情報１００４を取得する。 <Partial posture selection unit 104>
The partial posture selection unit 104 according to the present embodiment acquires partial posture information related to a search site optimum for a person region from known posture information. That is, the process of step S905 in FIG. 9 is performed. In the example shown in FIG. 10, a portion 1002 corresponding to the left and right legs is a target to be searched. From the plurality of posture information 1003 selected by the posture selection unit 103, the left and right legs having high consistency with the person area information are searched. Thus, the left and right leg partial posture information 1004 is acquired.

部分姿勢選択部１０４の基本的な処理としては、姿勢情報１００３の全ての探索部位と人物領域との整合性を評価し、最も整合性が高い部分姿勢情報を選択する。整合性の評価では、第１の実施形態と同様の手順によって部分姿勢と人物領域との整合性を評価することができる。図１０に示す例では、姿勢情報１００３の全ての探索部位（左右の脚）の整合性を評価した結果、姿勢情報３２ｎの探索部位が最も整合性が高いものとして選択されている。 As basic processing of the partial posture selection unit 104, the consistency between all search parts of the posture information 1003 and the person region is evaluated, and the partial posture information having the highest consistency is selected. In the evaluation of consistency, the consistency between the partial posture and the person area can be evaluated by the same procedure as in the first embodiment. In the example shown in FIG. 10, as a result of evaluating the consistency of all search parts (left and right legs) of the posture information 1003, the search part of the posture information 32n is selected as having the highest consistency.

また、図１０に示す例では、左右の脚をまとめて選択しているが、それぞれの部位ごとに姿勢情報から部分姿勢情報を選択してもよい。例えば、人物領域と整合性の高い上腿部および下腿部をそれぞれ独立して選択してもよい。あるいは、上腿部および下腿部が連なった右脚および左脚の姿勢ごとに、異なる姿勢情報から部分姿勢情報を選択するようにしてもよい。 In the example shown in FIG. 10, the left and right legs are selected together, but partial posture information may be selected from posture information for each part. For example, the upper thigh and the lower thigh that are highly consistent with the person region may be selected independently. Or you may make it select partial attitude | position information from different attitude | position information for every attitude | position of the right leg and left leg which the upper thigh part and lower leg part continued.

以上のような方法で、探索部位に相当する部分姿勢情報を選択してもよいが、さらに、基準部位と部分姿勢（探索部位）との親和性を評価して部分姿勢情報を選択するようにしてもよい。上記の方法では、姿勢情報に含まれる姿勢のバリエーションによっては、基準部位と部分姿勢とが好ましくない組み合わせで選択される場合がある。 The partial posture information corresponding to the search part may be selected by the method described above, but the partial posture information is selected by evaluating the affinity between the reference part and the partial posture (search part). May be. In the above method, the reference part and the partial posture may be selected in an unfavorable combination depending on the posture variation included in the posture information.

例えば、複数の姿勢情報に異なる身長の人物の姿勢情報が含まれている場合には、基準部位として選択された人物の身長と部分姿勢として選択された人物の身長とが大きく異なる可能性がある。この基準部位と部分姿勢とを組み合わせると、人物としてあり得ない体格の人物姿勢情報が生成されてしまう恐れがある。記録部１０６に記録されている姿勢情報の種類が限定的である場合や、それほど出力する人物姿勢情報に精度を求めない場合には上記の方法でもよいが、好ましくは、以下のように基準部位と部分姿勢との親和性を評価する。基準部位と部分姿勢との親和性は、基準部位と部分姿勢とを組み合わせて、あり得る姿勢となるか否かによって評価され、例えば、以下のように実施される。 For example, when the posture information of a person with different height is included in a plurality of posture information, the height of the person selected as the reference part may be significantly different from the height of the person selected as the partial posture. . When this reference part and the partial posture are combined, there is a possibility that personal posture information of a physique that is impossible for a person is generated. The above method may be used when the type of posture information recorded in the recording unit 106 is limited, or when accuracy is not required for human posture information to be output so much, but preferably, the reference part is as follows: Evaluate the affinity between and the partial posture. The affinity between the reference part and the partial posture is evaluated based on whether or not a possible posture is obtained by combining the reference part and the partial posture, and is performed as follows, for example.

親和性を評価する方法の一つに、姿勢パラメータを用いる方法がある。前述したように、各姿勢情報は前述の姿勢パラメータが対応付けされて記録されている。基準部位と部分姿勢との親和性を評価する際には、基準部位の元の姿勢情報の姿勢パラメータと、部分姿勢の元の姿勢情報の姿勢パラメータとの親和性を評価する。 One method for evaluating affinity is a method using posture parameters. As described above, each posture information is recorded in association with the above-described posture parameters. When evaluating the affinity between the reference portion and the partial posture, the affinity between the posture parameter of the original posture information of the reference portion and the posture parameter of the original posture information of the partial posture is evaluated.

例えば、基準姿勢情報として選択された姿勢情報の姿勢パラメータと、部分姿勢情報を取得しようとしている姿勢情報の姿勢パラメータとが一致していない場合には、その姿勢情報から部分姿勢情報を選択しないようにする。このように、完全に姿勢パラメータが一致する場合のみを許容するようにしてもよく、身長差、カメラ角度差などが微妙に異なっている程度の場合は、姿勢パラメータの不一致を許容するようにしてもよい。 For example, if the posture parameter of the posture information selected as the reference posture information and the posture parameter of the posture information for which partial posture information is to be acquired do not match, the partial posture information is not selected from the posture information To. In this way, only the case where the posture parameters completely match may be allowed, and if the height difference, the camera angle difference, etc. are slightly different, the mismatch of posture parameters is allowed. Also good.

親和性を評価する他の方法としては、基準部位と部分姿勢との配置を元に親和性を評価するようにしてもよい。ここで、基準部位と部分姿勢とをつなぐ関節を接続点と呼ぶ。例えば、図１０に示す例では、接続点は股関節となる。基準部位情報および部分姿勢情報は、それぞれ３次元空間中での座標を示しており、各接続点の座標を得ることができる。接続点間の距離があまりに離れている場合には、そのときの部分姿勢は基準部位との親和性が低いと判断することができる。したがって、基準部位情報における接続点の座標と部分姿勢情報における接続点の座標との間の距離が所定値以下となるような部分姿勢情報のみを部分姿勢選択部１０４が選択するようにしてもよい。 As another method for evaluating the affinity, the affinity may be evaluated based on the arrangement of the reference portion and the partial posture. Here, the joint connecting the reference part and the partial posture is called a connection point. For example, in the example shown in FIG. 10, the connection point is a hip joint. The reference part information and the partial posture information each indicate coordinates in a three-dimensional space, and the coordinates of each connection point can be obtained. When the distance between the connection points is too far, it can be determined that the partial posture at that time has low affinity with the reference portion. Therefore, the partial posture selection unit 104 may select only partial posture information such that the distance between the connection point coordinates in the reference part information and the connection point coordinates in the partial posture information is a predetermined value or less. .

なお、親和性を評価する際に、前述した複数の姿勢パラメータ、および接続点間の距離など複数の項目を合わせて評価してもよい。本実施形態では、上記のように親和性を評価して、基準部位と親和性があると判断した部分姿勢のうち、最も人物領域と整合性の高い部分姿勢を選択する。また、予め基準姿勢情報と他の姿勢情報との親和性を評価し、親和性の低い姿勢情報と人物領域情報との整合性の評価を省略するようにしてもよい。 When evaluating the affinity, a plurality of items such as the above-described plurality of posture parameters and the distance between connection points may be evaluated together. In the present embodiment, the affinity is evaluated as described above, and the partial posture having the highest consistency with the human region is selected from the partial postures determined to have affinity with the reference region. Further, the affinity between the reference posture information and other posture information may be evaluated in advance, and the evaluation of the consistency between the posture information having low affinity and the person area information may be omitted.

＜姿勢統合部１０５＞
姿勢統合部１０５では、以上の手順により得られた基準部位情報と部分姿勢情報とを組み合わせて、人物姿勢情報を生成する。すなわち、図９のステップＳ９０６の処理を行う。図１１（ａ）は、基準部位情報１００１と選択した部分姿勢情報１００４とを同じ座標面に並べた例を示す図である。図１１（ａ）に示すように、基準部位情報１００１と部分姿勢情報１００４とを接続する関節（図１１に示す例では股関節）の位置が一致していない場合がある。そこで、接続される関節位置を修正して人物姿勢情報を生成するようにしてもよい。 <Attitude integration unit 105>
The posture integration unit 105 generates the human posture information by combining the reference part information and the partial posture information obtained by the above procedure. That is, the process of step S906 in FIG. 9 is performed. FIG. 11A is a diagram showing an example in which the reference part information 1001 and the selected partial posture information 1004 are arranged on the same coordinate plane. As shown in FIG. 11A, the position of the joint (the hip joint in the example shown in FIG. 11) connecting the reference part information 1001 and the partial posture information 1004 may not match. Therefore, the posture information may be generated by correcting the joint position to be connected.

例えば、図１１（ｂ）に示すように、部分姿勢情報１００４の股関節と基準部位情報１００１の股関節とで座標が一致するように、部分姿勢情報１００４の座標を移動させて人物姿勢情報を生成してもよい。また、他の方法として、基準部位情報における股関節の位置と部分姿勢情報における股関節の位置との中間の位置を股関節の位置とし、他の関節の位置は元の基準部位情報および部分姿勢情報の座標とした人物姿勢情報を生成してもよい。 For example, as shown in FIG. 11B, the human posture information is generated by moving the coordinates of the partial posture information 1004 so that the coordinates of the hip joint of the partial posture information 1004 and the hip joint of the reference part information 1001 coincide. May be. As another method, an intermediate position between the hip joint position in the reference part information and the hip joint position in the partial posture information is set as the hip joint position, and the other joint positions are the coordinates of the original reference part information and the partial posture information. Human posture information may be generated.

以上のように本実施形態によれば、選択された姿勢情報の一部を取り出した基準部位と部分姿勢とを接続して人物姿勢情報を生成するため、既知の姿勢情報の形状を可能な限り保持した人物姿勢情報を出力することができる。したがって、人物らしい姿勢を出力することができる。また、基準姿勢情報および基準部位情報を設定することにより、組み合わせの探索数を減少させることができる。 As described above, according to the present embodiment, since the human body posture information is generated by connecting the reference portion obtained by extracting a part of the selected posture information and the partial posture, the shape of the known posture information is set as much as possible. The held person posture information can be output. Accordingly, it is possible to output a human-like posture. In addition, by setting the reference posture information and the reference part information, the number of search for combinations can be reduced.

（第３の実施形態）
第２の実施形態では、最も人物領域に一致する姿勢情報を基準姿勢情報とし、その一部を基準部位情報とした。本実施形態では、複数の基準部位を用いる例について説明する。また、その際、後述する組み合わせコストを算出することにより、最も良い組み合わせを探索する例について説明する。なお、本実施形態に係る情報処理装置の構成については図８と同様であるため、説明は省略する。 (Third embodiment)
In the second embodiment, the posture information that most closely matches the person region is set as the reference posture information, and a part thereof is set as the reference part information. In this embodiment, an example using a plurality of reference parts will be described. In this case, an example of searching for the best combination by calculating a combination cost described later will be described. Note that the configuration of the information processing apparatus according to the present embodiment is the same as that shown in FIG.

本実施形態では、姿勢選択部１０３で選択された全ての姿勢情報、または、姿勢情報と人物領域情報との整合度Ｃが所定値よりも高い姿勢情報を基準姿勢情報とする。そして、第２の実施形態と同様に、基準姿勢情報の中で、人物領域と整合性の高い部位、または特定の部位（例えば胴体部位）を基準部位とする。複数の基準姿勢情報のそれぞれについて基準部位を決定することにより、それぞれ基準部位以外の部位が探索部位となる。 In the present embodiment, all the posture information selected by the posture selection unit 103 or posture information in which the degree of matching C between the posture information and the person area information is higher than a predetermined value is set as the reference posture information. Similarly to the second embodiment, in the reference posture information, a part having high consistency with the person region or a specific part (for example, a body part) is set as a reference part. By determining the reference part for each of the plurality of reference posture information, parts other than the reference part are search parts.

次に、本実施形態の部分姿勢選択部１０４について説明する。第２の実施形態では、部分姿勢選択部１０４は、最も人物領域と整合する探索部位を部分姿勢情報として１つ選択した。これに対して本実施形態では、部分姿勢情報を複数選択するものとする。複数選択された部分姿勢情報は、複数の姿勢情報それぞれについて人物領域情報との整合性が評価され、各姿勢情報から得た部分姿勢情報のうち、整合性の高い上位の複数の部分姿勢情報が選択される。 Next, the partial posture selection unit 104 of this embodiment will be described. In the second embodiment, the partial posture selection unit 104 selects one search part that most matches the human region as partial posture information. In contrast, in the present embodiment, a plurality of pieces of partial posture information are selected. The plurality of selected partial posture information is evaluated for consistency with the person area information for each of the plurality of posture information, and among the partial posture information obtained from each posture information, a plurality of higher-level partial posture information with high consistency is obtained. Selected.

この結果、基準部位情報および部分姿勢情報がそれぞれ複数選択される。この段階で、出力する人物姿勢情報の候補は、これらの組み合わせの数だけ存在することになる。姿勢統合部１０５は、これらの複数の組み合わせそれぞれについて、以下のように組み合わせコストを算出し、最も低コストの組み合わせを出力すべき人物姿勢情報として選択する。ある基準部位ｉとある部分姿勢ｊとの組み合わせコストＰは、例えば以下の式（４）により算出する。 As a result, a plurality of pieces of reference part information and partial posture information are selected. At this stage, there are as many candidates of the human posture information to be output as the number of combinations. The posture integration unit 105 calculates a combination cost for each of the plurality of combinations as follows, and selects the lowest-cost combination as human posture information to be output. The combination cost P between a certain reference part i and a certain partial posture j is calculated by the following equation (4), for example.

ここで、Ｓ_iは基準部位の元となる基準姿勢情報のスコアを表し、Ｓ_jは部分姿勢の元となる基準姿勢情報のスコアを表す。Ｃ_iは前述した式（３）により求められる基準姿勢情報の整合度を表し、Ｃ_jは部分姿勢情報と人物領域情報との整合度を表す。また、Ｈ_ijは基準部位と部分姿勢との親和性を元に算出される評価値であり、ｗ₁〜ｗ₅はそれぞれの入力に対する係数である。 Here, S _i represents the score of the reference posture information that is the source of the reference part, and S _j represents the score of the reference posture information that is the source of the partial posture. C _i represents the degree of matching of the reference posture information obtained by the above-described equation (3), and C _j represents the degree of matching between the partial posture information and the person area information. H _ij is an evaluation value calculated based on the affinity between the reference part and the partial posture, and w _{1 to} w ₅ are coefficients for the respective inputs.

部分姿勢情報と人物領域情報との整合度Ｃ_jは、部分領域の人体の３次元モデルと点群との対応数などから算出することができる。また、評価値Ｈ_ijは、基準部位および部分姿勢に関する姿勢パラメータの関係や、接続される関節位置の距離を元に算出することができる。 The degree of matching C _j between the partial posture information and the person area information can be calculated from the number of correspondences between the three-dimensional model of the human body in the partial area and the point group. Further, the evaluation value H _ij can be calculated based on the relationship between the posture parameters relating to the reference part and the partial posture and the distance between the joint positions to be connected.

身長、カメラに対する向きなど姿勢パラメータの数値情報を元に評価値Ｈ_ijを算出する場合は、以下のように計算する。すなわち、例えば基準部位の姿勢パラメータと部分姿勢の姿勢パラメータとの二乗誤差の逆数を元に算出するなど、類似した姿勢パラメータを持つほど評価値Ｈ_ijが高くなるようにすればよい。一方、歩く、走るなどの行動カテゴリや、痩せ、太いなどの体格パラメータのように、ラベルやカテゴリとして記録された姿勢パラメータを元に評価値Ｈ_ijを算出する場合は、以下のように計算する。すなわち、例えばラベルの一致、不一致でバイナリの評価値を算出するか、各ラベル間の距離を予め定義しておき、ラベルの組み合わせに応じた距離を元に評価値Ｈ_ijを算出する。 When calculating the evaluation value H _ij based on numerical information of posture parameters such as height and orientation with respect to the camera, the calculation is performed as follows. That is, for example, the evaluation value H _ij may be set to be higher as having similar posture parameters, for example, based on the reciprocal of the square error between the posture parameter of the reference part and the posture parameter of the partial posture. On the other hand, when calculating the evaluation value H _ij based on the posture parameters recorded as labels and categories, such as behavior categories such as walking and running, and physique parameters such as skinny and thick, the calculation is performed as follows. . That is, for example, a binary evaluation value is calculated based on whether a label matches or does not match, or a distance between labels is defined in advance, and an evaluation value H _ij is calculated based on a distance corresponding to the combination of labels.

本実施形態では、以上のようにして算出する組み合わせコストＰを最小化する基準部位ｉと部分姿勢ｊとの組み合わせを決定する。姿勢統合部１０５は、第２の実施形態と同様に、基準部位ｉと部分姿勢ｊとを組み合わせた結果を人物姿勢情報として出力する。 In the present embodiment, the combination of the reference part i and the partial posture j that minimizes the combination cost P calculated as described above is determined. As in the second embodiment, the posture integration unit 105 outputs the result of combining the reference portion i and the partial posture j as person posture information.

以上のように本実施形態によれば、複数の基準部位を用い、組み合わせコストＰが最小となる基準部位と部分姿勢との組み合わせを出力すべき人物姿勢情報とした。この結果、姿勢選択部１０３が出力した複数の姿勢情報の中から、最適な部分姿勢の組み合わせを選択することができるようになる。 As described above, according to the present embodiment, a plurality of reference parts are used, and the combination of the reference part and the partial posture that minimizes the combination cost P is used as the person posture information to be output. As a result, an optimum combination of partial postures can be selected from the plurality of posture information output by the posture selection unit 103.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１０１画像入力部
１０２対象物領域抽出部
１０３姿勢選択部
１０４部分姿勢選択部
１０５姿勢統合部
１０６記録部 DESCRIPTION OF SYMBOLS 101 Image input part 102 Object area extraction part 103 Posture selection part 104 Partial posture selection part 105 Posture integration part 106 Recording part

Claims

Extraction means for extracting object region information from the input distance image;
Storage means for storing a plurality of posture information about the object;
Selection means for selecting a plurality of posture information from the posture information stored in the storage means for comparison with the region information of the object extracted by the extraction means;
For the posture information selected by the selection means, the consistency with the region information of the object is evaluated for each portion constituting the region of the object, and the partial posture related to the posture of the portion where the consistency is a predetermined value or more Evaluation means for obtaining information from the selected posture information;
Integration means for estimating the posture of the object by integrating the partial posture information acquired by the evaluation means;
An information processing apparatus comprising:

The evaluation means evaluates consistency by associating each part of a model of an object based on the posture information with a three-dimensional point group based on area information of the object of the distance image. Item 4. The information processing apparatus according to Item 1.

The information processing apparatus according to claim 1, wherein the evaluation unit evaluates consistency by associating a projection model obtained by projecting a model based on the posture information onto a plane and region information of the object. .

Extraction means for extracting object region information from the input distance image;
Storage means for storing a plurality of posture information about the object;
Selection means for selecting a plurality of posture information from the posture information stored in the storage means for comparison with the region information of the object extracted by the extraction means;
Setting reference posture information from the posture information selected by the selection means, setting means for setting a reference part from the parts constituting the region of the object in the reference posture information;
The reference part information related to the posture of the reference part set by the setting unit is acquired, and the posture information selected by the selection unit is matched with the region information of the object with respect to the part excluding the reference part Evaluation means for evaluating the position, and acquiring partial posture information related to the posture of the part where the consistency is equal to or greater than a predetermined value;
Integration means for estimating the posture of the object by integrating reference part information and partial posture information acquired by the evaluation means;
An information processing apparatus comprising:

The information processing apparatus according to claim 4, wherein the evaluation unit acquires partial posture information whose affinity with the reference part is a predetermined value or more.

The storage means stores posture parameters associated with the plurality of posture information,
The information processing apparatus according to claim 5, wherein the evaluation unit evaluates the affinity between the posture parameter related to the reference part information and the posture parameter related to the partial posture information.

The information processing apparatus according to claim 6, wherein the posture parameter includes information on an object size of the posture information, a category of the posture information, or an orientation of the posture information with respect to the camera.

The information processing apparatus according to claim 5, wherein the evaluation unit evaluates affinity based on a connection point between the reference part and a part related to the partial posture information.

The information processing apparatus according to any one of claims 4 to 8, wherein the setting unit sets the reference part as a specific part of the object.

The setting means sets a plurality of reference posture information from the posture information selected by the selection means,
The information processing apparatus according to claim 4, wherein the evaluation unit evaluates a combination of reference part information and partial posture information integrated by the integration unit and determines a combination having the highest evaluation value.

An information processing method of an information processing apparatus for storing a plurality of posture information about an object,
An extraction step of extracting region information of the object from the input distance image;
A selection step of selecting a plurality of posture information from the posture information stored in the information processing device for comparison with the posture information for comparison with the region information of the object extracted in the extraction step;
The posture information selected in the selection step is evaluated for consistency with the region information of the object for each portion constituting the region of the object, and the partial posture related to the posture of the portion where the consistency is a predetermined value or more An evaluation step of obtaining information from the selected posture information;
An integration step of estimating the posture of the object by integrating the partial posture information acquired in the evaluation step;
An information processing method characterized by comprising:

An information processing method of an information processing apparatus for storing a plurality of posture information about an object,
An extraction step of extracting region information of the object from the input distance image;
A selection step of selecting a plurality of posture information from the posture information stored in the information processing device for comparison with the posture information for comparison with the region information of the object extracted in the extraction step;
Setting reference posture information from the posture information selected in the selection step, and setting a reference portion from the portions constituting the region of the object in the reference posture information,
The reference part information related to the posture of the reference part set in the setting step is acquired, and the posture information selected in the selection step is matched with the region information of the object with respect to the part excluding the reference part An evaluation step for evaluating partiality and acquiring partial posture information related to the posture of a part where the consistency is equal to or greater than a predetermined value;
An integration step of estimating the posture of the object by integrating the reference part information and the partial posture information acquired in the evaluation step;
An information processing method characterized by comprising:

A program for controlling an information processing apparatus that stores a plurality of posture information about an object,
An extraction step of extracting region information of the object from the input distance image;
A selection step of selecting a plurality of posture information from the posture information stored in the information processing device for comparison with the posture information for comparison with the region information of the object extracted in the extraction step;
The posture information selected in the selection step is evaluated for consistency with the region information of the object for each portion constituting the region of the object, and the partial posture related to the posture of the portion where the consistency is a predetermined value or more An evaluation step of obtaining information from the selected posture information;
An integration step of estimating the posture of the object by integrating the partial posture information acquired in the evaluation step;
A program that causes a computer to execute.

A program for controlling an information processing apparatus that stores a plurality of posture information about an object,
An extraction step of extracting region information of the object from the input distance image;
A selection step of selecting a plurality of posture information from the posture information stored in the information processing device for comparison with the posture information for comparison with the region information of the object extracted in the extraction step;
Setting reference posture information from the posture information selected in the selection step, and setting a reference portion from the portions constituting the region of the object in the reference posture information,
The reference part information related to the posture of the reference part set in the setting step is acquired, and the posture information selected in the selection step is matched with the region information of the object with respect to the part excluding the reference part An evaluation step for evaluating partiality and acquiring partial posture information related to the posture of a part where the consistency is equal to or greater than a predetermined value;
An integration step of estimating the posture of the object by integrating the reference part information and the partial posture information acquired in the evaluation step;
A program that causes a computer to execute.