JP2017097578A

JP2017097578A - Information processing apparatus and method

Info

Publication number: JP2017097578A
Application number: JP2015228313A
Authority: JP
Inventors: 敦史野上; Atsushi Nogami
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-11-24
Filing date: 2015-11-24
Publication date: 2017-06-01

Abstract

PROBLEM TO BE SOLVED: To calculate position and attitude of an object accurately.SOLUTION: An image input unit 101 inputs an image including an object for which an attitude is estimated. A partial area identifying unit 102 identifies a partial area of the object. An attitude estimation unit 103 collects results of identifying partial areas, and classifies them into known attitude classes, to estimate an attitude of the object. A partial area selection unit 104 selects a partial area related to the estimated attitude, from the partial area identification results of an input image. In the partial area, positional relationship between a representative position of the object and an identification result position of the partial area is recorded in advance. A representative position calculation unit 105 uses information indicating the representative position of the selected partial area, to calculate a representative position of the object in the input image. A position posture determination unit 106 aligns the estimated attitude position in the calculated representative position, to calculate position and attitude of the object.SELECTED DRAWING: Figure 1

Description

本発明は、画像中の対象物の姿勢と位置を算出する情報処理装置に関する。 The present invention relates to an information processing apparatus that calculates the posture and position of an object in an image.

画像中の人体の姿勢を推定する方法として、予め各姿勢の部分領域の画像情報を学習しておき、姿勢推定時には、入力画像の人体領域に対して部分領域を識別し、その結果を統合して全体の人体姿勢を推定する方法がある。 As a method of estimating the posture of the human body in the image, the image information of the partial area of each posture is learned in advance, and when estimating the posture, the partial region is identified with respect to the human body region of the input image, and the result is integrated. There is a method for estimating the whole human body posture.

特許文献１では、まず、様々な姿勢の人体が写った学習画像を準備し、それぞれの学習画像の人体領域について、部分的な特徴（シェイプコンテキスト）を学習する。そして、姿勢推定時には入力画像の人体領域に対して部分的な特徴を算出し、特徴の集合から学習画像のどの姿勢に最も一致するかを判別している。学習画像の各姿勢には、予めそれぞれの姿勢の関節位置に関する情報が記録されており、推定した姿勢の関節位置関係（各関節の相対位置関係）情報が出力される。特許文献１では、以上の処理により、画像中の人体の関節位置関係が算出されるが、画像中あるいは実空間中（絶対座標系・ワールド座標系）で、推定した関節位置がどの位置に存在するかについては明らかになっていない。 In Patent Document 1, first, learning images in which human bodies of various postures are captured are prepared, and partial features (shape contexts) are learned for the human body regions of the respective learning images. At the time of posture estimation, a partial feature is calculated for the human body region of the input image, and it is determined which posture of the learning image most matches from the feature set. Information on the joint position of each posture is recorded in advance in each posture of the learning image, and information on the joint position relationship (relative position relationship of each joint) of the estimated posture is output. In Patent Document 1, the above-described processing calculates the joint position relationship of the human body in the image, but at which position the estimated joint position exists in the image or in real space (absolute coordinate system / world coordinate system). It is not clear what to do.

従って、特許文献１では、後処理として、推定した関節位置関係を入力画像に合わせ込むフィッティング処理を、上記の姿勢推定処理後に実施している。特許文献１でのフィッティング処理は、２次元画像において、特徴の取得位置を利用したモーフィング法により、関節座標を入力画像へ合わせ込む変換行列を求める方法が実施されている。具体的には、推定した姿勢の学習画像の特徴位置と入力画像の特徴位置の変換行列を計算し、関節位置座標へ適用することで、フィッティング処理を行っている。この後処理により、入力画像における関節位置を決定するとともに、推定した姿勢と画像中人体の微少な誤差を修正している。なお、微少な誤差とは、推定姿勢の結果として出力する姿勢が、既知の離散化した姿勢（学習姿勢）から選択されるため、実際の入力画像姿勢とは一致しないことにより生じる誤差である。 Therefore, in Patent Document 1, as post-processing, fitting processing for matching the estimated joint position relationship with the input image is performed after the above-described posture estimation processing. In the fitting process in Patent Document 1, a method of obtaining a transformation matrix for fitting joint coordinates to an input image in a two-dimensional image by a morphing method using a feature acquisition position is performed. Specifically, the fitting process is performed by calculating a transformation matrix between the feature position of the learned image of the estimated posture and the feature position of the input image, and applying it to the joint position coordinates. Through this post-processing, the joint position in the input image is determined, and a slight error between the estimated posture and the human body in the image is corrected. Note that the minute error is an error that occurs when the posture that is output as a result of the estimated posture is selected from a known discrete posture (learning posture) and does not match the actual input image posture.

特開２０１０−１７６３８０号公報JP 2010-176380 A

特許文献１におけるフィッティング処理では、入力画像において正しい位置で特徴を取得することが前提となっている。しかし、入力画像中で全ての特徴の位置を正しく求めることは困難である。特に、特許文献１の実施形態では、特徴点は人体領域輪郭上の粗な離散点であるため、１つでも特徴取得位置が大きくずれると、フィッティング処理にも大きな影響を及ぼしてしまう。 In the fitting process in Patent Document 1, it is premised that a feature is acquired at a correct position in an input image. However, it is difficult to correctly obtain the positions of all features in the input image. In particular, in the embodiment of Patent Document 1, the feature point is a coarse discrete point on the contour of the human body region. Therefore, if even one feature acquisition position deviates greatly, the fitting process is greatly affected.

フィッティング処理の他の形態として、既知のモデル（人体の場合、多関節モデル）を入力画像の人体領域に適合するようにモデルを移動・変形させる手法がある。特に距離画像から人体姿勢を推定する場合には、距離画像情報から生成した３次元点群データに対して、既知の多関節モデルを最適配置する手法（例えばICP ; Iterative Closest Pointを多関節モデルに適用した方法など）がある。ただし、これらの方法では、モデルの初期位置が別途必要となる。人体を対象とした場合、モデル初期位置は人体姿勢（関節位置関係）と、その姿勢の３次元空間中での位置で表される。さらに、モデルの初期位置によりフィッティングの性能が大きく変化するため、モデル初期位置を精度良く算出することが重要となる。 As another form of the fitting process, there is a method of moving and deforming a model so that a known model (a multi-joint model in the case of a human body) is adapted to a human body region of an input image. In particular, when estimating a human body posture from a distance image, a technique for optimally placing a known articulated model (eg, ICP; Iterative Closest Point) into 3D point cloud data generated from distance image information Applied method). However, these methods require a separate initial position of the model. In the case of a human body, the model initial position is represented by the human body posture (joint position relationship) and the position of the posture in the three-dimensional space. Furthermore, since the fitting performance varies greatly depending on the initial position of the model, it is important to accurately calculate the initial position of the model.

しかし、特許文献１では、姿勢推定の後、適切な初期位置を求める方法について開示されていない。 However, Patent Document 1 does not disclose a method for obtaining an appropriate initial position after posture estimation.

そこで、本発明は、部分領域の識別結果を統合して対象物の姿勢を推定する姿勢推定処理において、推定した姿勢が位置する場所を精度良く求めることを目的とする。特に、姿勢を推定する時に利用した部分領域の識別結果と、推定した姿勢の結果を有効に利用することにより対象物の位置を算出し、位置と姿勢を精度良く求めることを目的とする。 In view of this, an object of the present invention is to accurately obtain a place where an estimated posture is located in a posture estimation process for estimating the posture of an object by integrating the identification results of partial areas. In particular, the object is to calculate the position of an object by effectively using the partial region identification result used when estimating the posture and the estimated posture result, and to obtain the position and posture with high accuracy.

上記の目的を達成するために、本発明に係る情報処理装置は、
対象物のそれぞれ異なる部分領域を識別する部分領域識別部と、部分領域の識別結果を統合して、前記対象物の姿勢を推定する姿勢推定部と、推定した姿勢に基づいて前記部分領域の識別結果を選択する部分領域選択部と、選択した部分領域に関連づけられた代表位置を示す情報から、対象物の代表位置を算出する代表位置算出部と、前記代表位置と前記推定した姿勢から、前記対象物の位置・姿勢を算出する位置姿勢決定部を備えることを特徴とする。 In order to achieve the above object, an information processing apparatus according to the present invention provides:
A partial region identifying unit that identifies different partial regions of the object, a posture estimation unit that estimates the posture of the target by integrating the identification results of the partial regions, and identification of the partial region based on the estimated posture From the partial region selection unit that selects the result, the representative position calculation unit that calculates the representative position of the object from the information indicating the representative position associated with the selected partial region, the representative position and the estimated posture, A position / orientation determination unit for calculating the position / orientation of the object is provided.

本発明に係る情報処理装置によれば、対象物の位置と姿勢を精度良く推定することができるようになる。 With the information processing apparatus according to the present invention, the position and orientation of an object can be estimated with high accuracy.

本発明の情報処理装置の構成を示す図である。It is a figure which shows the structure of the information processing apparatus of this invention. 本発明の情報処理装置の処理フローを示す図である。It is a figure which shows the processing flow of the information processing apparatus of this invention. 実施例１における記録部に予め記録しておく情報（学習結果）について説明する図である。It is a figure explaining the information (learning result) previously recorded on the recording part in Example 1. FIG. 実施例１における部分領域識別部と姿勢推定部の処理を説明する図である。It is a figure explaining the process of the partial region identification part and attitude | position estimation part in Example 1. FIG. 部分領域選択部と代表位置算出部の処理を説明する図である。It is a figure explaining the process of a partial region selection part and a representative position calculation part. 実施例１における部分領域選択部の処理フローを示す図である。It is a figure which shows the processing flow of the partial area | region selection part in Example 1. FIG. 位置姿勢決定部の処理を説明する図である。It is a figure explaining the process of a position and orientation determination part. 実施例２における記録部に予め記録しておく情報（学習結果）について説明する図である。It is a figure explaining the information (learning result) previously recorded on the recording part in Example 2. FIG. 実施例２における部分領域識別部と姿勢推定部の処理を説明する図である。It is a figure explaining the process of the partial region identification part and attitude | position estimation part in Example 2. FIG. 実施例２における部分領域選択部の処理フローを示す図である。It is a figure which shows the processing flow of the partial area | region selection part in Example 2. FIG.

以下、図面を利用して、本発明の実施形態について説明を行う。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

以下の実施例では対象物を人体とし、画像中の人体の姿勢と位置を推定する実施形態について説明する。なお、本実施例における人体の姿勢とは、人体の各関節の角度（人体の各部位の相対的位置関係）のみならず、人体のカメラに対する向き・角度を含むものとする。すなわち、同じ関節位置関係の人体を異なる方向から撮影した場合には、それぞれの画像中の人体は異なる姿勢とする。 In the following examples, an embodiment will be described in which the object is a human body and the posture and position of the human body in the image are estimated. Note that the posture of the human body in the present embodiment includes not only the angle of each joint of the human body (relative positional relationship of each part of the human body) but also the direction and angle of the human body with respect to the camera. That is, when human bodies having the same joint position relationship are photographed from different directions, the human bodies in the respective images have different postures.

＜全体構成＞
図１は本発明の情報処理装置１００の構成を説明する図である。また、図２は本発明の基本的な処理フローを説明する図である。 <Overall configuration>
FIG. 1 is a diagram illustrating the configuration of an information processing apparatus 100 according to the present invention. FIG. 2 is a diagram for explaining the basic processing flow of the present invention.

本発明の情報処理装置１００は、ネットワークまたは各種記録媒体を介して取得したソフトウェア（プログラム）を、ＣＰＵ、メモリ、ストレージデバイス、入出力装置、バス、表示装置などにより構成される計算機にて実行することで実現できる。また、不図示の計算機については、汎用の計算機を用いても良いし、本発明のソフトウェアに最適に設計されたハードウェアを用いても良い。 The information processing apparatus 100 according to the present invention executes software (program) acquired via a network or various recording media on a computer including a CPU, a memory, a storage device, an input / output device, a bus, a display device, and the like. This can be achieved. As a computer (not shown), a general-purpose computer may be used, or hardware optimally designed for the software of the present invention may be used.

まず、図１及び図２を用いて、本発明の基本的な処理を説明する。詳細な処理については、それぞれ後述する。 First, basic processing of the present invention will be described with reference to FIGS. 1 and 2. Detailed processing will be described later.

図１の画像入力部１０１は、処理対象とする画像を入力する部分で、図２のステップＳ２０１とＳ２０２を実行する。本実施例では、入力画像は距離画像である。距離画像は、画像中の各画素に奥行き方向の距離情報が所定のスケーリングで記録された画像であり、照射した光の到達時間を基に距離を計算する方式や、パターン光を投影して、パターンの変形から距離を計算する方法などにより取得することができる。画像入力部１０１は、これらの方法により距離画像を撮影するカメラ装置から距離画像を取得する。また、予め撮影した距離画像を保存した記録装置から、処理対象とする距離画像を画像入力部１０１に順次入力するようにしても良い。後述する学習画像と区別するために、以下では、人体の姿勢と位置を推定するために、情報処理装置１００に入力した距離画像を入力画像と呼ぶ。また、画像入力部１０１は、入力画像から人体領域を抽出する処理も行う。 The image input unit 101 in FIG. 1 is a part for inputting an image to be processed, and executes steps S201 and S202 in FIG. In this embodiment, the input image is a distance image. A distance image is an image in which distance information in the depth direction is recorded with a predetermined scaling on each pixel in the image, and a method for calculating a distance based on the arrival time of irradiated light, or projecting pattern light, It can be obtained by a method of calculating the distance from the deformation of the pattern. The image input unit 101 acquires a distance image from a camera device that captures a distance image by these methods. Further, a distance image to be processed may be sequentially input to the image input unit 101 from a recording apparatus that stores a previously captured distance image. In order to distinguish from a learning image, which will be described later, hereinafter, the distance image input to the information processing apparatus 100 in order to estimate the posture and position of the human body is referred to as an input image. The image input unit 101 also performs processing for extracting a human body region from the input image.

部分領域識別部１０２では、予め人体の部分画像を学習した部分領域識別器を用いて、入力画像の人体領域の各部分が、学習した部分領域のいずれに該当するかを識別する（ステップＳ２０３）。 The partial area identifying unit 102 identifies which of the learned partial areas each part of the human body area of the input image uses a partial area classifier that has previously learned a partial image of the human body (step S203). .

姿勢推定部１０３では、部分領域の識別結果を統合して、入力画像の人体が既知の姿勢のいずれに最も近いかを判定する（ステップＳ２０４）。部分領域選択部１０４では、部分領域識別部１０２で識別した部分領域の識別結果（ステップＳ２０３の結果）から、姿勢推定部１０３で判定した姿勢に基づいて、代表位置を算出するために利用する部分領域の識別結果を選択する（ステップＳ２０５）。代表位置算出部１０５では、選択した部分領域に関連づけられた情報を元に、人体の代表位置を算出する（ステップＳ２０６）。位置姿勢決定部１０６では、求めた代表位置に人体姿勢（関節位置関係）の情報を並進・回転することで位置合わせを行い（ステップＳ２０７）、人体の位置姿勢推定結果として出力する（ステップＳ２０８）。記録部１０７には、予め学習した情報が格納されており、必要に応じて呼び出される。 The posture estimation unit 103 integrates the partial region identification results and determines which of the known postures the human body of the input image is closest to (step S204). In the partial area selection unit 104, a part used for calculating the representative position based on the posture determined by the posture estimation unit 103 from the identification result of the partial region identified by the partial region identification unit 102 (result of step S203). A region identification result is selected (step S205). The representative position calculation unit 105 calculates the representative position of the human body based on the information associated with the selected partial area (step S206). The position and orientation determination unit 106 performs alignment by translating and rotating information on the human body posture (joint position relationship) to the obtained representative position (step S207), and outputs the result as a human body position and orientation estimation result (step S208). . Information that has been learned in advance is stored in the recording unit 107 and is called up as necessary.

なお、本実施例では、代表位置算出部１０５で求める人体の代表位置を、人体の中心位置、より具体的には腰関節位置とする。人体の代表位置は、これに限定されることなく、例えば、頭部位置を代表位置として求めても良い。 In this embodiment, the representative position of the human body obtained by the representative position calculation unit 105 is the center position of the human body, more specifically, the hip joint position. The representative position of the human body is not limited to this. For example, the head position may be obtained as the representative position.

＜準備＞
以上、人体の位置姿勢推定時の処理の概要について説明したが、これらの処理を行うためには、予め多数の人体距離画像から必要な情報を学習し、記録部１０７に情報を格納しておく必要がある。以下では図３を用いて、実施例１における学習と、記録部１０７に格納しておく情報について説明する。 <Preparation>
The outline of the process at the time of estimating the position and orientation of the human body has been described above. In order to perform these processes, necessary information is learned in advance from a large number of human body distance images, and the information is stored in the recording unit 107. There is a need. Hereinafter, learning in the first embodiment and information stored in the recording unit 107 will be described with reference to FIG.

図３の３０１、３０１、３０ｍは、ｍ枚の人体が写った距離画像である。以下では、学習に用いるこれらの距離画像を学習画像と呼ぶ。学習を行いやすくするため、学習画像中の人体は、画像中央に所定の大きさで写っているものとする。また、学習画像には、それぞれの姿勢に応じて、人体の主要部位を示す座標が付与されている。図３では、学習画像３０１〜３０ｍに主要部位と主要部位を接続する線を重畳表示しているが、実際は、距離画像と主要部位座標の情報はそれぞれ個別に保存されている。例えば、距離画像は画像フォーマットで保存され、主要部位の座標はテキストデータ等で保存されている。主要部位座標は、例えば、人体の主要な関節の位置など、人体の姿勢変化を示す人体部位の情報である。実施例１での主要部位は、図３の３０１〜３０ｍ上に示した丸印のように、四肢の関節位置や頭部位置を含む位置情報であり、以下ではまとめて関節位置と呼ぶ。 3, 301, 301, and 30m are distance images in which m human bodies are captured. Hereinafter, these distance images used for learning are referred to as learning images. In order to facilitate learning, it is assumed that the human body in the learning image is shown in a predetermined size in the center of the image. Moreover, the coordinate which shows the main site | part of a human body is provided to the learning image according to each attitude | position. In FIG. 3, lines that connect the main parts and the main parts are superimposed and displayed on the learning images 301 to 30m, but actually, the information on the distance image and the main part coordinates are individually stored. For example, the distance image is stored in an image format, and the coordinates of the main part are stored as text data or the like. The main part coordinates are information on a human body part indicating a posture change of the human body, such as the position of a main joint of the human body. The main part in Example 1 is positional information including joint positions and head positions of the limbs as indicated by the circles shown on 301 to 30 m in FIG. 3, and hereinafter collectively referred to as joint positions.

なお、実施例１では、人体代表位置を腰関節位置としているため、後述の位置姿勢決定部１０６では、各姿勢の腰関節位置が必要となる。従って、関節位置には腰の関節位置を含むものとする。また、関節位置の座標は、距離画像を撮影したカメラ位置を原点とした３次元座標系（カメラ座標系）で記述されているとする。 In the first embodiment, since the human body representative position is the hip joint position, the position / posture determination unit 106 described below requires the hip joint position of each posture. Accordingly, the joint position includes the hip joint position. Further, it is assumed that the coordinates of the joint position are described in a three-dimensional coordinate system (camera coordinate system) with the camera position where the distance image is captured as the origin.

記録部１０７には、学習画像から学習した情報として、部分領域情報３１１と姿勢情報３２１を格納する。以下では、それぞれの情報の学習について説明する。 The recording unit 107 stores partial area information 311 and posture information 321 as information learned from the learning image. Below, learning of each information is demonstrated.

まず、学習画像を用いて、部分領域情報３１１を学習する処理について説明する。部分領域情報３１１は、各部分領域ｓ１〜ｓｎについて、部分領域識別器３１２と、部分領域から人体代表位置へのベクトル（以下、代表位置ベクトル）３１３から構成される。図３では、代表位置ベクトル３１３について、各部分領域識別器に対応した記号ｖ_ｓ１、ｖ_ｓ２〜ｖ_ｓｎを付与している。部分領域識別器は、全ての学習画像について、似た部分姿勢の画像群を収集し、各部分姿勢の画像群を学習することにより準備する。このような部分領域識別器の学習は、例えば非特許文献１を参考に実施することができる。 First, a process for learning the partial region information 311 using a learning image will be described. The partial area information 311 includes a partial area identifier 312 and a vector from the partial area to the human body representative position (hereinafter, representative position vector) 313 for each of the partial areas s1 to sn. In FIG. 3, symbols v _s1 and v _{s2 to} v _sn corresponding to each partial region classifier are assigned to the representative position vector 313. The partial region classifier prepares by collecting image groups having similar partial poses for all learning images and learning the image groups having each partial pose. Such learning of the partial region classifier can be performed with reference to Non-Patent Document 1, for example.

［非特許文献１］
Lubomir Bourdev and Jitendra Malik,“Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations,” IEEE International Conference on Computer Vision, 2009年
非特許文献１では、２次元の人体画像に対して、各関節位置を示す座標情報を与え、似た関節位置関係を持つ人体部分画像を収集する。収集した人体部分画像のHOG特徴をSVMで学習し、様々な姿勢の部分領域識別器を作成している。本発明の部分領域識別器も非特許文献１と同様に作成すればよい。具体的には、各学習画像に付随する関節位置の情報を元に、部分的に似た姿勢の距離画像を切り出すとともに収集する。そして、収集した部分距離画像を、それぞれ適当な画像特徴と識別器で学習する。画像特徴と識別器の例としては、非特許文献１に習い、距離画像から作成したHOG特徴やSVMを利用することができる。特徴量と識別器は、これに限定することなく、他の公知の学習・識別方法を用いても良い。学習の結果、図３の３１２のような部分領域識別器が、部分領域ｓ１〜ｓｎについて得られる。図３では、図示のため、部分領域識別器を人体の部分画像で示しているが、実際には、部分領域識別器は画像特徴と識別器から構成される。 [Non-Patent Document 1]
Lubomir Bourdev and Jitendra Malik, “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” Information is given and human body partial images with similar joint positions are collected. HOG features of collected human body partial images are learned by SVM, and partial region classifiers with various postures are created. What is necessary is just to produce the partial area | region discriminator of this invention similarly to the nonpatent literature 1. FIG. Specifically, based on information on joint positions accompanying each learning image, a distance image having a partially similar posture is cut out and collected. Then, the collected partial distance images are learned with appropriate image features and classifiers, respectively. As examples of image features and classifiers, HOG features and SVMs created from distance images according to Non-Patent Document 1 can be used. The feature quantity and the discriminator are not limited to this, and other known learning / identification methods may be used. As a result of learning, a partial area discriminator 312 in FIG. 3 is obtained for the partial areas s1 to sn. In FIG. 3, for the sake of illustration, the partial area classifier is shown as a partial image of the human body, but actually, the partial area classifier is composed of image features and a classifier.

部分領域識別器の形態は、部分の姿勢を識別できれば、他の形態でも良い。部分領域の識別は、多クラスの識別となるため、多クラス識別に適した決定木などの方法が好適である。他にも、多クラスの部分領域クラス識別を実施する方法としては、近似的な最近傍探索手法（例えば、ハッシュを用いる方法）なども好適である。 The form of the partial region identifier may be other forms as long as the posture of the part can be identified. Since identification of partial areas is multi-class identification, a method such as a decision tree suitable for multi-class identification is suitable. In addition, an approximate nearest neighbor search method (for example, a method using a hash) is also suitable as a method for performing multi-class partial region class identification.

また、部分領域の学習時には、各部分領域から代表位置ベクトル３１３も学習する。前述したように、本実施例では人体代表位置を腰の関節位置であるとして、学習画像に付随した関節位置情報には腰の関節位置が含まれている。図３では、各部分領域ｓ１〜ｓｎについて、各部分領域の出現位置から腰関節位置へのベクトルｖ_ｓ１〜ｖ_ｓｎが合わせて学習されている様子を示している。ここで、ベクトルｖの始点（黒丸）は、部分領域の中心位置を示しており、終点（白丸）は腰関節位置を示している。また、ベクトルｖは３次元ベクトルで表されているものとする。すなわち、ベクトルｖの始点は、部分領域の中心位置の距離画像値から算出される３次元空間での位置で、ベクトルｖの終点は腰関節位置の３次元位置を示す。 Further, when learning partial areas, the representative position vector 313 is also learned from each partial area. As described above, in this embodiment, the human body representative position is the hip joint position, and the joint position information accompanying the learning image includes the hip joint position. FIG. 3 shows a state in which the vectors v _s1 to v _sn from the appearance positions of the partial areas to the hip joint positions are learned together for the partial areas s1 to _sn . Here, the start point (black circle) of the vector v indicates the center position of the partial region, and the end point (white circle) indicates the hip joint position. The vector v is assumed to be represented by a three-dimensional vector. That is, the starting point of the vector v is a position in the three-dimensional space calculated from the distance image value of the center position of the partial region, and the end point of the vector v indicates the three-dimensional position of the hip joint position.

各部分領域に関連づけられる代表位置ベクトルｖは、各部分領域識別器を学習した時に利用した部分姿勢が似た距離画像群を用いて求める。具体的には以下の手順により代表位置ベクトルを求める。まず、ある部分領域の学習に用いた部分領域画像群について、各部分領域画像の中心位置を求める。中心位置は、カメラ座標系の３次元位置として算出する。次に、各部分領域画像について、中心位置を原点として、腰の関節位置方向へのベクトルを算出する。これらの各部分領域画像のベクトルの平均ベクトルを、ある部分領域の代表位置ベクトルとする。代表位置ベクトルの求め方は、平均に限らず、中央値を求める方法や、mean shiftにより極値を求める方法などで算出しても良い。また、ベクトルの分布のピークが複数となる場合には、ひとつの部分領域識別器に複数の代表位置ベクトルを関連づけても良い。 The representative position vector v associated with each partial region is obtained using a range of distance images having similar partial postures used when learning each partial region classifier. Specifically, the representative position vector is obtained by the following procedure. First, the center position of each partial area image is obtained for a partial area image group used for learning a certain partial area. The center position is calculated as a three-dimensional position in the camera coordinate system. Next, for each partial region image, a vector in the hip joint position direction is calculated with the center position as the origin. An average vector of the vectors of these partial area images is set as a representative position vector of a partial area. The method of obtaining the representative position vector is not limited to the average, and may be calculated by a method of obtaining a median value, a method of obtaining an extreme value by mean shift, or the like. When there are a plurality of vector distribution peaks, a plurality of representative position vectors may be associated with one partial region classifier.

以上のように、部分領域（部分姿勢）ｓ１〜ｓｎについて、部分領域識別器３１２と、各部分領域が示す代表位置の情報（代表位置ベクトル）ｖ_ｓ１〜ｖ_ｓｎを合わせて部分領域情報３１１として記録部１０７に格納する。 As described above, for the partial areas (partial postures) s1 to sn, the partial area identifier 312 and the representative position information (representative position vectors) v _{s1 to} v _sn indicated by the partial areas are combined as the partial area information 311. Store in the recording unit 107.

次に、予め記録部１０７に格納する別の情報として、姿勢情報３２１について説明する。 Next, posture information 321 will be described as other information stored in the recording unit 107 in advance.

図３には、人体の姿勢ｐ１〜ｐｍについて、各姿勢における各関節座標（関節位置関係）３２２が格納されている様子を示している。本実施例では、記録部１０７に記録する姿勢数、すなわち推定する姿勢数を、学習画像の姿勢数ｍと同じとしている。従って、各姿勢の関節位置関係３２２は、学習画像３０１〜３０ｍそれぞれに付随した関節位置関係を保存することで得られる。前述したように、学習画像に付随した関節位置関係の座標は、カメラ座標系で保存されている。記録部１０７に格納する関節位置関係３２２は、図８では図示して示しているが、実際にはカメラ座標系での各関節位置座標を保存する。なお、推定する姿勢数（記録部１０７に格納する姿勢数）ｍは、学習画像のうち、似た姿勢を削除するなどして減少させても良い。あるいは、学習画像は、似た姿勢の画像を削除して準備しても良い。なお、姿勢推定部では、入力画像の人体が姿勢ｐ１〜ｐｍのいずれに最も似ているかを推定する。以下、特に姿勢推定の説明で、姿勢ｐ１〜ｐｍを姿勢クラスと呼ぶことがある。 FIG. 3 shows a state in which each joint coordinate (joint position relationship) 322 in each posture is stored for the postures p1 to pm of the human body. In this embodiment, the number of postures recorded in the recording unit 107, that is, the estimated number of postures, is the same as the number of postures m of the learning image. Therefore, the joint position relationship 322 of each posture is obtained by storing the joint position relationship associated with each of the learning images 301 to 30m. As described above, the coordinates of the joint position relationship associated with the learning image are stored in the camera coordinate system. Although the joint position relationship 322 stored in the recording unit 107 is illustrated in FIG. 8, each joint position coordinate in the camera coordinate system is actually stored. Note that the number of postures to be estimated (the number of postures stored in the recording unit 107) m may be decreased by deleting a similar posture from the learning image. Alternatively, the learning image may be prepared by deleting an image with a similar posture. Note that the posture estimation unit estimates which of the postures p1 to pm the human body of the input image is most similar to. Hereinafter, in the description of posture estimation, the postures p1 to pm may be referred to as posture classes.

次に、姿勢情報３２１に含まれる情報として、部分領域ヒストグラム３２３について説明する。本実施例での部分領域ヒストグラム３２３は、入力画像の部分領域識別結果から姿勢を求める処理（姿勢推定部１０３の処理）と、推定した姿勢に応じて部分領域を選択する処理（部分領域選択部１０４の処理）に利用される。本実施例では、この２つの処理について部分領域ヒストグラムを共通して利用しているが、部分領域選択部１０４の実施形態によっては、部分領域ヒストグラムから別の情報を作成し、記録部１０７に格納しても良い。この詳細については、実行時の説明と合わせて後述する。 Next, the partial region histogram 323 will be described as information included in the posture information 321. The partial area histogram 323 in the present embodiment is a process for obtaining a posture from the partial region identification result of the input image (processing of the posture estimation unit 103) and a process of selecting a partial region according to the estimated posture (partial region selection unit) 104). In this embodiment, the partial area histogram is used in common for these two processes. However, depending on the embodiment of the partial area selection unit 104, another information is created from the partial area histogram and stored in the recording unit 107. You may do it. Details of this will be described later together with the description at the time of execution.

部分領域ヒストグラム３２３（ｈ_ｐ１〜ｈ_ｐｍ）は、図３に示すように、各姿勢ｐ１〜ｐｍそれぞれについて準備する。部分領域ヒストグラムは、横軸に部分領域のクラス番号ｓ１〜ｓｎ（前述の部分領域識別器の番号ｓ１〜ｓｎ）をとり、縦軸は各姿勢における部分領域の出現確率である。部分領域ヒストグラムは、後述する姿勢推定実行時と同様に、各学習画像に対して、全ての部分領域識別器を実行し、姿勢クラスごと（学習画像ごと）に各部分領域識別器の結果が得られる確率を算出することで作成する。姿勢推定実行時には、入力画像に対して部分領域識別器を実施し、入力画像の部分領域ヒストグラムを作成した後、各姿勢の部分領域ヒストグラムｈ_ｐ１〜ｈ_ｐｍと比較することで、最も似た姿勢を決定する。 The partial region histogram 323 (h _{p1 to} h _pm ) is prepared for each posture _{p1 to} _pm , as shown in FIG. In the partial region histogram, the horizontal axis represents the partial region class numbers s1 to sn (the partial region identifier numbers s1 to sn described above), and the vertical axis represents the appearance probability of the partial region in each posture. As in the case of the posture estimation execution described later, the partial region histogram is obtained by executing all the partial region classifiers for each learning image and obtaining the result of each partial region classifier for each posture class (each learning image). Create by calculating the probability of being. At the time of posture estimation execution, a partial region classifier is executed on the input image, a partial region histogram of the input image is created, and then compared with the partial region histograms h _{p1 to} h _{pm of} each posture, the most similar posture To decide.

以上の各姿勢の関節位置関係３２２と部分領域ヒストグラム３２３を、姿勢情報３２１として、記録部１０７に格納する。 The joint position relationship 322 and the partial region histogram 323 of each posture described above are stored in the recording unit 107 as posture information 321.

以上、図３を用いて、予め記録部１０７に格納しておく情報について説明した。 The information stored in the recording unit 107 in advance has been described above with reference to FIG.

＜前処理＞
以下では、姿勢推定実行時における各処理について詳細な説明を行う。 <Pretreatment>
Hereinafter, each process at the time of executing posture estimation will be described in detail.

画像入力部１０１では、前述したように、姿勢推定を実行する入力画像を読み込む。また、前処理として入力画像から人体領域を抽出する。人体領域の抽出処理では、まず、距離画像に対して、背景差分処理を行うことで、前景候補画素のみを抽出する。次に、前景候補画素の距離値をカメラ座標系（３次元座標系）の点群へと変換する。３次元の点群の固まりの中心位置を求め、中心位置周辺の点群の内、人体サイズに収まる範囲に存在する点を人体領域とする。人体領域とラベル付けされた点群を再び画像平面に投影することで、人体領域が抽出された距離画像を取得することができる。人体領域抽出方法は、これに限らず公知の手法を用いればよい。 As described above, the image input unit 101 reads an input image for performing posture estimation. Further, a human body region is extracted from the input image as preprocessing. In the human body region extraction processing, first, only foreground candidate pixels are extracted by performing background difference processing on the distance image. Next, the distance value of the foreground candidate pixel is converted into a point group in the camera coordinate system (three-dimensional coordinate system). The center position of the cluster of the three-dimensional point group is obtained, and a point existing in a range within the human body size in the point group around the center position is set as a human body region. By projecting the point cloud labeled as the human body region onto the image plane again, a distance image from which the human body region is extracted can be acquired. The human body region extraction method is not limited to this, and a known method may be used.

＜部分領域識別部＞
図４は部分領域識別部１０２と姿勢推定部１０３の処理を説明する図である。 <Partial area identification unit>
FIG. 4 is a diagram for explaining the processing of the partial region identification unit 102 and the posture estimation unit 103.

部分領域識別部１０２では、入力画像の人体領域に対して、部分領域の識別を行う。図４の４０１は、入力画像から抽出した人体領域で、４０２は部分領域の識別結果である。部分領域識別部１０２では、記録部１０７に格納されている部分領域情報３１１の部分領域識別器を用いて部分領域の識別を行う。また、部分領域識別部１０２では、入力画像の部分領域ヒストグラム４０５（ｈ_{ｉｎｐｕｔ}）を作成する。 The partial area identifying unit 102 identifies a partial area for the human body area of the input image. In FIG. 4, 401 is a human body region extracted from the input image, and 402 is a partial region identification result. The partial area identification unit 102 identifies the partial area using the partial area identifier of the partial area information 311 stored in the recording unit 107. In addition, the partial area identification unit 102 creates a partial area histogram 405 (h _input ) of the input image.

以下、部分領域識別部１０２の具体的な処理について説明する。まず、人体領域上の各画素を中心とした所定サイズの部分領域を切り出し、記録部１０７に格納された各部分領域識別器ｓ１〜ｓｎによる識別処理を行う。各画素の識別結果が、ある部分領域クラスｓｘに相当すると判断される場合には、部分領域ヒストグラムｈ_{ｉｎｐｕｔ}のクラスｓｘのビンにヒストグラムを加算する。図４では、部分領域４０２が、部分領域クラスｓ４に相当すると識別され、部分領域ヒストグラムｈ_{ｉｎｐｕｔ}のｓ４ビンにヒストグラムが加算されている。部分領域ヒストグラムｈ_{ｉｎｐｕｔ}は、全人物領域の探索後に正規化し、部分領域クラスの出現確率とする。また、部分領域識別器の実施形態によっては、各画素（各部分領域）から、複数の部分領域の可能性が得られる場合がある。例えば、部分領域識別器の識別方法として、複数の決定木により部分領域クラスを識別する場合には、ある部分領域の識別において、各決定木で異なる複数の部分領域クラスの結果を取得することがある。このような場合には、ある部分領域の識別結果について、複数の部分領域クラスのビンにヒストグラムを加算するようにしても良い。また、部分領域識別器の実施形態によっては、部分領域識別器が部分領域クラスの識別と同時にその識別結果の信頼度を出力させることができる。例えば、識別器としてSVMを用いる場合には、識別境界（超平面）からの距離を信頼度としても良い。このように、識別結果の信頼度が得られる場合には、ヒストグラムへの加算時に部分領域の識別結果信頼度に応じて、重み付けした加算を行っても良い。さらにまた、上記では、人体領域上の全画素について、各画素を中心とした部分領域に対して部分領域識別を実行する方法について説明したが、処理時間の観点から、処理対象とする部分領域を削減しても良い。例えば、中心画素を所定ステップ刻みで設定する、人体領域上からランダムに中心画素を設定する、などの処理により、部分領域の識別回数を削減するようにしても良い。 Hereinafter, specific processing of the partial region identification unit 102 will be described. First, a partial area of a predetermined size centered on each pixel on the human body area is cut out, and identification processing is performed by each of the partial area classifiers s1 to sn stored in the recording unit 107. If it is determined that the identification result of each pixel corresponds to a certain partial region class sx, the histogram is added to the bin of the class sx of the partial region histogram h _input . In FIG. 4, the partial area 402 is identified as corresponding to the partial area class s4, and a histogram is added to the s4 bin of the partial area histogram h _input . The partial area histogram h _input is normalized after searching all human areas, and is set as the appearance probability of the partial area class. Also, depending on the embodiment of the partial area classifier, the possibility of a plurality of partial areas may be obtained from each pixel (each partial area). For example, when a partial region class is identified by a plurality of decision trees as a method for identifying a partial region classifier, the result of a plurality of partial region classes that are different in each decision tree may be acquired in identifying a partial region. is there. In such a case, histograms may be added to bins of a plurality of partial area classes for a partial area identification result. Further, depending on the embodiment of the partial area classifier, the partial area classifier can output the reliability of the identification result simultaneously with the identification of the partial area class. For example, when SVM is used as the discriminator, the distance from the discrimination boundary (hyperplane) may be used as the reliability. In this way, when the reliability of the identification result is obtained, weighted addition may be performed according to the identification result reliability of the partial region when adding to the histogram. Furthermore, in the above description, a method for executing partial region identification for a partial region centered on each pixel for all pixels on the human body region has been described. However, from the viewpoint of processing time, a partial region to be processed is selected. It may be reduced. For example, the number of partial area identifications may be reduced by processing such as setting the center pixel in predetermined step increments, or setting the center pixel randomly from the human body area.

＜姿勢推定部＞
次に、姿勢推定部１０３の処理について説明する。姿勢推定部１０３では、部分領域識別部１０２の出力である入力画像の部分領域ヒストグラムｈ_{ｉｎｐｕｔ}と、記録部１０７に記録された姿勢情報３２１の、各姿勢の部分領域ヒストグラムｈ_ｐ１〜ｈ_ｐｍを比較し、最も近い姿勢クラス（全身の姿勢）を決定する。ヒストグラム間の比較と姿勢決定は、ｈ_{ｉｎｐｕｔ}とｈ_ｐ１〜ｈ_ｐｍそれぞれの類似度を計算し、最も類似度が高い姿勢を決定することで実行できる。類似度算出方法としては、ＫＬダイバージェンスやバタチャリヤ距離、Earth Mover's Distanceなどを利用することができる。ヒストグラム間の比較は、この処理に限らず、その他の最近傍探索処理により実施しても良い。姿勢推定部１０３は、決定した最近傍姿勢の関節位置関係の情報を記録部１０７から取得し、位置・姿勢決定部１０６へと出力する。ここで、関節位置関係の情報は、図４の４０４のように、各関節Ｊ１〜Ｊ１６（関節位置が１６個の場合）のカメラ座標系での座標であり、記録部１０７の関節位置情報をそのまま出力する。 <Attitude estimation unit>
Next, processing of the posture estimation unit 103 will be described. The posture estimation unit 103 compares the partial region histogram h _input of the input image, which is the output of the partial region identification unit 102, with the partial region histograms h _{p1 to} h _pm of each posture of the posture information 321 recorded in the recording unit 107. And determine the closest posture class (body posture). Comparison between histograms and posture determination can be performed by calculating the similarity between h _input and h _p1 to h _pm and determining the posture with the highest similarity. As a similarity calculation method, KL divergence, Batachariya distance, Earth Mover's Distance, or the like can be used. Comparison between histograms is not limited to this processing, and may be performed by other nearest neighbor search processing. The posture estimation unit 103 acquires information on the joint position relationship of the determined nearest neighbor posture from the recording unit 107 and outputs the information to the position / posture determination unit 106. Here, the information on the joint position is the coordinates in the camera coordinate system of each joint J1 to J16 (when there are 16 joint positions) as indicated by 404 in FIG. Output as is.

＜部分領域選択部・代表位置決定部＞
以上の処理により、入力画像中の人体の姿勢が推定されたが、この時点では、その姿勢がどの位置に存在するかについては定まっていない。画像入力部１０１で、入力距離画像から人体領域を抽出しているので、人体の大まかな位置は既知となっている。しかし、画像上、または入力画像のカメラ座標系で、姿勢推定部１０３で得た関節位置関係の座標４０４が、どの座標に存在するかは明らかでない。抽出した人体領域の中心位置を算出し、関節位置関係の中心位置を合わせることで、人体の各関節が存在する位置を求めることもできるが、距離画像ノイズの影響などにより、常に精度良く人体領域の中心位置を求めることは困難である。従って、本発明では、予め部分領域ごとに格納した代表位置を示す情報（代表位置ベクトル）を元に代表位置を算出し、代表位置に関節位置関係を位置合わせすることで、より正確に人体姿勢の位置を求める。このための処理を、以下の部分領域選択部１０４と、代表位置算出部１０５で実施する。 <Partial area selection unit / representative position determination unit>
Through the above processing, the posture of the human body in the input image is estimated, but at this point in time, the position of the posture is not determined. Since the human body region is extracted from the input distance image by the image input unit 101, the rough position of the human body is known. However, it is not clear at which coordinates the coordinates 404 of the joint position obtained by the posture estimation unit 103 exist on the image or in the camera coordinate system of the input image. By calculating the center position of the extracted human body area and matching the center positions of the joint position relationship, it is possible to determine the position where each joint of the human body exists, but due to the influence of distance image noise etc., the human body area is always accurate. It is difficult to obtain the center position of. Therefore, in the present invention, the human body posture is more accurately calculated by calculating the representative position based on the information (representative position vector) indicating the representative position stored in advance for each partial region and aligning the joint position relationship with the representative position. Find the position of. Processing for this is performed by the following partial region selection unit 104 and representative position calculation unit 105.

部分領域選択部１０４と代表位置算出部１０５では、入力画像中の人体の代表位置を算出し、姿勢推定部１０３で得た関節位置関係（姿勢）が対象空間中で位置する座標を算出する。これらの処理の概要図を図５に示す。 The partial region selection unit 104 and the representative position calculation unit 105 calculate the representative position of the human body in the input image, and calculate the coordinates at which the joint positional relationship (posture) obtained by the posture estimation unit 103 is located in the target space. A schematic diagram of these processes is shown in FIG.

図５（Ａ）は部分領域の識別結果と、識別した部分領域の代表位置ベクトルが示す腰関節位置を表している。なお、図５（Ａ）では図示の都合上、４つの部分領域識別結果を示しているが、実際は、人体領域５０１上の各画素（あるいは所定ステップの画素）を中心とした部分領域識別結果が存在する。 FIG. 5A shows the identification result of the partial area and the hip joint position indicated by the representative position vector of the identified partial area. In FIG. 5A, four partial region identification results are shown for the sake of illustration, but in actuality, partial region identification results centered on each pixel (or pixel in a predetermined step) on the human body region 501 are shown. Exists.

記録部１０７には、部分領域ごとに、代表位置ベクトルｖが格納されている。従って、人体領域上のある画素を中心とした部分領域識別を行い、その部分領域がある部分領域クラスであるという結果が得られると、同時に、その画素位置から代表位置ベクトル（腰関節位置へのベクトル）が得られる。例えば、部分領域識別結果５０２からは、腰関節位置を示す方向ベクトル５０５が得られる。方向ベクトル５０５は、より具体的には、その画素位置の距離値から求められる３次元空間座標から、腰関節位置を示す３次元ベクトルである。 The recording unit 107 stores a representative position vector v for each partial area. Therefore, when a partial region identification is performed with a certain pixel on the human body region as the center and a result that the partial region is a partial region class is obtained, at the same time, the representative position vector (from the hip joint position to the hip joint position) Vector). For example, a direction vector 505 indicating the hip joint position is obtained from the partial region identification result 502. More specifically, the direction vector 505 is a three-dimensional vector indicating a hip joint position from three-dimensional space coordinates obtained from the distance value of the pixel position.

ここで、部分領域の識別結果は、常に正しく得られるとは限らない。図５（Ａ）での部分領域識別結果５０３は、誤った識別結果が得られている場合を示している。その結果、代表位置ベクトル５０４も誤った方向を向いている。このような、誤った部分領域識別結果に関連する代表位置ベクトルを用いて代表位置を求めると、代表位置（腰関節位置）が正しく算出できない。本発明では、姿勢推定部１０３で求めた姿勢に基づいて、誤った部分領域識別結果を削除することにより、正しく人体代表位置を示すベクトルのみを抽出して、代表位置を精度良く求めることを目的としている。すなわち、本発明の趣旨は、推定した姿勢に適合する部分領域識別結果を選択し、選択した部分領域から代表位置を求めることにある。各姿勢に本来含まれる部分領域のみを選択し、代表位置算出に利用することで、精度良く代表位置を算出できるようになる。 Here, the identification result of the partial area is not always obtained correctly. The partial area identification result 503 in FIG. 5A shows a case where an incorrect identification result is obtained. As a result, the representative position vector 504 is also in the wrong direction. If the representative position is obtained using such a representative position vector related to the erroneous partial region identification result, the representative position (the hip joint position) cannot be calculated correctly. An object of the present invention is to accurately extract only a vector that represents a human body representative position by deleting an erroneous partial region identification result based on the posture obtained by the posture estimation unit 103, and to obtain a representative position with high accuracy. It is said. That is, the gist of the present invention is to select a partial region identification result that matches the estimated posture and to obtain a representative position from the selected partial region. By selecting only a partial area originally included in each posture and using it for representative position calculation, the representative position can be calculated with high accuracy.

推定姿勢に適合する部分領域選択方法として、本実施例では、記録部１０７に格納された各姿勢の部分領域ヒストグラムｈ_ｐを利用する。部分領域選択部１０４では、姿勢推定部１０３で推定した姿勢クラスの部分領域ヒストグラムを記録部１０７から呼び出す。各姿勢の部分領域ヒストグラムは、その姿勢クラスに含まれる部分領域の情報であり、ヒストグラムのビンは各部分領域の出現確率となっている（図３の３２３）。従って、推定した姿勢の部分領域ヒストグラムについて、所定以上の確率値を示すビンの部分領域を選択する部分領域とすることで、推定姿勢に基づいて部分領域を選択することができる。部分領域選択部１０４では、このよう部分領域ヒストグラムに基づいた部分領域選択基準によって、人物領域中の部分領域識別結果を選択する。 As compatible partial region selection method on the estimated posture, in this embodiment, utilizes a partial region histogram h _p of the posture stored in the recording unit 107. The partial region selection unit 104 calls the partial region histogram of the posture class estimated by the posture estimation unit 103 from the recording unit 107. The partial region histogram of each posture is information on the partial region included in the posture class, and the bin of the histogram is the appearance probability of each partial region (323 in FIG. 3). Therefore, the partial region histogram of the estimated posture can be selected based on the estimated posture by selecting a partial region of the bin indicating a probability value equal to or higher than a predetermined value. The partial area selection unit 104 selects a partial area identification result in the person area based on the partial area selection criterion based on the partial area histogram.

なお、上記では、部分領域ヒストグラムをそのまま部分領域選択基準として利用する方法について説明したが、各姿勢に含まれるとする部分領域を、予め部分領域選択基準として記録部１０７に記録しておいても良い。上記の部分領域ヒストグラムを利用した方法と同様の処理を行うためには、部分領域ヒストグラムに基づいて、各姿勢クラスとその姿勢に含まれる部分領域クラスの対応表を部分領域選択基準として準備し、姿勢クラスごとに記録部１０７に格納しておく。部分領域選択部では、姿勢推定結果に基づき、対応表から選択する部分領域の情報を取得すればよい。 In the above description, the method of using the partial region histogram as it is as a partial region selection criterion has been described. However, a partial region included in each posture may be recorded in the recording unit 107 in advance as a partial region selection criterion. good. In order to perform the same processing as the method using the partial area histogram described above, based on the partial area histogram, prepare a correspondence table of each posture class and the partial region class included in the posture as a partial region selection criterion, Each posture class is stored in the recording unit 107. The partial area selection unit may acquire information on the partial area to be selected from the correspondence table based on the posture estimation result.

図６に、部分領域選択部１０４の処理フローを示す。 FIG. 6 shows a processing flow of the partial area selection unit 104.

まず、ステップＳ６０１では、姿勢推定部１０３で推定した姿勢に基づいて、部分領域選択基準を決定する。部分領域選択基準を部分領域ヒストグラムから決定する実施形態では、まず、記録部１０７から推定した姿勢の部分領域ヒストグラムを呼び出す。次に、所定確率以上の部分領域クラスを決定し、部分領域選択基準とする。部分領域選択基準が、既に記録部１０７に保存されている場合、単純に推定した姿勢の部分領域選択基準を記録部１０７から呼び出す。 First, in step S601, a partial region selection criterion is determined based on the posture estimated by the posture estimation unit 103. In the embodiment in which the partial region selection criterion is determined from the partial region histogram, first, the partial region histogram of the posture estimated from the recording unit 107 is called. Next, a partial region class having a predetermined probability or higher is determined and used as a partial region selection criterion. When the partial region selection criterion is already stored in the recording unit 107, the partial region selection criterion of the simply estimated posture is called from the recording unit 107.

次に、ステップＳ６０２では、入力画像の人体領域の部分領域識別結果から１つを選択し、その識別結果の部分領域クラスと、部分領域選択基準に含まれる部分領域クラスを比較する。部分領域識別結果の部分領域クラスが部分領域選択基準に含まれている場合は、次のステップＳ６０３で基準に適合していると判定し、その部分領域を選択するステップＳ６０４へと進む。一方、識別結果の部分領域クラスが部分領域選択基準に含まれていない場合は、ステップＳ６０３で基準に適合していないと判定し、その部分領域を破棄するステップＳ６０５へと進む。 Next, in step S602, one is selected from the partial region identification results of the human body region of the input image, and the partial region class of the identification result is compared with the partial region class included in the partial region selection criterion. If the partial region class of the partial region identification result is included in the partial region selection criterion, it is determined in the next step S603 that the criterion is met, and the process proceeds to step S604 for selecting the partial region. On the other hand, if the partial region class of the identification result is not included in the partial region selection criterion, it is determined in step S603 that the criterion is not met, and the process proceeds to step S605 where the partial region is discarded.

ステップＳ６０６では、人体領域の部分領域識別結果の全てをチェックしたかを確認し、完了していなければ、次の部分領域識別結果についてステップＳ６０２〜ステップＳ６０５の処理を行い、部分領域を採択するか、破棄するかのチェックを実施する。 In step S606, it is confirmed whether all the partial region identification results of the human body region have been checked. If not completed, whether the partial region is selected by performing the processing of steps S602 to S605 for the next partial region identification result. Check whether to discard.

図５（Ｂ）は部分領域選択部１０４の処理結果を示す図で、推定した姿勢に関連する部分領域識別結果のみを選択した結果を示している。図５（Ａ）と比較すると、推定した姿勢の部分領域選択基準に含まれていなかった部分領域識別結果５０３が識別結果から除外されている。 FIG. 5B is a diagram illustrating a processing result of the partial region selection unit 104, and illustrates a result of selecting only a partial region identification result related to the estimated posture. Compared to FIG. 5A, the partial region identification result 503 that was not included in the partial region selection criterion of the estimated posture is excluded from the identification result.

代表位置算出部１０５は、部分領域選択部１０４で選択された部分領域識別結果に関連する代表位置ベクトルから代表位置を算出する。図５（Ｃ）は、図５（Ｂ）の部分領域結果から得られる腰中心位置を示すベクトルを集計して得られた代表位置５１０を示す。代表位置５１０は、図５（Ｂ）の各代表位置ベクトルの終点が示す座標の平均や中央値、分布の極値などを算出することで得ることができる。なお、本実施例では、各代表位置方向のベクトルは、部分領域中心の人体表面（距離画像の距離値）を基準とした３次元ベクトルであり、代表位置５１０は３次元座標として得られる。 The representative position calculation unit 105 calculates a representative position from the representative position vector related to the partial region identification result selected by the partial region selection unit 104. FIG. 5C shows a representative position 510 obtained by aggregating vectors indicating waist center positions obtained from the partial region results of FIG. The representative position 510 can be obtained by calculating the average, median, extreme value of the distribution, etc. indicated by the end points of the representative position vectors in FIG. In the present embodiment, each representative position direction vector is a three-dimensional vector based on the human body surface (distance value of the distance image) at the center of the partial region, and the representative position 510 is obtained as a three-dimensional coordinate.

＜位置姿勢決定部＞
以上の処理により、姿勢推定部１０３からは、人体姿勢（人体の関節位置の相対位置関係）が得られ、代表位置算出部１０５からは、代表位置（人体の腰関節位置の座標）が得られた。位置姿勢決定部１０６では、これらの情報を統合し、最終的な出力を生成する。具体的には、姿勢推定部１０３で得た姿勢の関節位置関係を保持したまま、その腰関節位置を代表位置算出部１０５で得た姿勢推定時のカメラ座標系での腰関節位置に移動させる。 <Position and orientation determination unit>
Through the above processing, the posture estimation unit 103 obtains the human body posture (relative positional relationship between the joint positions of the human body), and the representative position calculation unit 105 obtains the representative position (coordinates of the hip joint position of the human body). It was. The position and orientation determination unit 106 integrates these pieces of information and generates a final output. Specifically, the hip joint position is moved to the hip joint position in the camera coordinate system at the time of posture estimation obtained by the representative position calculation unit 105 while maintaining the joint position relationship of the posture obtained by the posture estimation unit 103. .

図７には、位置姿勢決定部１０６の処理を説明する図を示す。図７（Ａ）は代表位置算出部１０５で算出した代表位置を示している。代表位置７１０は、姿勢推定時のカメラ座標系の３次元座標で表される。図７（Ｂ）は、姿勢推定部１０３の結果の人体姿勢を表している。 FIG. 7 is a diagram illustrating the processing of the position / orientation determination unit 106. FIG. 7A shows the representative position calculated by the representative position calculation unit 105. The representative position 710 is represented by three-dimensional coordinates in the camera coordinate system at the time of posture estimation. FIG. 7B shows the human body posture as a result of the posture estimation unit 103.

人体姿勢は、学習カメラ座標系における人体の各関節位置座標７１２として表される。ここで、関節位置座標７１２には腰関節位置座標７１１が含まれている。代表位置決定部１０６では、関節位置座標７１２の位置関係を保ったまま、並進・回転処理により、姿勢推定結果の腰関節位置７１１を姿勢推定カメラ座標系の代表位置７１０に移動させる。並進・回転は、腰関節位置７１１と代表位置７１０の差異（平行移動とカメラ原点の回転角）を基に並進・回転量を算出して実施する。この結果、図７（Ｃ）のように、代表位置７１０の位置に腰関節位置が位置した各関節位置座標の情報が生成される。代表位置算出部１０５は、この姿勢推定時のカメラ座標系での各関節位座標を人体の位置・姿勢推定結果として出力する。 The human body posture is represented as each joint position coordinate 712 of the human body in the learning camera coordinate system. Here, the joint position coordinates 712 include the hip joint position coordinates 711. The representative position determination unit 106 moves the hip joint position 711 of the posture estimation result to the representative position 710 of the posture estimation camera coordinate system by translation / rotation processing while maintaining the positional relationship of the joint position coordinates 712. Translation / rotation is performed by calculating a translation / rotation amount based on a difference (translation and rotation angle of the camera origin) between the hip joint position 711 and the representative position 710. As a result, as shown in FIG. 7C, information of each joint position coordinate in which the hip joint position is located at the representative position 710 is generated. The representative position calculation unit 105 outputs each joint position coordinate in the camera coordinate system at the time of posture estimation as a position / posture estimation result of the human body.

本実施例の位置姿勢決定部１０６が出力する人体姿勢は、記録部１０７に格納されている既知の姿勢から選択された姿勢となる。従って、大まかな姿勢は一致していても、手足などの末端部分が入力距離画像の人体領域から外れる場合が生じる。大まかな姿勢が判断できれば良い用途であれば、位置姿勢決定部１０６の結果をそのまま利用しても良いが、関節位置など姿勢の精度がさらに要求される場合には、位置姿勢決定部１０６の精度では不十分な場合がある。 The human body posture output from the position / posture determination unit 106 according to the present embodiment is a posture selected from known postures stored in the recording unit 107. Therefore, even if the rough postures coincide with each other, a terminal portion such as a limb may be out of the human body region of the input distance image. If the rough posture can be determined, the result of the position / orientation determination unit 106 may be used as it is. However, when the accuracy of the posture such as the joint position is further required, the accuracy of the position / orientation determination unit 106 may be used. Sometimes it is not enough.

この問題を解決するために、位置姿勢決定部１０６の出力を初期値として、さらにフィッティング処理を実施しても良い。フィッティング処理には、例えば、複数パーツを持つ人体モデルを、ICPなどの手法により入力距離画像にフィッティングするような処理が利用できる。フィッティング処理は、既存の技術を利用すれば良く、詳細は本発明の趣旨とは異なるので、ここでの詳細説明は省略する。後処理として、フィッティング処理を行う場合にも、初期値となる位置・姿勢の精度は非常に重要となるため、本発明の方法により位置・姿勢を算出することによって全体の性能向上が期待できる。 In order to solve this problem, the fitting process may be further performed using the output of the position / orientation determination unit 106 as an initial value. For the fitting process, for example, a process of fitting a human body model having a plurality of parts to an input distance image by a technique such as ICP can be used. An existing technique may be used for the fitting process, and details thereof are different from the gist of the present invention, and thus detailed description thereof is omitted here. Even when fitting processing is performed as post-processing, the accuracy of the initial position / posture is very important. Therefore, the overall performance can be improved by calculating the position / posture by the method of the present invention.

実施例２では、部分領域クラスと姿勢クラスとの関係、及び代表位置ベクトルについて、実施例１と異なる実施形態について説明を行う。なお、構成上、実施例１と共通な部分についての説明は省略する。 In the second embodiment, an embodiment different from the first embodiment will be described regarding the relationship between the partial region class and the posture class and the representative position vector. Note that the description of the parts common to the first embodiment in the configuration is omitted.

実施例１では基本的に、１つの部分領域に対して１つの代表位置ベクトルが関連づけられている実施形態について説明を行った。部分領域に複数の代表位置ベクトルが関連づけられている場合についても簡単に触れたが、この場合、ある部分領域識別結果を選択すると、その部分領域に関連づけられた全ての代表位置ベクトルを代表位置算出に利用していた。実施例２では、部分領域に複数の代表位置ベクトルを関連づけ、さらに、それぞれの代表位置ベクトルについて姿勢クラスを関連づけておく。そして、推定した姿勢に基づいて、推定姿勢に関係する部分領域を選択し、同時に推定した姿勢に関連する代表位置ベクトルを選択する実施形態について説明する。 In the first embodiment, the embodiment in which one representative position vector is basically associated with one partial region has been described. The case where a plurality of representative position vectors are associated with a partial area is also briefly described. In this case, when a certain partial area identification result is selected, all representative position vectors associated with the partial area are calculated as representative positions. It was used for. In the second embodiment, a plurality of representative position vectors are associated with the partial area, and an attitude class is associated with each representative position vector. An embodiment in which a partial region related to the estimated posture is selected based on the estimated posture and a representative position vector related to the estimated posture is selected at the same time will be described.

図８は実施例２の学習と記録部１０７の情報を説明する図である。 FIG. 8 is a diagram for explaining learning and information in the recording unit 107 according to the second embodiment.

まず、図８の学習画像８０１〜８０ｍは、実施例１の学習画像と同様に、各学習画像に関節位置の情報が関連づけられた画像である。これらの学習画像から部分領域情報８１１と姿勢情報８２１を学習する。 First, learning images 801 to 80m in FIG. 8 are images in which information on joint positions is associated with each learning image, similarly to the learning images in the first embodiment. The partial area information 811 and the posture information 821 are learned from these learning images.

部分領域情報８１１の学習については、まず、部分領域識別器８１２の学習を行う。部分領域識別器８１３の学習は、実施例１と同様に、部分領域の画像特徴と識別関数の学習により部分領域識別器を作成する。 As for learning of the partial area information 811, first, learning of the partial area identifier 812 is performed. In the learning of the partial area classifier 813, as in the first embodiment, the partial area classifier is created by learning the image characteristics and the identification function of the partial area.

次に、部分領域ごとに、各姿勢クラスへの関連度合いを示すヒストグラム８１３を学習する。このヒストグラム８１３を以下では姿勢ヒストグラムと呼ぶ。姿勢ヒストグラム８１３は、ある部分領域が得られた時に、全体の姿勢がどの姿勢クラスとなるかの確からしさを示したものである。姿勢ヒストグラム８１３は、例えば、以下のようにして作成する。 Next, a histogram 813 indicating the degree of association with each posture class is learned for each partial region. This histogram 813 is hereinafter referred to as a posture histogram. The posture histogram 813 shows the probability of which posture class the overall posture is when a certain partial region is obtained. The posture histogram 813 is created as follows, for example.

まず、ここでは、学習画像８０１〜８０ｍは姿勢クラスｐ１〜ｐｍと一致しているとする。すなわち、全ての学習画像の姿勢は、そのまま推定する姿勢クラスであるとしている。姿勢ヒストグラム８１３を作成するためには、まず部分領域識別器を各学習画像に対して実行する。その結果、各学習画像（姿勢クラス）における部分領域の出現頻度が得られる。この出現頻度を、部分領域ごとに集計することで姿勢ヒストグラムを作成することができる。また、別な方法として、以下のような方法により、姿勢ヒストグラムを作成しても良い。実施例１で説明したように、部分領域識別器の作成時には、局所的な姿勢が似ている部分画像を学習画像から収集する。部分領域識別器を学習するための部分画像は、どの姿勢（学習画像）から得られているかが明らかなので、各部分領域識別器ついて学習に用いた部分画像が属する姿勢クラスのヒストグラムを作成して姿勢ヒストグラムとしても良い。 First, it is assumed here that the learning images 801 to 80m coincide with the posture classes p1 to pm. That is, the postures of all learning images are assumed to be posture classes that are estimated as they are. In order to create the posture histogram 813, first, a partial region classifier is executed for each learning image. As a result, the appearance frequency of the partial area in each learning image (posture class) is obtained. A posture histogram can be created by counting the appearance frequencies for each partial region. As another method, a posture histogram may be created by the following method. As described in the first embodiment, when creating a partial region classifier, partial images having similar local postures are collected from learning images. Since it is clear from which posture (learning image) the partial image for learning the partial region classifier is obtained, create a histogram of the posture class to which the partial image used for learning belongs for each partial region classifier It may be a posture histogram.

次に、実施例２における代表位置ベクトルの学習について説明する。代表位置ベクトルは、実施例１と同様に部分領域の識別結果の位置から、人体の腰関節位置を示す情報である。実施例２においては、部分領域に複数の代表位置ベクトルを関連づけ、さらに、それぞれの代表位置ベクトルについて姿勢クラスを関連づけておく。以下では、この具体例について説明を行う。 Next, learning of representative position vectors in the second embodiment will be described. The representative position vector is information indicating the position of the hip joint of the human body from the position of the partial region identification result as in the first embodiment. In the second embodiment, a plurality of representative position vectors are associated with the partial area, and an attitude class is associated with each representative position vector. Hereinafter, this specific example will be described.

図８には、各部分領域ｓ１〜ｓｎについて、代表位置ベクトル８１４が姿勢ｐ１〜ｐｍそれぞれについて格納されていることを示している（図８中のｖ_ｓ１ｐ１，ｖ_ｓ１ｐ２、・・・、ｖ_ｓｎｐ４）。例えば、代表位置ベクトルＶ_ｓ１ｐ２は、部分領域ｓ１が得られたときに、姿勢クラスｐ２についての代表位置を示すベクトルとなる。 FIG. 8 shows that representative position vectors 814 are stored for the postures p1 to pm for each of the partial areas s1 to sn (v _s1p1 , v _s1p2 ,..., V _{snp4 in} FIG. 8). ). For example, the representative position vector V _s1p2 is a vector indicating the representative position for the posture class p2 when the partial region s1 is obtained.

実施例２での代表位置ベクトルは、学習画像における部分領域識別結果の出現位置と、代表位置（腰関節位置）の相対位置関係（代表ベクトル）を、各姿勢（各学習画像）について集計することで算出する。例えば、まず、ある学習画像において、ある１つの部分領域識別器の識別を実施する。そして、その識別結果位置から、その学習画像の代表位置（腰関節位置）へのベクトルの平均を求めることにより、その学習画像における１つの部分領域の代表位置ベクトルを算出する。この処理を、ある学習画像おいて、全ての部分領域識別器に対して実施する。さらに、全ての学習画像（姿勢クラス）について、各部分領域の代表ベクトル算出を実施するする。この結果、各部分領域の代表位置ベクトルを、各姿勢クラスに関連づけた結果を取得することができる。 The representative position vector in the second embodiment is a total of the relative position relationship (representative vector) between the appearance position of the partial region identification result in the learning image and the representative position (waist joint position) for each posture (each learning image). Calculate with For example, first, a certain partial region classifier is identified in a certain learning image. Then, a representative position vector of one partial region in the learning image is calculated by obtaining an average of vectors from the identification result position to the representative position (lumbar joint position) of the learning image. This process is performed for all partial area classifiers in a certain learning image. Furthermore, representative vectors for each partial region are calculated for all learning images (posture classes). As a result, the result of associating the representative position vector of each partial area with each posture class can be acquired.

また、代表位置ベクトルの他の作成方法として、部分領域識別器を学習するための部分画像を元に作成する方法もある。実施例１で、部分領域識別器の学習時に、元の全ての学習画像から似ている部分領域画像を収集すること（似た関節位置関係の部分領域画像を収集するなどのクラスタリング）を説明した。これらの各部分領域画像は、元々どの学習画像（姿勢クラス）の、どの位置から切り出した部分画像か既知であり、同時に、似ている姿勢にクラスタリングされているため、どの部分領域クラスに属しているかも既知である。従って、各部分領域クラスについて、学習に利用した部分画像の切り出し元の学習画像（姿勢クラス）と位置を元に、腰関節位置へのベクトルを集計することで、ある部分領域のある姿勢クラス向けの代表位置ベクトルを算出する。 As another method of creating the representative position vector, there is a method of creating based on a partial image for learning the partial region classifier. In the first embodiment, at the time of learning of a partial area classifier, collecting similar partial area images from all original learning images (clustering such as collecting partial area images having similar joint position relationships) has been described. . Each of these partial area images is originally known from which position of which learning image (posture class) is cut out, and at the same time, it is clustered in a similar posture, so it belongs to which partial area class It is also known. Therefore, for each partial region class, for a posture class with a certain partial region, the vectors to the hip joint positions are aggregated based on the learning image (posture class) from which the partial image used for learning is extracted and the position. The representative position vector is calculated.

また、部分領域が出現しない姿勢に関する代表位置ベクトルは作成しないようにする。例えば、図８の部分領域ｓｎでは、姿勢ｐ１、ｐ３に部分領域ｓｎが出現しないことが姿勢ヒストグラムｈ_ｓｎから分かる。この場合、部分領域ｓｎについて、姿勢ｐ１、ｐ３に関係する代表位置ベクトルは作成しない。 Also, a representative position vector relating to a posture in which the partial area does not appear is not created. For example, in the partial region sn in FIG. 8, it can be seen from the posture histogram h _sn that the partial region sn does not appear in the postures p1 and p3. In this case, a representative position vector related to the postures p1 and p3 is not created for the partial region sn.

姿勢情報８２１は、学習画像の関節位置関係８２２を、各姿勢クラスの番号ｐ１〜ｐｍと関連づけて格納する。これは実施例１と同様に、情報処理装置１００が出力する人体姿勢（関節位置関係）となる。 The posture information 821 stores the joint position relationship 822 of the learning image in association with the numbers p1 to pm of each posture class. This is the human body posture (joint position relationship) output by the information processing apparatus 100 as in the first embodiment.

以上の情報を記録部１０７に格納することにより、実施例２の学習が完了する。 By storing the above information in the recording unit 107, the learning of the second embodiment is completed.

図９は実施例２における部分領域識別部１０２と姿勢推定部１０３の処理を説明する図であり、記録部１０７に格納された情報を用いて入力画像における人体の姿勢を推定する処理について説明する図である。 FIG. 9 is a diagram for explaining the processing of the partial region identification unit 102 and the posture estimation unit 103 in the second embodiment. The processing for estimating the posture of the human body in the input image using the information stored in the recording unit 107 will be described. FIG.

まず、実施例１と同様に、記録部１０７の部分領域情報８１１を用いて、入力画像の人体領域９０１に対して部分領域の識別を行い、部分領域識別結果９０２を得る。実施例２における部分領域には、その部分領域が識別されたときの、姿勢クラスの出現頻度が姿勢ヒストグラムとして格納されている。人体領域９０１の部分領域識別結果の姿勢ヒストグラムを集計することで、人体領域全体のヒストグラム９０５（ｈ_{ｉｎｐｕｔ}）を作成する。姿勢ヒストグラム集計は、人体領域全体のヒストグラム９０５は、その人体領域が示す姿勢の確率分布となっていることから、尤も高い可能性を示す姿勢クラスのビンを選択することで、入力画像の人体の姿勢を推定することができる。図９では、姿勢クラスｐ１のビンが尤も高い値を示していることから、入力画像人体の姿勢をｐ１と決定する。この結果、姿勢推定部は、記録部１０７に記録された姿勢情報８２１から、姿勢ｐ１の関節位置関係を参照し、推定姿勢結果９０３として出力する。なお、出力する姿勢情報は、実施例１と同様に、各姿勢の学習カメラ座標系における各関節位置の座標情報９０４である。 First, as in the first embodiment, the partial area is identified for the human body area 901 of the input image using the partial area information 811 of the recording unit 107, and a partial area identification result 902 is obtained. In the partial area in the second embodiment, the appearance frequency of the posture class when the partial area is identified is stored as a posture histogram. A histogram 905 (h _input ) of the entire human body region is created by adding up the posture histograms of the partial region identification results of the human body region 901. Since the histogram 905 of the whole human body region is a probability distribution of the posture indicated by the human body region, the posture histogram total is selected by selecting a bin of a posture class that indicates the most likely possibility. Posture can be estimated. In FIG. 9, since the bin of the posture class p1 shows the highest value, the posture of the input image human body is determined as p1. As a result, the posture estimation unit refers to the joint position relationship of the posture p1 from the posture information 821 recorded in the recording unit 107, and outputs it as the estimated posture result 903. The posture information to be output is the coordinate information 904 of each joint position in the learning camera coordinate system of each posture, as in the first embodiment.

次に、実施例２における部分領域選択部１０４の処理について、図１０のフローチャートを用いて説明する。まず、ステップＳ１００１では、人体領域の部分領域識別結果を１つ選択し、姿勢推定部１０３で推定した姿勢に関係しているかを照合する。照合は、部分領域に関連づけられた姿勢クラスの情報を元に行い、例えば、以下のようにして実施する。 Next, the processing of the partial region selection unit 104 in the second embodiment will be described with reference to the flowchart of FIG. First, in step S <b> 1001, one partial region identification result of the human body region is selected, and it is verified whether it is related to the posture estimated by the posture estimation unit 103. The collation is performed based on the posture class information associated with the partial area, and is performed as follows, for example.

図８において、部分領域の情報として、各部分領域が識別された場合に、どの姿勢クラスに関連するかを示すヒストグラムｈ_ｓ１〜ｈ_ｓｎが記録部１０７に格納されていることを説明した。ステップＳ１００１の照合処理では、選択した部分領域の姿勢クラスヒストグラムを参照し、選択した部分領域と推定姿勢の関係を判定する。例えば、姿勢ヒストグラムの推定姿勢のビンが、所定の出現頻度を満たしている場合には、推定姿勢と選択した部分領域が関係している、とする。次のステップＳ１００２では、部分領域識別結果が、推定姿勢に関係しているか否かで、ステップＳ１００３とステップＳ１００４に分岐する。例えば、部分領域識別結果が部分領域クラスｓ１を示した場合、図８の部分領域ｓ１の姿勢クラスヒストグラムｈ_ｓ１を参照する。そして、推定姿勢クラスがｐ１、ｐ２、ｐ３のいずれかであれば、推定姿勢と部分領域識別結果が関係するとして、ステップＳ１００３に進む。一方、推定姿勢クラスがｐ４であれば、部分領域ｓ１は、姿勢ｐ４では出現しないことが学習画像から学習されているので、推定姿勢と部分領域識別結果がしないとして、ステップＳ１００４に進む。 In FIG. 8, it has been described that histograms h _{s1 to} h _sn indicating which posture class is associated with each partial region when the partial region is identified are stored in the recording unit 107 as the partial region information. In the collation processing in step S1001, the posture class histogram of the selected partial region is referred to and the relationship between the selected partial region and the estimated posture is determined. For example, when the estimated posture bin of the posture histogram satisfies a predetermined appearance frequency, it is assumed that the estimated posture is related to the selected partial region. In the next step S1002, the process branches to step S1003 and step S1004 depending on whether or not the partial region identification result is related to the estimated posture. For example, when the partial region identification result indicates the partial region class s1, the posture class histogram h _s1 of the partial region s1 in FIG. 8 is referred to. If the estimated posture class is any one of p1, p2, and p3, the process proceeds to step S1003 on the assumption that the estimated posture and the partial region identification result are related. On the other hand, if the estimated posture class is p4, since it is learned from the learning image that the partial region s1 does not appear in the posture p4, the process proceeds to step S1004 on the assumption that there is no estimated posture and partial region identification result.

推定姿勢に部分領域識別結果が関係しないと判定した場合、ステップＳ１００４では、その部分領域識別結果を破棄する。一方、推定姿勢と部分領域識別結果が関係すると判定した場合、ステップＳ１００３では、その部分領域識別結果を代表位置算出に用いる部分領域として選択し、次のステップＳ１００５に進む。ステップＳ１００５では、部分領域に関連づけられた代表位置ベクトルについて、推定姿勢に関係する代表位置ベクトルを保存する。例えば、図８の部分領域ｓ１について、姿勢推定部で推定した姿勢がｐ１となった場合には、姿勢ｐ１の代表位置ベクトルｖ_ｓ１ｐ１を保存する。 If it is determined that the partial region identification result is not related to the estimated posture, the partial region identification result is discarded in step S1004. On the other hand, if it is determined that the estimated posture and the partial region identification result are related, in step S1003, the partial region identification result is selected as a partial region used for representative position calculation, and the process proceeds to the next step S1005. In step S1005, the representative position vector related to the estimated posture is stored for the representative position vector associated with the partial region. For example, for the partial region s1 in FIG. 8, when the posture estimated by the posture estimation unit is p1, the representative position vector v _s1p1 of the posture p1 is stored.

次のステップＳ１００６では、人体領域の部分領域識別結果の全てをチェックしたかを確認し、完了していなければ、次の部分領域識別結果についてステップＳ１００１〜ステップＳ１００５の処理を行い、部分領域を採択するか、破棄するかのチェックを実施する。 In the next step S1006, it is confirmed whether all the partial region identification results of the human body region have been checked. If not completed, the processing of steps S1001 to S1005 is performed on the next partial region identification result, and the partial region is adopted. Check whether to discard or discard.

以上の部分領域選択部の説明では、本発明の趣旨を説明するために、処理を逐一行うフローについて説明したが、実際には処理を簡略化して実施しても良い。例えば、上記では姿勢ヒストグラムを用いて、推定姿勢と部分領域識別結果の関係を照合した。しかし、各部分領域がどの姿勢に関する代表位置ベクトルを所有しているかは明らかである。従って、姿勢ヒストグラムを用いずに、部分領域から推定姿勢に関係ある代表位置ベクトルを参照・保存することにしてもよい。 In the above description of the partial area selection unit, the flow of performing the processing step by step has been described in order to explain the gist of the present invention, but in practice the processing may be simplified. For example, in the above description, the relationship between the estimated posture and the partial region identification result is collated using the posture histogram. However, it is clear which posture has a representative position vector for each partial area. Therefore, the representative position vector related to the estimated posture may be referred to and stored from the partial region without using the posture histogram.

以上により、部分領域識別結果からの代表位置ベクトルの収集が完了する。これ以降の処理は、実施例１の代表位置算出部１０５及び位置姿勢決定部１０６と同様の処理となるので、説明は省略する。 This completes the collection of representative position vectors from the partial region identification results. The subsequent processing is the same as that of the representative position calculation unit 105 and the position / orientation determination unit 106 of the first embodiment, and thus description thereof is omitted.

実施例１では、姿勢推定部で推定した１つの姿勢から、代表位置を求める部分領域を選択した。実施例３では、姿勢推定部で推定した複数の姿勢から、部分領域を選択する実施形態について説明する。
部分領域の識別結果の母数が少ない場合など、姿勢に応じた部分領域選択を行った結果、部分領域の数が少なくなる場合がある。部分領域の数が極端に少ないと、代表位置を求める時に利用する代表位置ベクトルの数が減少し、代表位置が正しく求められない可能性がある。このような場合、複数の姿勢から部分領域を選択することにより、代表位置を求める代表ベクトルの数が増加させることができる。 In the first embodiment, the partial region for which the representative position is obtained is selected from one posture estimated by the posture estimation unit. In Example 3, an embodiment in which a partial region is selected from a plurality of postures estimated by the posture estimation unit will be described.
There are cases where the number of partial areas is reduced as a result of performing partial area selection according to the posture, for example, when the number of parameters of the partial area identification result is small. If the number of partial areas is extremely small, the number of representative position vectors used when obtaining the representative position decreases, and the representative position may not be obtained correctly. In such a case, the number of representative vectors for obtaining the representative position can be increased by selecting a partial region from a plurality of postures.

実施例１の姿勢推定部は、入力画像の部分領域ヒストグラムと記録部の部分領域ヒストグラムの類似度を算出し、最も類似度の高い姿勢を１つ選択した。実施例３では、ここで、類似度の高い複数の姿勢を選択する。類似度の高い姿勢を複数選択する方法は、予め決めた所定数だけ類似度の高い方から姿勢を選択する方法がある。また、予め決めた所定閾値以上の類似度を示す姿勢を選択しても良い。また、代表位置を所定の精度で求めるために必要な部分領域数を確保するために、最低限必要な部分領域数を定めておき、その部分領域数を満たす数だけの複数姿勢を選択するようにしても良い。 The posture estimation unit according to the first embodiment calculates the similarity between the partial region histogram of the input image and the partial region histogram of the recording unit, and selects one posture having the highest similarity. In the third embodiment, a plurality of postures with high similarity are selected here. As a method for selecting a plurality of postures having a high degree of similarity, there is a method for selecting postures having a higher degree of similarity by a predetermined number. Alternatively, an attitude that indicates a degree of similarity equal to or greater than a predetermined threshold value may be selected. In addition, in order to secure the number of partial areas necessary for obtaining the representative position with a predetermined accuracy, a minimum number of partial areas is determined, and a plurality of postures are selected so as to satisfy the number of partial areas. Anyway.

複数の姿勢を選択する場合に、大きく異なる姿勢を複数選択すると、それらに含まれる部分領域及び部分領域に関連づけられた代表位置ベクトルも大きく異なり、結果として代表位置の算出精度が低下してしまう。従って、複数の姿勢は、似た姿勢であることが望ましい。 When a plurality of different postures are selected in selecting a plurality of postures, the partial position included in them and the representative position vectors associated with the partial regions are also greatly different, resulting in a decrease in representative position calculation accuracy. Therefore, it is desirable that the plurality of postures are similar postures.

そこで、複数選択する姿勢間の類似度も考慮して、複数の姿勢を選択するようにしても良い。以下では、姿勢間の類似度は、前記のヒストグラム類似度との混乱を避けるため、姿勢間距離と呼ぶ。複数姿勢間の姿勢間距離を考慮する方法は、例えば、以下のように行う。まず、上記のように、ヒストグラム類似度を元に複数の姿勢を選択する。ここで、最終的に出力する姿勢は、入力画像の部分領域ヒストグラムと記録部の部分領域ヒストグラムの類似度が、最も高い類似度の姿勢とする。以下、この姿勢を出力姿勢と呼ぶ。次に、類似度を元に選択した出力姿勢以外の複数の姿勢について、出力姿勢との姿勢間距離を算出する。姿勢間距離は、それぞれの姿勢に関連づけられて保存されている関節位置関係を元に算出すればよく、例えば以下の様に実施できる。まず、それぞれの関節位置関係を、それぞれの腰関節位置を原点とした座標系に変換する。そして、同じ関節（例えば、右手首同士、左足首同士など）の距離差の２乗和を元に姿勢間距離を算出する。以上の様な算出方法で、出力姿勢と他の姿勢の姿勢間距離を算出する。姿勢間距離が所定の閾値以下で、出力姿勢と似ていると判断された姿勢を出力姿勢とともに部分領域選択に利用する姿勢とする。 Therefore, a plurality of postures may be selected in consideration of the similarity between the postures to be selected. Hereinafter, the similarity between postures is referred to as an inter-posture distance in order to avoid confusion with the histogram similarity. For example, a method for considering the distance between postures is performed as follows. First, as described above, a plurality of postures are selected based on the histogram similarity. Here, the posture to be finally output is the posture having the highest similarity between the partial region histogram of the input image and the partial region histogram of the recording unit. Hereinafter, this posture is referred to as an output posture. Next, for a plurality of postures other than the output posture selected based on the degree of similarity, a distance between postures with the output posture is calculated. What is necessary is just to calculate the distance between attitude | positions based on the joint positional relationship preserve | saved linked | related with each attitude | position, for example, it can implement as follows. First, each joint position relationship is converted into a coordinate system with each waist joint position as the origin. Then, the distance between postures is calculated based on the sum of squares of the distance difference between the same joints (for example, between right wrists and between left ankles). The distance between postures of the output posture and other postures is calculated by the calculation method as described above. The posture determined to be similar to the output posture with the distance between postures is equal to or smaller than a predetermined threshold is set as a posture to be used for partial region selection together with the output posture.

以上の処理により、部分領域を選択する複数の姿勢が決定した。部分領域の選択以降の処理は、部分領域を選択する姿勢の基準が複数になること以外は、実施例１と同様に実施すればよい。 With the above processing, a plurality of postures for selecting the partial region are determined. The processing after the selection of the partial area may be performed in the same manner as in the first embodiment except that there are a plurality of posture references for selecting the partial area.

また、実施例３の変形として、出力姿勢を複数備える構成にしても良い。出力姿勢を複数備えることにより、部分領域選択部及び代表位置算出部では、各出力姿勢について代表位置を算出する。そして、位置姿勢推定部の出力は、複数の位置・姿勢となる。この構成は、本発明の情報処理装置が出力する位置・姿勢を一つに決定する必要がなく、複数の可能性を出力すれば良いときに好適な構成である。例えば、情報処理装置の出力を元に、フィッティング処理を行う場合、複数の位置・姿勢を初期値としてフィッティング処理を行い、最も信頼できる結果を最終結果としても良い。 Further, as a modification of the third embodiment, a configuration including a plurality of output postures may be employed. By providing a plurality of output postures, the partial region selection unit and the representative position calculation unit calculate a representative position for each output posture. The output of the position / orientation estimation unit is a plurality of positions / orientations. This configuration is suitable when there is no need to determine a single position / posture output by the information processing apparatus of the present invention and it is sufficient to output a plurality of possibilities. For example, when performing the fitting process based on the output of the information processing apparatus, the fitting process may be performed with a plurality of positions and orientations as initial values, and the most reliable result may be used as the final result.

＜実施例の効果＞
以上、実施例１〜３により、本発明の実施形態の例を説明した。 <Effect of Example>
In the above, the example of embodiment of this invention was demonstrated by Examples 1-3.

実施例１では、部分領域識別結果を推定した姿勢に基づいて選択し、選択した部分領域に関連づけられた代表位置を示す情報（代表位置ベクトル）を用いて、代表位置を求めることで、精度良く代表位置を求めることができる。その結果、最終的な出力の対象物の位置・姿勢を精度良く求めることができる。 In the first embodiment, the partial area identification result is selected based on the estimated posture, and the representative position is obtained using information (representative position vector) indicating the representative position associated with the selected partial area with high accuracy. The representative position can be obtained. As a result, the position and orientation of the final output object can be obtained with high accuracy.

実施例２では、実施例１に比べて、さらに姿勢に限定的な代表位置ベクトルを利用することにより、精度良く代表位置を算出することができる。 In the second embodiment, compared to the first embodiment, the representative position can be calculated with higher accuracy by using a representative position vector limited to the posture.

実施例３では、姿勢推定部から複数の推定姿勢を出力し、複数の姿勢による部分領域選択を行うことで、代表位置を求める情報量を増加させることができる。 In the third embodiment, the amount of information for obtaining the representative position can be increased by outputting a plurality of estimated postures from the posture estimation unit and selecting a partial region based on the plurality of postures.

＜定義＞
以上、実施例では、対象物を人体とした場合について説明したが、本発明の対象物は人体に限定されることはない。また、人体の関節角度が異なる状態と、カメラに対する向きの２つを合わせて姿勢と呼んだが、関節を持たない剛体を対象物とする場合には、カメラに対する向きのみを姿勢として取り扱えば良い。 <Definition>
As mentioned above, although the Example demonstrated the case where a target object was a human body, the target object of this invention is not limited to a human body. In addition, the state in which the joint angles of the human body are different and the orientation with respect to the camera are referred to as a posture. However, when a rigid body having no joint is used as an object, only the orientation with respect to the camera may be handled as the posture.

また、本発明における部分領域識別部は、対象物の様々な姿勢の部分を識別する部分領域識別器を含むとともに、姿勢と代表位置が関連づけられていれば、どのようなものであっても良い。また、その情報の保有方法も特に限定する物ではない。一例として、実施例１では、各部分領域識別器に代表位置方向を示す代表位置ベクトルを関連づけた形態を説明した。また、姿勢ごとに部分領域ヒストグラムを保持することで、部分領域識別器と姿勢を関連づけた。部分領域選択部は、これらの部分領域と姿勢の関係を元に、対象物領域の部分領域識別結果を姿勢推定部で推定した姿勢に基づいて選択する。 In addition, the partial region identification unit in the present invention includes a partial region identifier that identifies portions of various postures of the target object, and may be anything as long as the posture and the representative position are associated with each other. . Further, the information holding method is not particularly limited. As an example, in the first embodiment, the configuration in which the representative position vector indicating the representative position direction is associated with each partial area classifier has been described. In addition, the partial region classifier is associated with the posture by holding a partial region histogram for each posture. The partial region selection unit selects a partial region identification result of the target region based on the posture estimated by the posture estimation unit based on the relationship between the partial region and the posture.

また、発明における対象物の代表位置は、対象物の姿勢によらず一意に決まる位置であればどこでも良い。代表位置は、複数の代表位置ベクトルから求めるため、対象物の中心付近の位置を採用することが好ましく、実施例では、人体の腰関節位置とした。 In addition, the representative position of the object in the invention may be any position as long as it is uniquely determined regardless of the posture of the object. Since the representative position is obtained from a plurality of representative position vectors, it is preferable to adopt a position near the center of the object. In the embodiment, the representative position is the hip joint position of the human body.

１００情報処理装置、１０２部分領域識別部、１０３姿勢推定部、
１０４部分領域選択部、１０５代表位置算出部、１０６位置姿勢決定部、
１０７記録部、３０１〜３０ｍ，８０１〜８０ｍ学習画像、
３１２，８１２部分領域識別器、３２２，８２２姿勢（関節の相対的位置関係）、
５０２，５０３部分領域識別結果、
５０４代表位置ベクトル（代表位置を示す情報）、
５１０，７１０代表位置（腰関節位置） 100 information processing apparatus, 102 partial region identification unit, 103 posture estimation unit,
104 partial region selection unit, 105 representative position calculation unit, 106 position and orientation determination unit,
107 recording unit, 301 to 30 m, 801 to 80 m learning image,
312,812 Partial region identifier, 322,822 posture (relative positional relationship of joints),
502,503 Partial region identification result,
504 representative position vector (information indicating the representative position),
510,710 Representative position (lumbar joint position)

Claims

A partial area identifying unit for identifying different partial areas of the object;
A posture estimation unit that integrates the identification results of the partial areas and estimates the posture of the object;
A partial region selection unit that selects an identification result of the partial region based on the estimated posture;
A representative position calculation unit that calculates a representative position of the target object from information indicating the representative position associated with the selected partial region;
An information processing apparatus comprising: a position / orientation determination unit that calculates a position / orientation of the object from the representative position and the estimated attitude.

The information processing apparatus according to claim 1, wherein the posture estimation method of the posture estimation unit estimates a posture based on an appearance frequency of a partial region identification result in the region of the target object.

The partial region selection unit determines a partial region selection criterion based on the posture estimated by the posture estimation unit, and selects an identification result of the partial region based on the partial region selection criterion. Item 4. The information processing apparatus according to Item 1.

The partial area identification unit is associated with the probability of the object posture when the partial area is identified, and is associated with information indicating the representative position of the object posture. The information processing apparatus according to claim 1, wherein information indicating an identification result and a representative position of the partial area is selected based on the posture estimated by the posture estimation unit.

The posture estimation unit outputs a plurality of highly likely postures as posture estimation results, and the partial region selection unit selects a partial region identification result based on the plurality of postures. The information processing apparatus according to 1.

The information processing apparatus according to claim 5, wherein the plurality of postures output by the posture estimation unit are postures determined to be similar postures based on a predetermined reference indicating similarity between postures.

The object is a human body, and the posture is described by a relative positional relationship of joint positions of the human body, and the position / posture determination unit translates the joint position while maintaining the relative positional relationship of the joint position. The information processing apparatus according to claim 1, wherein the information processing apparatus matches the coordinates of the representative position by rotating.

A partial area identification step for identifying different partial areas of the object;
A posture estimation step of integrating the identification results of the partial areas and estimating the posture of the object;
A partial region selection step of selecting an identification result of the partial region based on the estimated posture;
A representative position calculating step of calculating a representative position of the object from information indicating the representative position associated with the selected partial region;
An information processing method comprising a position and orientation determination step of calculating a position and orientation of the object from the representative position and the estimated orientation.