JP5767078B2

JP5767078B2 - Posture estimation apparatus, posture estimation method, and posture estimation program

Info

Publication number: JP5767078B2
Application number: JP2011233986A
Authority: JP
Inventors: 鮎美松本; 小軍ウ; 春美川村; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-10-25
Filing date: 2011-10-25
Publication date: 2015-08-19
Anticipated expiration: 2031-10-25
Also published as: JP2013092876A

Description

本発明は、カメラで撮影された画像データから人物の姿勢推定を行う姿勢推定装置、姿勢推定方法及び姿勢推定プログラムに関する。 The present invention relates to a posture estimation apparatus, a posture estimation method, and a posture estimation program that perform posture estimation of a person from image data captured by a camera.

近年、モーションキャプチャシステムや距離画像を用いずに、画像データから人体の姿勢推定・運動解析を行う研究が広く行われている（例えば、非特許文献１参照）。これらの研究は、映像監視やヒューマンインタラクション、ロボットの運動制御、ＣＧアニメーションの製作、医療分野等、幅広い分野への応用が期待されている。特に、３次元的な姿勢を推定できれば、応用の幅はさらに広がると考えられる。 In recent years, research that performs posture estimation / motion analysis of a human body from image data without using a motion capture system or a distance image has been widely performed (for example, see Non-Patent Document 1). These studies are expected to be applied to a wide range of fields such as video surveillance, human interaction, robot motion control, CG animation production, and medical fields. In particular, if a three-dimensional posture can be estimated, the range of applications will be further expanded.

しかし、画像データに基づき人体の３次元的な仮想空間における姿勢を推定する場合、任意の時間において、人体の関節の３次元回転角を推定することが困難である。特に、単眼カメラにより撮影された画像データからの３次元の姿勢を推定する場合、画像データにおいては２次元で人体が表現されているため、３次元で表現するための情報が欠如している。これにより、姿勢の多義性に対応できないことや、手足や障害物による遮蔽に対応できないことなどが問題となる。 However, when estimating the posture of the human body in the three-dimensional virtual space based on the image data, it is difficult to estimate the three-dimensional rotation angle of the joint of the human body at an arbitrary time. In particular, when estimating a three-dimensional posture from image data captured by a monocular camera, since the human body is expressed in two dimensions in the image data, information for expressing in three dimensions is lacking. As a result, problems such as inability to deal with the ambiguity of posture and inability to deal with shielding by limbs and obstacles arise.

これらの問題に対処するため、推定する際に、人体の動きに関する事前知識を用いる手法が注目されている。この事前知識とは、２次元の画像データに基づき、３次元で表現するために欠如している情報を補充するための情報である。この補充するための情報である２次元の画像データは、３次元の姿勢を推定するための２次元画像上の関節位置座標で構成する。関節位置座標は、対象となる場所の画像特徴量をフレームを跨って追跡することで得られる。事前知識として、例えば、非特許文献２のように光学モーションキャプチャシステムにより計測された高精度な３次元動作データから低次元特徴を取り出した動作モデルを用いるものが知られている。この動作モデルを事前知識として、２次元画像上の関節位置から３次元の姿勢を推定することが可能となる。 In order to cope with these problems, attention has been paid to a technique using prior knowledge about the movement of the human body when estimating. This prior knowledge is information for supplementing information that is lacking in order to express in three dimensions based on two-dimensional image data. The two-dimensional image data which is information for supplementation is composed of joint position coordinates on the two-dimensional image for estimating a three-dimensional posture. The joint position coordinates are obtained by tracking the image feature amount of the target location across the frames. As prior knowledge, for example, as in Non-Patent Document 2, one using a motion model obtained by extracting low-dimensional features from highly accurate three-dimensional motion data measured by an optical motion capture system is known. Using this motion model as prior knowledge, it is possible to estimate a three-dimensional posture from a joint position on a two-dimensional image.

島田, 伸敬, 有田, 大作, 玉木, 徹. "関節物体のモデルフィッティング" 情報処理学会研究報告CVIM Vol.154. page.375-392. 2006.Shimada, Nobutaka, Arita, Daisaku, Tamaki, Toru. "Model fitting of joint objects" IPSJ SIG CVIM Vol.154. Page.375-392. 2006. R. Urtasun, D. J. Fleet, and P. Fua. "3d people tracking with gaussian process dynam-ical models." CVPR, 2006.R. Urtasun, D. J. Fleet, and P. Fua. "3d people tracking with gaussian process dynam-ical models." CVPR, 2006.

しかしながら、動作モデルを利用して、２次元の関節位置から３次元の姿勢を補充する場合、２次元の関節位置の情報の不確かさが増すほど、３次元の姿勢の不確かさも増してしまう。一般に、２次元の関節位置は画像上で取得可能な色やエッジの情報を基に推定するため、安定した推定が困難であるという問題がある。このような問題は、人体及び関節に限られた問題ではなく、姿勢が変化する物（例えば、人体やロボット）とその物が持つ追跡対象（例えば、関節や可動部）に関して一般的な問題である。 However, when using a motion model to supplement a three-dimensional posture from a two-dimensional joint position, the uncertainty of the three-dimensional posture increases as the uncertainty of the information on the two-dimensional joint position increases. In general, since a two-dimensional joint position is estimated based on information on colors and edges that can be acquired on an image, there is a problem that stable estimation is difficult. Such a problem is not limited to the human body and joints, but is a general problem regarding an object whose posture changes (for example, a human body or a robot) and a tracking target (for example, a joint or a movable part) of the object. is there.

本発明は、このような事情に鑑みてなされたもので、２次元の追跡対象の安定した推定を行い、それに伴って３次元の姿勢推定の精度の向上を実現することができる姿勢推定装置、姿勢推定方法及び姿勢推定プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and performs a stable estimation of a two-dimensional tracking target, and accordingly, an improvement in the accuracy of the three-dimensional posture estimation, An object is to provide a posture estimation method and a posture estimation program.

本発明は、姿勢推定対象が撮影された画像を入力する画像入力手段と、前記姿勢推定対象が持つ追跡対象の特徴量がテンプレートデータとして記憶されたテンプレート記憶手段と、前記追跡対象の３次元動作モデルデータが記憶された３次元動作モデル記憶手段と、前記３次元動作モデルデータを参照して、前記追跡対象について、初期の位置候補を複数設定する位置候補初期設定手段と、前記画像入力手段によって入力した前記画像から、前記追跡対象の位置候補における特徴量を算出し、該特徴量を前記テンプレート記憶手段に記憶された前記テンプレートデータと比較した結果得られる類似度の重みに基づき、前記追跡対象の位置を推定し、前記追跡対象の位置の推定情報を出力する位置推定手段と、前記類似度の重みと、前記３次元動作モデルデータに基づき、前記追跡対象の前記位置候補を改めて設定する位置候補設定手段と、前記位置推定手段による前記追跡対象の位置の推定と、前記位置候補設定手段による前記追跡対象の前記位置候補の設定とを複数回数繰り返すことにより、前記追跡対象の位置を追跡する追跡処理手段と、前記追跡対象の位置の推定情報と、前記３次元動作モデルデータを参照して、前記姿勢推定対象の３次元姿勢を推定する姿勢推定手段とを備えたことを特徴とする。 The present invention includes an image input unit that inputs an image of a posture estimation target, a template storage unit that stores a tracking target feature amount of the posture estimation target as template data, and a three-dimensional operation of the tracking target. By means of a 3D motion model storage means storing model data, a position candidate initial setting means for setting a plurality of initial position candidates for the tracking target with reference to the 3D motion model data, and the image input means Based on the weight of similarity obtained as a result of calculating the feature amount in the position candidate of the tracking target from the input image and comparing the feature amount with the template data stored in the template storage unit, the tracking target Position estimation means for estimating the position of the tracking target and outputting estimated information of the position of the tracking target, the weight of similarity, and the three-dimensional Position candidate setting means for newly setting the position candidate of the tracking target based on the cropping model data; estimation of the position of the tracking target by the position estimating means; and the position candidate of the tracking target by the position candidate setting means The tracking processing means for tracking the position of the tracking target, the estimation information of the position of the tracking target, and the three-dimensional motion model data are referred to by repeating the setting of the plurality of times. And a posture estimation means for estimating a three-dimensional posture.

本発明は、前記姿勢推定対象が人体であり、前記追跡対象が関節であることを特徴とする。 The present invention is characterized in that the posture estimation target is a human body and the tracking target is a joint.

本発明は、姿勢推定対象が撮影された画像を入力する画像入力手段と、前記姿勢推定対象が持つ追跡対象の特徴量がテンプレートデータとして記憶されたテンプレート記憶手段と、前記追跡対象の３次元動作モデルデータが記憶された３次元動作モデル記憶手段とを備える姿勢推定装置における姿勢推定方法であって、前記３次元動作モデルデータを参照して、前記追跡対象について、初期の位置候補を複数設定する位置候補初期設定ステップと、前記画像入力手段によって入力した前記画像から、前記追跡対象の位置候補における特徴量を算出し、該特徴量を前記テンプレート記憶手段に記憶された前記テンプレートデータと比較した結果得られる類似度の重みに基づき、前記追跡対象の位置を推定し、前記追跡対象の位置の推定情報を出力する位置推定ステップと、前記類似度の重みと、前記３次元動作モデルデータに基づき、前記追跡対象の前記位置候補を改めて設定する位置候補設定ステップと、前記位置推定ステップによる前記追跡対象の位置の推定と、前記位置候補設定ステップによる前記追跡対象の前記位置候補の設定とを複数回数繰り返すことにより、前記追跡対象の位置を追跡する追跡処理ステップと、前記追跡対象の位置の推定情報と、前記３次元動作モデルデータを参照して、前記姿勢推定対象の３次元姿勢を推定する姿勢推定ステップとを有することを特徴とする。 The present invention includes an image input unit that inputs an image of a posture estimation target, a template storage unit that stores a tracking target feature amount of the posture estimation target as template data, and a three-dimensional operation of the tracking target. A posture estimation method in a posture estimation apparatus including a three-dimensional motion model storage unit storing model data, wherein a plurality of initial position candidates are set for the tracking target with reference to the three-dimensional motion model data. A result of comparing the feature amount with the template data stored in the template storage unit by calculating the feature amount in the position candidate to be tracked from the position input initial setting step and the image input by the image input unit; Based on the obtained similarity weight, the position of the tracking target is estimated, and estimation information of the tracking target position is output. A position candidate setting step for setting again the position candidate of the tracking target based on the weight of the similarity and the three-dimensional motion model data, and the position of the tracking target by the position estimation step The tracking processing step of tracking the position of the tracking target by repeating the estimation and the setting of the position candidate of the tracking target by the position candidate setting step a plurality of times, the estimation information of the position of the tracking target, And a posture estimation step of estimating a three-dimensional posture of the posture estimation target with reference to three-dimensional motion model data.

本発明は、前記姿勢推定方法をコンピュータに実行させることを特徴とする。 The present invention is characterized by causing a computer to execute the posture estimation method.

本発明によれば、追跡対象（関節位置）を安定して推定できることに加え、追跡対象の位置推定と同時に姿勢推定対象の３次元姿勢を推定することが可能となるため、３次元の姿勢推定の精度の向上を実現することができるという効果が得られる。 According to the present invention, in addition to being able to stably estimate the tracking target (joint position), it is possible to estimate the three-dimensional posture of the posture estimation target simultaneously with the position estimation of the tracking target. The effect that the improvement of the accuracy can be realized is obtained.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す２次元関節追跡部１０３の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the two-dimensional joint tracking part 103 shown in FIG. 本発明の一実施形態の変形例の構成を示すブロック図である。It is a block diagram which shows the structure of the modification of one Embodiment of this invention. 図３に示す２次元関節追跡部１０９の動作を示すフローチャートである。4 is a flowchart illustrating an operation of a two-dimensional joint tracking unit 109 illustrated in FIG. 3. 物体追跡手法の一般的な概念を示す説明図である。It is explanatory drawing which shows the general concept of an object tracking method.

以下、図面を参照して、本発明の一実施形態による姿勢推定装置を説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１０１は、姿勢を解析する対象となる映像データが記憶された解析映像記憶部である。映像データの一例として、未校正の１台または複数台のカメラによって撮像された映像データから構成される。符号１０２は、解析映像記憶部１０１に記憶された映像データを入力する画像入力部であり、入力した映像データを推定処理を行う形式へ変換する。この変換は例えば、色相変換やフレーム補間等である。符号１０３は、画像入力部１０２により入力した映像データから画像特徴量などを用いて人体の関節位置の座標を抽出し、解析対象の全フレームに対して追跡処理を行う２次元関節追跡部である。符号１０４は、２次元関節追跡部１０３によって追跡した人体の２次元関節位置座標から、３次元の姿勢を推定する３次元姿勢推定部である。 Hereinafter, an attitude estimation apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 101 denotes an analysis video storage unit in which video data to be analyzed for posture is stored. As an example of the video data, the video data is composed of video data captured by one or more uncalibrated cameras. Reference numeral 102 denotes an image input unit that inputs video data stored in the analysis video storage unit 101, and converts the input video data into a format for performing an estimation process. This conversion is, for example, hue conversion or frame interpolation. Reference numeral 103 denotes a two-dimensional joint tracking unit that extracts the coordinates of the joint positions of the human body from the video data input by the image input unit 102 using image feature amounts and the like, and performs tracking processing for all frames to be analyzed. . Reference numeral 104 denotes a three-dimensional posture estimation unit that estimates a three-dimensional posture from the two-dimensional joint position coordinates of the human body tracked by the two-dimensional joint tracking unit 103.

符号１０５は、追跡対象とする領域の特徴や位置等の情報をテンプレートとして記憶するテンプレート記憶部である。テンプレートの形式は、２次元関節追跡部１０３ので利用する手法に依存する。２次元関節追跡部１０３は、画像入力部１０２により入力した映像データの各フレームの画像情報と、テンプレート記憶部１０５に記憶されたテンプレートから追跡処理を行う。符号１０６は、３次元姿勢推定部１０４で推定された３次元の姿勢情報を保持する姿勢情報記憶部である。符号１０７は、３次元姿勢推定部１０４において、３次元の姿勢を推定する際に用いる事前３次元動作モデル情報を記憶する事前３次元動作モデル記憶部である。動作モデルは、例えば、非特許文献２のように、３次元動作を低次元特徴で表現したものが適用可能である。符号１０８は、２次元間接追跡部１０３が推定した追跡対象の推定状態情報を記憶する推定情報記憶部である。３次元姿勢推定部１０４は、推定情報記憶部１０８の２次元の関節位置情報と事前３次元動作モデル記憶部１０７の動作モデル情報を参照して３次元姿勢を推定する。３次元姿勢推定部１０４における３次元姿勢の推定処理は公知の方法を用いて行う。 Reference numeral 105 denotes a template storage unit that stores information such as characteristics and positions of regions to be tracked as templates. The template format depends on the method used by the two-dimensional joint tracking unit 103. The two-dimensional joint tracking unit 103 performs tracking processing from the image information of each frame of the video data input by the image input unit 102 and the template stored in the template storage unit 105. Reference numeral 106 denotes a posture information storage unit that holds the three-dimensional posture information estimated by the three-dimensional posture estimation unit 104. Reference numeral 107 denotes a prior 3D motion model storage unit that stores prior 3D motion model information used when the 3D posture estimation unit 104 estimates a 3D posture. As the motion model, for example, as shown in Non-Patent Document 2, a representation of three-dimensional motion with low-dimensional features can be applied. Reference numeral 108 denotes an estimated information storage unit that stores estimated state information of the tracking target estimated by the two-dimensional indirect tracking unit 103. The three-dimensional posture estimation unit 104 estimates the three-dimensional posture with reference to the two-dimensional joint position information in the estimation information storage unit 108 and the motion model information in the prior three-dimensional motion model storage unit 107. The three-dimensional posture estimation processing in the three-dimensional posture estimation unit 104 is performed using a known method.

ここで、図５を参照して、パーティクルフィルタを用いて２次元関節位置を追跡する処理動作を説明する。パーティクルフィルタを用いる場合、図５に示すように、追跡対象の位置候補を仮説として多数（Ｎ個）生成する。まず、仮説の位置を初期化する（ＳＴＥＰ０）。そして、各時刻において、仮説位置の特徴量を追跡対象のテンプレート特徴量と比較して、その類似度を重みとして算出する（ＳＴＥＰ１）。次に、Ｎ個の仮説の中からもっともらしいものを選択し、その時刻の推定位置（状態）とする（ＳＴＥＰ２）。続いて、次の時刻の準備として重みの大きい仮説を複製し、小さい仮説を消滅させる（ＳＴＥＰ３）。そして、追跡対象の状態遷移モデルに従って、仮説を移動させて（ＳＴＥＰ４）、ＳＴＥＰ１に戻り、処理を繰り返す。状態遷移モデルは、対象物の運動モデルに基づいて定義することが望ましいが、運動モデルの定式化が難しい場合は、等速直線運動やランダムウォークを想定する。 Here, a processing operation for tracking the two-dimensional joint position using a particle filter will be described with reference to FIG. When using a particle filter, as shown in FIG. 5, a large number (N) of position candidates to be tracked are generated as hypotheses. First, the hypothesis position is initialized (STEP 0). At each time, the feature quantity at the hypothesis position is compared with the template feature quantity to be tracked, and the similarity is calculated as a weight (STEP 1). Next, a plausible one is selected from N hypotheses and set as an estimated position (state) at that time (STEP 2). Subsequently, as a preparation for the next time, a hypothesis having a large weight is duplicated and a small hypothesis is eliminated (STEP 3). Then, the hypothesis is moved in accordance with the state transition model to be tracked (STEP 4), the process returns to STEP 1 and the process is repeated. Although it is desirable to define the state transition model based on the motion model of the object, constant velocity linear motion or random walk is assumed when it is difficult to formulate the motion model.

次に、図２を参照して、図１に示す２次元関節追跡部１０３の動作を説明する。図２は、図１に示す２次元関節追跡部１０３の動作を示すフローチャートである。まず、２次元関節追跡部１０３は、時刻ｔを０とし（ステップＳ１）、解析映像記憶部１０１に蓄積される解析対象の映像データより時刻ｔが０であるフレームを取り出し、追跡対象とする関節の初期位置［ｘ_０，ｙ_０］を指定する（ステップＳ２）。この指定には、例えば、手動指定や肌色検出、エッジ検出等の画像処理を用いることが可能である。 Next, the operation of the two-dimensional joint tracking unit 103 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the two-dimensional joint tracking unit 103 shown in FIG. First, the two-dimensional joint tracking unit 103 sets time t to 0 (step S1), extracts a frame at time t from the analysis target video data stored in the analysis video storage unit 101, and sets the joint to be tracked. The initial position [x ₀ , y ₀ ] is designated (step S2). For this designation, for example, image processing such as manual designation, skin color detection, and edge detection can be used.

次に、２次元関節追跡部１０３は、ステップＳ２において指定した関節位置の特徴量Ｉ_ｔｅｍｐを算出し、これをテンプレートとしてテンプレート記憶部１０５に保持する（ステップＳ３）。この特徴量は、例えば、色ヒストグラムやＨＯＧ特徴量等、任意のものを用いることが可能である。続いて、２次元関節追跡部１０３は、追跡対象の位置を状態Φ（ｔ）＝［ｘ（ｔ），ｙ（ｔ）］とし、Ｎ個の仮説を生成して、ステップＳ２において指定した初期位置［ｘ_０，ｙ_０］を中心として分布するように初期化する（ステップＳ５）。このとき、分布には例えば平均［ｘ_０，ｙ_０］の正規分布等を用いることが可能である。 Next, the two-dimensional joint tracking unit 103 calculates the feature value I _temp of the joint position designated in step S2, and stores this in the template storage unit 105 as a template (step S3). As this feature amount, for example, an arbitrary one such as a color histogram or an HOG feature amount can be used. Subsequently, the two-dimensional joint tracking unit 103 sets the position of the tracking target to the state Φ (t) = [x (t), y (t)], generates N hypotheses, and specifies the initial specified in step S2 Initialization is performed such that the distribution is centered on the position [x ₀ , y ₀ ] (step S5). At this time, for example, a normal distribution having an average [x ₀ , y ₀ ] can be used.

次に、２次元関節追跡部１０３は、時刻ｔが解析フレーム数Ｔ以下であるか否かを判定し（ステップＳ６）、時刻ｔが解析フレーム数Ｔ以下でなければ処理を終了する。一方、時刻ｔが解析フレーム数Ｔ以下であれば、解析映像記憶部１０１に蓄積されている解析対象の映像データよりｔ＝Ｔとなるフレーム映像を取り出す（ステップＳ７）。そして、２次元関節追跡部１０３は、取り出したフレーム映像に対して、仮説Φ^ｉ（ｔ），ｉ＝１，２，．．，Ｎの場合の特徴量Ｉ^ｉ（ｔ）を算出する（ステップＳ１０）。 Next, the two-dimensional joint tracking unit 103 determines whether or not the time t is less than or equal to the number of analysis frames T (step S6). If the time t is not less than or equal to the number of analysis frames T, the process ends. On the other hand, if the time t is equal to or less than the number of analysis frames T, a frame image with t = T is extracted from the analysis target video data stored in the analysis video storage unit 101 (step S7). Then, the two-dimensional joint tracking unit 103 performs a hypothesis Φ ⁱ (t), i = 1, 2,. . , N, the feature quantity I ⁱ (t) is calculated (step S10).

次に、２次元関節追跡部１０３は、得られたＩ^ｉ（ｔ）とテンプレート特徴量Ｉ_ｔｅｍｐを比較してその類似度の重みｗ^ｉ（ｔ）を算出する（ステップＳ１１）。重みは、例えば、Ｉ^ｉ（ｔ）とＩ_ｔｅｍｐの距離を用いることが可能である。続いて、２次元関節追跡部１０３は、算出した重みｗ^ｉ（ｔ）より、時刻ｔにおける状態Φ^＊（ｔ）を推定する（ステップＳ１２）。この推定は、もっとも大きな重みをもつ仮説を選んだり、重みつき平均をとる等の方法によって行うことが可能である。そして、推定した状態情報を推定情報記憶部１０８のへ保存する。 Next, the two-dimensional joint tracking unit 103 compares the obtained I ⁱ (t) with the template feature value I _temp to calculate the similarity weight w ⁱ (t) (step S11). As the weight, for example, a distance between I ⁱ (t) and I _temp can be used. Subsequently, the two-dimensional joint tracking unit 103 estimates the state Φ ^* (t) at the time t from the calculated weight w ⁱ (t) (step S12). This estimation can be performed by selecting a hypothesis having the largest weight or taking a weighted average. Then, the estimated state information is stored in the estimated information storage unit 108.

次に、２次元関節追跡部１０３は、重みｗ^ｉ（ｔ）の大きさに応じて重みの大きい仮説は複製し、小さい仮説は削除することで、推定した状態付近に仮説が集まるようにリサンプリングを行う（ステップＳ１３）。続いて、２次元関節追跡部１０３は、次の時刻の準備として、追跡対象の状態遷移モデルに基づいて、仮説のサンプリングを行う（ステップＳ１４）。このサンプリングは例えば、等速直線やランダムウォークのモデルによって行う。そして、２次元関節追跡部１０３は、時刻ｔに１加算して（ステップＳ１５）、ステップＳ６に戻って、処理を繰り返す。 Next, the two-dimensional joint tracking unit 103 replicates hypotheses with large weights according to the weights w ⁱ (t) and deletes small hypotheses so that hypotheses are gathered around the estimated state. Sampling is performed (step S13). Subsequently, as a preparation for the next time, the two-dimensional joint tracking unit 103 samples a hypothesis based on the state transition model to be tracked (step S14). This sampling is performed by, for example, a constant velocity straight line or random walk model. Then, the two-dimensional joint tracking unit 103 adds 1 to the time t (step S15), returns to step S6, and repeats the process.

次に、図３を参照して、図１に示す姿勢推定装置の変形例を説明する。図３は、図１に示す姿勢推定装置の変形例の構成を示すブロック図である。この図において、図１に示す姿勢推定装置と同一の部分には同一の符号を付し、その説明を省略する。図３に示す姿勢推定装置が図１に示す姿勢推定装置と異なる点は、２次元関節追跡部１０３に代えて、２次元関節追跡部１０９を設けた点である。２次元関節追跡部１０９は、画像入力部１０２により入力した映像データと、テンプレート記憶部１０５に記憶されたテンプレート情報と、事前３次元動作モデル記憶部１０７に記憶された動作モデル情報とを参照して追跡処理を行う。 Next, a modification of the posture estimation apparatus shown in FIG. 1 will be described with reference to FIG. FIG. 3 is a block diagram showing a configuration of a modification of the posture estimation apparatus shown in FIG. In this figure, the same parts as those in the posture estimation apparatus shown in FIG. The posture estimation apparatus shown in FIG. 3 is different from the posture estimation apparatus shown in FIG. 1 in that a two-dimensional joint tracking unit 109 is provided instead of the two-dimensional joint tracking unit 103. The two-dimensional joint tracking unit 109 refers to the video data input by the image input unit 102, the template information stored in the template storage unit 105, and the motion model information stored in the prior three-dimensional motion model storage unit 107. Tracking process.

次に、図４を参照して、図３に示す２次元関節追跡部１０９の動作を説明する。図４は、図３に示す２次元関節追跡部１０９の動作を示すフローチャートである。図４において、図２に示す動作と同一の部分には同一の符号を付し、その説明を省略する。図４に示す動作が図２に示す動作と異なる点は、ステップＳ４、Ｓ５１、Ｓ８、Ｓ９、Ｓ１２１、Ｓ１４１を設けた点である。 Next, the operation of the two-dimensional joint tracking unit 109 shown in FIG. 3 will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of the two-dimensional joint tracking unit 109 shown in FIG. 4, parts that are the same as the operations shown in FIG. 2 are given the same reference numerals, and descriptions thereof are omitted. The operation shown in FIG. 4 is different from the operation shown in FIG. 2 in that steps S4, S51, S8, S9, S121, and S141 are provided.

まず、２次元関節追跡部１０９は、時刻ｔを０とする（ステップＳ１）。解析映像記憶部１０１に蓄積される解析対象の映像データより時刻ｔが０であるフレームを取り出し、追跡対象とする関節の初期位置［ｘ_０，ｙ_０］を指定する（ステップＳ２）。 First, the two-dimensional joint tracking unit 109 sets time t to 0 (step S1). A frame whose time t is 0 is extracted from the analysis target video data stored in the analysis video storage unit 101, and the initial position [x ₀ , y ₀ ] of the joint to be tracked is designated (step S2).

次に、２次元関節追跡部１０９は、ステップＳ２において指定した関節位置の特徴量Ｉ_ｔｅｍｐを算出し、これをテンプレートとしてテンプレート記憶部１０５に保持する（ステップＳ３）。続いて、２次元関節追跡部１０９は、解析対象となる映像の初期視点Ｖ_０を指定する。この初期視点Ｖ_０は、３次元姿勢を２次元に投影処理する際に用いる。 Next, the two-dimensional joint tracking unit 109 calculates the feature value I _temp of the joint position designated in step S2, and stores this in the template storage unit 105 as a template (step S3). Subsequently, the two-dimensional joint tracking unit 109 specifies the initial viewpoint V _{0 of the} video to be analyzed. This initial viewpoint V ₀ is used when a three-dimensional posture is projected in two dimensions.

次に、２次元関節追跡部１０９は、追跡対象の３次元姿勢Ｙ（ｔ）と２次元映像への投影視点Ｖ（ｔ）を状態Φ（ｔ）＝［Ｙ（ｔ），Ｖ（ｔ）］とし、Ｎ個の仮説を生成して、Ｖ（ｔ）は先に指定した初期視点Ｖ_０を中心として分布するように、Ｙ（ｔ）は事前３次元動作モデル記憶部１０７に記憶されている動作情報を中心として分布するように初期化する（ステップＳ５１）。 Next, the two-dimensional joint tracking unit 109 displays the three-dimensional posture Y (t) to be tracked and the projection viewpoint V (t) projected onto the two-dimensional image in the state Φ (t) = [Y (t), V (t) ], N hypotheses are generated, and Y (t) is stored in the prior three-dimensional motion model storage unit 107 so that V (t) is distributed around the initial viewpoint V ₀ specified earlier. Initialization is performed so that the distribution is centered on the current motion information (step S51).

次に、２次元関節追跡部１０９は、時刻ｔが解析フレーム数Ｔ以下であるか否かを判定し（ステップＳ６）、時刻ｔが解析フレーム数Ｔ以下でなければ処理を終了する。一方、時刻ｔが解析フレーム数Ｔ以下であれば、解析映像記憶部１０１に蓄積されている解析対象の映像データよりｔ＝Ｔとなるフレーム映像を取り出す（ステップＳ７）。そして、時刻ｔのフレーム映像より、人体の重心位置を算出する（ステップＳ８）。これは、例えば、腰の位置で表現でき、人体領域のシルエットの面積の重心を用いたりすることで可能である。続いて、２次元関節追跡部１０９は、ステップＳ８で用いたフレーム映像に対して、仮説Φ^ｉ（ｔ），ｉ＝１，２，．．，Ｎの２次元関節位置［ｘ^ｉ（ｔ），ｙ^ｉ（ｔ）］を３次元姿勢Ｙ^ｉ（ｔ）と投影視点Ｖ^ｉ（ｔ）より算出する（ステップＳ９）。 Next, the two-dimensional joint tracking unit 109 determines whether or not the time t is less than or equal to the number of analysis frames T (step S6). If the time t is not less than or equal to the number of analysis frames T, the process ends. On the other hand, if the time t is equal to or less than the number of analysis frames T, a frame image with t = T is extracted from the analysis target video data stored in the analysis video storage unit 101 (step S7). Then, the position of the center of gravity of the human body is calculated from the frame image at time t (step S8). This can be expressed by, for example, the position of the waist, and can be performed by using the center of gravity of the silhouette area of the human body region. Subsequently, the two-dimensional joint tracking unit 109 applies the hypothesis Φ ⁱ (t), i = 1, 2,. . , N two-dimensional joint positions [x ⁱ (t), y ⁱ (t)] are calculated from the three-dimensional posture Y ⁱ (t) and the projection viewpoint V ⁱ (t) (step S9).

次に、２次元関節追跡部１０９は、取り出したフレーム映像に対して、仮説Φ^ｉ（ｔ），ｉ＝１，２，．．，Ｎの場合の特徴量Ｉ^ｉ（ｔ）を算出し（ステップＳ１０）、得られたＩ^ｉ（ｔ）とテンプレート特徴量Ｉ_ｔｅｍｐを比較してその類似度の重みｗ^ｉ（ｔ）を算出する（ステップＳ１１）。そして、２次元関節追跡部１０９は、算出した重みｗ^ｉ（ｔ）より、時刻ｔにおける状態Φ^＊（ｔ）＝［Ｙ^＊（ｔ），Ｖ^＊（ｔ）］と２次元関節位置［ｘ^＊（ｔ），ｙ^＊（ｔ）］を推定する（ステップＳ１２１）。ここで、推定した状態情報と２次元関節位置情報を推定情報記憶部１０８へ保存する。 Next, the two-dimensional joint tracking unit 109 performs a hypothesis Φ ⁱ (t), i = 1, 2,. . Calculates the feature amount ^I i in the case of N (t) (step S10), and calculates the resulting ^I i (t) and its similarity by comparing the template feature amount _{I temp} weights ^w i (t) (Step S11). Then, the two-dimensional joint tracking unit 109 determines the state Φ ^* (t) = [Y ^* (t), V ^* (t)] and the two-dimensional joint position [x at time t from the calculated weight w ⁱ (t). ^* (T), y ^* (t)] is estimated (step S121). Here, the estimated state information and two-dimensional joint position information are stored in the estimated information storage unit 108.

次に、２次元関節追跡部１０９は、重みｗ^ｉ（ｔ）の大きさに応じて重みの大きい仮説は複製し、小さい仮説は削除することで、推定した状態付近に仮説が集まるようにリサンプリングを行う（ステップＳ１３）。続いて、２次元関節追跡部１０９は、次の時刻の準備として、追跡対象の状態遷移モデルに基づいて、仮説のサンプリングを行う（ステップＳ１４１）。ここで、状態遷移モデルは事前３次元動作モデル記憶部１０７に保持されている動作情報より定義する。３次元姿勢Ｙ^ｉの状態遷移は、例えば、非特許文献２の動作モデルを用いた場合、（１）式、（２）式で与えられる。

このとき、

である。 Next, the two-dimensional joint tracking unit 109 copies the hypothesis having a large weight according to the size of the weight w ⁱ (t) and deletes the small hypothesis so that the hypothesis is gathered around the estimated state. Sampling is performed (step S13). Subsequently, as a preparation for the next time, the two-dimensional joint tracking unit 109 samples a hypothesis based on the state transition model to be tracked (step S141). Here, the state transition model is defined from motion information held in the prior three-dimensional motion model storage unit 107. For example, when the motion model of Non-Patent Document 2 is used, the state transition of the three-dimensional posture Y ⁱ is given by Equations (1) and (2).

At this time,

It is.

ここで、Ｘ（ｎ）は３次元姿勢Ｙ（ｎ）の低次元表現である。また、ｋ_Ｘ（Ｘ^＊），ｋ_Ｙ（Ｘ^＊）はｉ番目の要素にｋ_Ｘ（Ｘ^＊，Ｘ_ｉ），ｋ_Ｙ（Ｘ^＊，Ｘ_ｉ）をもつベクトルであり、例えば（５）式、（６）式のように適当な関数で定義する。

このとき、α，βは学習により求めるパラメータである。 Here, X (n) is a low-dimensional representation of the three-dimensional posture Y (n). K _X (X ^* ) and k _Y (X ^* ) are vectors having k _X (X ^* , X _i ) and k _Y (X ^* , X _i ) as the i-th element, for example, (5) It is defined by an appropriate function as shown in Equation (6).

At this time, α and β are parameters obtained by learning.

また、投影視点Ｖ^ｉの状態遷移は、例えば、事前学習済みの動作情報より腰位置の並進と回転の変化量を用いて定義する。より具体的には、投影視点をＶ＝［θ，φ］と表現した場合、その視点変化ｄθ（ｎ），ｄφ（ｎ）は、同じ時刻の３次元動作データＹと最も近い動作データを事前３次元動作モデル記憶部１０７から選び、その並進ｔ（ｎ−１）と回転ｒ（ｎ−１）からそれぞれ視点の変化量ｄθ_ｔ（ｎ）＝ｔ（ｎ）−ｔ（ｎ−１），ｄθ_ｒ（ｎ）＝ｒ（ｎ）−ｒ（ｎ−１）を求める。全体としての視点変化はｄθ（ｎ）＝ｄθ_ｔ（ｎ）−ｄθ_ｒ（ｎ）で定義する。φに関しても同様に求められる。 The state transition of the projection viewpoint V ^i, for example, defined using the translation amount of change in the rotation of the pre-learned behavior information from the waist position. More specifically, when the projection viewpoint is expressed as V = [θ, φ], the viewpoint changes dθ (n) and dφ (n) are preliminarily obtained from the motion data closest to the three-dimensional motion data Y at the same time. From the three-dimensional motion model storage unit 107, the viewpoint change amount dθ _t (n) = t (n) −t (n−1), from the translation t (n−1) and rotation r (n−1), respectively. dθ _r (n) = r (n) −r (n−1) is obtained. The viewpoint change as a whole is defined by dθ (n) = dθ _t (n) −dθ _r (n). The same applies to φ.

次に、２次元関節追跡部１０９は、時刻ｔに１加算して（ステップＳ１５）、ステップＳ６に戻って、処理を繰り返す。 Next, the two-dimensional joint tracking unit 109 adds 1 to the time t (step S15), returns to step S6, and repeats the processing.

以上説明したように、２次元の関節位置を推定する際に、２次元関節位置の追跡を観測される画像情報だけでなく、事前に学習済みの３次元動作情報を利用して行うようにして、従来は２次元関節位置の追跡結果から推定を行っていた３次元動作を２次元関節位置の追跡と同時に行うことが可能となるため、３次元の姿勢推定の精度の向上を実現することができる。 As described above, when estimating the two-dimensional joint position, the tracking of the two-dimensional joint position is performed using not only the observed image information but also the previously learned three-dimensional motion information. In addition, since it is possible to perform the three-dimensional motion that has been estimated from the tracking result of the two-dimensional joint position at the same time as the tracking of the two-dimensional joint position, the accuracy of the three-dimensional posture estimation can be improved. it can.

なお、図１、図３における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより姿勢推定処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The program for realizing the functions of the processing unit in FIGS. 1 and 3 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read by the computer system and executed. An estimation process may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の精神及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other modifications of components may be made without departing from the spirit and scope of the present invention.

カメラで撮影された画像データから人物の姿勢推定を行うことが不可欠な用途に適用できる。 It can be applied to applications where it is essential to estimate the posture of a person from image data captured by a camera.

１０１・・・解析映像記憶部、１０２・・・画像入力部、１０３・・・２次元関節追跡部、１０４・・・３次元姿勢推定部、１０５・・・テンプレート記憶部、１０６・・・姿勢情報記憶部、１０７・・・事前３次元動作モデル記憶部、１０８・・・推定情報記憶部 DESCRIPTION OF SYMBOLS 101 ... Analysis video memory | storage part, 102 ... Image input part, 103 ... Two-dimensional joint tracking part, 104 ... Three-dimensional attitude | position estimation part, 105 ... Template memory | storage part, 106 ... Attitude Information storage unit, 107 ... Pre-three-dimensional motion model storage unit, 108 ... Estimated information storage unit

Claims

An image input means for inputting an image in which the posture estimation target is photographed;
A template storage means in which the feature quantity of the tracking target possessed by the posture estimation target is stored as template data;
3D motion model storage means storing the 3D motion model data to be tracked;
Position candidate initial setting means for setting a plurality of initial position candidates for the tracking target with reference to the three-dimensional motion model data;
From the image input by the image input means, a feature amount in the position candidate to be tracked is calculated, and the feature amount is compared with the template data stored in the template storage means to obtain a similarity weight obtained as a result. A position estimating means for estimating the position of the tracking target and outputting estimation information of the position of the tracking target;
Based on the weight of similarity and the three-dimensional motion model data, the position candidate of the tracking target is set again, and is set again based on a state transition model defined based on the three-dimensional motion model data Position candidate setting means for moving the position candidate;
Tracking processing means for tracking the position of the tracking target by repeating the estimation of the position of the tracking target by the position estimating means and the setting of the position candidate of the tracking target by the position candidate setting means a plurality of times;
An attitude estimation apparatus comprising: estimation information of the position of the tracking object and attitude estimation means for estimating the 3D attitude of the attitude estimation object with reference to the 3D motion model data.

The posture estimation apparatus according to claim 1, wherein the posture estimation target is a human body and the tracking target is a joint.

Image input means for inputting an image in which the posture estimation target is photographed, template storage means for storing the tracking target feature quantity of the posture estimation target as template data, and the tracking target three-dimensional motion model data stored A posture estimation method in a posture estimation device comprising the three-dimensional motion model storage means,
A position candidate initial setting step for setting a plurality of initial position candidates for the tracking target with reference to the three-dimensional motion model data;
From the image input by the image input means, a feature amount in the position candidate to be tracked is calculated, and the feature amount is compared with the template data stored in the template storage means to obtain a similarity weight obtained as a result. A position estimating step for estimating the position of the tracking target and outputting estimation information of the position of the tracking target;
Based on the weight of similarity and the three-dimensional motion model data, the position candidate of the tracking target is set again, and is set again based on a state transition model defined based on the three-dimensional motion model data A position candidate setting step for moving the position candidate;
A tracking processing step of tracking the position of the tracking target by repeating the estimation of the position of the tracking target by the position estimation step and the setting of the position candidate of the tracking target by the position candidate setting step a plurality of times;
A posture estimation method, comprising: estimation information of the position of the tracking target, and a posture estimation step of estimating a three-dimensional posture of the posture estimation target with reference to the three-dimensional motion model data.

A posture estimation program for causing a computer to execute the posture estimation method according to claim 3.