JP2001005973A

JP2001005973A - Method and device for estimating three-dimensional posture of person by color image

Info

Publication number: JP2001005973A
Application number: JP2000034483A
Authority: JP
Inventors: Kazuhiko Takahashi; 和彦高橋; Tatsumi Sakaguchi; 竜己坂口; Atsushi Otani; 淳大谷
Original assignee: ATR Media Integration and Communication Research Laboratories
Current assignee: ATR Media Integration and Communication Research Laboratories
Priority date: 1999-04-20
Filing date: 2000-02-14
Publication date: 2001-01-12

Abstract

PROBLEM TO BE SOLVED: To make detectable the feature point of a part even when the part of a human body overlaps another part against a camera. SOLUTION: A binary image is prepared from color images picked up by a plurality of cameras in respectively different directions, and a feature point is detected on the basis of the binary image (S13). In the case the feature point can not be detected or predicted from the binary image, a two-dimensional flesh color area is extracted from the color images on the basis of a flesh color index. The extracted flesh color area is subjected to volume restoration in a three-dimensional manner (S23), a volume in which the volume is maximum is defined as a vicinity area, and its center of gravity is defined as the three- dimensional coordinates of the feature point (S31 and S43).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は人物の３次元姿勢推定
方法および装置に関し、特にたとえば複数のカラーカメ
ラから入力されるカラー画像に基づいて人物の３次元姿
勢を推定する、人物の３次元姿勢推定方法および装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for estimating a three-dimensional posture of a person, and more particularly to estimating the three-dimensional posture of a person based on, for example, color images input from a plurality of color cameras. The present invention relates to an estimation method and an apparatus.

【０００２】[0002]

【従来の技術】この種の従来技術の一例が、平成１０年
９月２９日付で出願公開された特開平１０−２５８０４
４号公報[Ａ６１Ｂ５／１０]に開示されている。この
従来技術は、熱画像を閾値処理するをすることによって
人体のシルエット画像を抽出し、そのシルエット画像の
輪郭形状から身体の特徴点を検出して人物の３次元姿勢
を推定していた。2. Description of the Related Art An example of this kind of prior art is disclosed in Japanese Patent Application Laid-Open No. Hei 10-25804, which was filed on Sep. 29, 1998.
No. 4 [A61B 5/10]. In this conventional technology, a thermal image is subjected to threshold processing to extract a silhouette image of a human body, and feature points of the body are detected from the contour shape of the silhouette image to estimate a three-dimensional posture of the person.

【０００３】[0003]

【発明が解決しようとする課題】従来技術では、特別な
装置を被験者の身体に装着することなく、いわゆる非接
触で３次元姿勢を推定できるので被験者に身体的あるい
は心理的な圧迫を与えることがないという利点はあるも
のの、熱画像を用いていたため、姿勢推定に利用できる
情報は人体のシルエット画像およびその輪郭形状だけで
あった。そのため、赤外線カメラに対して人体の一部、
たとえば手先が他の部位、たとえば胴体と重なるような
場合にはその部位を検出することができず、したがっ
て、推定精度の向上に限界があった。In the prior art, a so-called non-contact three-dimensional posture can be estimated without attaching a special device to the subject's body, so that physical or psychological pressure is applied to the subject. Although there is an advantage that no thermal image is used, information that can be used for posture estimation is only a silhouette image of a human body and its contour shape because a thermal image is used. Therefore, part of the human body,
For example, when the hand overlaps with another part, for example, the body, the part cannot be detected, and there is a limit in improving the estimation accuracy.

【０００４】それゆえに、この発明の主たる目的は、推
定精度を向上できる、新規な３次元姿勢推定方法および
装置を提供することである。[0004] Therefore, a main object of the present invention is to provide a novel three-dimensional posture estimation method and apparatus which can improve the estimation accuracy.

【０００５】この発明の他の目的は、人体の一部が他の
部位と重なるような場合でも身体の特徴点を検出するこ
とが可能な、３次元姿勢推定方法および装置を提供する
ことである。Another object of the present invention is to provide a three-dimensional posture estimation method and apparatus capable of detecting a characteristic point of a body even when a part of a human body overlaps another part. .

【０００６】[0006]

【課題を解決するための手段】この発明に従った推定方
法は、複数のカラーカメラを用いて複数の視点から撮影
した複数のカラー画像に基づいて身体の特徴点を検出
し、その特徴点から人物の３次元姿勢を推定する方法で
あって、(a)カラー画像を２値化して２値画像を取得す
るステップ、(b)２値画像から特徴点を検出するステッ
プ、(c)特徴点の妥当性を判断するステップ、および(d)
妥当でない場合には、カラー画像を処理して特徴点を検
出するステップを含む、人物の３次元姿勢を推定する方
法である。An estimating method according to the present invention detects a feature point of a body based on a plurality of color images taken from a plurality of viewpoints using a plurality of color cameras, and detects the feature point from the feature point. A method for estimating a three-dimensional posture of a person, comprising the steps of (a) binarizing a color image to obtain a binary image, (b) detecting feature points from the binary image, and (c) feature points. Determining the validity of
If not, a method of estimating a three-dimensional posture of a person, including processing a color image to detect feature points.

【０００７】この発明に従った推定装置は、人物をそれ
ぞれ異なる方向から撮影してカラー画像を出力する複数
のカラーカメラ、カラー画像を２値化して２値画像を出
力する２値化手段、２値画像から身体の特徴点を検出す
る第１検出手段、特徴点の妥当性を判断する妥当性判断
手段、および特徴点が妥当でない場合には、カラー画像
を処理して特徴点を検出する第２検出手段を備える、人
物の３次元姿勢を推定する装置である。The estimating apparatus according to the present invention includes a plurality of color cameras for photographing a person from different directions and outputting a color image, binarizing means for binarizing the color image and outputting a binary image, First detecting means for detecting the feature points of the body from the value image, validity determining means for determining the validity of the feature points, and, if the feature points are not appropriate, processing the color image to detect the feature points. It is an apparatus for estimating a three-dimensional posture of a person, comprising two detecting means.

【０００８】[0008]

【作用】複数のカメラでそれぞれ異なった方向から撮像
した人物のカラー画像を、たとえば人物像を「１」、背
景像を「０」として２値化し、その２値画像の輪郭形状
から身体の特徴点を検出する。具体的には、２値画像か
ら重心と、主軸と、輪郭すなわち２値画像における人物
の境界線を検出する。A color image of a person taken by a plurality of cameras from different directions is binarized, for example, by setting the person image to "1" and the background image to "0". Detect points. Specifically, the center of gravity, the main axis, and the outline, that is, the boundary of the person in the binary image are detected from the binary image.

【０００９】なお、好ましくは、複数のカラーカメラ
は、人物の正面，側面，立面および少なくとも１つの対
向方向から人物を撮影するように配置され、もしくは人
物の正面，側面，立面および少なくとも１つの中間位置
から人物を撮影するように配置される。そうすることに
よって、死角を解消することができる。Preferably, the plurality of color cameras are arranged so as to photograph a person from the front, side, and elevation of the person and at least one facing direction, or at least one of the front, side, and elevation of the person. Are arranged so as to photograph a person from two intermediate positions. By doing so, blind spots can be eliminated.

【００１０】重心，主軸および輪郭に基づいて身体の特
徴点を検出するには、上半身重心より上方でかつ上半身
主軸から最短距離の輪郭点を仮頭頂点とし、さらに足先
点や手先点等の特徴点を検出する。このとき、特徴点が
検出されなければ、予測フィルタによって特徴点の位置
を予測する。In order to detect a characteristic point of the body based on the center of gravity, the main axis and the contour, a contour point above the upper body center of gravity and the shortest distance from the upper body main axis is set as a temporary head vertex, and further, a toe point, a hand point point, etc. Detect feature points. At this time, if no feature point is detected, the position of the feature point is predicted by the prediction filter.

【００１１】なお、特徴点を検出するに際して、最も適
したカラー画像すなわち撮影方向ないし視点を決定する
必要がある。つまり、複数のカラー画像のそれぞれにつ
いて特徴点候補を検出し、それらの特徴点候補を評価す
ることによって視点を決定する。このようにして視点を
選択することによって、特徴点をより正確に検出または
推定することができる。When detecting a feature point, it is necessary to determine the most suitable color image, that is, the shooting direction or viewpoint. In other words, a feature point candidate is detected for each of the plurality of color images, and the viewpoint is determined by evaluating the feature point candidates. By selecting a viewpoint in this way, a feature point can be detected or estimated more accurately.

【００１２】そして、検出または予測された特徴点の妥
当性を判断する。具体的には、特徴点間の３次元直線距
離を判断基準として、その直線距離があり得ないもので
あるとき、該当の特徴点が妥当でないと判断される。こ
のとき、たとえば肩や肘あるいは股や膝等の関節角度を
判断基準として加味してもよい。これらの関節は曲がる
方向や角度が決まっていて、その方向や角度からはずれ
ているときには、特徴点が妥当でないと判断すればよ
い。Then, the validity of the detected or predicted feature point is determined. Specifically, when the three-dimensional straight-line distance between the feature points is a criterion and the straight-line distance is impossible, the corresponding feature point is determined to be invalid. At this time, for example, a joint angle of a shoulder, an elbow, a crotch, a knee, or the like may be added as a criterion. The bending direction and angle of these joints are determined, and when they deviate from those directions and angles, it may be determined that the feature points are not appropriate.

【００１３】妥当でないと判断された場合は、カラー画
像を処理し、そのカラー画像から抽出される肌色領域か
ら特徴点を検出する。具体的には、各カラー画像につい
て肌色領域の２次元座標群から３次元ボリュームに復元
し、そのうちの最大体積の３次元ボリュームを肌色領域
として選択し、その肌色領域からたとえば手先点を決定
する。If it is determined that the color image is not appropriate, the color image is processed, and a feature point is detected from a flesh color region extracted from the color image. Specifically, for each color image, a three-dimensional volume is restored from the two-dimensional coordinate group of the flesh-colored area, the three-dimensional volume having the maximum volume is selected as the flesh-colored area, and, for example, a hand point is determined from the flesh-colored area.

【００１４】妥当であると判断された特徴点および／ま
たはカラー処理によって求めた特徴点によって、被験者
の３次元姿勢を推定することができる。The three-dimensional posture of the subject can be estimated from the feature points determined to be appropriate and / or the feature points obtained by color processing.

【００１５】なお、この発明の他の実施例では、シルエ
ット画像から特徴点を抽出する方法と、時間差分画像に
基づいて特徴点を抽出する方法とを併用し、各特徴点を
カルマンフィルタによって処理する。時間差分画像から
特徴点を求める方法は、具体的には、(a)カラー画像デ
ータから時間差分画像を形成するステップ、(b)カラー
画像データから肌色領域を抽出した画像を形成するステ
ップ、(c)差分画像データと肌色領域を抽出した画像デ
ータとに基づいて人物の特徴点を抽出するステップ、お
よび(d)特徴点に対してカルマンフィルタによるフィル
タリング処理と未来位置予測を施すステップを含む。In another embodiment of the present invention, a method of extracting feature points from a silhouette image and a method of extracting feature points based on a time difference image are used together, and each feature point is processed by a Kalman filter. . The method for obtaining the feature points from the time difference image is, specifically, (a) a step of forming a time difference image from the color image data, (b) a step of forming an image obtained by extracting a skin color region from the color image data, c) extracting a feature point of the person based on the difference image data and the image data from which the flesh color region is extracted, and (d) performing a filtering process using a Kalman filter and a future position prediction on the feature point.

【００１６】この実施例によれば、自己遮蔽を生じてい
るような場合でも、正確に特徴点すなわち３次元姿勢を
推定することができる。According to this embodiment, a feature point, that is, a three-dimensional posture can be accurately estimated even in a case where self-shielding occurs.

【００１７】[0017]

【発明の効果】この発明によれば、２値画像によって特
徴点を検出または予測できないときには、カラー画像処
理によって特徴点を検出するようにしているので、たと
えば手先などのような人体の一部が他の部位と重なるよ
うな場合でも、その部位を正確に検出することができ
る。したがって、この発明によれば、推定する３次元姿
勢の精度を向上させることができる。According to the present invention, when a feature point cannot be detected or predicted by a binary image, the feature point is detected by color image processing. Even when it overlaps with another part, that part can be detected accurately. Therefore, according to the present invention, the accuracy of the estimated three-dimensional posture can be improved.

【００１８】この発明の上述の目的，その他の目的，特
徴および利点は、図面を参照して行う以下の実施例の詳
細な説明から一層明らかとなろう。The above objects, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

【００１９】[0019]

【実施例】図１に示すこの実施例の仮想変身装置１０
は、５台のカラーＣＣＤカメラ（以下、単に「カメラ」
という。）１２ａ〜１２ｅを含み、それぞれのカメラ１
２ａ〜１２ｅは、対象となる人物をそれぞれ異なる方向
から撮影する。たとえば、カメラ１２ａは正面から、カ
メラ１２ｂは側面から、カメラ１２ｃは上面から、カメ
ラ１２ｄは左斜め前方から、そしてカメラ１２ｅは右斜
め後方から、それぞれ、人物を撮影するように配置され
ている。なお、カメラ１２ｄおよび１２ｅは、このよう
な中間的な視点からではなく、正面や側面の対向面から
の視点、裏面や右または左側面に配置されてもよく、使
用するカメラの台数や撮影方法は、適宜変更されてよ
い。しかしながら、少なくとも４台のカメラを用いるこ
とが、死角をなくするという点で望ましい。たとえば、
正面，側面および立面（上面）の３方向だけの場合、３
台のカメラで捉えられるのは身体領域の３／４だけであ
り、残りの１／４は完全に死角となる。この光学的な死
角を解消するためには、より多くのカメラを設置するこ
とが有効である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A virtual transformation device 10 of this embodiment shown in FIG.
Means five color CCD cameras (hereinafter simply referred to as "cameras").
That. ) 12a to 12e, each camera 1
2a to 12e photograph the target person from different directions. For example, the camera 12a is arranged so as to photograph a person from the front, the camera 12b is arranged from the side, the camera 12c is arranged from the upper side, the camera 12d is arranged diagonally from the left and the camera 12e is arranged diagonally from the rear. It should be noted that the cameras 12d and 12e may be arranged not from such an intermediate viewpoint but from the front or side facing surface, the back surface or the right or left side surface. May be changed as appropriate. However, using at least four cameras is desirable in that it eliminates blind spots. For example,
3 in case of only three directions of front, side and elevation (top)
Only three quarters of the body area is captured by the camera, and the remaining quarter is completely blind spot. In order to eliminate the optical blind spot, it is effective to install more cameras.

【００２０】カメラ１２ａ〜１２ｅからはカラー画像が
出力され、各カラー画像は、それぞれのカメラ１２ａ〜
１２ｅに個別に割り当てられた画像処理装置１４ａ〜１
４ｅへ入力される。画像処理装置１４ａ〜１４ｅは、そ
れぞれパソコン程度の処理能力を有するコンピュータで
あるが、時分割的に処理することによって１つないし複
数のより高性能のコンピュータを共有することによって
個別のコンピュータを代替することができる。そして、
画像処理装置１４ａ〜１４ｅは、後に詳細に説明する
が、入力されたカラー画像を処理して各画像上における
身体の特徴点の２次元座標を検出する。Color images are output from the cameras 12a to 12e, and each color image is output from each of the cameras 12a to 12e.
12e individually assigned to the image processing apparatuses 14a to 14e
4e. Each of the image processing devices 14a to 14e is a computer having a processing capability of a personal computer, but replaces an individual computer by sharing one or more higher-performance computers by performing time-division processing. be able to. And
As will be described in detail later, the image processing devices 14a to 14e process the input color images and detect the two-dimensional coordinates of the feature points of the body on each image.

【００２１】画像処理装置１４ａ〜１４ｅによって得ら
れた特徴点の２次元座標情報は、通信回線を通じて、３
次元座標算出装置１６へ送られる。この３次元座標算出
装置１６もまた１台のコンピュータであり、特徴点の２
次元座標を３角測量の原理で３次元的に復元するととも
に、復元したデータが妥当であるかどうかを、たとえば
初期的に設定されている人物の身体データに基づいて検
証する。The two-dimensional coordinate information of the characteristic points obtained by the image processing devices 14a to 14e is
It is sent to the dimensional coordinate calculation device 16. The three-dimensional coordinate calculation device 16 is also one computer,
The dimensional coordinates are three-dimensionally restored based on the principle of triangulation, and the validity of the restored data is verified based on, for example, the initially set body data of the person.

【００２２】そして、３次元座標算出装置１６で得られ
た特徴点の３次元座標情報は、身体動作合成装置１８に
送られる。身体動作合成装置１８は、特徴点の３次元座
標に基づいて、コンピュータグラフィックの手法により
「アバタ」と呼ばれる被験者の分身の人物画像を仮想３
次元環境内に再現ないし再生する。Then, the three-dimensional coordinate information of the feature points obtained by the three-dimensional coordinate calculating device 16 is sent to the body motion synthesizing device 18. Based on the three-dimensional coordinates of the feature points, the body motion synthesizer 18 uses a computer graphic technique to create a virtual image of the subject's alter ego called “avatar”.
Reproduce or reproduce in a dimensional environment.

【００２３】図１に示す仮想変身装置１０は、発明者等
が先に発表した“Shall We Dance?”システムとして利
用可能な装置であるが、この発明は、身体動作合成装置
１８には関与せず、したがって、斯かるシステムだけで
なく、被験者の３次元姿勢を推定する必要のある任意の
システム、たとえば、トレーニングシステム，監視シス
テム，非接触マニピュレーションシステム等に利用可能
である。The virtual transformation device 10 shown in FIG. 1 is a device that can be used as the “Shall We Dance?” System previously announced by the inventors and the like. Therefore, the present invention can be used not only for such a system but also for any system that needs to estimate the three-dimensional posture of the subject, for example, a training system, a monitoring system, a non-contact manipulation system, and the like.

【００２４】図２を参照して、図１実施例の仮想変身装
置１０において、まず、ステップＳ１では、カメラ１２
ａ〜１２ｅによって撮影された人物（被験者）のカラー
画像が対応する画像処理装置１４ａ〜１４ｅに入力され
る。Referring to FIG. 2, in virtual transformation apparatus 10 of the embodiment of FIG. 1, first, in step S1, camera 12
Color images of a person (subject) photographed by a to 12e are input to the corresponding image processing devices 14a to 14e.

【００２５】次に、ステップＳ３では、画像処理装置１
４ａ〜１４ｅは、背景差分法によって、入力カラー画像
から人物画像を抽出する。背景差分法とは、人物の存在
する画像（入力画像）から人物のいない画像（背景画
像）を差し引くことによって、人物画像だけを取り出す
ことである。なお、背景画像は、後述する前処理によっ
て予め獲得されている。Next, in step S3, the image processing apparatus 1
4a to 14e extract a person image from the input color image by the background difference method. The background subtraction method is to extract only a person image by subtracting an image without a person (background image) from an image with a person (input image). Note that the background image has been obtained in advance by preprocessing described later.

【００２６】そして、ステップＳ５において、人物画像
を「１」、背景画像を「０」として入力画像を２値化
し、図３に示すように、人物のシルエット画像となる２
値画像を抽出する。Then, in step S5, the input image is binarized by setting the person image to "1" and the background image to "0", and as shown in FIG.
Extract the value image.

【００２７】なお、図３〜図７では、（ａ）に正面、
（ｂ）に側面、（ｃ）に立面のそれぞれの画像を示し、
カメラ１２ｄおよび１２ｅに基づく中間位置画像（また
は対向位置画像）は省略している。しかしながら、これ
ら中間位置画像または対向位置画像についても、以下に
述べるのと同様の処置によって同様の結果が得られるこ
とを予め指摘しておく。In FIGS. 3 to 7, (a) shows a front view,
(B) shows an image of the side surface, (c) shows an image of the elevation surface,
The intermediate position image (or the opposing position image) based on the cameras 12d and 12e is omitted. However, it should be pointed out in advance that the same result can be obtained for the intermediate position image or the opposing position image by the same treatment as described below.

【００２８】ステップＳ７では、図４に示す身体の重心
（イ）を検出する。２値画像である図３のシルエット画
像より身体重心を直接求める場合は、腕や足の姿勢によ
って重心位置が著しく移動してしまい不安定なので、距
離変換を施した距離変換画像dijを用いることによって
手足による姿勢の影響を抑える。距離変換画像dijにお
ける身体重心（ＩC，ＪC）は次式より求められる。In step S7, the center of gravity (A) of the body shown in FIG. 4 is detected. When the body center of gravity is directly obtained from the silhouette image of FIG. 3 which is a binary image, the position of the center of gravity significantly moves due to the posture of the arm or foot and is unstable. Reduce the effects of posture due to limbs. The body center of gravity (IC, JC) in the distance conversion image dij is obtained by the following equation.

【００２９】[0029]

【数１】 (Equation 1)

【００３０】ステップＳ９では、図５に示す上半身の主
軸（ロ）を検出する。上半身主軸は、先の距離変換画像
dijから上半身距離画像ｇijを求め、その上半身距離画
像ｇijの重心を通る慣性主軸の角度θから上半身の傾き
および角度を検出する。In step S9, the main shaft (b) of the upper body shown in FIG. 5 is detected. The upper body main axis is the distance conversion image
The upper body distance image gij is obtained from dij, and the inclination and angle of the upper body are detected from the angle θ of the principal axis of inertia passing through the center of gravity of the upper body distance image gij.

【００３１】上半身距離画像ｇijは、距離変換画像dij
における身体重心の水平位置よりも上方の画像に対し、
腕領域の除去処理のため前フレームの上半身主軸を対称
軸としたガウス分布を距離値に乗じて求める。なお、腕
領域を除去処理するのは、腕領域による上半身主軸を検
出する際の誤差をなくするためである。The upper body distance image gij is a distance conversion image dij
For images above the horizontal position of the body center of gravity at
For removing the arm region, a Gaussian distribution with the upper body main axis of the previous frame as a symmetry axis is obtained by multiplying the distance value. The reason for removing the arm region is to eliminate an error in detecting the upper body main axis based on the arm region.

【００３２】上半身距離画像ｇijの重心を通る慣性主軸
の角度θは次式より求められる。The angle θ of the principal axis of inertia passing through the center of gravity of the upper body distance image gij can be obtained by the following equation.

【００３３】[0033]

【数２】 (Equation 2)

【００３４】また、水平方向に手を延ばした場合などに
は、脊髄に沿う方向に主軸が定まらないことが考えられ
るが、この問題は腕領域の除去処理により解決すること
ができる。When the hand is extended in the horizontal direction, the main axis may not be determined along the spinal cord, but this problem can be solved by removing the arm region.

【００３５】ステップＳ１１では、図３のシルエット画
像から、図６に示す人物の輪郭線を抽出する。人物の輪
郭線は、シルエット画像において身体重心からラスタ走
査し、最初に発見された輪郭（境界）画素を開始点とし
て反時計回りに境界線を追跡し、境界線上で「１」、そ
れ以外で「０」となる２値画像を抽出することで獲得さ
れる。ここで、上半身主軸から最短距離の輪郭点を仮頭
頂点とする。In step S11, the outline of the person shown in FIG. 6 is extracted from the silhouette image of FIG. The contour line of the person is raster-scanned from the center of gravity of the body in the silhouette image, the boundary line is traced counterclockwise starting from the first found contour (boundary) pixel, "1" on the boundary line, It is obtained by extracting a binary image that becomes “0”. Here, the contour point at the shortest distance from the upper body main axis is defined as a temporary head vertex.

【００３６】そして、ステップＳ１３において、図７の
黒塗り小矩形に示すような、正面，側面および立面の各
画像上における人体の特徴点（ハ）の２次元座標位置を
検出する。特徴点の検出方法は、図８に示すフローチャ
ートを用いて説明する。In step S13, the two-dimensional coordinate position of the characteristic point (c) of the human body on each of the front, side, and elevation images is detected as shown by the small black rectangle in FIG. The feature point detection method will be described with reference to the flowchart shown in FIG.

【００３７】図８を参照して、まず、ステップＳ５１で
は、人体の足先点を検出する。身体重心の算出に用いた
距離変換画像dijの極大値を求めることで骨格画像を作
成し、下半身（身体重心の垂直位置より下方）かつ左
（右）半身（身体重心の水平位置を中心として左あるい
は右側）に存在する骨格の端点の中から「身体重心から
の距離が最大」の条件を満たす端点を足先点として選
ぶ。Referring to FIG. 8, first, in step S51, the toe point of the human body is detected. A skeleton image is created by calculating the maximum value of the distance conversion image dij used for calculating the body center of gravity, and the lower body (below the vertical position of the body center of gravity) and the left (right) half body (left centering on the horizontal position of the body center of gravity) Alternatively, an end point that satisfies the condition “the distance from the center of gravity of the body is the largest” is selected from the end points of the skeleton existing on the right side) as the toe point.

【００３８】次に、ステップＳ５３では、手先点を検出
する。ステップＳ１１で得た仮頭頂点から足先までに相
当する輪郭に含まれる画素数を、図９に示すように、ｌ
h：ｍh：ｎhの比率に分割し、その中央部分に含まれる
輪郭点を手先点の候補区間とする。なお、ｌh：ｍh：ｎ
hの比率は経験的に求めた定数であり、たとえば２：
４：４に設定される。Next, in step S53, a hand point is detected. As shown in FIG. 9, the number of pixels included in the contour corresponding to the temporary head vertex and the toe obtained in step S11 is
It is divided into the ratio of h: mh: nh, and the contour point included in the center portion is set as a candidate section of the hand point. Note that lh: mh: n
The ratio of h is an empirically determined constant, for example, 2:
4: 4 is set.

【００３９】そして、図９におけるように、候補区間中
の垂直位置が最高の点を手先候補点Ａ，垂直位置が最低
の点を手先候補点Ｂ，水平距離が最長の点を手先候補点
Ｃ，候補区間の終点で足先側の点を手先候補点Ｄとし、
以下の条件に従って順に評価を加える。各手先候補点へ
の評価方法は、図１０に示すフローチャートを用いて説
明する。As shown in FIG. 9, the point having the highest vertical position in the candidate section is the hand candidate point A, the point having the lowest vertical position is the hand candidate point B, and the point having the longest horizontal distance is the hand candidate point C. , Let the point on the toe side at the end point of the candidate section be the hand candidate point D,
Evaluation is performed in order according to the following conditions. The evaluation method for each hand candidate point will be described with reference to the flowchart shown in FIG.

【００４０】図１０を参照して、まず、ステップＳ６１
では、手先候補点Ａの垂直位置が仮頭頂点の垂直位置か
らＴ１を引いた値よりも大きい場合は、最も可能性が高
いので、手先候補点Ａに得点３を与える。Referring to FIG. 10, first, at step S61.
Then, when the vertical position of the hand candidate point A is larger than the value obtained by subtracting T1 from the vertical position of the temporary head vertex, the score 3 is given to the hand candidate point A because it is most likely.

【００４１】次に、ステップＳ６３では、手先候補点Ｂ
の垂直位置が身体重心の垂直位置にＴ１を加えた値より
も小さい場合は、可能性が高いので、手先候補点Ｂに得
点２を与える。Next, in step S63, the hand candidate point B
If the vertical position is smaller than the value obtained by adding T1 to the vertical position of the body center of gravity, there is a high possibility that score 2 is given to the hand candidate point B.

【００４２】ステップＳ６５では、手先候補点Ｃの水平
位置が身体重心の水平位置にＴ２を加えた値よりも小さ
い場合は、可能性が最も低いので、手先候補点Ｃに得点
１を与える。In step S65, when the horizontal position of the hand candidate point C is smaller than the value obtained by adding T2 to the horizontal position of the body center of gravity, the score is given to the hand candidate point C because the possibility is the lowest.

【００４３】そして、ステップＳ６７では、手先候補点
Ａ,ＢおよびＣが上記のいずれの条件にも当てはまらな
かった場合は、それぞれの手先候補点Ａ,ＢおよびＣに
得点０を与える。In step S67, if the hand candidate points A, B, and C do not satisfy any of the above conditions, a score of 0 is given to each of the hand candidate points A, B, and C.

【００４４】なお、Ｔ１およびＴ２は、それぞれのシル
エット画像の外接四角形の高さによって定まる定数であ
る。Note that T1 and T2 are constants determined by the height of the circumscribed rectangle of each silhouette image.

【００４５】このようにして、手先候補点Ａ,Ｂおよび
Ｃに対してあり得るべき可能性に応じて得点０点〜３点
の評価が与えられ、得点の高い順に２点の手先候補点が
手先候補として選択される。つまり、この評価によっ
て、カメラすなわち撮影方向ないし視点を選択する。そ
して、図２のステップＳ２１を参照して後述するよう
に、選択された２点の手先候補点を含む画像から３次元
復元される。In this way, the hand candidate points A, B and C are given a score of 0 to 3 according to the possible possibility, and two hand candidate points are assigned in descending order of the score. It is selected as a hand candidate. That is, a camera, that is, a shooting direction or a viewpoint is selected by this evaluation. Then, as described later with reference to step S21 in FIG. 2, three-dimensional restoration is performed from the image including the two selected hand candidate points.

【００４６】なお、手先候補点Ａ,ＢおよびＣに与えら
れた得点がすべて０点であった場合は、手先候補点Ｄを
手先点とする。If the scores given to the hand candidate points A, B and C are all 0, the hand candidate point D is regarded as the hand point.

【００４７】図８に戻り、ステップＳ５５では、頭頂点
を検出する。仮頭頂点から両手先に相当する輪郭の画素
数を図１１に示すｌP：ｍP：ｎPの比率に分割し、中央
部分に含まれる輪郭点を頭頂点の候補とする。そして、
上半身主軸までの距離が最短である点を首位置として、
左右の首位置に挟まれている輪郭の二等分点を頭頂点と
する。なお、ｌP：ｍP：ｎPの比率は、手先点の検出の
ときと同様に、経験的に求めた定数である。Returning to FIG. 8, in step S55, the top of the head is detected. The number of pixels of the contour corresponding to both hands from the provisional head vertex is divided into the ratio of lP: mP: nP shown in FIG. And
The point where the distance to the upper body main axis is the shortest is set as the neck position,
The bisection point of the contour sandwiched between the left and right neck positions is defined as the top vertex. Note that the ratio of lP: mP: nP is an empirically obtained constant, as in the case of detecting the hand point.

【００４８】ステップＳ５７では、肘位置を推定する。
図１２に示すように、頭頂点ＨＰと手先点ＴＰとを結ぶ
直線ＨＴを引き、頭頂点ＨＰと手先点ＴＰとの間の輪郭
点群をＣＰとする。そして、輪郭点群ＣＰの中で直線Ｈ
Ｔとの距離が最も遠い点を仮肘点ＴＥとする。また、一
般的な経験から、手先点から肘までの距離をＨＬと設定
する。In step S57, the elbow position is estimated.
As shown in FIG. 12, a straight line HT connecting the head vertex HP and the hand point TP is drawn, and a group of contour points between the head vertex HP and the hand point TP is defined as CP. Then, a straight line H in the contour point group CP
The point farthest from T is the provisional elbow point TE. Also, from general experience, the distance from the hand point to the elbow is set as HL.

【００４９】そして、手先点ＴＰから仮肘点ＴＥまでの
距離と、経験から知られている手先点から肘までの距離
ＨＬとの差が予め定められる閾値ＭＤよりも小さい場合
は、仮肘点ＴＥが肘の位置とされる。If the difference between the distance from the hand point TP to the provisional elbow point TE and the distance HL from the hand point to the elbow known from experience is smaller than a predetermined threshold MD, the provisional elbow point TE is the position of the elbow.

【００５０】逆に、両者の差が予め定められる閾値ＭＤ
よりも大きい場合は、頭頂点ＨＰから手先点ＴＰに至る
輪郭点群ＣＰからなる曲線の中点を肘の位置とする。Conversely, a difference MD between the two is determined in advance by a threshold MD.
If it is larger than the elbow position, the middle point of the curve composed of the contour points CP from the head vertex HP to the hand point TP is set as the position of the elbow.

【００５１】図２に戻り、ステップＳ１５では、特徴点
の検出が可能であったかどうかを判断する。たとえば手
先点が胴体と重なるような場合には、図６の人物の輪郭
画像において手先点の候補区間を設定できないので、手
先点を検出することができない。また、足先点が重心か
ら一方側のみにある場合も、現行の手法では足は重心の
左右にあることを前提としているため、足先点を検出す
ることができない。Returning to FIG. 2, in step S15, it is determined whether or not the feature point can be detected. For example, when the hand point overlaps the torso, a hand point candidate section cannot be set in the contour image of the person in FIG. 6, so that the hand point cannot be detected. Also, when the toe point is located only on one side from the center of gravity, the current method assumes that the foot is on the left or right of the center of gravity, and therefore cannot detect the toe point.

【００５２】ここで、特徴点の検出が可能だと判断され
た場合はステップＳ１７へ進み、先のステップＳ１３で
検出された特徴点の２次元座標情報を３次元座標算出装
置１６に入力する。逆に、特徴点の検出が可能でないと
判断された場合はステップＳ１９へ進む。If it is determined that the feature point can be detected, the process proceeds to step S17, and the two-dimensional coordinate information of the feature point detected in the previous step S13 is input to the three-dimensional coordinate calculation device 16. Conversely, when it is determined that the feature point cannot be detected, the process proceeds to step S19.

【００５３】ステップＳ１９では、予測フィルタによっ
て特徴点の位置を予測する。ここでは、過去数フレーム
分の検出結果からカルマンフィルタもしくは線形予測フ
ィルタによって現フレームでの特徴点の位置を予測す
る。予測された特徴点の２次元座標情報は３次元座標算
出装置１６に入力される。In step S19, the position of a feature point is predicted by a prediction filter. Here, the positions of the feature points in the current frame are predicted by the Kalman filter or the linear prediction filter from the detection results for the past several frames. The two-dimensional coordinate information of the predicted feature point is input to the three-dimensional coordinate calculation device 16.

【００５４】なお、過去の検出結果とは、画像処理装置
１４ａ〜１４ｅでの検出結果ではなく、後述するステッ
プＳ２７およびステップＳ４１において、３次元座標算
出装置１６で算出された３次元座標値を２次元に逆写像
したものであり、その値は１フレーム検出される度に画
像処理装置１４にフィードバックされている。Note that the past detection result is not the detection result of the image processing devices 14a to 14e but the two-dimensional coordinate value calculated by the three-dimensional coordinate calculation device 16 in step S27 and step S41 described later. The value is fed back to the image processing device 14 every time one frame is detected.

【００５５】次に、ステップＳ２１では、先のステップ
Ｓ６７（図１０）で選択された２点の手先候補点を含む
画像をカメラの視線として選択する。つまり、その画像
を撮影した２台のカメラを選択する。Next, in step S21, an image including the two hand candidate points selected in the previous step S67 (FIG. 10) is selected as the line of sight of the camera. That is, the two cameras that have captured the image are selected.

【００５６】ステップＳ２３では、特徴点の２次元座標
を３次元的に復元する。先のステップＳ２１で選択され
た２つのカメラの相対的な位置あるいは角度が求められ
れば、３角測量の原理で、ステップＳ１７で入力された
特徴点の２次元座標情報から３次元座標値を求めること
ができる。In step S23, the two-dimensional coordinates of the feature point are three-dimensionally restored. If the relative position or angle of the two cameras selected in step S21 is obtained, three-dimensional coordinate values are obtained from the two-dimensional coordinate information of the feature points input in step S17 according to the principle of triangulation. be able to.

【００５７】ステップＳ２５では、特徴点の３次元座標
値の妥当性を判断する。特徴点の３次元座標値の妥当性
は、３次元空間内における、たとえば、手先点から手先
点あるいは頭頂点から手先点までの直線距離を判断基準
にして判断される。つまり、それら直線距離の妥当性
を、初期値の手足の長さをベースにして判定する。ま
た、距離だけでなく、関節（たとえば、肩関節、肘関
節、股関節、膝関節等）の角度も判断基準となる。これ
は、人間の関節の曲がることの可能な方向あるいは角度
が決まっているためである。よって、関節の曲がってい
る方向あるいは角度からも妥当性を判断することができ
る。In step S25, the validity of the three-dimensional coordinate values of the feature points is determined. The validity of the three-dimensional coordinate values of the feature points is determined based on, for example, the straight-line distance from the hand point to the hand point or the head vertex to the hand point in the three-dimensional space. That is, the validity of the linear distances is determined based on the initial limb length. In addition, not only the distance but also the angles of the joints (for example, the shoulder joint, the elbow joint, the hip joint, the knee joint, and the like) serve as the determination criterion. This is because the direction or angle at which the human joint can bend is determined. Therefore, the validity can be determined from the direction or angle at which the joint is bent.

【００５８】ここで、妥当であると判断された場合はス
テップＳ２７へ進むが、妥当でないと判断された場合は
ステップＳ２９へ進み、再度特徴点の検出を行なう。Here, if it is determined that it is appropriate, the process proceeds to step S27. If it is determined that it is not appropriate, the process proceeds to step S29, and the feature point is detected again.

【００５９】ステップＳ２７では、３次元座標算出装置
１６によって算出した特徴点の３次元座標値を２次元に
逆写像する。つまり、３次元座標から各カメラ１２ａ〜
１２ｅにおける２次元座標に逆変換する。そして、２次
元に置換された特徴点の座標は画像処理装置１４に送信
され、次回からのステップＳ１９での予測フィルタによ
る人物の動作予測、つまり、特徴点の移動位置の予測に
利用される。それによって、予測精度の劣化を防止する
ことができる。In step S27, the three-dimensional coordinate values of the feature points calculated by the three-dimensional coordinate calculation device 16 are inversely mapped two-dimensionally. That is, each camera 12a-
The inverse transformation is performed to the two-dimensional coordinates in 12e. Then, the coordinates of the two-dimensionally replaced feature points are transmitted to the image processing device 14 and used for predicting the motion of the person by the prediction filter in the next step S19, that is, predicting the movement position of the feature points. Thereby, it is possible to prevent the prediction accuracy from deteriorating.

【００６０】そして、ステップＳ３１では、先のステッ
プＳ２３において３次元復元された特徴点の３次元座標
値を身体動作合成装置１８へ送信する。In step S31, the three-dimensional coordinate values of the feature points three-dimensionally restored in step S23 are transmitted to the body movement synthesizing device 18.

【００６１】ステップＳ２５で「妥当でない」と判断さ
れた場合は、ステップＳ２９でカラー画像を処理し、再
度特徴点を検出する（第２検出手段）。なお、カラー画
像処理の際に使用される肌色インデックスを得るため
に、この実施例では通常の画像処理の前に前処理がなさ
れている。前処理の方法は、図１３に示すフローチャー
トを用いて説明する。If it is determined in step S25 that the image is not valid, the color image is processed in step S29, and the feature points are detected again (second detecting means). In this embodiment, in order to obtain a skin color index used in color image processing, preprocessing is performed before normal image processing. The preprocessing method will be described with reference to the flowchart shown in FIG.

【００６２】図１３を参照して、まず、ステップＳ７１
では、人物の写っていない背景画像を習得する。次に、
ステップＳ７３では、その背景に人物が登場した画像を
撮影し、ステップＳ７５では、先のステップＳ３と同様
に、背景差分法によって人物画像だけ抽出する。Referring to FIG. 13, first, in step S71,
Then, the user learns a background image without a person. next,
In step S73, an image in which a person appears in the background is photographed. In step S75, only the person image is extracted by the background subtraction method as in step S3.

【００６３】ステップＳ７７では、一般に知られている
ＲＧＢ空間（ＲＧＢがそれぞれたとえば２５６階調で表
現される空間）における「肌色空間」を参考にして、人
物画像の手および顔を含むすべての肌色部分を抽出す
る。In step S77, referring to the "skin color space" in a generally known RGB space (a space in which RGB is represented by, for example, 256 gradations), all skin color portions including hands and faces of a person image are referred to. Is extracted.

【００６４】そして、ステップＳ７９では、肌色部分か
らその人物（被験者）や照明固有の肌色インデックスを
作成する。作成された肌色インデックスは、後述するス
テップＳ２９のカラー画像処理において、カラー画像か
ら人物の肌色領域を抽出するのに利用される。Then, in step S79, a flesh color index unique to the person (subject) or the lighting is created from the flesh color portion. The created skin color index is used to extract a skin color region of a person from a color image in color image processing in step S29 described below.

【００６５】次に、ステップＳ２９のカラー画像処理に
ついて、手先点の検出を例に、図１４に示すフローチャ
ートを用いて説明する。Next, the color image processing in step S29 will be described with reference to the flowchart shown in FIG.

【００６６】図１４を参照して、まず、ステップＳ８１
では、カメラ１２ａ〜１２ｅにより撮影された人物のカ
ラー画像が各画像処理装置１４ａ〜１４ｅへ入力され
る。Referring to FIG. 14, first, in step S81,
In, color images of a person taken by the cameras 12a to 12e are input to the image processing devices 14a to 14e.

【００６７】次に、ステップＳ８３では、前フレームの
検出結果から、先のステップＳ１９における予測フィル
タと同じ手法によって、当該フレームでの手先を検出で
きるであろう範囲を予測する。これは、検出領域を減少
させることによって、背景画像を可及的除去し、処理の
高速化、高精度化を図るためである。Next, in step S83, from the detection result of the previous frame, a range in which a hand in the frame can be detected is predicted by the same method as the prediction filter in the previous step S19. This is because the background image is removed as much as possible by reducing the detection area, and the processing speed and accuracy are improved.

【００６８】ステップＳ８５では、手先候補を検出す
る。手先候補は、先のステップＳ８３で予測した手先範
囲において、前処理のステップＳ７９で作成したその人
物固有の肌色インデックスと一致する領域を抽出し、そ
の領域を手先候補領域とする。In step S85, hand candidates are detected. As the hand candidate, an area that matches the person-specific skin color index created in step S79 of the preprocessing is extracted from the hand area predicted in step S83, and the area is set as a hand candidate area.

【００６９】ステップＳ８７では、ローパスフィルタ等
によって手先候補領域の２次元的なノイズを除去し、人
体の肌色領域を抽出する。つまり、検出された肌色領域
が複数存在する場合、肌色領域の面積の広い部分のみを
残し、面積の狭い部分については除去する。In step S87, a two-dimensional noise in the hand candidate area is removed by a low-pass filter or the like, and a flesh color area of the human body is extracted. That is, when there are a plurality of detected flesh-colored areas, only the widened area of the flesh-colored area is left, and the narrowed area is removed.

【００７０】そして、ステップＳ８９において、先のス
テップＳ８７で抽出された肌色領域の情報を３次元座標
算出装置１６へ送信する。Then, in step S89, the information of the skin color area extracted in the previous step S87 is transmitted to the three-dimensional coordinate calculation device 16.

【００７１】図２に戻り、先のステップＳ８７で抽出さ
れた肌色領域は、ステップＳ３３においてボリューム復
元される。つまり、それぞれのカメラ１２から抽出され
た肌色領域をすべて組み合わせた２次元領域を３次元的
な体積をもつ領域（ボリューム）として復元する。Returning to FIG. 2, the volume of the skin color area extracted in the previous step S87 is restored in step S33. That is, a two-dimensional area obtained by combining all the skin color areas extracted from the respective cameras 12 is restored as an area (volume) having a three-dimensional volume.

【００７２】次に、ステップＳ３５では、先のステップ
Ｓ８７における２次元ノイズの除去と同じ手法で、３次
元復元された肌色領域の３次元ノイズを除去する。つま
り、体積の小さい領域ボリュームを除去する。Next, in step S35, the three-dimensional noise of the three-dimensionally restored skin color region is removed by the same method as the two-dimensional noise removal in step S87. That is, an area volume having a small volume is removed.

【００７３】そして、ステップＳ３７では、復元された
肌色３次元ボリュームのうち、体積が最大のものを１組
だけ選択し、それを手先領域として決定する。Then, in step S37, of the restored three-dimensional skin color volume, only one set having the largest volume is selected and determined as the hand area.

【００７４】ステップＳ３９では、先のステップＳ３７
で選択されたボリュームすなわち手先領域の重心を検出
する。検出された重心の値は、そのボリュームにおける
３次元座標の代表値とされ、その値が手先点とされる。In step S39, the previous step S37
, The center of gravity of the selected volume, that is, the hand area is detected. The detected value of the center of gravity is set as a representative value of the three-dimensional coordinates in the volume, and the value is set as the hand point.

【００７５】ステップＳ４１では、ステップＳ２７と同
様に、３次元座標算出装置１６によって算出した特徴点
の３次元座標値を２次元に逆写像する。そして、２次元
に置換された特徴点の座標は画像処理装置１４に送信さ
れ、次回からのステップＳ１９での予測フィルタ処理に
利用される。In step S41, as in step S27, the three-dimensional coordinate values of the feature points calculated by the three-dimensional coordinate calculation device 16 are inversely mapped two-dimensionally. Then, the coordinates of the two-dimensionally replaced feature points are transmitted to the image processing device 14 and used for the prediction filter processing in the next step S19.

【００７６】そして、ステップＳ４３では、先のステッ
プＳ３９において検出された重心すなわち手先点の３次
元座標値を身体動作合成装置１８へ送信する。Then, in step S43, the three-dimensional coordinate value of the center of gravity, that is, the hand point detected in the previous step S39 is transmitted to the body movement synthesizing device 18.

【００７７】このように、この実施例では、入力カラー
画像を２値化したシルエット画像から人体の特徴点が検
出できないような場合でも、カラー画像を再度処理する
ことにより肌色領域から特徴点を検出することができ
る。したがって、人体の一部、たとえば手先が他の部
位、たとえば胴体と重なるような場合でも、特徴点を正
確に検出または推定することができ、姿勢推定の精度の
向上が期待できる。As described above, in this embodiment, even when a feature point of a human body cannot be detected from a silhouette image obtained by binarizing an input color image, the feature point is detected from the skin color region by processing the color image again. can do. Therefore, even when a part of a human body, for example, a hand overlaps another part, for example, a torso, a feature point can be accurately detected or estimated, and an improvement in the accuracy of posture estimation can be expected.

【００７８】なお、上述の実施例では、たとえば自己遮
蔽（たとえば、胴体と手とが重なるような姿勢）が生じ
ていて特徴点を検出できない場合、図２のステップＳ１
９において、カルマンフィルタ等によって特徴点を推定
するようにした。ただし、特徴点は、頭頂位置，左右手
先位置，左右肘関節位置および左右足先位置である。In the above-described embodiment, if a feature point cannot be detected due to, for example, self-shielding (for example, a posture in which the body and hand overlap), step S1 in FIG.
In 9, the Kalman filter is used to estimate feature points. However, the feature points are the parietal position, the left and right hand positions, the left and right elbow joint positions, and the left and right foot positions.

【００７９】しかしながら、いわゆるシルエット画像に
基づいて特徴点を推定する方法では、自己遮蔽の問題を
根本的には解決できない。そこで、以下には、このステ
ップＳ１９でのカルマンフィルタによる特徴点の推定を
確実に実行できる新規な手法を説明する。However, the method of estimating a feature point based on a so-called silhouette image cannot fundamentally solve the problem of self-shielding. Therefore, a new method that can reliably execute the feature point estimation by the Kalman filter in step S19 will be described below.

【００８０】図１５がこの発明の他の実施例の動作の一
部を示し、この実施例では、図２のステップＳ３−Ｓ１
３までの手順に従った特徴点検出処理（ステップＳ２）
と、フレーム間差分を利用する特徴点検出処理とを平行
して実行する。そして、ステップＳ２での特徴点検出の
信頼性が高い場合、具体的には図１０での得点が高い場
合には、この処理によって得られた座標を各特徴点の観
測結果として採用し、この観測結果を用いて、ステップ
Ｓ１０１のカルマンフィルタの更新、観測データのフィ
ルタリング処理、および未来フレームにおけ各特徴点の
位置を予測する。逆に、ステップＳ２での特徴点の信頼
性すなわち得点が低い場合には、差分画像によって検出
された座標を特徴点の観測結果とし、それを用いてカル
マンフィルタの更新、観測データのフィルタリング処
理、および未来フレームにおけ各特徴点の位置を予測す
る。FIG. 15 shows a part of the operation of another embodiment of the present invention. In this embodiment, steps S3-S1 of FIG.
Feature point detection processing according to procedures up to 3 (step S2)
And a feature point detection process using an inter-frame difference are executed in parallel. If the reliability of the feature point detection in step S2 is high, specifically, if the score in FIG. 10 is high, the coordinates obtained by this processing are adopted as the observation result of each feature point. Using the observation result, the Kalman filter is updated in step S101, the observation data is filtered, and the position of each feature point in the future frame is predicted. Conversely, if the reliability of the feature point, that is, the score is low in step S2, the coordinates detected by the difference image are used as the observation result of the feature point, and the Kalman filter is updated using the result, and the filtering process of the observation data is performed. Predict the position of each feature point in the future frame.

【００８１】詳しく説明すると、図１５のステップＳ９
１では、時間的に隣り合うフレーム画像間の差分を計算
する。この場合、全画像領域について差分計算すると膨
大な時間がかかるため、図１６に示すように、差分計算
のための窓領域を設定する。この窓領域の設定のため
に、先のフレーム中において実行したステップＳ１０１
でのカルマンフィルタを用いた予測結果を利用する。つ
まり、図１６に示す予測位置を第ｋフレームのステップ
Ｓ１０１で特徴点として推定した場合、第（ｋ＋１）フ
レームにおいて、この予測位置を中心として所定の大き
さの差分計算領域（窓）を設定することにより、計算の
効率化を図る。More specifically, step S9 in FIG.
In step 1, the difference between temporally adjacent frame images is calculated. In this case, it takes an enormous amount of time to calculate the difference for all the image areas. Therefore, as shown in FIG. 16, a window area for the difference calculation is set. Step S101 executed in the previous frame for setting this window area
The prediction result using the Kalman filter is used. That is, when the predicted position shown in FIG. 16 is estimated as a feature point in step S101 of the k-th frame, a difference calculation area (window) of a predetermined size is set around the predicted position in the (k + 1) -th frame. In this way, the efficiency of calculation is improved.

【００８２】そして、ステップＳ９１では、第ｋフレー
ムの窓領域内の画像と、第（ｋ＋１）フレームの窓領域
内の画像との差分を計算する。実際の計算に当っては、
図１７に示すように、窓領域を拡大表示し、その拡大表
示した窓領域について差分を計算する。フレーム間差分
を計算した後、その差分画像を２値化し、図１７に示す
ように２値化したフレーム間差分画像を得る。In step S91, the difference between the image in the window area of the k-th frame and the image in the window area of the (k + 1) -th frame is calculated. In the actual calculation,
As shown in FIG. 17, the window area is enlarged and displayed, and a difference is calculated for the enlarged and displayed window area. After calculating the inter-frame difference, the difference image is binarized to obtain a binarized inter-frame difference image as shown in FIG.

【００８３】その後、ステップＳ９５において、２値化
差分画像を４角形に近似するのであるが、たとえば、特
に、手先が胴体の前面にあって自己遮蔽を生じていてか
つ手先と胴体とが同時に動いているような場合には、差
分計算によっても手先だけの差分画像を抽出することは
困難であるので、ステップＳ９３で、肌色領域のマスク
をかける。具体的には、図１８に示すように、カラー画
像から予め獲得していた肌色情報に基づいて、肌色領域
を抽出し、ステップＳ９１で得た差分画像に対して、肌
色マスクをかけることによって、図１８に示すように、
手先画像を抽出する。Thereafter, in step S95, the binarized difference image is approximated to a quadrangle. For example, in particular, the hand is in front of the torso and self-shielding occurs, and the hand and the torso move simultaneously. In such a case, since it is difficult to extract the difference image of only the hand even by the difference calculation, in step S93, the skin color area is masked. Specifically, as shown in FIG. 18, a skin color region is extracted based on the skin color information obtained in advance from the color image, and a skin color mask is applied to the difference image obtained in step S91. As shown in FIG.
Extract the hand image.

【００８４】次いで、ステップＳ９５において、上述の
ようにして２値化しまたは抽出した差分画像を、たとえ
ば図１７に示すように、４角形に近似する。具体的に
は、垂直方向および水平方向の座標がそれぞれ最大また
は最小となる点を求め、その点で４角形領域を求める。Next, in step S95, the difference image binarized or extracted as described above is approximated to a quadrangle, for example, as shown in FIG. Specifically, a point where the coordinates in the vertical direction and the horizontal direction are maximum or minimum is determined, and a quadrangular area is determined at that point.

【００８５】そして、ステップＳ９７において、ステッ
プＳ９５で近似した４角形のすべての頂点について、第
ｋフレームで特定した特徴点との間の距離をそれぞれ計
算し、その距離が最も小さい４角形の頂点をステップＳ
９９で第（ｋ＋１）フレームでの特徴点であるとして検
出する。そして、ステップＳ１０１のカルマンフィルタ
処理に移る。このカルマンフィルタによる特徴点の推定
方法は、たとえば有本卓著「カルマン・フィルター」昭
和５２年産業図書刊等に詳しいが、以下には、この発明
に必要な範囲で、カルマンフィルタによる特徴点の推定
方法について説明する。１．ＡＲモデルによる特徴点時
刻歴のモデル化特徴点Ｐｓ（ｓ＝頭頂，右手先，左手
先，右足先，左足先，右肘関節および左肘関節）の位置
座標［ｘＰｓ（ｋ）ｙＰｓ（ｋ）］の時刻歴が数３のＡ
Ｒモデルで与えられると仮定する。Then, in step S97, the distance between all the vertices of the quadrangle approximated in step S95 and the feature point specified in the k-th frame is calculated, and the vertex of the quadrangle having the smallest distance is calculated. Step S
At 99, it is detected as a feature point in the (k + 1) th frame. Then, the process proceeds to the Kalman filter process in step S101. The method of estimating feature points using the Kalman filter is described in detail in, for example, "Kalman Filter" written by Taku Arimoto and published in Sangyo Tosho in 1977. explain. 1. Modeling of feature point time history by AR model Position coordinates [xPs (k) yPs (k) of feature points Ps (s = top, right hand, left hand, right foot, left foot, right elbow joint and left elbow joint) ] Has a time history of 3
Assume that given by the R model.

【００８６】[0086]

【数３】 (Equation 3)

【００８７】ただし、数３の変数ζ（ｋ）はｘ座標を対
象とした場合ｘＰｓ（ｋ）であり、ｙ座標の場合は、ｙ
Ｐｓ（ｋ）である。ε（ｋ）はＮ（０，Ｒ）のガウス性
白色雑音、ａｉは未知係数である。未知係数を状態ベク
トルとすると、状態方程式は、数４で表わすことができ
る。However, the variable ζ (k) in Expression 3 is xPs (k) when the x coordinate is targeted, and yP when the y coordinate is used.
Ps (k). ε (k) is Gaussian white noise of N (0, R), and ai is an unknown coefficient. Assuming that the unknown coefficient is a state vector, the state equation can be expressed by Equation 4.

【００８８】[0088]

【数４】 (Equation 4)

【００８９】観測方程式は、数３から、数５で与えられ
る。The observation equation is given by Expression 5 from Expression 3.

【００９０】[0090]

【数５】 (Equation 5)

【００９１】２．カルマンフィルタによる状態ベクトル
の推定数４および数５のシステムにカルマンフィルタを適用し
て未知係数を逐次推定する。ただし、ｃ（ｋ）が時変で
あるので、カルマンフィルタは数６のようになる。2. Estimation of State Vector by Kalman Filter An unknown coefficient is sequentially estimated by applying a Kalman filter to the systems of Equations 4 and 5. However, since c (k) is time-varying, the Kalman filter is as shown in Equation 6.

【００９２】[0092]

【数６】 (Equation 6)

【００９３】以下に、カルマンフィルタの計算手順を示
す。The calculation procedure of the Kalman filter will be described below.

【００９４】(1) まず、数７の初期値を設定する。(1) First, an initial value of Expression 7 is set.

【００９５】[0095]

【数７】 (Equation 7)

【００９６】(2) 続いて、観測値ｘＰｓ（ｋ）（ｙＰｓ
（ｋ），ｚＰｓ（ｋ））を取り込み、数６の（4）式の
右辺のψとして用いる。(2) Subsequently, the observed value xPs (k) (yPs
(K), zPs (k)) is taken and used as ψ on the right side of equation (4) of equation (6).

【００９７】(3) そして、カルマンフィルタを計算す
る。つまり、Ｍ（ｋ）（数６の(7)式）、推定誤差の共
分散Ｐ（ｋ）（数６の(6)式）、カルマンゲインＫ
（ｋ）（数６の(5)式）、および状態ベクトルξ
^∧（ｋ）（数６の(4)式）を順次計算する。３．フィル
タリングと未来予測時刻ｋにおける推定結果係数ａｉを
数３に代入し、観測結果ζ（ｋ）に対するフィルタリン
グを行う。フィルタリング結果を時間を１ステップずつ
ずらせて数３に代入して、特徴点のｄステップ未来（ｄ
＝１，２，…）の座標位置を予測する。(3) Then, a Kalman filter is calculated. That is, M (k) (Equation (7)), the covariance P (k) of the estimation error (Equation (6)), and the Kalman gain K
(K) (equation (5)) and state vector 状態
^順次 (k) (Equation 6 (4)) is sequentially calculated. 3. The filtering and the estimation result coefficient ai at the future predicted time k are substituted into Equation 3 to perform filtering on the observation result ζ (k). By substituting the filtering result by time by one step and substituting it into Equation 3, d-step future (d
= 1, 2, ...).

【００９８】このようにして、ステップＳ１０１で、カ
ルマンフィルタ処理を実行して、最終的に、特徴点を推
定する。したがって、自己遮蔽が起こっているような場
合でも、人物画像の各特徴点を性格に求めることができ
るので、人物の姿勢を非接触でかつ性格に把握すること
ができる。As described above, in step S101, the Kalman filter process is executed, and finally the feature points are estimated. Therefore, even when self-occlusion occurs, each feature point of the person image can be obtained in a personality, and the posture of the person can be grasped in a non-contact and personal manner.

[Brief description of the drawings]

【図１】この発明の一実施例である仮想変身装置を示す
ブロック図であるFIG. 1 is a block diagram showing a virtual transformation device according to an embodiment of the present invention.

【図２】この実施例の全体の動作を説明するためのフロ
ーチャートである。FIG. 2 is a flowchart for explaining the overall operation of this embodiment.

【図３】身体の２値画像を示す図解図である。FIG. 3 is an illustrative view showing a binary image of a body;

【図４】身体の重心を示す図解図である。FIG. 4 is an illustrative view showing a center of gravity of a body;

【図５】身体の上半身主軸を示す図解図である。FIG. 5 is an illustrative view showing a main body main axis of the body;

【図６】身体の輪郭線を示す図解図である。FIG. 6 is an illustrative view showing a contour line of a body;

【図７】身体の特徴点を示す図解図である。FIG. 7 is an illustrative view showing characteristic points of a body;

【図８】身体の特徴点を検出する方法を説明するための
フローチャートである。FIG. 8 is a flowchart illustrating a method for detecting a feature point of a body.

【図９】手先点を検出する方法を説明するための図解図
である。FIG. 9 is an illustrative view for explaining a method of detecting a hand point;

【図１０】手先候補点の評価方法を説明するためのフロ
ーチャートである。FIG. 10 is a flowchart illustrating a method for evaluating a hand candidate point.

【図１１】頭頂点を検出する方法を説明するための図解
図である。FIG. 11 is an illustrative view for explaining a method of detecting a top vertex;

【図１２】肘位置を推定する方法を説明するための図解
図である。FIG. 12 is an illustrative view for explaining a method of estimating an elbow position;

【図１３】前処理の方法を説明するためのフローチャー
トである。FIG. 13 is a flowchart illustrating a pre-processing method.

【図１４】カラー画像処理の方法を説明するためのフロ
ーチャートである。FIG. 14 is a flowchart illustrating a color image processing method.

【図１５】この発明の他の実施例の動作を示すフローチ
ャートである。FIG. 15 is a flowchart showing the operation of another embodiment of the present invention.

【図１６】フレーム間差分画像を計算する際の窓領域を
示す図解図である。FIG. 16 is an illustrative view showing a window area when calculating an inter-frame difference image;

【図１７】フレーム間差分画像から特徴点を特定するま
での手順を示す図解図である。FIG. 17 is an illustrative view showing a procedure until a feature point is specified from an inter-frame difference image;

【図１８】肌色マスク処理を示す図解図である。FIG. 18 is an illustrative view showing a skin color mask process;

[Explanation of symbols]

１０ …仮想変身装置１２ａ〜１２ｅ …カラーカメラ１４ａ〜１４ｅ …画像処理装置１６ …３次元座標算出装置１８ …身体動作合成装置 DESCRIPTION OF SYMBOLS 10 ... Virtual transformation apparatus 12a-12e ... Color camera 14a-14e ... Image processing apparatus 16 ... 3D coordinate calculation apparatus 18 ... Body motion synthesis apparatus

───────────────────────────────────────────────────── フロントページの続き (72)発明者坂口竜己京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール知能映像通信研究所内 (72)発明者大谷淳京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール知能映像通信研究所内Ｆターム(参考） 2F065 AA17 AA37 CC16 FF04 JJ03 JJ05 JJ26 QQ04 QQ13 QQ31 QQ32 QQ33 QQ45 4C038 VA04 VA20 VB01 VC05 5B057 BA02 BA19 CA01 CA08 CA12 CA16 CE06 CE12 CE16 DA07 DB02 DB03 DB06 DB09 DC06 DC08 DC16 DC25 DC32 5L096 AA02 CA05 EA43 FA06 FA15 FA60 FA63 FA66 FA67 GA08 GA38 GA55 HA01 9A001 GG03 HH20 HH25 HH27 HH28 HH29 HH31 KK37 ──────────────────────────────────────────────────の Continuing on the front page (72) Inventor Tatsumi Sakaguchi 5th Sanraya, Inaya small character, Seika-cho, Soraku-gun, Kyoto Pref. 5F, Sanraya, Seiya-cho, Seika-cho, Soraku-gun F-term (reference) 2F065 AA17 AA37 CC16 FF04 JJ03 JJ05 JJ26 QQ04 QQ13 QQ31 QQ32 QQ33 QQ45 4C038 VA04 VA20 V57B02 5 BA19 CA01 CA08 CA12 CA16 CE06 CE12 CE16 DA07 DB02 DB03 DB06 DB09 DC06 DC08 DC16 DC25 DC32 5L096 AA02 CA05 EA43 FA06 FA15 FA60 FA63 FA66 FA67 GA08 GA38 GA55 HA01 9A001 GG03 HH20 HH25 HH27 HH28 HH29 HH31 KK37

Claims

[Claims]

1. A method for detecting a feature point of a body based on a plurality of color images photographed from a plurality of viewpoints using a plurality of color cameras, and estimating a three-dimensional posture of a person from the feature points. (a) binarizing the color image to obtain a binary image, (b) detecting the feature point from the binary image, (c) determining the validity of the feature point, and (d) A method for estimating the three-dimensional posture of a person, comprising processing the color image if it is not appropriate to detect a feature point.

2. The method according to claim 1, wherein said step (b) includes a step (b1) of predicting a feature point by a prediction filter.

3. The method according to claim 1, wherein the step (b) comprises detecting feature point candidates for each of the plurality of color images.
The method according to claim 1 or 2, comprising: (b) selecting a viewpoint by evaluating each feature point candidate.

4. The method according to claim 1, wherein the step (c) includes a step (c1) of determining validity based on at least a distance between the feature points.

5. The method according to claim 4, wherein in said step (c1), said validity is determined by further considering a joint angle.

6. The method according to claim 1, wherein the step (d) includes a step (d1) of extracting a flesh color area from the color image and a step (d2) of detecting a feature point from the flesh color area. The method described in Crab.

7. The method according to claim 6, wherein in the step (d1), a two-dimensional coordinate group is restored to a three-dimensional volume, and a three-dimensional volume having a maximum volume is selected as a skin color area.

8. A step of (a) binarizing a plurality of color images photographed from a plurality of viewpoints using a plurality of color cameras to obtain a binary image, and (b) obtaining the feature points from the binary image. Detecting, (c) determining the validity of the feature point, (d) if not, processing the color image to detect the feature point, and (e) according to the detected feature point A recording medium storing a program for causing a computer to execute a step of estimating a three-dimensional posture.

9. A plurality of color cameras for photographing a person from different directions and outputting a color image, binarizing means for binarizing the color image and outputting a binary image, First detection means for detecting a feature point; validity determination means for determining the validity of the feature point; and second detection for processing the color image and detecting the feature point when the feature point is not valid. An apparatus for estimating a three-dimensional posture of a person, comprising:

10. The apparatus of claim 9, wherein the plurality of color cameras are arranged to photograph a person from the front, side, and elevation of the person and at least one facing direction.

11. The apparatus of claim 9, wherein the plurality of color cameras are arranged to photograph the person from a front, side, elevation, and at least one intermediate position of the person.

12. The apparatus according to claim 9, wherein said second detecting means includes a flesh color area extracting means for extracting a flesh color area from said color image, and wherein said characteristic point is detected based on said flesh color area. An apparatus according to claim 1.

13. A method for extracting a feature point from color image data measured by a color camera and estimating a three-dimensional posture of a person according to the feature point, comprising: (a) forming a time difference image from the color image data; (B) forming an image in which a flesh color region is extracted from the color image data; (c) extracting a feature point of a person based on the difference image data and the image data in which the flesh color region is extracted. ,
And (d) performing a filtering process using a Kalman filter and a future position prediction on the feature points,
3D pose estimation method.

14. The method according to claim 14, further comprising the step of: (e) forming a silhouette image of a person from the color image data to extract feature points of the person;
(c) or processing the feature points obtained in step (e),
The three-dimensional posture estimation method according to claim 13.

15. A step of: (a) forming a time difference image from the color image data; (b) forming an image in which a skin color region is extracted from the color image data; and (c) forming the difference image data and the skin color. Extracting a feature point of the person based on the image data from which the region is extracted, (d) performing a filtering process using a Kalman filter and a future position prediction on the feature point, and (e) detecting the feature point. A recording medium recording a program for executing a step of estimating a three-dimensional posture of a person.