JP3144400B2

JP3144400B2 - Gesture recognition device and method

Info

Publication number: JP3144400B2
Application number: JP32303998A
Authority: JP
Inventors: 裕一仁野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-11-13
Filing date: 1998-11-13
Publication date: 2001-03-12
Anticipated expiration: 2018-11-13
Also published as: JP2000149025A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ジェスチャ認識装
置に関し、特に、自由に位置／姿勢が変化するユーザの
ジェスチャを認識し、仮想空間内のＣＧ（コンピュータ
グラフィック）物体の操作等に用いて好適とされるジェ
スチャ認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a gesture recognition device, and more particularly to a gesture recognition device that recognizes a user's gesture whose position / posture changes freely and is suitable for use in operating a CG (computer graphic) object in a virtual space. And a gesture recognition device.

【０００２】[0002]

【従来の技術】ジェスチャ認識装置として、これまで、
いくつかの手法が提案・開発されており、このうち典型
的な手法について概説すると、例えば文献（１）（特開
平６−１２４９３号公報）には、マウスやペンの時系列
の動きからＤＰ（ダイナミックプログラミング；動的計
画法）マッチングを用いてジェスチャを検出し、該ジェ
スチャに対応したコマンドを実行させるコンピュータの
ユーザインタフェースが提案されており、また文献
（２）（特開平１０−１６２１５１号公報）等には、被
写体のジェスチャーを撮像した画像から時間差分画像を
取得して２値化し、差分画像の特徴量から、連続ＤＰを
用いて特徴パターン用の特徴ベクトルを取得する、ジェ
スチャ認識方法が提案されている。また文献（３）（電
子情報通信学会誌、１９９３年８月号、１８０５−１８
１２頁）には、石井らによる、ステレオ画像を用いて３
次元的な手の動きを抽出し、動作モデルとのマッチング
によりジェスチャを検出する方法が記載されている。2. Description of the Related Art As a gesture recognition apparatus,
Several methods have been proposed and developed. Among them, a typical method is briefly described. For example, in Document (1) (Japanese Patent Laid-Open No. 6-12493), a DP ( A user interface of a computer for detecting a gesture using matching (dynamic programming; dynamic programming) and executing a command corresponding to the gesture has been proposed, and also a document (2) (Japanese Patent Application Laid-Open No. H10-162151). For example, a gesture recognition method is proposed in which a time difference image is acquired from an image of a gesture of a subject, binarized, and a feature vector for a feature pattern is acquired from the feature amount of the difference image using continuous DP. Have been. Reference (3) (Journal of the Institute of Electronics, Information and Communication Engineers, August 1993, 1805-18)
12), Ishii et al.
A method of extracting a dimensional hand movement and detecting a gesture by matching with a motion model is described.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記し
た従来の各種方法は、以下のような問題点を有してい
る。However, the above-mentioned various conventional methods have the following problems.

【０００４】すなわち、上記文献１（１）、（２）に記
載された方法は、手の２次元的な動きからジェスチャを
推定している。したがって、ジェスチャ認識には、ユー
ザと、ユーザを撮影するカメラと、の位置関係が限定
（拘束）されてしまうことになり、このため、ユーザは
自由に位置／姿勢を変化することができない。That is, the methods described in the above-mentioned references 1 (1) and (2) estimate a gesture from a two-dimensional movement of a hand. Therefore, in the gesture recognition, the positional relationship between the user and the camera that captures the user is limited (restricted), and the user cannot freely change the position / posture.

【０００５】また、上記文献（３）に記載された方法
は、ユーザが自由な位置及び姿勢変化を行なうことはで
きるものの、モデルが複雑すぎ、高速な処理が困難であ
る。In the method described in the above-mentioned reference (3), although the user can freely change the position and posture, the model is too complicated and high-speed processing is difficult.

【０００６】したがって本発明は、上記問題点に鑑みて
なされたものであって、その目的は、ユーザが位置及び
姿勢変化を行なっても、高速に認識可能なジェスチャ認
識装置及び方法を提供することにある。SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and an object of the present invention is to provide a gesture recognition apparatus and a method capable of performing high-speed recognition even when a user changes a position and a posture. It is in.

【０００７】[0007]

【課題を解決するための手段】前記目的を達成する本発
明のジェスチャ認識装置は、ユーザに装着したセンサの
出力、又は画像情報から、前記ユーザの手足の動きの特
徴を示す点の空間中の３次元座標を計測する特徴点位置
計測手段と、前記抽出された特徴点のうち前記ユーザの
手以外の特徴点の３次元座標から、前記ユーザの３次元
位置及び姿勢を推定する人物位置姿勢推定手段と、前記
推定された前記ユーザの３次元位置を基準点として、前
記ユーザの姿勢に応じて各座標軸が一意に定められるユ
ーザ座標系を構成し、前記ユーザの手足の特徴点を前記
ユーザ座標系に変換することにより、前記ユーザの位置
及び姿勢に依存しない特徴量を抽出し、前記ユーザの位
置及び姿勢に依存しない前記特徴量の時系列変化を解析
する人物動作解析手段と、前記時系列変化と、予め作成
されたジェスチャモデルを記憶するジェスチャモデル記
憶部に格納されたジェスチャモデルとのマッチングをと
り、前記ユーザの行なったジェスチャを推定するジェス
チャ推定手段と、を含む。本発明において、前記ジェス
チャ推定手段において、絶対値が同じであり、互いに逆
極性の２つの閾値を設け、前記特徴量の１つの時系列変
化について、その値が正の閾値よりも大きくなるフレー
ムと、負の閾値よりも小さくなるフレームとが交互に存
在し、前記フレームが、所定時間に、予め定められた所
定の割合よりも多く存在する場合に、反復動作のジェス
チャであると推定する、ように構成してもよい。According to the present invention, there is provided a gesture recognition apparatus which achieves the above object, comprising, based on an output of a sensor attached to a user or image information, a point in a space indicating a characteristic of a limb movement of the user. Feature point position measurement means for measuring three-dimensional coordinates; and a person position and orientation estimation for estimating the user's three-dimensional position and attitude from the three-dimensional coordinates of feature points other than the user's hand among the extracted feature points. Means, and a user coordinate system in which each coordinate axis is uniquely determined according to the posture of the user, using the estimated three-dimensional position of the user as a reference point, and the characteristic points of the limbs of the user are defined in the user coordinates. A human motion analysis for extracting a feature amount independent of the position and orientation of the user by converting the system into a system and analyzing a time-series change of the feature amount independent of the position and orientation of the user And a gesture estimating means for matching the time-series change with a gesture model stored in a gesture model storage unit for storing a gesture model created in advance, and estimating a gesture performed by the user. . In the present invention, in the gesture estimating means, two thresholds having the same absolute value and opposite polarities are provided, and for one time-series change of the feature amount, a frame whose value is larger than a positive threshold is determined. It is assumed that a frame that is smaller than a negative threshold value is alternately present, and that the frame is a gesture of a repetitive operation when the frame is present at a predetermined time and more than a predetermined ratio. May be configured.

【０００８】[0008]

【発明の実施の形態】本発明の実施の形態について以下
に説明する。本発明のジェスチャ認識装置は、その好ま
しい実施の形態において、特徴点位置計測手段（図１の
１）と、人物位置姿勢推定手段（図１の２）と、人物動
作解析手段（図１の３）と、ジェスチャ推定手段（図１
の４）と、ジェスチャモデル記憶手段（図１の５）と、
を備えて構成されている。このうち、特徴点位置計測手
段（図１の１）は、ユーザの手足などの特徴点の世界座
標系における位置を計測する。Embodiments of the present invention will be described below. In a preferred embodiment, the gesture recognition device of the present invention has a feature point position measuring means (1 in FIG. 1), a person position and orientation estimating means (2 in FIG. 1), and a person motion analyzing means (3 in FIG. 1). ) And gesture estimating means (FIG. 1)
4), gesture model storage means (5 in FIG. 1),
It is provided with. Among them, the feature point position measuring means (1 in FIG. 1) measures the position of the feature point such as the limb of the user in the world coordinate system.

【０００９】人物位置姿勢推定手段（図１の２）は、ユ
ーザの手以外の、ある特徴点の３次元座標からユーザの
３次元位置／姿勢を推定する。The person position / posture estimating means (2 in FIG. 1) estimates the user's three-dimensional position / posture from the three-dimensional coordinates of a certain feature point other than the user's hand.

【００１０】人物動作解析手段（図１の３）は、推定さ
れたユーザの３次元位置の一つの点を基準点とし、且
つ、ユーザ姿勢に応じて各座標軸が一意に定まるユーザ
座標系を構成し、ユーザの手足の特徴点をユーザ座標系
に変換することにより、ユーザの位置／姿勢に依存しな
い特徴量を抽出し、さらにその時系列変化を解析する。The human motion analysis means (3 in FIG. 1) forms a user coordinate system in which one point of the estimated three-dimensional position of the user is used as a reference point and each coordinate axis is uniquely determined according to the user posture. Then, by converting the feature points of the limbs of the user to the user coordinate system, feature amounts independent of the position / posture of the user are extracted, and their time-series changes are analyzed.

【００１１】ジェスチャ推定手段（図１の４）は、上記
特徴点の時系列変化と、事前に作成されジェスチャモデ
ル記憶手段（図１の５）に格納されているジェスチャモ
デルとのマッチングをとり、ユーザの行なったジェスチ
ャを推定する。The gesture estimating means (4 in FIG. 1) matches the time-series change of the feature point with a gesture model created in advance and stored in the gesture model storing means (5 in FIG. 1). Estimate the gesture performed by the user.

【００１２】また、本発明のジェスチャ認識装置は、そ
の好ましい実施の形態において、ジェスチャ推定手段に
おいて、上記特徴量の１つの時系列変化を見たとき、そ
の値が非常に大きくなるフレームと非常に小さくなるフ
レームとが交互に存在し、フレームが短時間に多く存在
する場合に、反復動作のジェスチャであるものと推定す
る。In a preferred embodiment of the gesture recognition apparatus according to the present invention, when the gesture estimating means sees a time-series change of one of the feature values, the frame whose value becomes very large is regarded as a frame. When the smaller frames are alternately present and many frames are present in a short time, it is estimated that the gesture is a repetitive gesture.

【００１３】また本発明のジェスチャ認識方法は、以下
のステップよりなる。Further, the gesture recognition method of the present invention comprises the following steps.

【００１４】ステップ１：ユーザに装着したセンサの出
力情報、又はユーザを撮像した画像情報から、ユーザの
手足の動きの特徴を示す点の空間中の３次元座標を計測
する。Step 1: Three-dimensional coordinates in the space of points indicating characteristics of limb movement of the user are measured from output information of a sensor worn by the user or image information of the user.

【００１５】ステップ２：前記抽出された特徴点のうち
前記ユーザの手以外の特徴点の３次元座標から、前記ユ
ーザの３次元位置及び姿勢を推定する。Step 2: Estimating the three-dimensional position and posture of the user from the three-dimensional coordinates of the feature points other than the hand of the user among the extracted feature points.

【００１６】ステップ３：前記推定された前記ユーザの
３次元位置の一つを基準点として、前記ユーザの姿勢に
応じて各座標軸が一意に定められるユーザ座標系を構成
し、前記ユーザの手足の特徴点を前記ユーザ座標系に変
換することにより、前記ユーザの位置及び姿勢に依存し
ない特徴量を抽出し、前記ユーザの位置及び姿勢に依存
しない前記特徴量の時系列変化を解析する。Step 3: Using one of the estimated three-dimensional positions of the user as a reference point, a user coordinate system in which each coordinate axis is uniquely determined in accordance with the posture of the user is formed. By converting the feature points into the user coordinate system, a feature amount independent of the position and orientation of the user is extracted, and a time-series change of the feature amount independent of the position and orientation of the user is analyzed.

【００１７】ステップ４：前記時系列変化と、予め作成
されたジェスチャモデルを記憶するジェスチャモデル記
憶部に格納されたジェスチャモデルとのマッチングをと
り、前記ユーザの行なったジェスチャを推定する。Step 4: The time series change is matched with a gesture model stored in a gesture model storage unit that stores a gesture model created in advance, and a gesture performed by the user is estimated.

【００１８】上記ステップ１〜５の各ステップは、デー
タ処理装置で実行されるプログラム制御によってその処
理機能を実現するようにしてもよく、これらの手段の機
能・処理を実現するプログラムを記憶した記憶媒体から
データ処理装置が、該プログラムを読み出して実行する
ことで、本発明を実施することができる。以下実施例に
即して詳説する。In each of the above steps 1 to 5, the processing functions may be realized by program control executed by the data processing apparatus, and a storage for storing a program for realizing the functions and processing of these means is stored. The present invention can be implemented when the data processing device reads out and executes the program from the medium. Hereinafter, the present invention will be described in detail with reference to examples.

【００１９】[0019]

【実施例】図１は、本発明のジェスチャ認識装置の一実
施例の装置構成を示すブロック図である。図１を参照す
ると、本発明の一実施例は、特徴点位置計測部１と、人
物位置姿勢推定部２と、人物動作解析部３と、ジェスチ
ャ推定部４と、ジェスチャ記憶部５とを備えて構成され
ており、このうち特徴点位置計測部１では、人物の手／
足／胴体の特徴点の一部又は全てに装着した磁気センサ
を利用して、これらの特徴点について例えば計測機器に
よって決定される世界座標系における３次元座標を求め
る。FIG. 1 is a block diagram showing the configuration of a gesture recognition apparatus according to an embodiment of the present invention. Referring to FIG. 1, one embodiment of the present invention includes a feature point position measurement unit 1, a person position and posture estimation unit 2, a person motion analysis unit 3, a gesture estimation unit 4, and a gesture storage unit 5. The feature point position measuring unit 1 includes a human hand /
Using a magnetic sensor attached to part or all of the characteristic points of the foot / torso, three-dimensional coordinates in the world coordinate system determined by, for example, a measuring device are determined for these characteristic points.

【００２０】この場合、計測された特徴点の位置は、計
測機器の基準位置を原点とした世界座標系で記述された
め、ユーザが同一のジェスチャを行っても、ユーザの位
置／向きが異なると、計測される位置は、異なる値をと
ることになる。そこで、本発明の一実施例では、人物位
置姿勢推定部２において、ユーザの位置／姿勢を推定
し、人物動作解析部３において、ユーザの位置／姿勢に
影響しない特徴量を構成する。In this case, the positions of the measured feature points are described in the world coordinate system with the origin being the reference position of the measuring device. Therefore, even if the user makes the same gesture, if the user's position / orientation is different. The measured position will take different values. Therefore, in one embodiment of the present invention, the position / posture of the user is estimated by the person position / posture estimating unit 2, and a feature amount that does not affect the position / posture of the user is configured by the human motion analysis unit 3.

【００２１】まず、人物位置姿勢推定部２の動作を説明
するが、説明の前提として、ユーザの位置／姿勢の推定
においては、ユーザの手以外に、２つ以上の特徴点の世
界座標系における３次元位置が分かっている必要があ
る。これを、図２の説明図を用いて説明する。図２にお
いて、鉛直上向きが世界座標系でのある軸上と一致する
ことが前提とされており、姿勢は、鉛直上向きを軸（ｙ
軸）と回転中心とした回転動作のみで表されるものとす
る。First, the operation of the person position / posture estimating unit 2 will be described. As a premise of the description, in estimating the position / posture of the user, in addition to the user's hand, two or more feature points in the world coordinate system are used. The three-dimensional position needs to be known. This will be described with reference to the explanatory diagram of FIG. In FIG. 2, it is assumed that the vertical upward direction coincides with a certain axis in the world coordinate system, and the posture is such that the vertical upward direction corresponds to the axis (y
Axis) and a rotation operation about the rotation center.

【００２２】図２に示した例では、ユーザ（人物）の位
置として、右膝の位置、姿勢は、右膝から左膝を結んだ
ベクトル（図のｘ軸方向）、および鉛直上向きへのベク
トル（図のｙ軸方向）に対して、垂直な方向ということ
で求まる。In the example shown in FIG. 2, as the position of the user (person), the position and posture of the right knee are a vector connecting the right knee to the left knee (x-axis direction in the figure) and a vector pointing vertically upward. (In the y-axis direction in the figure).

【００２３】人物動作解析部３において、ユーザの手足
の特徴点について、その位置をユーザの向き／位置に依
存しないユーザ座標系の値に変換する。この座標系は、
例えば図２に示すように、ユーザの右膝の３次元位置を
基準点とし、鉛直上向きをｙ軸の正方向、ユーザの向き
の反対方向（正面から背面方向）をｚ軸の正方向とした
左手系である。なお、これとは別に、右手系としてもよ
いが、説明の都合上、左手系で説明する。The person motion analysis unit 3 converts the positions of the characteristic points of the limbs of the user into values in a user coordinate system independent of the direction / position of the user. This coordinate system is
For example, as shown in FIG. 2, the three-dimensional position of the right knee of the user is set as a reference point, the vertically upward direction is defined as the positive direction of the y-axis, and the opposite direction (front to back) is defined as the positive direction of the z-axis. Left-handed. Note that, separately from this, a right-handed system may be used, but for convenience of explanation, a left-handed system will be described.

【００２４】このユーザ座標系で記述すると、例えば上
下左右の位置指示の動作が、ｘ／ｙの座標のみから推定
することができ、ジェスチャの推定には、都合がよい。
さらにこのようにして変換した値の、時系列変化も求め
ておく。When described in the user coordinate system, for example, the operation of pointing up, down, left, and right can be estimated from only x / y coordinates, which is convenient for estimating a gesture.
Further, a time-series change of the value converted in this way is also obtained.

【００２５】その後、ジェスチャ推定部４では、ジェス
チャモデル記憶部５のジェスチャモデルの値を参照しな
がら、ジェスチャを推定する。After that, the gesture estimating unit 4 estimates the gesture with reference to the value of the gesture model in the gesture model storage unit 5.

【００２６】例えば、上下左右の指示動作の推定は、図
３に示すように、予めモデル化されたものを利用する。For example, as shown in FIG. 3, a pre-modeled one is used for estimating the up, down, left, and right instruction operations.

【００２７】図３において、ＰＵ（ＸＵ、ＹＵ）、ＰＤ
（ＸＤ、ＹＤ）、ＰＬ（ＸＬ、ＹＬ）、ＰＲ（ＸＲ、Ｙ
Ｒ）は、それぞれ上、下、左、右の位置指示の基準とな
る点であり、ＰＵ、ＰＤ、ＰＬ、ＰＲを中心とした４つ
の点線で囲んだ円は、各指示動作を推定する領域の範囲
である。In FIG. 3, PU (XU, YU), PD
(XD, YD), PL (XL, YL), PR (XR, Y
R) are reference points for the upper, lower, left, and right position indications, respectively, and circles surrounded by four dotted lines centering on PU, PD, PL, and PR are areas for estimating each indication action. Range.

【００２８】またｒＵ、ｒＤ、ｒＬ、ｒＲは、各領域を
円とみなした場合の円の半径である。Also, rU, rD, rL, and rR are the radii of the circles when each area is regarded as a circle.

【００２９】各フレームでは、人物動作解析部３で推定
される右手の位置と、各点とのユークリッド距離を計算
する。In each frame, the position of the right hand estimated by the person motion analysis unit 3 and the Euclidean distance between each point are calculated.

【００３０】そして、このユークリッド距離が、指示動
作を推定するための領域内に存在する場合、該当する指
示動作を行ったものと推定する。例えば、図２のユーザ
の右手と、図３におけるＰＵとのユークリッド距離をｄ
とすると、ｄ＜ｒＵのとき、上指示動作を行なったもの
と推定する。When the Euclidean distance exists in the area for estimating the instruction operation, it is estimated that the relevant instruction operation has been performed. For example, the Euclidean distance between the right hand of the user in FIG. 2 and the PU in FIG.
Then, when d <rU, it is estimated that the upward instruction operation has been performed.

【００３１】ただし、このままでは、ユーザが別の動作
をするときに、偶然、その領域を通過した場合にも、該
位置指示と誤認識されてしまうことから、この誤認識を
避けるために、予め定められた一定時間Ｔｔｈ以上右手
の位置が、各領域に留まった場合についてのみ、位置指
示を認識することとする。However, in this case, when the user performs another operation, even if the user accidentally passes through the area, the user may be erroneously recognized as the position instruction. The position indication is recognized only when the position of the right hand remains in each area for a predetermined time Tth or more.

【００３２】そして、図３の、ＸＵ、ＹＵ、ＸＤ、Ｙ
Ｄ、ＸＬ、ＹＬ、ＸＲ、ＹＲ、ｒＵ、ｒＤ、ｒＬ、ｒ
Ｒ、Ｔｔｈは、ユーザの手の長さ／感覚に依存する値で
あるため、ジェスチャモデル記憶部５に、事前に、ジェ
スチャの推定対象のユーザに応じた値を、設定記憶して
おき、ジェスチャモデル記憶部５の格納値を参照しなが
ら、ユーザのジェスチャを推定する。なお、右手でジェ
スチャを行なった例について説明したが、左手でのジェ
スチャも同様の処理で認識することができる。Then, XU, YU, XD, Y in FIG.
D, XL, YL, XR, YR, rU, rD, rL, r
Since R and Tth are values depending on the length / feeling of the user's hand, values corresponding to the user whose gesture is to be estimated are set and stored in the gesture model storage unit 5 in advance, and the gesture is stored. The user's gesture is estimated while referring to the value stored in the model storage unit 5. Although the example in which the gesture is performed with the right hand has been described, the gesture with the left hand can be recognized by similar processing.

【００３３】次に本発明の第２の実施例について説明す
る。本発明の第２の実施例は、第１の実施例において、
特徴点位置計測部１が、ステレオ画像処理を行なうもの
である。この場合、ユーザは、図４に示すように、手足
４箇所に特殊な蛍光色をしたマーカーを装着する。蛍光
色を利用するのは、光のあたり具合／マーカーとカメラ
間の距離によって色の検出精度が落ちてしまうことを防
ぐためである。なお、このマーカーの色は色空間で離れ
た赤／青／黄／緑を使用すると効果的である。また、色
を使わない方法として、例えば文献（４）（特開平９−
２４３３２５号公報）に提案されているように、特殊光
を発光するマーカーなどを、上記位置に使用してもよ
い。Next, a second embodiment of the present invention will be described. The second embodiment of the present invention is different from the first embodiment in that
The feature point position measurement unit 1 performs stereo image processing. In this case, as shown in FIG. 4, the user attaches markers with special fluorescent colors to four limbs. The use of the fluorescent color is to prevent the accuracy of color detection from being lowered due to the degree of light hit / the distance between the marker and the camera. In addition, it is effective to use red / blue / yellow / green separated in the color space for the color of the marker. As a method that does not use color, for example, reference (4) (Japanese Unexamined Patent Application Publication No.
243325), a marker or the like that emits special light may be used at the position.

【００３４】さらに、ステレオ画像処理を行なうにあた
り、被写体を撮像する左右２つののカメラの、相対的な
位置／姿勢を予め求めておく必要がある。ここでは、空
間中のそれぞれの相対位置が分かっている５点以上の点
の各画像中に撮影されている位置を利用して、例えば文
献（５）（Ｌｉｕ、「Ｄｅｔｅｒｍｉｎａｔｉｏｎｏｆ
ＣａｍｅｒａＬｏｃａｔｉｏｎｆｒｏｍ２−Ｄ
ｔｏ３−ＤＬｉｎｅａｎｄＰｏｉｎｔＣｏ
ｒｒｅｓｐｏｎｄｅｎｃｅ」、ＩＥＥＥＴｒａｎｓａ
ｃｔｉｏｎＰＡＭＩ（アイトリプルイ−トランザクシ
ョンパタンアナリシスアンドマシンインテリジェ
ンス）、１９９０年、第１２号、第２８−３７頁）に記
載されている手法により、予め求めておく。なお、この
手法で求まるのは、左カメラのレンズ中心を原点として
光軸方向をＺ軸とした左手系（図５（ｂ）参照、以下、
この座標系を「左カメラ座標系」と呼ぶ）を基準とし
て、右カメラのレンズ中心を原点として光軸方向をｚ軸
とした左手系（図５（ｃ）参照、以下、この座標系を
「右カメラ座標系」と呼ぶ）に変換する際の回転行列
Ｒ、並進ベクトルｈ￣である。Further, in performing the stereo image processing, it is necessary to obtain in advance the relative positions / postures of the two right and left cameras for imaging the subject. Here, for example, Literature (5) (Liu, “Determinationof”) is used by utilizing positions of five or more points whose relative positions in space are known in each image.
Camera Location from 2-D
to 3-D Line and Point Co
respondence ", IEEE Transa
ction PAMI (i-triple-transaction pattern analysis and machine intelligence), 1990, No. 12, pp. 28-37). Note that this technique determines a left-handed system with the lens center of the left camera as the origin and the optical axis direction as the Z-axis (see FIG. 5 (b);
This coordinate system is referred to as a "left camera coordinate system", and a left-handed system (see FIG. 5 (c), hereinafter referred to as "left camera coordinate system") with the optical axis direction as the origin and the lens center of the right camera as the origin. A rotation matrix R and a translation vector h￣ when converting into a “right camera coordinate system”.

【００３５】ここでＲ、ｈ￣は以下のように表せる。Here, R, h￣ can be expressed as follows.

【００３６】 [0036]

【００３７】なお便宜上、本実施例では、世界座標系を
左カメラ座標系としているが、世界座標系は右カメラ座
標系であってもよい。For convenience, in this embodiment, the world coordinate system is the left camera coordinate system, but the world coordinate system may be the right camera coordinate system.

【００３８】以下では、具体的な処理内容について説明
する。まず、ユーザが撮影されたステレオ画像を入力す
る。そして、色情報によって、４種類のそれぞれのマー
カー領域を抽出する。Hereinafter, specific processing contents will be described. First, a user inputs a captured stereo image. Then, four types of marker areas are extracted according to the color information.

【００３９】次に、それぞれのマーカー領域の重心位置
をそれぞれのステレオ画像において求める。各画像にお
いて重心位置を求め、その位置を基に、世界座標系での
各マーカーの３次元位置を計測する。Next, the position of the center of gravity of each marker area is obtained in each stereo image. The position of the center of gravity is determined in each image, and the three-dimensional position of each marker in the world coordinate system is measured based on the position.

【００４０】左右画像中のマーカーの重心位置をそれぞ
れＰＬ（ｘ_l，ｙ_l）、ＰＲ（ｘ_r，ｙ_r）とすると、ＰＬ
の左カメラ座標系での位置ベクトルは、ｘ_l￣＝（ｘ_l，ｙ_l，ｆ_l）^T と表され、ＰＲの右カメラ座標系での位置ベクトルは、ｘ_l￣＝（ｘ_r，ｙ_r，ｆ_r）^T と表される。但し肩のＴは転置（Transpose）を表す。Assuming that the barycentric positions of the markers in the left and right images are PL ( _xl , _yl ) and PR ( _xr , _yr ), respectively, PL
Is represented as x _l ￣ = ( _xl , _yl , _fl ) ^T, and the position vector of the PR in the right camera coordinate system is x _l ￣ = (x _r , _yr , _fr ) ^T. However, T of the shoulder represents transpose.

【００４１】ここで、注意すべき事項として、マーカー
の一部が身体で隠れてしまうので、左右画像においてマ
ーカーの撮影されている領域が異なってしまう。したが
って、マーカー領域の重心位置を示すＰＬ、ＰＲは、通
常、エピポーラー拘束を満たさない。ゆえに、直線ＣＬ
ＰＬと直線ＣＲＰＲは通常一点Ｍで交わらない。Here, it should be noted that a part of the marker is hidden by the body, so that the area where the marker is photographed in the left and right images is different. Therefore, PL and PR indicating the position of the center of gravity of the marker region do not usually satisfy the epipolar constraint. Therefore, the straight line CL
The PL and the straight line CRPR do not normally intersect at one point M.

【００４２】したがって、図５（ａ）に拡大図として示
すように、Ｍを、直線Ｃ_LＰ_Lと直線Ｃ_RＰ_Rのいずれから
も等距離で、しかも、その距離が最小になる位置に、定
義する。[0042] Therefore, as shown as an enlarged view in FIG. 5 (a), the M, equidistant from both of the straight line C _L P _L and the straight line C _R P _R, moreover, the position where the distance is minimum ,Define.

【００４３】このとき、Ｍを通って、直線Ｃ_LＰ_Lと直線
Ｃ_RＰ_Rに対する垂線の足をＭＬ、ＭＲとし、Ｃ_LＭ_L￣＝
ｔｘ_L￣、Ｃ_RＭ_R￣＝ｓｘ_R￣とすると、[0043] In this case, through the M, the perpendicular line of the leg to the straight line C _{_L} P _L and the straight line C _{_R} P _R ML, and MR, C _{_L} M _L ¯ =
Assuming that tx _L ￣ and C _R M _R ￣ = sx _R 、,

【００４４】 [0044]

【００４５】上式（２）、（３）の連立方程式を解く
と、Solving the simultaneous equations of the above equations (2) and (3) gives

【００４６】 [0046]

【００４７】そして、上式（４）により、ｔ、ｓを求め
ると、世界座標系での点Ｍの位置ベクトルｘ_M￣は次式
（５）として求まる。When t and s are obtained from the above equation (4), the position vector x _Mの of the point M in the world coordinate system is obtained as the following equation (5).

【００４８】 [0048]

【００４９】以上が、フレーム画像処理を行う特徴点位
置計測部２の動作である。The above is the operation of the feature point position measuring section 2 for performing the frame image processing.

【００５０】なお、人物位置姿勢推定部２以降の動作
は、前記第１の実施例と同じである。The operation after the person position / posture estimating unit 2 is the same as that of the first embodiment.

【００５１】次に、本発明の第３の実施例について説明
する。本発明の第３の実施例は、前記第１の実施例のジ
ェスチャ推定部４において、「手招き」／「ばいばい」
などの反復動作の推定を行なうようにしたものである。
これは、人物動作解析部３の座標変換によって、「ばい
ばい」のような動作をユーザが行なった場合、該ユーザ
の手の特徴点の座標変化が、ｘ軸方向に現れたり、「手
招き」のような動作をユーザが行なった場合手の特徴点
の座標変化がｚ軸方向に現れたりすることを利用してい
る。Next, a third embodiment of the present invention will be described. According to the third embodiment of the present invention, the gesture estimating unit 4 of the first embodiment uses “beckoning” / “bye-bye”.
And so on.
This is because, when the user performs an operation such as “bye-bye” by the coordinate conversion of the human motion analysis unit 3, a change in the coordinates of the feature points of the user's hand appears in the x-axis direction, or This is based on the fact that when the user performs such an operation, a change in the coordinates of the characteristic points of the hand appears in the z-axis direction.

【００５２】図６は、反復動作の特徴をモデル化した図
であり、ユーザが「ばいばい」の動作をした場合の、フ
レーム番号と特徴点のユーザ座標系における座標変化の
関係を示す図である。以下、「ばいばい」について説明
するが、「手招き」の場合も同様である。FIG. 6 is a diagram in which the feature of the repetitive operation is modeled, and shows the relationship between the frame number and the change in the coordinate of the feature point in the user coordinate system when the user performs the “bye-bye” operation. . Hereinafter, “bye” will be described, but the same applies to “beckoning”.

【００５３】まず、認識にあたり、過去数フレームの情
報を記載した配列Ｐｉ［ｊ］（ただし、ｉ＝０、１、
ｊ＝０、…、ｎ_max-1）を用意する。First, upon recognition, an array Pi [j] (where i = 0, 1,
j = 0,..., n _max-1 ) are prepared.

【００５４】このＰ０［ｊ］、Ｐ１［ｊ］は、それぞれ
「手招き」／「ばいばい」の認識に必要な情報を格納す
るためものであり、ジェスチャに応じて別々に用意す
る。The P0 [j] and P1 [j] are used to store information necessary for recognizing "beckoning" / "bye", and are separately prepared according to gestures.

【００５５】図６において、横軸のフレーム番号は、キ
ャプチャを始めてからの画像のフレーム数を示す。Ｐｉ
［ｊ］は、初期設定ではすべてのインデクスｉ、ｊにつ
いて、値を０に設定しておき、フレーム番号をｎ_maxで
割った余りを、ｊの値にしながら、現在のフレームか
ら、ｎ_maxフレーム前までの情報を記録する。In FIG. 6, the frame number on the horizontal axis indicates the number of frames of the image since the capture was started. Pi
[J] is set such that the value is set to 0 for all the indexes i and j in the initial setting, and the remainder obtained by dividing the frame number by n _max is set to the value of j, and from the current frame to the n _max frame Record previous information.

【００５６】次に、毎フレームの処理では、まず座標変
換した各軸方向について座標変化の絶対値を求める。Next, in the processing for each frame, first, the absolute value of the coordinate change is obtained for each axis direction after the coordinate conversion.

【００５７】そして、それらの座標変化のうち、ｘ軸の
変化が予め定められた閾値ＴＨ０を超える場合、あるい
は、該閾値と極性を反転した負値である閾値−ＴＨ０を
下回る場合には、「ばいばい」の動作を一部おこなって
いるものとみなして、Ｐ１［ｊ₀］＝１（ｊ０は当該フレームでのｊの値）とする。If the change in the x-axis exceeds the predetermined threshold value TH0, or falls below the threshold value -TH0 which is a negative value obtained by inverting the polarity of the threshold value, " it is assumed to perform some operation of the trading _{", P1 [j 0] = 1} (j0 the value of j in the frame) and.

【００５８】また、ｘ軸の変化が閾値−ＴＨ０を下回る
場合は、Ｐ１［ｊ０］＝−１（ｊ０は当該フレームでのｊの値）とする。If the change in the x-axis is less than the threshold value -TH0, P1 [j0] =-1 (j0 is the value of j in the frame).

【００５９】ただし、上記以外の場合は、Ｐ１［ｊ］＝
０とする。However, in cases other than the above, P1 [j] =
Set to 0.

【００６０】同様に、ｚ軸の変化が予め定められた閾値
ＴＨ１を超える場合、あるいは閾値−ＴＨ１を下回る場
合も、「手招き」の動作を一部おこなっているものとみ
なして、それぞれ、Ｐ０［ｊ］＝１、Ｐ０［ｊ］＝−１（ただし、ｊはフレーム番号）とする。それ以外の場合は、Ｐ０［ｊ］＝０とする。Similarly, when the change in the z-axis exceeds a predetermined threshold value TH1 or falls below a threshold value -TH1, it is considered that a part of the "beckoning" operation is partially performed, and P0 [ j] = 1, P0 [j] = − 1 (where j is a frame number). In other cases, P0 [j] = 0.

【００６１】このように、Ｐｉ［ｊ］が書き換えられた
後、Ｐｉ［０］からＰｉ［ｎ_max-1］までのうち、１あ
るいは−１をとる値をとるものをカウントする。このカ
ウントされた値をｄ１とする。After Pi [j] has been rewritten in this way, those which take a value of 1 or -1 from Pi [0] to Pi [n _max-1 ] are counted. The counted value is defined as d1.

【００６２】また、Ｐｉ［０］からＰｉ［ｎ_max-1］の
総和を求め、その総和をｄ２とする。The sum of Pi [n _max-1 ] is calculated from Pi [0], and the sum is d2.

【００６３】ｄ１が一定値ｍ_i（ただしｉ＝０、１）を
超え、かつ、ｄ２が０、１、−１のいずれかの値をとる
場合、そのジェスチャが行われたものと推定する。When d1 exceeds a fixed value m _i (i = 0, 1) and d2 takes one of 0, 1, and -1, it is estimated that the gesture has been performed.

【００６４】ｄ１がｍ_i（ただしｉ＝０、１）を超えた
ということは、特定軸において激しい移動が高頻度でお
こなわれたということを意味し、ｄ２が０、１、−１の
いずれかの値をとったということは、その激しい運動の
向きが正負反対にほぼ同頻度で行われたことを意味して
いる。When d1 exceeds m _i (where i = 0, 1), it means that intense movement has been performed at a specific axis with high frequency, and d2 is 0, 1, or −1. Taking this value means that the direction of the intense exercise was performed at almost the same frequency in the opposite direction.

【００６５】ただし、ＴＨ０、ＴＨ１、ｎ_max、ｍ_iは、
システムの処理速度、ユーザの手を動かす速さに依存す
るものであることから、システムの処理速度、ユーザの
手を動かす速さに合った値を、予めジェスチャモデル記
憶部６に設定記憶しておき、毎フレームではその値を参
照することにする。[0065] _{However, TH0, TH1, n max,} m i is,
Since it depends on the processing speed of the system and the speed of moving the user's hand, a value suitable for the processing speed of the system and the speed of moving the user's hand is set and stored in the gesture model storage unit 6 in advance. In each frame, the value is referred to.

【００６６】[0066]

【発明の効果】以上説明したように、本発明によれば下
記記載の効果を奏する。As described above, according to the present invention, the following effects can be obtained.

【００６７】本発明の第１の効果は、ユーザの位置／姿
勢に依存しないジェスチャ推定を行なうことができる、
ということである。A first effect of the present invention is that gesture estimation independent of the position / posture of the user can be performed.
That's what it means.

【００６８】その理由は、本発明においては、ユーザの
特徴点を世界座標系から、ユーザ姿勢に対し一意でかつ
ユーザのある特徴点を基準としたユーザ座標系に変換す
る構成としたことによる。The reason is that, in the present invention, the feature points of the user are converted from the world coordinate system to the user coordinate system which is unique to the user posture and is based on a certain feature point of the user.

【００６９】本発明の第２の効果は、ユーザが行う反復
動作を高速に推定するできる、ということである。A second effect of the present invention is that iterative operations performed by a user can be quickly estimated.

[Brief description of the drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】本発明の一実施例を説明するための図であり、
ユーザ座標系を示す図である。FIG. 2 is a diagram for explaining one embodiment of the present invention;
FIG. 3 is a diagram illustrating a user coordinate system.

【図３】本発明の一実施例を説明するための図であり、
上下左右の指示動作の特徴をモデル化した図である。FIG. 3 is a diagram for explaining one embodiment of the present invention;
It is a figure modeling the feature of up-and-down and right-and-left instruction operation.

【図４】本発明の一実施例を説明するための図であり、
ユーザのマーカー使用例を示した図である。FIG. 4 is a diagram for explaining one embodiment of the present invention;
It is a figure showing an example of a marker use of a user.

【図５】本発明の別の実施例を説明するための図であ
り、（ａ）はマーカー位置推定、（ｂ）は左カメラ座標
系、（ｃ）は右カメラ座標系を説明する図である。5A and 5B are diagrams for explaining another embodiment of the present invention, wherein FIG. 5A illustrates a marker position estimation, FIG. 5B illustrates a left camera coordinate system, and FIG. 5C illustrates a right camera coordinate system. is there.

【図６】本発明の別の実施例を説明するための図であ
り、反復動作の特徴をモデル化した図である。FIG. 6 is a diagram for explaining another embodiment of the present invention, and is a diagram modeling a feature of the repetitive operation.

[Explanation of symbols]

１特徴点位置計測部２人物位置姿勢推定部３人物動作解析部４ジェスチャ推定部５ジェスチャモデル記憶部 DESCRIPTION OF SYMBOLS 1 Feature point position measurement part 2 Person position and posture estimation part 3 Person movement analysis part 4 Gesture estimation part 5 Gesture model storage part

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 1/00 G06T 7/00 - 7/60 G01B 11/00 Continuation of front page (58) Fields investigated (Int.Cl. ⁷ , DB name) G06T 1/00 G06T ⁷ /00-7/60 G01B 11/00

Claims

(57) [Claims]

1. A feature point position measuring means for measuring three-dimensional coordinates in a space of a point indicating a feature of a limb movement of a user from output information of a sensor worn by the user or image information of an image of the user. A person position / posture estimating means for estimating the three-dimensional position and orientation of the user from three-dimensional coordinates of feature points other than the hand of the user among the measured feature points; Using the three-dimensional position points as reference points, a user coordinate system in which each coordinate axis is uniquely determined according to the posture of the user, and by converting feature points of the limbs of the user into the user coordinate system, A person motion analysis unit for extracting a feature amount independent of a position and a posture and analyzing a time-series change of the feature amount independent of a position and a posture of the user; and a gesture model created in advance. Gesture model storage means for storing a time series change of the feature amount, and a gesture model stored in the gesture model storage means, and a gesture estimation means for estimating the gesture performed by the user, A gesture recognition device, comprising:

2. The gesture estimating means, wherein two thresholds having the same absolute value and opposite polarities are provided, and for one time-series change of the feature amount, the value is larger than the positive threshold. In the case where frames and frames that are smaller than the negative threshold value are alternately present, and such frames are present within a predetermined time and more than a predetermined ratio. The gesture recognition apparatus according to claim 1, wherein the gesture is estimated to be a gesture of a repetitive operation.

3. The gesture storage means stores, as a gesture model, a range of an area for estimating each pointing action, including a reference point of the pointing action in the user coordinates, wherein the gesture estimating means includes: When the feature point estimated by the person motion analysis means stays in the area for estimating the pointing action for a predetermined time or more, it is estimated that the pointing action corresponding to the area has been performed. The gesture recognition device according to claim 1, wherein

4. A method according to claim 1, wherein a point indicating a characteristic of the movement of one or a plurality of predetermined parts of the object is obtained from an output of a sensor attached to the object for estimating an instruction operation or image information of the object. Means for measuring three-dimensional coordinates as characteristic points; and three-dimensional coordinates of the object from the three-dimensional coordinates of characteristic points of a predetermined part of the object among the measured characteristic points.
Means for estimating a three-dimensional position and a posture, and a coordinate system in which each coordinate axis is uniquely determined according to the posture of the target, using one of the estimated three-dimensional positions of the feature points of the target as a reference point. The feature point of the one or more predetermined parts of the object is converted into the coordinate system, thereby extracting a feature amount independent of the position and orientation of the object, and Means for analyzing a time-series change of the feature amount that does not depend on, and, for the feature amount, matching the time-series change with an operation model stored in advance in a storage unit, and estimating the target instruction operation. A pointing action recognition device, comprising:

5. The method according to claim 1, wherein the means for estimating the instruction operation of the object is configured such that, for one time-series change of the feature value, the feature value is set to a predetermined first and second threshold value having different polarities. 5. The pointing action recognition apparatus according to claim 4, wherein when the appearance frequency of the frame that alternately moves beyond the predetermined value is equal to or higher than a predetermined value, the pointing action recognition apparatus estimates that it is a repeating action.

6. A step of measuring three-dimensional coordinates in a space of a point indicating a characteristic of limb movement of the user from output information of a sensor worn by the user or image information of the image of the user. (B) estimating a three-dimensional position and orientation of the user from three-dimensional coordinates of feature points other than the hand of the user among the extracted feature points; and (c) estimating the user's estimated position. With one of the three-dimensional positions as a reference point, a user coordinate system in which each coordinate axis is uniquely determined according to the posture of the user, and by converting feature points of the limbs of the user to the user coordinate system,
Extracting a feature amount that does not depend on the position and orientation of the user, and analyzing a time-series change of the feature amount that does not depend on the position and orientation of the user; A method of matching with a gesture model stored in a gesture model storage unit that stores a gesture model, and estimating the gesture performed by the user.

7. In the step (d), two thresholds having the same absolute value and opposite polarities are provided, and for one time-series change of the feature amount, the value becomes larger than the positive threshold. Frames and frames that are smaller than the negative threshold value are alternately present, and these frames are:
At a predetermined time, if there is more than a predetermined ratio, it is estimated to be a gesture of a repetitive operation,
The gesture recognition method according to claim 3, wherein:

8. The gesture recognition device according to claim 1, wherein a magnetic sensor is used as said sensor.

9. The feature point position measurement means performs stereo image processing from image information captured by two cameras,
The gesture recognition device according to claim 1, wherein three-dimensional coordinates in a space of a point indicating a feature of the movement of the limb of the user are measured.

10. A process of measuring three-dimensional coordinates in a space of a point indicating a feature of a limb movement of the user from output information of a sensor worn by the user or image information of the user. b) estimating a three-dimensional position and a posture of the user from three-dimensional coordinates of feature points other than the hand of the user among the extracted feature points; and (c) estimating the three-dimensional user of the user. With one of the positions as a reference point, configuring a user coordinate system in which each coordinate axis is uniquely determined according to the posture of the user, and converting the feature points of the limbs of the user to the user coordinate system,
A process of extracting a feature amount that does not depend on the position and orientation of the user and analyzing a time-series change of the feature amount that does not depend on the position and orientation of the user; A process for matching a gesture model stored in a gesture model storage unit for storing a gesture model and estimating the gesture performed by the user, and the processes of (a) to (d) of the following are performed by a computer. Recording medium on which a program for recording is recorded.