JP2000331170A

JP2000331170A - Hand motion recognizing device

Info

Publication number: JP2000331170A
Application number: JP11141483A
Authority: JP
Inventors: Akira Uchiumi; 章内海
Original assignee: ATR Media Integration and Communication Research Laboratories
Current assignee: ATR Media Integration and Communication Research Laboratories
Priority date: 1999-05-21
Filing date: 1999-05-21
Publication date: 2000-11-30

Abstract

PROBLEM TO BE SOLVED: To provide a hand motion recognizing device which can stable recognize the motions of a plurality of hands in real time. SOLUTION: This hand motion recognizing device takes the picture of a plurality of hands from different directions by means of a plurality of cameras. A visual point selecting section 4 selects the optimum visual point for processing out of a plurality of visual points (of the cameras). A feature extracting section 6 selects such an image that does not cause occlusion based on the feature point of the image. A chase processing section 8 predicts the positions of the hands by means of a Kalman filter. The visual point selecting section 4 operates based on the predicted results. The chase processing section 8 selects the images which recognize the shapes of the hands based on the angles of rotations of the hands and recognizes the shapes of the hands based on a P-type Fourier descripter. The hand motion recognizing device recognizes hand motion by integrating the shapes of the hands.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は手振り認識装置に関
し、さらに詳しくは、複数の手の動きを実時間で認識す
る手振り認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a hand gesture recognition device, and more particularly, to a hand gesture recognition device for recognizing a plurality of hand movements in real time.

【０００２】[0002]

【従来の技術】直感的で使いやすいインタフェースとし
て、手振りを有するシステムをこれまで数多く提案され
ている。これらのシステムの大半は、データグローブ
（登録商標）に代表されるように、センサを付加した特
殊な手袋の着用を利用者に求めるものであり、着脱の煩
雑さなど使用者の負担が大きく、広く普及するには至っ
ていない。これに対して、同等のシステムを非接触で実
現するために、画像処理により手振りを検出する手法が
提案されている。2. Description of the Related Art A number of systems having hand gestures have been proposed as intuitive and easy-to-use interfaces. Most of these systems require the user to wear special gloves with sensors, as represented by Data Gloves (registered trademark), which imposes a heavy burden on the user such as the complexity of putting on and taking off. It has not spread widely. On the other hand, in order to realize an equivalent system in a non-contact manner, a method of detecting a hand gesture by image processing has been proposed.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
画像処理を基本としたシステムでは、手同士のオクルー
ジョン、自己オクルージョン（たとえば、人指し指を延
ばした手の形状と、人指し指と中指を延ばした手の形状
とは、手を側方から観察する場合は識別しがたい。）と
いったカメラの光軸に対する手の姿勢変化による撮影像
の変形が大きな問題となる。However, in the conventional system based on image processing, the occlusion between hands and the self-occlusion (for example, the shape of the hand with the extended index finger and the shape of the hand with the extended index finger and middle finger) are considered. This means that deformation of the captured image due to a change in the posture of the hand with respect to the optical axis of the camera is a major problem.

【０００４】画像処理に基づくシステムでは、３次元情
報を再構成して追跡するためには、検出した特徴点と画
像とのマッチングをとる必要がある。しかしながら、手
のような対象物を動かした場合、上述したオクルージョ
ンの発生と形状変化とのため、連続的に特徴点を検出し
かつマッチングをとることが困難であった。In a system based on image processing, in order to reconstruct and track three-dimensional information, it is necessary to match a detected feature point with an image. However, when an object such as a hand is moved, it is difficult to continuously detect and match feature points due to the occurrence of occlusion and a change in shape described above.

【０００５】本発明者は、この問題点を解決するため、
多数のカメラを用いた手の位置・姿勢の推定法を提案し
ている（特開平１０−６３８６４号公報、内海章，宮里
勉，岸野文郎，大谷淳，中津良平：距離変換処理を用い
た多視点画像による手姿勢推定法，映像情報メディア学
会誌，Vol.51, No.12, pp.2116-2125 （1997）. ）。こ
の方法では、複数の視点から得られるロバストな特徴量
により、手の形状変化にかかわらず３次元位置・姿勢を
安定に求めることができる。しかしながら、この方法
は、複数の手を同時に処理することができない。また、
手の形状が認識できない。The present inventor has sought to solve this problem.
A method for estimating the position and posture of a hand using a large number of cameras has been proposed (Japanese Patent Laid-Open Publication No. Hei 10-63864, Akira Utsumi, Tsutomu Miyazato, Fumio Kishino, Jun Ohtani, Ryohei Nakatsu: Hand posture estimation method using viewpoint images, Journal of the Institute of Image Information and Television Engineers, Vol.51, No.12, pp.2116-2125 (1997). According to this method, a three-dimensional position / posture can be stably obtained irrespective of a change in the shape of a hand, based on robust feature amounts obtained from a plurality of viewpoints. However, this method cannot handle multiple hands simultaneously. Also,
The shape of the hand cannot be recognized.

【０００６】これに対して、カルマンフィルタ等の予測
フィルタを用いたステレオカメラによる複数の手の追跡
に関する研究が行なわれている（Ali Azarbayejani and
Alex Pentland. Real-time self-calibrating stereo
person tracking using 3-dshape estimation from blo
d feature. In 13th International Conference onPatt
ern Recognition,pp. 627-632, 1996.)。しかしなが
ら、この手法では、両手位置を追跡できるが、姿勢や形
状を認識できない。[0006] On the other hand, research on tracking a plurality of hands by a stereo camera using a prediction filter such as a Kalman filter has been conducted (Ali Azarbayejani and
Alex Pentland. Real-time self-calibrating stereo
person tracking using 3-dshape estimation from blo
d feature.In 13th International Conference onPatt
ern Recognition, pp. 627-632, 1996.). However, in this method, the position of both hands can be tracked, but the posture and shape cannot be recognized.

【０００７】これに対し、手指の３次元モデルを用いて
形状復元を目指す研究も行なわれてきたが（Rehg, J.M.
and Kanade, T.: Visual Tracking og High DOF Artic
ulated Structures: an Application to Human Hand Tr
acking, Computer Vision-ECCV '94, LNCS vol. 801, p
p.35-46 （1994）. 、中嶋正之，柴広有：仮想現実世界
構築のための指の動き検出法，グラフィックスとＣＡＤ
67-6, pp.41-46 （1994）. 、岩井儀雄，八木康史，谷
内田正彦：単眼動画像からの手の３次元運動と位置の推
定，信学論（D-II）, Vol.J80-D-II, No. 1, pp.44-55
（1997）. ）、これらの手法は精緻な姿勢情報が得られ
ることが期待できる一方で、手指の関節の自由度が大き
く計算コストが膨大になるという問題がある。また、オ
クルージョンを回避する検討もなされておらず、処理の
前提となる画像特徴がオクルージョンにより検出できな
い場合、やはり推定が困難になるという問題がある。[0007] On the other hand, researches aimed at shape recovery using a three-dimensional model of a finger have been conducted (Rehg, JM
and Kanade, T .: Visual Tracking og High DOF Artic
ulated Structures: an Application to Human Hand Tr
acking, Computer Vision-ECCV '94, LNCS vol. 801, p
p.35-46 (1994)., Masayuki Nakajima, Yuyu Shiba: Finger Motion Detection Method for Constructing Virtual Reality World, Graphics and CAD
67-6, pp.41-46 (1994)., Y. Iwai, Y. Yagi, and M. Yauchida: Estimation of 3D hand movement and position from monocular video, IEICE (D-II), Vol. -D-II, No. 1, pp.44-55
(1997).), While these methods can be expected to provide precise posture information, they have the problem that the degree of freedom of finger joints is large and the calculation cost is enormous. Further, no consideration has been given to avoiding occlusion, and there is a problem that estimation becomes difficult when image features that are prerequisites for processing cannot be detected by occlusion.

【０００８】それゆえ、この発明の目的は、複数の手の
動きを安定して認識することが可能な手振り認識装置を
提供することである。Therefore, an object of the present invention is to provide a hand gesture recognition device capable of stably recognizing a plurality of hand movements.

【０００９】[0009]

【課題を解決するための手段】請求項１に係る手振り認
識装置は、複数の手の動きを認識する手振り認識装置で
あって、互いに異なる方向から複数の手を撮影して複数
の画像を得る複数の撮像手段と、複数の画像のうち複数
の手の動きを認識するための画像を選択的に出力する第
１選択手段と、第１選択手段で選択した画像の特徴点を
抽出し、特徴点に基づいて追跡対象とする画像を選択す
る特徴抽出手段と、複数の手のそれぞれに対応して設け
られ、対応する手の動きを追跡する複数の追跡処理手段
とを備え、複数の追跡処理手段のそれぞれは、追跡対象
とされる画像の特徴点を用いて、手の状態を予測する予
測手段と、追跡対象とされる画像に基づいて手の法線方
向の算出して、法線方向に最も近い方向から手の撮影し
た結果得られる画像を選択する第２選択手段と、第２選
択手段の選択する画像に基づき、手の形状を認識する認
識手段とを含み、第１選択手段は、複数の画像のうち前
記予測手段において前記予測した状態からオクルージョ
ンが起こりにくいと判断される画像を選択する。According to a first aspect of the present invention, there is provided a hand gesture recognition apparatus for recognizing a plurality of hand movements, wherein a plurality of hands are photographed from different directions to obtain a plurality of images. A plurality of image pickup means, a first selection means for selectively outputting an image for recognizing a plurality of hand movements among the plurality of images, and a feature point of the image selected by the first selection means, A feature extraction unit for selecting an image to be tracked based on a point; and a plurality of tracking processing units provided for each of the plurality of hands and tracking the movement of the corresponding hand. Each of the means is a prediction means for predicting the state of the hand using the feature points of the image to be tracked, and the normal direction of the hand is calculated based on the image to be tracked. Image obtained by shooting the hand from the direction closest to And a recognition unit for recognizing the shape of the hand based on the image selected by the second selection unit, wherein the first selection unit performs the prediction in the prediction unit among a plurality of images. An image that is determined to be less likely to occlude from the state is selected.

【００１０】請求項２に係る手振り認識装置は、請求項
１に係る手振り認識装置であって、特徴抽出手段は、抽
出された特徴点の位置と手の幅との比により、オクルー
ジョンの発生を検出し、オクルージョンが検出された画
像を非選択とする。According to a second aspect of the present invention, in the first aspect of the present invention, the feature extracting unit determines occurrence of occlusion based on a ratio between the position of the extracted feature point and the width of the hand. An image for which occlusion has been detected is detected and is not selected.

【００１１】請求項３に係る手振り認識装置は、請求項
２に係る手振り認識装置であって、認識手段は、第２選
択手段で選択した画像中の手の輪郭線を抽出する輪郭線
抽出手段と、輪郭線抽出手段により抽出された手の輪郭
線をＰ型フーリエ記述子で記述するＰ型フーリエ記述手
段と、Ｐ型フーリエ記述手段からのＰ型フーリエ記述子
に基づいて手の形状を特定する形状特定手段とを含む。A hand gesture recognition device according to a third aspect is the hand gesture recognition device according to the second aspect, wherein the recognition unit extracts a contour line of the hand in the image selected by the second selection unit. P-type Fourier description means for describing the contour of the hand extracted by the contour-line extraction means with a P-type Fourier descriptor, and specifying the hand shape based on the P-type Fourier descriptor from the P-type Fourier description means And a shape specifying means.

【００１２】[0012]

【発明の実施の形態】［実施の形態１］本発明の実施の
形態における手振り認識装置について、図面を参照しな
がら説明する。なお、図中同一または相当部分には同一
符号を付しその説明は繰返さない。本発明の実施の形態
における手振り認識装置は、異なる方向から手を撮影す
る多数の視点のなかから処理に最適な複数の視点を選択
することにより、複数の手の動き（位置、姿勢、形状を
含む）を実時間で安定して認識する。以下、一例として
右手・左手を追跡する手振り認識装置１０００について
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [Embodiment 1] A hand gesture recognition device according to an embodiment of the present invention will be described with reference to the drawings. In the drawings, the same or corresponding portions have the same reference characters allotted, and description thereof will not be repeated. The hand gesture recognition apparatus according to the embodiment of the present invention selects a plurality of viewpoints optimal for processing from among a large number of viewpoints for photographing a hand from different directions, thereby enabling a plurality of hand movements (position, posture, shape). ) In real time. Hereinafter, the hand gesture recognition device 1000 that tracks the right hand and the left hand will be described as an example.

【００１３】図１は、本発明の実施の形態１における手
振り認識装置１０００の要部の構成を示すブロック図で
ある。手振り認識装置１０００は、複数の視点から得ら
れる連続する画像を用いて右手・左手の動きを追跡して
いく。図１に示す手振り認識装置１０００は、カメラ２
＃１〜２＃ｎと、視点選択部４と、特徴抽出部６と、追
跡処理部８と、ジェスチャ認識部１４とを備える。FIG. 1 is a block diagram showing a configuration of a main part of a hand gesture recognition device 1000 according to Embodiment 1 of the present invention. The hand gesture recognition apparatus 1000 tracks the movement of the right hand and the left hand using continuous images obtained from a plurality of viewpoints. The hand gesture recognition device 1000 shown in FIG.
# 1 to #n, a viewpoint selection unit 4, a feature extraction unit 6, a tracking processing unit 8, and a gesture recognition unit 14.

【００１４】カメラ２＃１〜２＃ｎは、互いに異なる方
向から手を撮影して手の画像を得る。視点選択部４は、
複数のカメラによって得られた複数の画像の中から処理
に最適な画像を選択する。特徴抽出部６は、選択された
カメラから受ける画像を用いて距離変換値（スケルトン
値）を算出し、重心点（特徴点）を抽出する。この時点
で、手と手との間のオクルージョンが発生した画像に対
しては、形状認識の対象から外す。The cameras 2 # 1 to 2 # n capture hands from different directions to obtain hand images. The viewpoint selection unit 4
An image optimal for processing is selected from a plurality of images obtained by a plurality of cameras. The feature extraction unit 6 calculates a distance conversion value (skeleton value) using an image received from the selected camera, and extracts a center of gravity (feature point). At this point, the image in which the occlusion between the hands has occurred is excluded from the target of shape recognition.

【００１５】追跡処理部８は、左手対応の追跡部１０と
右手対応の追跡部１２とを含む。追跡部１０、１２のそ
れぞれは、後述するようにカルマンフィルタで対応する
手の動きを予測する。視点選択部４は、当該位置予測結
果に基づき次フレームにおける視点（カメラ）を選択す
る。これにより、オクルージョンの発生確率を低く抑え
ることが可能となる。The tracking processor 8 includes a left-handed tracking unit 10 and a right-handed tracking unit 12. Each of the tracking units 10 and 12 predicts a corresponding hand movement by a Kalman filter as described later. The viewpoint selecting unit 4 selects a viewpoint (camera) in the next frame based on the position prediction result. This makes it possible to reduce the probability of occurrence of occlusion.

【００１６】追跡部１０、１２はさらに、手の回転角に
基づき、処理対象となる１つの視点を選択して対応する
手の形状を認識する。ジェスチャ認識部１４は、追跡部
１０、１２の結果に基づき、ジェスチャを認識する。認
識結果に対応するコマンドが、アプリケーションプログ
ラムに対して発行される。The tracking units 10 and 12 further select one viewpoint to be processed based on the rotation angle of the hand and recognize the corresponding hand shape. The gesture recognition unit 14 recognizes a gesture based on the results of the tracking units 10 and 12. A command corresponding to the recognition result is issued to the application program.

【００１７】ここで、特徴抽出部６の構成について、図
２を用いて説明する。図２は、図１に示す特徴抽出部６
の構成の概要を示すブロック図である。図２を参照し
て、特徴抽出部６は、領域分割部２０＃１〜２０＃ｎ、
主軸検出部２２＃１〜２２＃ｎ、回転変換部２４＃１〜
２４＃ｎ、および特徴点算出部２６＃１〜２６＃ｎを含
む。領域分割部２０＃１〜２０＃ｎ、主軸検出部２２＃
１〜２２＃ｎ、回転変換部２４＃１〜２４＃ｎおよび特
徴点算出部２６＃１〜２６＃ｎのそれぞれは、カメラ２
＃１〜２＃ｎに対応して設けられる。Here, the configuration of the feature extracting unit 6 will be described with reference to FIG. FIG. 2 shows the feature extraction unit 6 shown in FIG.
FIG. 2 is a block diagram showing an outline of the configuration of FIG. Referring to FIG. 2, feature extracting section 6 includes area dividing sections 20 # 1 to 20 # n,
Spindle detectors 22 # 1 to 22 # n, rotation converters 24 # 1
24 # n and feature point calculation units 26 # 1 to 26 # n. Area dividing units 20 # 1 to 20 # n, spindle detecting unit 22 #
1 to 22 # n, rotation conversion units 24 # 1 to 24 # n, and feature point calculation units 26 # 1 to 26 # n
It is provided corresponding to # 1 to #n.

【００１８】以下、総称的に、領域分割部２０＃１〜２
０＃ｎを領域分割部２０と、主軸検出部２２＃１〜２２
＃ｎを主軸検出部２２と、回転変換部２４＃１〜２４＃
ｎを回転変換部２４と、特徴点算出部２６＃１〜２６＃
ｎを特徴点算出部２６とそれぞれ記す。Hereinafter, the area dividing units 20 # 1-2 will be generically referred to.
0 # n is assigned to the area dividing unit 20 and the spindle detecting units 22 # 1 to 22 # 22.
#N is the spindle detection unit 22, and the rotation conversion units 24 # 1 to 24 #
n is a rotation conversion unit 24 and feature point calculation units 26 # 1 to 26 #
n is described as a feature point calculator 26.

【００１９】領域分割部２０は、対応するカメラによっ
て得られた入力画像中の肌色領域を色情報および輝度情
報を用いて分割する。主軸検出部２２は、ソーベル（So
bel）フィルタの結果から平均エッジ方向を求め、これ
を２次元上の手の方向（主軸）とする。The area dividing section 20 divides a skin color area in an input image obtained by a corresponding camera using color information and luminance information. The spindle detection unit 22 is a sobel (So
bel) The average edge direction is obtained from the result of the filter, and this is defined as the hand direction (main axis) in two dimensions.

【００２０】回転変換部２４は、主軸検出部２２で得ら
れた手の方向に基づいて指先が上の方を向くように画像
を回転する。特徴点算出部２６は、回転変換部２４の出
力を受けて、手を構成する画素の各々から手の画像の輪
郭までの最短距離を示す距離変換値を算出する。The rotation conversion unit 24 rotates the image based on the hand direction obtained by the main axis detection unit 22 so that the fingertip points upward. The feature point calculation unit 26 receives the output of the rotation conversion unit 24 and calculates a distance conversion value indicating the shortest distance from each of the pixels constituting the hand to the contour of the image of the hand.

【００２１】たとえば、領域分割部２０からは、図３
（ａ）に示されるような入力２値画像（シルエット画
像）が出力される。主軸検出部２２では、得られたシル
エット画像にソーベルフィルタを適用して手領域の平均
エッジ方向を求める。この平均エッジ方向が、図３
（ｂ）に示されるように手の指先方向（図中記号Φ）と
みなされる（石淵耕一，岩崎圭介，竹村治雄，岸野文
郎：画像処理を用いた実時間手振り推定とヒューマンイ
ンタフェースへの応用，信学論（D-II）, Vol.J79-D-I
I,No.7, pp.1218-1229（1996）. ）。For example, from the area dividing section 20, FIG.
An input binary image (silhouette image) as shown in FIG. The main axis detection unit 22 obtains an average edge direction of the hand region by applying a Sobel filter to the obtained silhouette image. This average edge direction is shown in FIG.
As shown in (b), it is regarded as the fingertip direction (Φ in the figure) (Koichi Ishibuchi, Keisuke Iwasaki, Haruo Takemura, Fumio Kishino: Real-time hand gesture estimation using image processing and application to human interface, IEICE (D-II), Vol.J79-DI
I, No. 7, pp. 1218-1229 (1996).).

【００２２】回転変換部２４では、図３（ｂ）で求めら
た平均エッジ方向（指先方向）Φが画像上において上方
を向くように、画像を回転する。たとえば、画像を回転
角γｉだけ回転させる。これにより、図３（ｃ）に示さ
れる画像が得られる。The rotation converter 24 rotates the image so that the average edge direction (fingertip direction) Φ obtained in FIG. 3B is directed upward on the image. For example, the image is rotated by the rotation angle γi. Thus, the image shown in FIG. 3C is obtained.

【００２３】特徴点算出部２６は、図３（ｃ）に示され
る画像を受けて、図３（ｄ）に示される距離変換画像を
算出する。図３（ｄ）では、距離変換値が大きい画素ほ
ど黒く表わされ、距離変換値が小さい画素ほど白く表わ
されている。したがって、手の輪郭から遠ざかるにつれ
て画素が徐々に濃くなっている。そして、距離変換値
（スケルトン値）の最も大きい点（図中記号ＣＯＧ）を
抽出する。これが、重心点（特徴点）となる。The feature point calculator 26 receives the image shown in FIG. 3C and calculates the distance conversion image shown in FIG. 3D. In FIG. 3D, a pixel having a larger distance conversion value is represented as black, and a pixel having a smaller distance transformation value is represented as white. Therefore, the pixels gradually become darker as the distance from the hand outline increases. Then, a point (symbol COG in the figure) having the largest distance conversion value (skeleton value) is extracted. This is the center of gravity (feature point).

【００２４】重心点間の距離をｓとすると、距離ｓと手
のシルエットの縦方向幅ｗとの比（ｗ／ｓ）が、あるし
きい値以上より大きい場合には、手同士のオクルージョ
ンが発生していると判定する。オクルージョンが発生し
ていない画像は、追跡処理部８における追跡処理の対象
とし、オクルージョンが発生している画像については、
追跡処理の対象からはずす。Assuming that the distance between the center of gravity points is s, if the ratio (w / s) of the distance s to the vertical width w of the silhouette of the hand is greater than a certain threshold value, the occlusion between the hands is reduced. It is determined that it has occurred. Images for which occlusion has not occurred are subjected to tracking processing in the tracking processing unit 8, and for images for which occlusion has occurred,
Remove from tracking processing.

【００２５】次に、追跡処理部８について、図４を用い
て説明する。図４は、図１に示す追跡処理部８の構成の
概要を示すブロック図である。図４を参照して、追跡処
理部８は、３次元位置・方向検出部３０＃１〜３０＃
２、回転角検出部３１＃１〜３１＃２、視点選択部３２
＃１〜３２＃２、および手形状認識部３３＃１〜３３＃
２を含む。Next, the tracking processing section 8 will be described with reference to FIG. FIG. 4 is a block diagram showing an outline of the configuration of the tracking processing unit 8 shown in FIG. Referring to FIG. 4, tracking processing unit 8 includes three-dimensional position / direction detection units 30 # 1 to 30 #.
2, rotation angle detectors 31 # 1-31 # 2, viewpoint selector 32
# 1 to 32 # 2 and hand shape recognition units 33 # 1 to 33 #
2 inclusive.

【００２６】３次元位置・方向検出部３０＃１、回転角
検出部３１＃１、視点選択部３２＃１および手形状認識
部３３＃１は、左手対応の追跡部１０に、３次元位置・
方向検出部３０＃２、回転角検出部３１＃２、視点選択
部３２＃２および手形状認識部３３＃２は、右手対応の
追跡部１２にそれぞれ含まれる。The three-dimensional position / direction detection unit 30 # 1, rotation angle detection unit 31 # 1, viewpoint selection unit 32 # 1, and hand shape recognition unit 33 # 1 provide a three-dimensional position / direction
The direction detecting unit 30 # 2, the rotation angle detecting unit 31 # 2, the viewpoint selecting unit 32 # 2, and the hand shape recognizing unit 33 # 2 are included in the tracking unit 12 corresponding to the right hand.

【００２７】以下、総称的に、３次元位置・方向検出部
３０＃１〜３０＃２を３次元位置・方向検出部３０と、
回転角検出部３１＃１〜３１＃２を回転角検出部３１
と、視点選択部３２＃１〜３２＃２を視点選択部３２
と、手形状認識部３３＃１〜３３＃２を手形状認識部３
３と記す。Hereinafter, the three-dimensional position / direction detecting units 30 # 1 to 30 # 2 are generally referred to as a three-dimensional position / direction detecting unit 30,
The rotation angle detectors 31 # 1-31 # 2 are replaced with the rotation angle detector 31.
And the viewpoint selecting units 32 # 1 to 32 # 2
And the hand shape recognition units 33 # 1 to 33 # 2
Write 3.

【００２８】３次元位置・方向検出部３０は、追跡処理
の対象とされる画像から得られる特徴点に基づき、カル
マンフィルタで対応する手の動きを予測する。回転角検
出部３１は、対応する手の回転角を検出する。視点選択
部３２は、回転角検出部３１の検出結果に基づき、追跡
処理の対象とされる画像（カメラ）から対応する手の形
状を認識するための画像（カメラ）を選択する。手形状
認識部３３は、選択されたカメラおよび対応する手の回
転角に基づき、対応する手の形状を認識する。The three-dimensional position / direction detection unit 30 predicts a corresponding hand movement by a Kalman filter based on feature points obtained from an image to be tracked. The rotation angle detector 31 detects the rotation angle of the corresponding hand. The viewpoint selecting unit 32 selects an image (camera) for recognizing a corresponding hand shape from an image (camera) to be tracked based on the detection result of the rotation angle detecting unit 31. The hand shape recognition unit 33 recognizes the corresponding hand shape based on the selected camera and the corresponding hand rotation angle.

【００２９】３次元位置・方向検出部３０について、図
５を用いて説明する。図５は、カルマンフィルタにおけ
る観測モデルについて説明するための図である。図５に
おいて、記号Ｘ、Ｙ、Ｚは、世界座標系の３軸を、添字
ｉはカメラの番号を、添字ｊは左手または右手をそれぞ
れ表わす。なお、画像１は、カメラ２♯ｉの観測によっ
て得られる。The three-dimensional position / direction detector 30 will be described with reference to FIG. FIG. 5 is a diagram for describing an observation model in the Kalman filter. In FIG. 5, symbols X, Y, and Z represent three axes of the world coordinate system, a subscript i represents a camera number, and a subscript j represents a left hand or a right hand, respectively. Note that the image 1 is obtained by observation of the camera 2 # i.

【００３０】位置（Ｘｈｊ，Ｙｈｊ，Ｚｈｊ）にある手
（左手または右手）が、位置（Ｘｃｉ，Ｙｃｉ）にある
カメラ２♯ｉ（ｉ＝１〜ｎのいずれか）により観測され
ているものとする。ここで、この観測はガウス誤差を含
んでいるものとする（共分散をΣで表わす）。It is assumed that the hand (left hand or right hand) at the position (Xhj, Yhj, Zhj) is observed by the camera 2 # i (i = 1 to n) at the position (Xci, Yci). I do. Here, this observation is assumed to include a Gaussian error (covariance is represented by Σ).

【００３１】ここで、Ｌｈｊ，ｉを、手ｈｊとカメラ２
♯ｉとの距離、ｌｉを、カメラＣｉの焦点距離とする。
また、ｗｈｊ，ｉをエピポーラ線とＹ−Ｚ平面との角
度、ｖｈｊ，ｉをＺ軸とエピポーラ線のＹ−Ｚ平面への
投影結果との角度とする。さらに、Ｒｗｈｊ，ｉ、Ｒｖ
ｈｊ，ｉを、対応するエピポーラ線を回転してＺ軸に平
行にするための回転行列とする。Here, Lhj, i is the hand hj and the camera 2
距離 Let the distance to i, li be the focal length of the camera Ci.
Also, let whj, i be the angle between the epipolar line and the YZ plane, and let vhj, i be the angle between the Z axis and the projection result of the epipolar line on the YZ plane. Further, Rwhj, i, Rv
Let hj, i be a rotation matrix for rotating the corresponding epipolar line to make it parallel to the Z-axis.

【００３２】手が一定の速度で動くものと仮定すると、
点の位置Ｘｈｊ，ｔは、式（１）〜（４）で表わされ
る。なお、式（２）〜（４）は、Ｘ、Ｙ、Ｚ軸における
手ｈｊの速度をそれぞれ表わしている。なお、行列に付
される“′”の記号は、転置を意味する。Assuming that the hand moves at a constant speed,
The position Xhj, t of the point is represented by equations (1) to (4). Expressions (2) to (4) represent the speed of the hand hj on the X, Y, and Z axes, respectively. The symbol "'" added to the matrix means transposition.

【００３３】[0033]

【数１】 (Equation 1)

【００３４】手ｈｊが、Ｎ個のカメラで観測されたもの
とする。∧Ｘｈｊ，ｔ−１を、時刻（ｔ−１）における
手ｈｊの位置の推定値とし、∧Ｓｈｊ，ｔ−１を推定値
∧Ｘｈｊ，ｔ−１の分散行列とする。時刻ｔでの状態
は、式（５）〜（７）で表現される。なお、以下の式に
おいては、添字“∧”、“／”はそれぞれ、対応する変
数の上に標記するものとする。It is assumed that the hand hj is observed by N cameras. ∧Xhj, t-1 is an estimated value of the position of the hand hj at time (t-1), and ∧Shj, t-1 is a variance matrix of the estimated value ∧Xhj, t-1. The state at time t is expressed by equations (5) to (7). In the following formulas, the subscripts “∧” and “/” are respectively described above the corresponding variables.

【００３５】[0035]

【数２】 (Equation 2)

【００３６】行列Ｆは、遷移行列を、行列Ｑは、遷移に
おいて発生する誤差の共分散行列をそれぞれ表わしてい
る。ここで、カメラ２♯ｉの状態Ｃｉを式（８）と表わ
すと、カメラの状態とエピポーラ軸ｗｈｊ，ｉ、ｖｈ
ｊ，ｉとにより、式（９）〜（１０）の観測式が得られ
る。なお、式（９）のｅは、観測誤差を表わしている
（平均を［００］′、共分散行列をΣｈｊ，ｉとす
る）。共分散行列Σｈｊ，ｉは、観測するカメラからの
距離が大きくなるにつれて増大すると考え、式（１１）
のように表わす。カメラ２♯ｉと手ｈｊとの実際の距離
Ｌｈｊ，ｉは未知数のため、カメラ２♯ｉと予測位置／
Ｈｈｊ，ｔとの距離／Ｌｈｊ，ｉ，ｔを近似値として使
用する。The matrix F represents a transition matrix, and the matrix Q represents a covariance matrix of errors occurring in transitions. Here, if the state Ci of the camera 2♯i is expressed by Expression (8), the state of the camera and the epipolar axes whj, i, vh
The observation formulas (9) to (10) are obtained from j and i. Note that e in equation (9) represents an observation error (the mean is [0 0] ', and the covariance matrix is Σhj, i). Considering that the covariance matrix Σhj, i increases as the distance from the camera to be observed increases, Equation (11)
It is expressed as follows. Since the actual distance Lhj, i between the camera 2 @ i and the hand hj is unknown, the camera 2 @ i and the predicted position /
The distance / Lhj, i, t from Hhj, t is used as an approximate value.

【００３７】[0037]

【数３】 (Equation 3)

【００３８】式（９）の観測式は、左辺が観測情報を表
わし、右辺が手の位置を画像に投影した結果を表わして
いる。これらの観測結果および推定値（式（５）、
（６））により、手ｈｊの状態が式（１２）および（１
３）に示すように更新（予測）される。なお、式（１
２）および（１３）におけるΣｉは、１つの手に関し
て、追跡可能な全ての画像（オクルージョンを発生させ
ない画像）について総和を取ることを表わしている。In the observation formula (9), the left side represents the observation information, and the right side represents the result of projecting the position of the hand on the image. These observations and estimates (Equation (5),
According to (6)), the state of the hand hj is determined by the equations (12) and (1).
It is updated (predicted) as shown in 3). Note that the expression (1)
Σi in 2) and (13) indicates that the sum of all traceable images (images that do not cause occlusion) is calculated for one hand.

【００３９】[0039]

【数４】 (Equation 4)

【００４０】このようにして、観測モデルの３次元位置
および速度が更新される。なお、方向についても、同様
に予測可能である。Thus, the three-dimensional position and velocity of the observation model are updated. Note that the direction can be similarly predicted.

【００４１】カルマンフィルタは、予測値をガウス分布
で与える。図１に示す視点選択部４では、予測位置と特
徴点との対応付けを行なう。より具体的には、予測位置
を示すガウス分布Ｎ（／Ｘｈｊ，ｔ、／Ｓｈｊ，ｔ）を
それぞれ画像上に弱透視投影する。そして、マハラノビ
ス距離を用いて、投影した分布と特徴点との近さを算出
する。投影した分布にもっとも近い特徴点によりフィル
タを更新する。The Kalman filter gives a predicted value in a Gaussian distribution. The viewpoint selecting unit 4 shown in FIG. 1 associates a predicted position with a feature point. More specifically, the Gaussian distribution N (/ Xhj, t, / Shj, t) indicating the predicted position is weakly perspective projected on the image. Then, the proximity between the projected distribution and the feature point is calculated using the Mahalanobis distance. The filter is updated with the feature points closest to the projected distribution.

【００４２】追跡処理部８および視点選択部４の有効性
を説明するため、図７〜図８に示す実験を行なった。図
７〜図８は、連続するシーケンスにおいて右手と左手と
を追跡した実験結果を示す図である。５台のカメラを配
置し、右手と左手とをＸ軸方向において微小な距離で交
わらせた場面を撮影する。図７（ａ）は、Ｘ軸方向の追
跡結果を、図７（ｂ）は、Ｙ軸方向の追跡結果を、図７
（ｃ）は、Ｚ軸方向の追跡結果をそれぞれ示している。
図７（ａ）から、手振り認識装置１０００は、近接する
右手（記号“＋”）と左手（記号“◇”）とを追跡して
いることがわかる。In order to explain the effectiveness of the tracking processing unit 8 and the viewpoint selecting unit 4, experiments shown in FIGS. FIG. 7 and FIG. 8 are diagrams showing experimental results of tracking the right hand and the left hand in a continuous sequence. Five cameras are arranged, and a scene in which the right hand and the left hand cross at a small distance in the X-axis direction is photographed. 7A shows a tracking result in the X-axis direction, and FIG. 7B shows a tracking result in the Y-axis direction.
(C) shows the tracking results in the Z-axis direction, respectively.
From FIG. 7A, it can be seen that the hand gesture recognition apparatus 1000 is tracking the right hand (symbol “+”) and the left hand (symbol “◇”) that are close to each other.

【００４３】図８は、図７に対する視点選択部４の処理
結果を示す図である。図８の縦軸は、カメラを示し、横
軸はフレームを示している。図８において、各記号は、
各フレームにおいて選択されたカメラを示している。図
８では、各フレームにおいて、３つの視点が選択されて
いる。なお、図中、記号“×”、記号“□”はそれぞ
れ、後の処理で右手、左手の形状認識に使われた視点を
示している。手の予測位置に従ってカメラを切替えなが
ら手を追跡することにより、２つの手を適切に追跡して
いることがわかる。FIG. 8 is a diagram showing the processing result of the viewpoint selecting unit 4 with respect to FIG. The vertical axis in FIG. 8 indicates a camera, and the horizontal axis indicates a frame. In FIG. 8, each symbol is
The camera selected in each frame is shown. In FIG. 8, three viewpoints are selected in each frame. In the drawing, the symbol “x” and the symbol “□” indicate the viewpoints used for the shape recognition of the right and left hands, respectively, in later processing. By tracking the hand while switching the camera according to the predicted position of the hand, it can be seen that the two hands are properly tracked.

【００４４】次に、図４で説明した回転角検出部３１に
ついて説明する。手の位置および主軸の方向はすでに求
められているので、回転角検出部３１は、所定の手のひ
らモデルに基づき、残る手の主軸周りの回転角を重心点
のスケルトン値から推定する。図９は、回転角検出部３
１における手のひらモデルを用いた回転角の検出につい
て説明するための図である。図９を参照して、手のひら
モデルとして、楕円体を用いる。重心点のスケルトン値
を、カメラ２♯ｉで撮影された画像上における楕円体の
幅ｓとする。弱透視変換を仮定すると、光軸と手のひら
モデルの回転軸の長軸とがなす角θと、重心点のスケル
トン値ｓとの関係は、式（１４）に従うことになる。式
（１４）におけるＬは、手の位置とカメラ２との距離、
ａおよびｂは定数を表わしている。ここで、θは、手の
ひらの回転角を意味する。なお、定数ａ，ｂは、予めサ
ンプルデータを用いて決定しておく。観測がガウス誤差
を含む仮定すると、ｍ個のカメラで観測される重心スケ
ルトン値ｓ１，…，ｓｍに対して確率Ｐ（ｓ１，…，ｓ
ｋ｜θ）を最大にする値θが、手の主軸周りの回転角と
なる（式（１５）〜（１６）参照）。Next, the rotation angle detector 31 described with reference to FIG. 4 will be described. Since the position of the hand and the direction of the main axis have already been obtained, the rotation angle detection unit 31 estimates the rotation angle around the main axis of the remaining hand from the skeleton value of the center of gravity based on a predetermined palm model. FIG. 9 shows the rotation angle detector 3.
FIG. 2 is a diagram for describing detection of a rotation angle using a palm model in FIG. Referring to FIG. 9, an ellipsoid is used as a palm model. The skeleton value of the center of gravity is defined as the width s of the ellipsoid on the image captured by the camera 2 # i. Assuming the weak perspective transformation, the relationship between the angle θ between the optical axis and the long axis of the rotation axis of the palm model and the skeleton value s of the center of gravity follows Expression (14). L in equation (14) is the distance between the position of the hand and the camera 2;
a and b represent constants. Here, θ means the rotation angle of the palm. Note that the constants a and b are determined in advance using sample data. Assuming that the observations include Gaussian errors, the probability P (s1,..., S for the centroid skeleton values s1,.
The value θ that maximizes k | θ) is the rotation angle around the main axis of the hand (see equations (15) to (16)).

【００４５】[0045]

【数５】 (Equation 5)

【００４６】この手法により、姿勢推定におけるステレ
オマッチングの必要性がなくなる。これにより、推定の
安定性が増す。なお、重心点のスケルトン値は、指曲げ
などの形状変化の影響を受けにくい。このため、回転角
検出部３１により算出される手の主軸周りの回転角は、
形状変化に対して安定しているという特徴を有する。This method eliminates the need for stereo matching in posture estimation. This increases the stability of the estimation. Note that the skeleton value at the center of gravity is hardly affected by shape changes such as finger bending. For this reason, the rotation angle around the main axis of the hand calculated by the rotation angle detection unit 31 is
It has the feature that it is stable against shape changes.

【００４７】図１０は、回転角検出部３１の有効性を説
明するための実験結果を示す図である。図１０（ｂ）の
白黒画像に表わされる５つの異なる手の形状について回
転角を検出した。図１０（ａ）のグラフ上における各々
のドットは回転角検出部３１により検出された角度を、
点線は参考のため磁気センサーを用いて計測した回転角
をそれぞれ示している。図１０（ａ）に示すように、回
転角検出部３１は、手の形状に影響されることなく回転
角を検出することが可能であることがわかる。FIG. 10 is a diagram showing experimental results for explaining the effectiveness of the rotation angle detecting unit 31. In FIG. The rotation angles were detected for five different hand shapes shown in the black and white image of FIG. 10B. Each dot on the graph of FIG. 10A indicates the angle detected by the rotation angle detection unit 31.
The dotted lines indicate the rotation angles measured using a magnetic sensor for reference. As shown in FIG. 10A, it is understood that the rotation angle detection unit 31 can detect the rotation angle without being affected by the shape of the hand.

【００４８】次に、図４で説明した視点選択部３２につ
いて説明する。視点選択部３２は、回転角検出部３１か
ら出力される回転角に基づき、オクルージョンが発生し
ていない視点（カメラ）のなかから、手のひらに対して
最も垂直に近い光軸を持つカメラを選択する。より具体
的には、手の指先方向（主軸）と手の主軸周りの回転角
とを定めることにより、図１１に示すような手のひらの
法線ベクトルＮを定めることができる。図１１に示され
るようにカメラ２♯ｉの光軸ベクトルをＣｉとしたと
き、光軸ベクトルＣｉが法線ベクトルＮとなす角度θｉ
が最小になるカメラｉが選択される。手指間のオクルー
ジョンは、手のひらを正面から撮影する場合に最も起こ
りにくいため、このような視点の選択によりオクルージ
ョンを回避することができる。Next, the viewpoint selecting section 32 described with reference to FIG. 4 will be described. Based on the rotation angle output from the rotation angle detection unit 31, the viewpoint selection unit 32 selects a camera having an optical axis that is most perpendicular to the palm from viewpoints (cameras) in which occlusion has not occurred. . More specifically, the normal vector N of the palm as shown in FIG. 11 can be determined by determining the fingertip direction (main axis) of the hand and the rotation angle about the main axis of the hand. As shown in FIG. 11, when the optical axis vector of the camera 2 # i is Ci, the angle θi that the optical axis vector Ci forms with the normal vector N
Is selected. Since occlusion between fingers is most unlikely to occur when a palm is photographed from the front, occlusion can be avoided by selecting such a viewpoint.

【００４９】図１２は、視点選択部３２の有効性を説明
するための実験結果を示す図である。図１２に示す実験
では、３つのカメラを等間隔に配置し、中央のカメラ
（図中記号カメラ１）を中心として手の姿勢を変化させ
た。図１２（ａ）は、各フレームにおいて検出した回転
角を、図１２（ｂ）は、図１２（ａ）に対応して選択さ
れたカメラを示している。この結果から、手のひらの回
転角の変化に伴い、逐次、３つのカメラから最も適切な
１つのカメラが選択されることがわかる。FIG. 12 is a diagram showing experimental results for explaining the effectiveness of the viewpoint selecting unit 32. In the experiment shown in FIG. 12, three cameras were arranged at equal intervals, and the posture of the hand was changed around the center camera (the symbol camera 1 in the figure). FIG. 12A shows a rotation angle detected in each frame, and FIG. 12B shows a camera selected corresponding to FIG. 12A. From this result, it can be understood that the most appropriate one camera is sequentially selected from the three cameras as the rotation angle of the palm changes.

【００５０】次に、図４で説明した手形状認識部３３に
ついて説明する。手形状認識部３３では、対応する視点
選択部３２で選択された画像を用いて、Ｐ型フーリエ記
述素子により対応する手の形状を判別する。Next, the hand shape recognition section 33 described with reference to FIG. 4 will be described. The hand shape recognition unit 33 determines a corresponding hand shape by a P-type Fourier description element using the image selected by the corresponding viewpoint selection unit 32.

【００５１】図１３は、手形状認識部３３の構成を示す
ブロック図である。図１３に示すように、手形状認識部
３３は、視点選択部３２により選択されたカメラからの
手の画像に基づいてその画像中の手の輪郭線を抽出する
輪郭線抽出部８１と、輪郭線抽出部８１により抽出され
た手の輪郭線を、回転角検出部３１により算出された法
線ベクトルに一致する方向から手を撮影したならば得ら
れたであろう手の輪郭線に補正する輪郭線補正部８２
と、輪郭線抽出部８１により抽出されかつ輪郭線補正部
８２により補正された手の輪郭線をＰ型フーリエ記述子
で記述するＰ型フーリエ記述部８３と、Ｐ型フーリエ記
述部８３からのＰ型フーリエ記述子に含まれる複数のフ
ーリエ係数のうち所定次数よりも低い次数のフーリエ係
数をそれぞれベクトル成分とする特徴ベクトルを算出す
る特徴ベクトル算出部８４と、手の形状と特徴ベクトル
との関係を学習するために複数の既知形状を有する手を
撮影して得た多数のサンプル画像に基づいてもし撮影さ
れた手の形状がある既知形状であったならば得られたで
あろう確率を定義する確率定義部８５と、確率定義部８
５により定義された確率に基づいて上記複数の既知形状
のうち特徴ベクトル算出部８４により算出された特徴ベ
クトルが得られる確率が最も高い既知形状を手の形状と
して選択する形状選択部８６とを含む。すなわち、特徴
ベクトル算出部８４、確率定義部８５および形状選択部
８６は、Ｐ型フーリエ記述部８３からのＰ型フーリエ記
述子に基づいて手の形状を特定し、その形状を認識結果
として出力する。FIG. 13 is a block diagram showing the configuration of the hand shape recognition unit 33. As shown in FIG. 13, the hand shape recognition unit 33 includes a contour extraction unit 81 that extracts a contour of the hand in the image based on the image of the hand from the camera selected by the viewpoint selection unit 32, The outline of the hand extracted by the line extraction unit 81 is corrected to the outline of the hand that would have been obtained if the hand was photographed from a direction that matches the normal vector calculated by the rotation angle detection unit 31. Contour correction unit 82
And a P-type Fourier description unit 83 that describes the outline of the hand extracted by the outline extraction unit 81 and corrected by the outline correction unit 82 using a P-type Fourier descriptor. A feature vector calculation unit 84 that calculates a feature vector having a Fourier coefficient of a lower order than a predetermined order as a vector component among a plurality of Fourier coefficients included in the type Fourier descriptor, and a relation between the shape of the hand and the feature vector. Based on a number of sample images obtained by capturing a hand with multiple known shapes for learning, define the probability that the captured hand shape would have been obtained if it had a known shape Probability definition unit 85 and probability definition unit 8
And a shape selecting unit 86 that selects, as a hand shape, a known shape having the highest probability of obtaining the feature vector calculated by the feature vector calculating unit 84 among the plurality of known shapes based on the probability defined by the step S5. . That is, the feature vector calculation unit 84, the probability definition unit 85, and the shape selection unit 86 specify a hand shape based on the P-type Fourier descriptor from the P-type Fourier description unit 83, and output the shape as a recognition result. .

【００５２】次に、このように構成された手形状認識部
の動作について説明する。輪郭線抽出部８１では、選択
されたカメラ２♯ｉの画像から手の輪郭線が抽出され
る。ここでは、特に手形状を特徴付ける手指部分の輪郭
線のみが抽出される。上述したように、入力２値画像は
重心点（特徴点）検出の前処理として指先が上方を向く
よう角度γｉの回転変換を受けているため、図１４に示
されるように、手の領域内で重心点よりも上方部分のみ
の輪郭線を抽出することで手指部分の形状が得られる。
より具体的には、図１４に示されるようなシルエット画
像上で重心点よりも上方に位置する白色画素のうち黒色
画素に隣接する画素が輪郭線として抽出される。ここ
で、輪郭線の抽出は右回りとして、抽出した各画素の座
標を（ａ（ｔ），ｂ（ｔ））とする（ｔ＝０，…，ｍ−
１，ｍは輪郭線を構成する全画素数）。Next, the operation of the thus configured hand shape recognition unit will be described. The contour extraction unit 81 extracts a contour of the hand from the image of the selected camera 2 # i. Here, only the outline of the finger portion that particularly characterizes the hand shape is extracted. As described above, since the input binary image has been subjected to the rotation transformation of the angle γi so that the fingertip is directed upward as preprocessing for the detection of the center of gravity (feature point), as shown in FIG. By extracting the contour of only the portion above the center of gravity, the shape of the finger portion can be obtained.
More specifically, a pixel adjacent to a black pixel among white pixels located above the center of gravity on a silhouette image as shown in FIG. 14 is extracted as a contour line. Here, the outline is extracted clockwise, and the coordinates of each extracted pixel are set to (a (t), b (t)) (t = 0,..., M−
1, m is the total number of pixels constituting the contour line).

【００５３】このようにして抽出された輪郭線は、手の
ひらの法線ベクトルＮとカメラ２♯ｉの光軸ベクトルＣ
ｉのなす角θｉの変化に応じて、投影による変形を受け
ている。そこで、輪郭線補正部８２では、抽出された手
の輪郭線が、手のひらを正面から撮影したならば得られ
たはずの輪郭線に補正される。より具体的には、カメラ
選択の結果、θｉ＜＜９０°であると考え、弱透視変換
を仮定すると、観測される画像は、図１１に示されるよ
うに手の法線ベクトルＮをカメラｉの撮像面に投影した
Ｎｉ′の方向（画像内の水平軸ｘ_Ciとなす角をφｉとす
る。）にｃｏｓθｉ倍の縮小を受けていることになる。
そこで、抽出された輪郭線の各画素の座標（ａ（ｔ），
ｂ（ｔ））は、次の式（１７）で補正され、（ａ′
（ｔ），ｂ′（ｔ））となる（ｔ＝０，…，ｍ−１）。
ここで、Ｒ（α）は角度αの回転変換行列である。The contour line extracted in this way is composed of the normal vector N of the palm and the optical axis vector C of the camera 2♯i.
In response to a change in the angle θi made by i, the image is deformed by projection. Therefore, the contour correction unit 82 corrects the extracted contour of the hand to a contour that would have been obtained if the palm was photographed from the front. More specifically, assuming that θi << 90 ° as a result of the camera selection and assuming a weak perspective transformation, the observed image has a camera normal vector N as shown in FIG. Is reduced by cos θi times in the direction of Ni ′ projected on the image pickup surface (the angle between the horizontal axis x _Ci in the image and φi).
Therefore, the coordinates (a (t),
b (t)) is corrected by the following equation (17), and (a ′)
(T), b '(t)) (t = 0,..., M-1).
Here, R (α) is a rotation transformation matrix of the angle α.

【００５４】[0054]

【数６】 (Equation 6)

【００５５】Ｐ型フーリエ記述部８３では、抽出された
輪郭線が折れ線で近似され、Ｐ型フーリエ記述子で記述
される。Ｐ型フーリエ記述子は、上坂によって提案され
たもので（上坂吉則：開曲線にも適用できる新しいフー
リエ記述子，信学論（Ａ），Vol.J67-A, No.3, PP.166-
173 （1984）. ）、種々のパターンを少ないパラメータ
で記述できる、平行移動・拡大縮小について不変であ
る、開曲線に適用できるといった形状認識に優れた特徴
を持ち、これまでにも部品形状の認識（伊藤豪俊，平田
達也，石井直宏：フーリエ記述子を用いた部品の形状認
識と処理，信学論（Ｄ），Vol.J71-D, No.6, pp.1065-1
073 （1988）. ）、文字認識（大友照彦，原健一：Ｐ型
フーリエ記述子を用いたオンライン手書き漢字認識，情
処学論，Vol.34, No.2, pp.281-288（1993）. ）、人間
の横顔認識（相原恒博，大上健二，松岡靖：人間の横顔
認識におけるＰ型フーリエ記述子の有効成分の個数につ
いて，信学論（D-II）, Vol.J74-DII, No.10, pp.1486-
1487（1991）. ）などに広く利用されてきたものであ
る。In the P-type Fourier description unit 83, the extracted contour is approximated by a polygonal line, and is described by a P-type Fourier descriptor. The P-type Fourier descriptor was proposed by Kamisaka (Yoshinori Uesaka: New Fourier descriptor applicable to open curves, IEICE (A), Vol.J67-A, No.3, PP.166-
173 (1984).), Which has excellent features in shape recognition, such as being able to describe various patterns with few parameters, being invariant in translation and scaling, and being applicable to open curves. (Toshiya Ito, Tatsuya Hirata, Naohiro Ishii: Shape Recognition and Processing of Parts Using Fourier Descriptors, IEICE (D), Vol.J71-D, No.6, pp.1065-1
073 (1988).), Character Recognition (Tomohiko Otomo, Kenichi Hara: Online Handwritten Kanji Character Recognition Using P-Type Fourier Descriptors, Theory of Information Processing, Vol.34, No.2, pp.281-288 (1993) ), Human Profile Recognition (Tsunehiro Aihara, Kenji Ohgami, Yasushi Matsuoka: Number of active components of P-type Fourier descriptor in human profile recognition, IEICE (D-II), Vol.J74-DII, No.10, pp.1486-
1487 (1991).).

【００５６】Ｐ型フーリエ記述子では、２次元の曲線を
複素平面上の点列と考え、長さの等しい線分からなる折
れ線で近似する。折れ線の各頂点をｚ（ｊ）＝ｘ（ｊ）
＋ｉｙ（ｊ）とするとき（｜ｚ（ｊ＋１）−ｚ（ｊ）｜
＝δ；ｊ＝０，…，ｎ−１）、各折れ線をその長さδで
正規化して、式（１８）で表わされるような折れ線のＰ
表現ｗ（ｊ）を得る。そして、ｗ（ｊ）の離散フーリエ
展開により、フーリエ係数ｃ（ｋ）が式（１９）により
求まる。In the P-type Fourier descriptor, a two-dimensional curve is considered as a sequence of points on a complex plane, and is approximated by a polygonal line composed of line segments having the same length. Each vertex of the polygonal line is represented by z (j) = x (j)
+ Iy (j) (| z (j + 1) -z (j) |
= Δ; j = 0,..., N−1), each polygonal line is normalized by its length δ, and the P of the polygonal line represented by the equation (18) is obtained.
Obtain the expression w (j). Then, a Fourier coefficient c (k) is obtained by Expression (19) by discrete Fourier expansion of w (j).

【００５７】[0057]

【数７】 (Equation 7)

【００５８】ここで、係数の集合｛ｃ（ｋ）；ｋ＝−
Ｎ，…，０，…，Ｎ｝がＮ時のＰ型フーリエ記述子であ
る。この実施の形態では、ｃ（ｋ）の大きさ｜ｃ（ｋ）
｜を輪郭線の記述に利用する。Here, a set of coefficients ｛c (k); k = −
N,..., 0,..., N} are P-type Fourier descriptors at the time of N. In this embodiment, the magnitude of c (k) | c (k)
Is used to describe the outline.

【００５９】特徴ベクトル算出部８４では、Ｐ型フーリ
エ記述子｛ｃ（ｋ）｝に含まれるフーリエ係数のうち所
定次数よりも低い次数の（２Ｎ＋１）個（たとえばＮ＝
４または５）のフーリエ係数を用いて式（２０）で定義
される特徴ベクトルＶが算出される。ここで、Ａ^tはベ
クトルＡの転置ベクトルを表わす。The feature vector calculator 84 calculates (2N + 1) (for example, N = 4) orders lower than a predetermined order among the Fourier coefficients included in the P-type Fourier descriptor {c (k)}.
The feature vector V defined by equation (20) is calculated using the Fourier coefficients of 4 or 5). Here, A ^t represents the transposed vector of the vector A.

【００６０】[0060]

【数８】 (Equation 8)

【００６１】確率定義部８５では、手形状の認識に先立
ち、手のいくつかの既知形状と特徴ベクトルの対応関係
が学習され、各形状ｓごとに特徴ベクトルＶが観測され
る確率Ｐ（Ｖ｜ｓ）が式（２１）で表わされる正規分布
により定義される。Prior to recognizing the hand shape, the probability definition unit 85 learns the correspondence between some known shapes of the hand and the feature vector, and the probability P (V |) that the feature vector V is observed for each shape s. s) is defined by the normal distribution represented by equation (21).

【００６２】[0062]

【数９】 (Equation 9)

【００６３】ここで、Ｖｍｓは、ある既知形状の手を撮
影して得られた多数のサンプル画像に基づいて算出され
たその既知形状ｓの特徴ベクトルＶの平均ベクトル（た
とえば多数の特徴ベクトルのベクトル成分の平均値をベ
クトル成分とするもの）である。また、Σｓは行分散行
列である。このような確率関数が各既知形状ごとに予め
用意される。Here, Vms is an average vector (for example, a vector of a large number of feature vectors) of the feature vector V of the known shape s calculated based on a large number of sample images obtained by photographing a hand having a known shape. The average value of the components as a vector component). Σs is a row variance matrix. Such a probability function is prepared in advance for each known shape.

【００６４】形状選択部８６では、算出された特徴ベク
トルＶが得られる確率が最も高い既知形状が手の形状と
して選択される。すなわち、算出された特徴ベクトルＶ
が観測される確率Ｐ（Ｖ｜ｓ）がすべての既知形状ｓに
ついて求められ、その求められた確率Ｐ（Ｖ｜ｓ）が最
大になる既知形状ｓがこの場合の手の形状として認識さ
れる。The shape selecting section 86 selects a known shape having the highest probability of obtaining the calculated feature vector V as a hand shape. That is, the calculated feature vector V
Is obtained for all the known shapes s, and the known shape s at which the obtained probability P (V | s) is maximized is recognized as the hand shape in this case. .

【００６５】以上のように、手の姿勢に基づいて形状認
識に用いる最適なカメラを選択しているため、オクルー
ジョンを回避し、手の形状を安定して認識することがで
きる。また、画像から抽出した手の輪郭を記述するため
に、形状の画像内の平行移動、拡大・縮小などに対して
不変なＰ型フーリエ記述子を用いているため、上記カメ
ラ選択と組合せることにより手の姿勢に関係なく手形状
を安定して認識することができる。As described above, since the optimal camera used for shape recognition is selected based on the posture of the hand, occlusion can be avoided and the shape of the hand can be recognized stably. Also, in order to describe the outline of the hand extracted from the image, a P-type Fourier descriptor that is invariant to translation, enlargement / reduction, etc. in the image of the shape is used. Thereby, the hand shape can be stably recognized regardless of the posture of the hand.

【００６６】また、抽出した手の輪郭線を手のひらの垂
直方向に一致する方向から手を撮影したならば得られた
であろう手の輪郭線に補正しているため、選択されたカ
メラが手のひらを真正面から撮影していない場合であっ
ても正確に手の形状を認識することができる。Since the extracted contour of the hand is corrected to the contour of the hand that would have been obtained if the hand was photographed from a direction that coincides with the vertical direction of the palm, the selected camera is Can be accurately recognized even when the image is not taken directly in front of the camera.

【００６７】また、相対的に低い次数のフーリエ係数を
用いて特徴ベクトルＶを算出しているため、抽出された
手の輪郭線に含まれる高周波ノイズ（高次フーリエ係数
として現れる）を除去し、真の手の輪郭線（低次フーリ
エ係数として現れる）のみを正確に特定することができ
る。その結果、手形状をより安定して認識することがで
きる。Since the feature vector V is calculated using the Fourier coefficients of a relatively low order, high-frequency noise (appearing as higher-order Fourier coefficients) contained in the extracted hand contour is removed. Only true hand contours (appearing as low-order Fourier coefficients) can be accurately identified. As a result, the hand shape can be more stably recognized.

【００６８】なお、上記式では、カメラ選択において手
のひらの表裏の違いが考慮されておらず、そのため、同
じ手形状について表裏２種類の画像入力があり得る。こ
のような輪郭線の左右の判定に対し、得られるＰ型フー
リエ記述子の式（２２）で表わされる。ここで、ｃ′
（ｋ）は反転後の係数を示し、ｃ（−ｋ）の上線はその
共役複素数を示す。そこで、手の表裏の判定を考慮し、
式（２３）で表わされるように特徴ベクトルＶの成分を
並べ変えたＶ^*を定義する。In the above equation, the difference between the front and back of the palm is not taken into account in the camera selection. Therefore, two types of image input can be performed for the same hand shape. Such determination of the left and right of the contour is represented by Expression (22) of the obtained P-type Fourier descriptor. Where c ′
(K) indicates the coefficient after inversion, and the upper line of c (-k) indicates its conjugate complex number. Therefore, considering the judgment of the front and back of the hand,
Define V ^{* in} which the components of the feature vector V are rearranged as represented by equation (23).

【００６９】[0069]

【数１０】 (Equation 10)

【００７０】手形状の認識においては、Ｐ（Ｖ｜ｓ）お
よびＰ（Ｖ^*｜ｓ）の両方を評価することで、入力画像
の表裏の違いに対応する。すなわち、選択画像から抽出
された輪郭線からＶ，Ｖ^*を計算し、すべての形状ｓに
対するＰ（Ｖ｜ｓ）、Ｐ（Ｖ ^*｜ｓ）のうち最大値をと
る分布を選択し、対応する形状ｓを認識結果とする。In recognizing the hand shape, P (V | s) and
And P (V^*| S) to evaluate the input image
Corresponds to the difference between the front and back. That is, extracted from the selected image
V, V from the contour^*And calculate all the shapes s
P (V | s), P (V ^*| S)
Is selected, and the corresponding shape s is set as a recognition result.

【００７１】[0071]

【実施例】以下の実験のために、次のように画像の撮影
を行なった。実験では、図６に示すように、同心円上に
中央に向けて所定の間隔で５台のカメラを配置し、手を
撮影する。図１５に示す７種類の手形状shape １〜７そ
れぞれについて約６００フレーム分の画像を撮影した。
撮影画像は、手のひらの回転と移動を含む。５台のカメ
ラで得られた画像は５台の追記型ビデオディスクに記録
した後に、コマ送りにより１フレームごとに処理した。
全フレームのうち３００フレームを学習用に、残りを認
識実験用に用いた。EXAMPLES For the following experiments, images were taken as follows. In the experiment, as shown in FIG. 6, five cameras are arranged at predetermined intervals on the concentric circle toward the center, and the hand is photographed. Images of about 600 frames were taken for each of the seven types of hand shapes 1 to 7 shown in FIG.
The captured image includes rotation and movement of the palm. Images obtained by the five cameras were recorded on five write-once video disks, and then processed frame by frame by frame advance.
Of all the frames, 300 frames were used for learning, and the rest were used for recognition experiments.

【００７２】パラメータ学習時には、各形状について処
理フレームの画像から上記実施の形態で述べた方法によ
り特徴ベクトルＶ₀を抽出した。２フレーム以降につい
ては、特徴ベクトルＶ，Ｖ^*のうち、ユークリッド距離
が特徴ベクトルＶ₀に近いものを選んで記録した。形状
ごとに得られた、３００サンプルの特徴ベクトルをもと
に上記実施の形態で述べた確率分布を決定した。At the time of parameter learning, a feature vector V ₀ was extracted from the image of the processing frame for each shape by the method described in the above embodiment. For two frames later, feature vector V, of V ^*, the Euclidean distance is recorded by selecting those closer to the feature vector V _0. The probability distribution described in the above embodiment was determined based on the feature vectors of 300 samples obtained for each shape.

【００７３】認識実験用の各形状約３００フレームの画
像に対する形状認識結果を図１６に示す。図１６に示さ
れるように、いずれの形状についても８７％以上、特に
形状shape １，２，６，７について９９％以上という安
定した認識結果が得られた。これにより、この発明によ
る手振り認識装置の有効性が示されたといえる。なお、
形状shape ３，４，５については自己オクルージョンに
よる誤判別を起こしていると考えられるが、このような
オクルージョンによる誤判別は、視点数の増加により視
点選択の機会を増やすことにより回避することができ
る。FIG. 16 shows a shape recognition result for an image of about 300 frames for each shape for a recognition experiment. As shown in FIG. 16, a stable recognition result of 87% or more was obtained for all the shapes, and particularly 99% or more for the shapes 1, 2, 6, and 7. Thus, it can be said that the effectiveness of the hand gesture recognition device according to the present invention has been demonstrated. In addition,
It is considered that erroneous determination due to self-occlusion is caused for shapes shape 3, 4, and 5, but such erroneous determination due to occlusion can be avoided by increasing the number of viewpoints and increasing the opportunity for viewpoint selection. .

【００７４】上述した手振り認識装置を利用し、対話的
に仮想空間を操作できるシステムを構築した。利用者
は、手振りによってコマンドを発することで、基本的な
形状を有する３次元の仮想物体を作成し、仮想物体の配
置・削除・拡大縮小などの操作を行なうことができる。Using the above-described hand gesture recognition device, a system capable of interactively operating a virtual space was constructed. By issuing a command by hand gesture, a user can create a three-dimensional virtual object having a basic shape and perform operations such as arranging, deleting, and scaling the virtual object.

【００７５】実験では、図１７に示すコマンドを、図１
６に示す７種類の手形状に対応させた。図１７における
“手の位置”は、利用者の手が、仮想物体の内側にある
か外側にあるかを示している。また、図１７における
“形状遷移”は、手の形状の変化を意味する。形状shap
e ７の提示後、他の形状shape １〜６を提示することで
コマンドが実行あるいは開始される。In the experiment, the command shown in FIG.
Six types of hand shapes shown in FIG. The “hand position” in FIG. 17 indicates whether the user's hand is inside or outside the virtual object. “Shape transition” in FIG. 17 means a change in the shape of the hand. Shape shap
After the presentation of e7, the command is executed or started by presenting the other shapes 1-6.

【００７６】コンピュータグラフィックスで表示される
仮想空間には、図１８〜図１９に示すように、利用者の
手の３次元位置を表わす指示ポインタが描画されてお
り、利用者は自分の手を動かすことにより操作対象とな
る仮想物体の内部にポインタを移動させることで、操作
対象を直接指定することができる。In the virtual space displayed by the computer graphics, as shown in FIGS. 18 to 19, an instruction pointer indicating the three-dimensional position of the user's hand is drawn. By moving the pointer inside the virtual object to be operated by moving the object, the operation object can be directly designated.

【００７７】「作成」（手の軌道に沿って仮想物体の形
状を特定する）コマンドでは、手を閉じる（shape７→
１）。そして、手を動かすことで軌道を描く。閉じた手
を開いて「作成」コマンドを終了させると、仮想物体が
作成される。The “create” (specify the shape of the virtual object along the trajectory of the hand) command closes the hand (shape 7 →
1). Then draw a trajectory by moving your hand. Opening the closed hand and ending the “create” command creates a virtual object.

【００７８】「作成」コマンドを除くすべてのコマンド
では、手を３次元空間において仮想物体の内側に移動さ
せる。In all commands except the “create” command, the hand is moved inside the virtual object in the three-dimensional space.

【００７９】「削除」コマンド（仮想物体を消去す
る）、「結合」コマンド（近接する複数の仮想物体を結
合する）、「分割」（仮想物体の結合を解除する）の３
コマンドについては、それぞれ操作対象内にポインタを
移動させた後、各コマンドに対応する形状遷移を提示す
ることで、当該コマンドが実行される。The three commands are a "delete" command (to delete a virtual object), a "combination" command (to combine a plurality of adjacent virtual objects), and a "division" (to release the combination of virtual objects).
The command is executed by moving the pointer into the operation target and presenting a shape transition corresponding to each command.

【００８０】「把持・移動」コマンド（仮想物体を掴ん
で移動させる）、「拡大・縮小」コマンド（仮想物体の
大きさを変更する）、「色・テクスチャ変更」コマンド
（仮想物体表面の色・テクスチャを変化させる）につい
ては、ポインタによる指示、対応する形状遷移に続い
て、さらに手を移動させることで、変更対象となる物体
パラメータ（位置、大きさ、色など）が移動量に応じて
変化する。パラメータの変更は、手形状を形状shape ７
に再び遷移させることで終了する。A “grasping / moving” command (grasping and moving the virtual object), an “enlarge / reduce” command (changing the size of the virtual object), a “color / texture changing” command (color / texture of the virtual object surface) For changing the texture), following the instruction by the pointer and the corresponding shape transition, moving the hand further changes the object parameters (position, size, color, etc.) to be changed according to the amount of movement. I do. To change the parameters, change the hand shape to shape 7
The process is terminated by making a transition again to.

【００８１】「伸張」（仮想物体の長さを変える）コマ
ンドを除くすべてのコマンドは、右手・左手をそれぞれ
独立して発行することができる。「伸張」コマンドで
は、右手と左手とで仮想物体を把持し、相対的に手を移
動させる。All commands except the "stretch" (change the length of the virtual object) command can be issued independently of the right hand and the left hand. In the “extension” command, the virtual object is grasped by the right hand and the left hand, and the hand is relatively moved.

【００８２】以上の次第で、多数カメラを用いた手振り
認識装置とそれを利用した仮想シーン生成システムにつ
いて述べた。本システムは、多数視点の画像から抽出さ
れた最適な少数の画像から、手の動き（３次元位置、手
振り）を再構築する。より具体的には、追跡結果に従い
オクルージョンを生じない画像のうちの最適な画像が、
それぞれの手に対して選択される。この視点選択のメカ
ニズムにより、精緻な３次元モデルを用いることなく自
己オクルージョンや手同士のオクルージョンの発生を効
果的に減少させることができる。As described above, a hand gesture recognition device using a large number of cameras and a virtual scene generation system using the same have been described. The present system reconstructs hand movements (three-dimensional position, hand movements) from a small number of optimal images extracted from images of many viewpoints. More specifically, the best image that does not cause occlusion according to the tracking result is
Selected for each hand. With this viewpoint selection mechanism, the occurrence of self-occlusion and occlusion between hands can be effectively reduced without using a sophisticated three-dimensional model.

【００８３】またこれにより、モデルの生成・モデルの
再構成のためのコンピュータ上のコストを削減すること
が可能となる。Further, this makes it possible to reduce the cost on the computer for generating the model and reconstructing the model.

【００８４】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

【００８５】[0085]

【発明の効果】以上のようにこの発明によれば、カルマ
ンフィルタによる予測に基づいて複数のカメラのうち最
適なカメラの画像を選択して手の動きを追跡する。この
ため、オクルージョンを回避して、安定して、しかも実
時間で複数の手に対する動きを追跡・認識することが可
能となる。As described above, according to the present invention, an optimum camera image is selected from a plurality of cameras based on the prediction by the Kalman filter, and the hand movement is tracked. For this reason, it becomes possible to avoid occlusion and to track and recognize the movements of a plurality of hands stably and in real time.

【００８６】また、選択された画像の特徴点に基づき、
手同士のオクルージョンを回避するようにさらに処理対
象とする画像の絞込みをおこなう。これにより、安定し
て、しかも実時間で複数の手に対する動きを追跡・認識
することが可能となる。Also, based on the feature points of the selected image,
The image to be processed is further narrowed down so as to avoid occlusion between hands. This makes it possible to track and recognize the movements of a plurality of hands stably and in real time.

【００８７】さらに、手の形状認識にあたり、形状の画
像内の平行移動、拡大・縮小などに対して不変なＰ型フ
ーリエ記述子を用いるため、低計算コストで安定して手
の形状を認識することができる。Further, in recognizing the hand shape, a P-type Fourier descriptor which is invariant to translation, enlargement / reduction, etc. in the image of the shape is used, so that the hand shape can be stably recognized at low calculation cost. be able to.

[Brief description of the drawings]

【図１】本発明の実施の形態１における手振り認識装置
１０００の要部の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a main part of a hand gesture recognition device 1000 according to Embodiment 1 of the present invention.

【図２】図１に示す特徴抽出部６の構成の概要を示すブ
ロック図である。FIG. 2 is a block diagram showing an outline of a configuration of a feature extracting unit 6 shown in FIG.

【図３】特徴抽出部６の動作を説明するための図であ
る。FIG. 3 is a diagram for explaining the operation of a feature extraction unit 6;

【図４】図１に示す追跡処理部８の構成の概要を示すブ
ロック図である。FIG. 4 is a block diagram illustrating an outline of a configuration of a tracking processing unit 8 illustrated in FIG. 1;

【図５】カルマンフィルタにおける観測モデルについて
説明するための図である。FIG. 5 is a diagram for describing an observation model in a Kalman filter.

【図６】追跡処理部８および視点選択部４の有効性を説
明するための実験環境を示す図である。FIG. 6 is a diagram showing an experimental environment for explaining the effectiveness of the tracking processing unit 8 and the viewpoint selecting unit 4;

【図７】連続するシーケンスにおいて右手と左手とを追
跡した実験結果を示す図である。FIG. 7 is a diagram showing an experimental result of tracking a right hand and a left hand in a continuous sequence.

【図８】連続するシーケンスにおいて右手と左手とを追
跡した実験結果を示す図である。FIG. 8 is a diagram showing an experimental result of tracking a right hand and a left hand in a continuous sequence.

【図９】回転角検出部３１における手のひらモデルを用
いた回転角の検出について説明するための図である。FIG. 9 is a diagram for explaining detection of a rotation angle using a palm model in a rotation angle detection unit 31;

【図１０】回転角検出部３１の有効性を説明するための
実験結果を示す図である。FIG. 10 is a diagram showing experimental results for explaining the effectiveness of the rotation angle detection unit 31.

【図１１】視点選択部３２の動作について説明するため
の図である。FIG. 11 is a diagram for describing an operation of a viewpoint selection unit 32.

【図１２】視点選択部３２の有効性を説明するための実
験結果を示す図である。FIG. 12 is a diagram showing experimental results for explaining the effectiveness of the viewpoint selecting unit 32.

【図１３】手形状認識部３３の構成を示すブロック図で
ある。13 is a block diagram illustrating a configuration of a hand shape recognition unit 33. FIG.

【図１４】輪郭線抽出部８１の動作について説明するた
めの図である。FIG. 14 is a diagram for explaining the operation of a contour line extraction unit 81.

【図１５】認識実験に使用する手形状を説明するための
図である。FIG. 15 is a diagram for explaining a hand shape used in a recognition experiment.

【図１６】図１５に示す手形状に対する認識結果を示す
図である。16 is a diagram illustrating a recognition result for the hand shape illustrated in FIG. 15;

【図１７】手振り認識装置に基づく対話型システムにお
けるコマンドと手との関係を示す図である。FIG. 17 is a diagram illustrating a relationship between a command and a hand in an interactive system based on a hand gesture recognition device.

【図１８】手振り認識装置に基づく対話型システムにお
ける指示ポインタの状況を表わした図である。FIG. 18 is a diagram illustrating a state of an instruction pointer in an interactive system based on a hand gesture recognition device.

【図１９】手振り認識装置に基づく対話型システムにお
ける指示ポインタの状況を表わした図である。FIG. 19 is a diagram illustrating a state of an instruction pointer in an interactive system based on a hand gesture recognition device.

[Explanation of symbols]

２、２♯１〜２♯ｎカメラ４、３３視点選択部６特徴抽出部８追跡処理部１０、１２追跡部１４ジェスチャ認識部２０♯１〜２０♯ｎ領域分割部２２♯１〜２２♯ｎ主軸検出部２４♯１〜２４♯ｎ回転変換部２６♯１〜２６♯ｎ特徴点算出部３０♯１〜３０♯２３次元位置・方向検出部３１♯１〜３１♯２回転角検出部３２♯１〜３２♯２視点選択部３３♯１〜３３♯２手形状認識部８１輪郭線抽出部８２輪郭線補正部８３Ｐ型フーリエ記述部８４特徴ベクトル算出部８５確率定義部８６形状選択部１０００手振り認識装置 2, 2♯1-2♯n Camera 4, 33 Viewpoint selection unit 6 Feature extraction unit 8 Tracking processing unit 10, 12 Tracking unit 14 Gesture recognition unit 20 # 1-20♯n Area division unit 22 # 1-22♯n Spindle detection unit 24 # 1 to 24 # n Rotation conversion unit 26 # 1 to 26 # n Feature point calculation unit 30 # 1 to 30 # 2 Three-dimensional position / direction detection unit 31 # 1 to 31 # 2 Rotation angle detection unit 32 ♯1-32♯2 viewpoint selection unit 33 ♯1-33♯2 hand shape recognition unit 81 contour extraction unit 82 contour correction unit 83 P-type Fourier description unit 84 feature vector calculation unit 85 probability definition unit 86 shape selection unit 1000 Hand gesture recognition device

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 15/70 ３５０Ｈ４６０ＢＦターム(参考） 2F065 AA04 AA17 AA37 AA39 AA51 CC16 FF01 FF04 JJ03 JJ05 JJ19 JJ26 QQ16 QQ17 QQ27 QQ31 QQ42 5B057 AA20 BA02 CA01 CA12 CD03 DA08 DB02 DB05 DB08 DC05 DC09 DC17 DC34 5L096 AA07 BA08 CA05 EA16 FA06 FA60 FA66 FA67 HA05 HA09 JA11 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 15/70 350H 460B F-term (Reference) 2F065 AA04 AA17 AA37 AA39 AA51 CC16 FF01 FF04 JJ03 JJ05 JJ19 JJ26 QQ16 QQ17 QQ27 QQ31 QQ42 5B057 AA20 BA02 CA01 CA12 CD03 DA08 DB02 DB05 DB08 DC05 DC09 DC17 DC34 5L096 AA07 BA08 CA05 EA16 FA06 FA60 FA66 FA67 HA05 HA09 JA11

Claims

[Claims]

1. A hand gesture recognition device for recognizing a plurality of hand movements, comprising: a plurality of image pickup means for photographing the plurality of hands from different directions to obtain a plurality of images; First selecting means for selectively outputting an image for recognizing a plurality of hand movements; extracting feature points of the image selected by the first selecting means; and an image to be tracked based on the feature points And a plurality of tracking processing means provided for each of the plurality of hands and tracking the movement of the corresponding hand, wherein each of the plurality of tracking processing means Prediction means for predicting the state of the hand using the feature points of the target image; calculating the normal direction of the hand based on the image to be tracked; the direction closest to the normal direction Image obtained as a result of hand shooting from A second selecting unit for selecting, and a recognizing unit for recognizing a hand shape based on the image selected by the second selecting unit, wherein the first selecting unit includes the A hand gesture recognition device for selecting an image in which a feature point is located near a predicted state.

2. The method according to claim 1, wherein the feature extraction unit detects occurrence of occlusion based on a ratio between the position of the extracted feature point and a width of a hand, and deselects an image in which the occlusion is detected. The hand gesture recognition device according to 1.

3. The method according to claim 1, wherein the recognizing means extracts a contour of the hand in the image selected by the second selecting means, and extracts the contour of the hand extracted by the contour extracting means into P. 3. The hand gesture according to claim 2, further comprising: a P-type Fourier description unit described by a type-Fourier descriptor; Recognition device.