JP2006244272A

JP2006244272A - Hand position tracking method, device and program

Info

Publication number: JP2006244272A
Application number: JP2005060952A
Authority: JP
Inventors: Yoshinori Kitahashi; 美紀北端; Hidekazu Hosoya; 英一細谷; Hidenori Sato; 秀則佐藤; Ikuo Harada; 育生原田; Akira Onozawa; 晃小野澤; Shizue Hattori; 静枝服部
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-03-04
Filing date: 2005-03-04
Publication date: 2006-09-14

Abstract

PROBLEM TO BE SOLVED: To reduce erroneous recognition of a similar object in a method for specifying the two-dimensional position of a hand from an image photographed by a camera. SOLUTION: An image is inputted from an image picking up part 11, a limited area setting part 12A determines a tracking range, and a limited area image creating part 14 creates a camera image (limited area image) limited only to information within the limited range. A skin color tracking part 15 uses the limited area image to track the position of the hand according to skin color information. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、人間の手の動きと位置によってコンピュータ、家電等を操作するインタフェース方法および装置に関する。 The present invention relates to an interface method and apparatus for operating a computer, home appliances, etc. according to the movement and position of a human hand.

素手を認識して手の動きと位置によってコンピュータや家電などを操作するインタフェースが研究されている。手の位置と動きで操作するインタフェースの利用シーンとしては、画面から少し離れた場所にユーザが複数いて、背景は特に規定しない普通の部屋などが想定され、また、ユーザが何も身体に装着しなくても使えることが望ましい。さらに、手の動きに追随できる程度の実時間性が要求される。 An interface for recognizing a bare hand and operating a computer or a household appliance according to the movement and position of the hand has been studied. The usage scene of the interface operated by the position and movement of the hand is assumed to be a normal room where there are multiple users at a distance from the screen and the background is not specified, and the user wears nothing on the body. It is desirable that it can be used without it. Furthermore, real-time performance that can follow the movement of the hand is required.

このような状況で手の位置認識をすることは実は難しい。 It is actually difficult to recognize the hand position in such a situation.

アプローチとして、ステレオ視などの形状復元技術を利用し手の形状の特徴を検出する手法もあるが、直接手をクローズアップするなどの手段を用いなければ十分な精度で手の形状を認識することは困難である。クローズアップなどをせずに撮像する場合、人物全体の大まかな形状（カメラからの距離情報）を利用することとなる。しかし、大まかな形状のみでは手の認識が困難なため、誤認識が発生しやすい。また、３次元的に指差しを行うシステムの場合に、指先位置の計測誤差が指されている物体との距離に比例して拡大することによる指差し位置の誤認識も発生する。また、肌色情報を主に用いて手の位置を検出・追跡するものは、顔や周りの物体など手以外に条件に当てはまるものが出てきてしまうため、やはり誤認識の発生が避けられない。 As an approach, there is a method of detecting the shape of the hand shape using shape restoration technology such as stereo vision, but the shape of the hand can be recognized with sufficient accuracy unless means such as direct close-up of the hand are used. It is difficult. When taking an image without performing close-up or the like, the rough shape (distance information from the camera) of the entire person is used. However, since it is difficult to recognize a hand only with a rough shape, erroneous recognition is likely to occur. In addition, in the case of a system that performs three-dimensional pointing, misrecognition of the pointing position also occurs because the measurement error of the fingertip position increases in proportion to the distance from the object being pointed. In addition, in the case of detecting and tracking the position of the hand mainly using the skin color information, there are cases where conditions other than the hand such as the face and surrounding objects are met, and thus erroneous recognition is unavoidable.

ここで、代表的な従来技術の問題点について説明する。非特許文献１に記載の技術は、ＣｏｎｔｉｎｕｏｕｓｌｙＡｄｓａｐｔｉｖｅＭｅａｎＳｈｉｆｔ（ＣＡＭＳＨＩＦＴ）と呼ばれるアルゴリズムを用いて対象物をリアルタイムに追跡するものであるが、指定色分布領域の追跡を時系列的に前の画像での指定色検出位置を起点とする近傍追跡背景に同色のものがあると誤認識することがある。特に物体に照明の影響で影などが入った場合色情報の精度が落ちる。非特許文献２に記載の技術は、距離（一番手前にあるもの）と背景差分（カメラと手の間に置いてあるものなどは無視）と顔の近くにあるという条件と肌色情報を用いて手を認識するもので、顔検出は他人が肩越しに覗いている等の複雑な背景では操作者の特定等が難しく、顔の誤認識の影響を強く受ける。非特許文献３に記載の技術は、指定色領域の時系列の各画像毎を追跡するもので、指定色マーカを使わなくてはいけない。背景にない可能性が高い色情報を使っており、背景に指定色がある場合、位置認識は失敗する。
ＧａｒｙＲ．Ｂｒａｄｓｋｉ，“ＲｅａｌＴｉｍｅＦａｃｅａｎｄＯｂｊｅｃｔＴｒａｃｋｉｎｇａｓａＣｏｍｐｏｎｅｎｔｏｆａＰｅｒｃｅｐｔｕａｌＵｓｅｒＩｎｔｅｒｆａｃｅ” １９９８ＩＥＥＥｐ．ｐ．２１４〜２１９金次保明ら「指さしポインターにおけるカーソル位置の特定法」ＩＥＩＣＥＩＴＳ研究会２００２．１細谷英一ら「ミラーインタフェースを用いた遠隔地間の実世界インタラクション」ＦＩＴ（情報科学技術フォーラム）２００３，ｐ．ｐ．５９１〜５９２ Here, the problems of typical prior art will be described. The technique described in Non-Patent Document 1 tracks an object in real time using an algorithm called Continuously Adaptive Mean Shift (CAMSSHIFT). May be misrecognized that there is the same color in the vicinity tracking background starting from the designated color detection position. In particular, the accuracy of color information is reduced when a shadow or the like enters an object due to lighting. The technique described in Non-Patent Document 2 uses distance (the one that is in the foreground), background difference (ignoring anything that is placed between the camera and the hand), and the condition of being near the face and skin color information. Face detection is difficult to identify the operator in complicated backgrounds such as when someone is looking over your shoulder and is strongly affected by face misrecognition. The technique described in Non-Patent Document 3 tracks each time-series image in the designated color region, and must use a designated color marker. If color information that has a high possibility of not being in the background is used and there is a designated color in the background, position recognition fails.
Gary R. Bradski, “Real Time Face and Object Tracking as a Component of a Perceptual User Interface” 1998 IEEE p. p. 214-219 Yasuaki Kaneji et al. “Identification Method of Cursor Position on Pointer Pointer” IEICE ITS Study Group 2002.1 Eiichi Hosoya et al. “Real World Interaction between Remote Locations Using Mirror Interface” FIT (Information Science and Technology Forum) 2003, p. p. 591-592

本発明の目的は、類似物体誤認識を減らす手位置追跡方法、装置、およびプログラムを提供することにある。 An object of the present invention is to provide a hand position tracking method, apparatus, and program that reduce false recognition of similar objects.

本発明の第１の態様によれば、手位置追跡方法は、
撮像手段で撮像された撮像画像を入力として手の探索範囲を限定するための画像内の限定領域を決定する限定領域決定段階と、
該撮像画像から該限定領域外の情報を削除した画像情報である限定領域画像を作成する限定領域画像作成段階と、
該限定領域画像を探索範囲とし、肌色領域の色を指定する肌色情報に適合する画像上の領域を追跡する肌色追跡段階と
を有する。 According to the first aspect of the present invention, the hand position tracking method comprises:
A limited area determination step for determining a limited area in the image for limiting the search range of the hand using the captured image captured by the imaging means as an input;
A limited area image creating step of creating a limited area image which is image information obtained by deleting information outside the limited area from the captured image;
A skin color tracking step of tracking the region on the image that matches the skin color information that specifies the color of the skin color region with the limited region image as a search range.

本態様は、手の位置を追跡する範囲を決定して、その限定範囲内の情報にのみ限ったカメラ画像（限定領域画像）を作成し、限定領域画像を用いて肌色情報により手の位置を追跡するものである。追跡する範囲を限定することにより、処理対象の情報が減り、誤認識の可能性が減るとともに、計算コストも減る。 In this aspect, a range for tracking the position of the hand is determined, a camera image limited to information within the limited range (limited area image) is created, and the position of the hand is determined based on skin color information using the limited area image. To track. By limiting the range to be tracked, the information to be processed is reduced, the possibility of erroneous recognition is reduced, and the calculation cost is also reduced.

本発明の第２の態様によれば、前記限定領域決定段階が、第１および第２の撮像手段から得られる複数の画像からステレオ視技術によって画像各画素の距離情報を作成する距離情報作成段階と、該距離情報から探索範囲を指定するための対象距離情報に合致する画像上の領域を求めて限定領域として出力する限定領域作成段階を含む。 According to the second aspect of the present invention, the limited area determining step generates distance information of each pixel of the image by a stereo vision technique from a plurality of images obtained from the first and second imaging means. And a limited area creating step of obtaining an area on the image that matches the target distance information for designating the search range from the distance information and outputting the area as a limited area.

本態様は、撮像手段を２つ用いてステレオ視の手法により距離情報を作成し、対象距離設定部から得られる対象距離情報に基づき限定領域を算出するものである。 In this aspect, distance information is created by a stereo vision technique using two imaging means, and a limited region is calculated based on target distance information obtained from a target distance setting unit.

本発明の第３の態様によれば、限定領域決定段階が、前記撮像手段から得られる時系列上で連続する複数の画像列に対し、隣接する２画像毎の各画素値の差分絶対値の総和が一定値以上である画像上の点を抽出する動き情報作成段階と、該抽出点の画像上の隣接関係によって定義される連結領域の面積が一定値以上である連結領域集合を限定領域として出力する限定領域作成段階を含む。 According to the third aspect of the present invention, the limited region determination step may calculate the absolute value of the difference between the pixel values of every two adjacent images with respect to a plurality of consecutive image sequences obtained in time series from the imaging unit. A motion information creation stage for extracting points on the image whose sum is equal to or greater than a certain value, and a connected region set in which the area of the connected region defined by the adjacent relationship on the image of the extracted points is equal to or greater than a certain value is defined as a limited region Includes a limited area creation stage to output.

本態様は、１つの撮像手段の画像を入力とし、時系列的に連続する画像の差分を用いて動き情報を作成し、動きのある範囲を限定領域とするものである。 In this aspect, an image of one imaging unit is input, motion information is created using a difference between images that are time-sequentially continuous, and a range where there is a motion is defined as a limited region.

本発明の第４の態様によれば、手位置追跡方法は、
第１および第２の撮像手段から得られる複数の画像からステレオ視技術によって画像各画素の距離情報を作成し、該距離情報から探索範囲を指定するための対象距離情報に合致する画像上の領域を求めて第１の限定領域として出力する第１の限定領域決定段階と、
第１の撮像手段から得られる時系列上で連続する複数の画像列に対し、隣接する２画像毎の各画素値の差分絶対値の総和が一定値以上である画像上の点を抽出し、該抽出点の画像上の隣接関係によって定義される連結領域の面積が一定値以上である連結領域集合を第２の限定領域として出力する第２の限定領域決定段階と、
該撮像画像から前記第１と第２の限定領域の積集合である限定領域外の情報を削除した画像情報である限定領域画像を作成する限定領域画像作成段階と、
該限定領域画像を探索範囲とし、肌色領域の色を指定する肌色情報に適合する画像上の領域を追跡する肌色追跡段階と
を有する。 According to a fourth aspect of the present invention, a hand position tracking method comprises:
An area on the image that matches the target distance information for specifying the search range from the distance information by creating distance information of each pixel of the image from a plurality of images obtained from the first and second imaging means by stereo vision technology A first limited region determination step for obtaining and outputting as a first limited region;
For a plurality of image sequences that are continuous in time series obtained from the first imaging means, extract points on the image in which the sum of absolute difference values of each pixel value for every two adjacent images is a certain value or more, A second limited region determination step of outputting a connected region set in which the area of the connected region defined by the adjacent relationship on the image of the extraction points is a certain value or more as a second limited region;
A limited area image creation step of creating a limited area image that is image information obtained by deleting information outside the limited area that is a product set of the first and second limited areas from the captured image;
A skin color tracking step of tracking the region on the image that matches the skin color information that specifies the color of the skin color region with the limited region image as a search range.

本態様は、動き情報と距離情報をあわせて用いて、動きがあり、一定の距離範囲にある領域を探索範囲として限定するものである。 In this aspect, the motion information and the distance information are used together to limit an area that is in motion and within a certain distance range as a search range.

本発明は、追跡する範囲を限定することにより、色ベース手法の欠点である類似色物体誤認識の削減、さらに計算コストの低減が可能になる。 In the present invention, by limiting the range to be tracked, it is possible to reduce false recognition of similar color objects, which is a drawback of the color-based method, and to reduce calculation costs.

次に、本発明の実施の形態について図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１を参照すると、本発明の第１の実施形態の手位置追跡装置は、撮像部１１と限定領域設定部１２Ａと限定領域画像作成部１３と肌色追跡部１４と表示部１５とで構成されている。 [First Embodiment]
Referring to FIG. 1, the hand position tracking device according to the first embodiment of the present invention includes an imaging unit 11, a limited region setting unit 12 A, a limited region image creating unit 13, a skin color tracking unit 14, and a display unit 15. ing.

本実施形態は撮像部１１からの画像を入力とし、ユーザが追跡する範囲を決定して、その限定範囲内の情報にのみ限ったカメラ画像（限定領域画像）を作成し、限定領域画像を用いて肌色情報により手の位置を追跡するものである。本実施形態は、動きや距離情報で限定領域を自動的に決めるのではなく、手動（ファイル入力またはアプリケーションのユーザインタフェースより入力）によって領域を指定する方法を示しており、メニューアイコンなどがあるエリアなどだけを認識対象とする場合などに使える。 In this embodiment, an image from the imaging unit 11 is input, a range to be tracked by the user is determined, a camera image (limited region image) limited to only information within the limited range is created, and the limited region image is used. The position of the hand is tracked by the skin color information. This embodiment shows a method for specifying a region manually (input from a file input or application user interface) instead of automatically determining a limited region based on movement and distance information. It can be used when only the target is recognized.

本実施形態は特に、ミラーインタフェースへの適用を考えたもので、画面上のメニューアイコン（タッチされてコマンドが入力されるオブジェクト）を触るための方法の１つである。メニューを認識対象とするのは、それ以外の場所で手が見つかってもアプリケーション側がコマンドをメニューからしか受け付けない場合には意味がないからである。また、メニューアイコンの場所以外にシステムを設置した場合に背景にドアがあってよく動き、人が通ることがわかっていて、その場所で手を認識する必要がない場合、ドアのある位置以外の場所を認識対象として指定することによってドアの開閉と背景の人をノイズとして無視することができるようになり、そのような場合にも本実施形態を使用することができる。 In particular, this embodiment is intended for application to a mirror interface, and is one method for touching a menu icon (an object to which a command is input by touching it) on the screen. The reason for recognizing the menu is that it is meaningless if the application side only accepts commands from the menu even if a hand is found elsewhere. In addition, when the system is installed at a location other than the location of the menu icon, if there is a door in the background and it moves well and it is known that a person passes and there is no need to recognize the hand at that location, By designating a place as a recognition target, the opening and closing of the door and the background person can be ignored as noise, and in this case, the present embodiment can be used.

次に、各部の機能を図２のフローチャートも参照しながら説明する。 Next, the function of each unit will be described with reference to the flowchart of FIG.

限定領域決定部１２Ａはユーザが画面上の特定領域を限定領域として予め指定するためのものである（ステップ１０１、１０２）。四角形の限定領域の場合、四角形の対角線の２点の座標を指定する、または１点の座標と四角形の幅と高さを指定することによって、限定領域を指定することができる。また、四角形以外の限定領域の場合、座標、大きさなどを指定することによって、限定領域を指定することができる。ユーザインタフェースとして本手位置追跡装置を用いることを想定した場合、ユーザが選択可能な画面上のコマンド指定領域（アイコンなど）を含む範囲を予め限定領域に設定することが可能である。限定領域決定部１２Ａの出力のうち限定範囲内の画素は“１”の値を持ち、それ以外の画素は“０”の値をもつ。図１では、３つの正方形の領域が限定領域となっている。 The limited area determination unit 12A is for the user to designate in advance a specific area on the screen as a limited area (steps 101 and 102). In the case of a rectangular limited area, the limited area can be specified by specifying the coordinates of two points on the diagonal of the square or by specifying the coordinates of one point and the width and height of the square. In the case of a limited area other than a quadrangle, the limited area can be specified by specifying coordinates, size, and the like. When it is assumed that the hand position tracking device is used as the user interface, a range including a command designation area (icon or the like) on the screen that can be selected by the user can be set in advance as a limited area. Among the outputs of the limited region determination unit 12A, pixels within the limited range have a value of “1”, and other pixels have a value of “0”. In FIG. 1, three square areas are limited areas.

撮像部１１はカメラで、手を含む人体の画像を入力する（ステップ１０３）。 The imaging unit 11 is a camera and inputs a human body image including a hand (step 103).

限定領域画像作成部１３は撮像部１１（カメラ）からのカメラ画像（基準画像）を入力し、限定領域決定部１２Ａで作成された限定領域を用いて、この限定領域内に限定された領域のみの情報を持った限定領域画像を作成する（ステップ１０４）。この限定領域画像では、限定領域内の画素に関しては、カメラ画像（基準画像）の同じ画素の位置のカラー画素値を保持し、限定領域外の画素に関しては、カメラ画像（基準画像）の同じ画素の位置のカラー画素値は“０”となる。図１では、３つの正方形の限定領域のうち、右の限定領域に手の一部の画像が入っている。 The limited region image creation unit 13 inputs a camera image (reference image) from the imaging unit 11 (camera), and uses only the limited region created by the limited region determination unit 12A, and only the region limited within this limited region. A limited area image having the following information is created (step 104). In this limited area image, for the pixels in the limited area, the color pixel value at the position of the same pixel in the camera image (reference image) is held, and for the pixels outside the limited area, the same pixel in the camera image (reference image). The color pixel value at the position is “0”. In FIG. 1, an image of a part of the hand is in the right limited area among the three square limited areas.

肌色追跡部１４は限定領域画像作成部１３で作成された限定領域画像において肌色により手の位置を追跡し（ステップ１０５）、表示部１５に表示する（ステップ１０６）。肌色追跡部１４はＣＡＭＳＨＩＦＴ等の既存の方法を用いて実現することができる。 The skin color tracking unit 14 tracks the position of the hand based on the skin color in the limited region image created by the limited region image creating unit 13 (step 105) and displays it on the display unit 15 (step 106). The skin color tracking unit 14 can be realized by using an existing method such as CAMSHIFT.

なお、ユーザが限定領域を指定する代わりに、領域を指定する座標等（限定領域指定情報）が保存されたファイルから、限定領域指定情報を限定領域決定部１１Ａが読み出すことによって行なってもよい。 Instead of designating the limited area by the user, the limited area specifying unit 11A may read the limited area designation information from a file in which coordinates for designating the area or the like (limited area designation information) are stored.

本実施形態は、ユーザインタフェースとして必要な領域にのみ追跡を限定することにより、計算コストを削減するとともに、誤認識を削減して、コマンドの選択を確実にする効果がある。 In this embodiment, tracking is limited only to an area necessary as a user interface, thereby reducing calculation cost and reducing misrecognition and ensuring command selection.

［第２の実施形態］
図３を参照すると、本発明の第２の実施形態の手位置追跡装置は撮像部１１₁，１１₂と限定領域決定部１２Ｂと限定領域画像作成部１３と肌色追跡部１４と表示部１５とから構成されている。また、図４に示すように、限定領域決定部１２Ｂは距離情報作成部２１と対象距離設定部２２と限定領域作成部２３とで構成されている。 [Second Embodiment]
Referring to FIG. 3, the hand position tracking apparatus according to the second embodiment of the present invention includes imaging units 11 ₁ and 11 ₂ , a limited region determining unit 12B, a limited region image creating unit 13, a skin color tracking unit 14, and a display unit 15. It is composed of As shown in FIG. 4, the limited region determination unit 12 B includes a distance information creation unit 21, a target distance setting unit 22, and a limited region creation unit 23.

限定領域画像作成部１３と肌色追跡部１４と表示部１５は第１の実施形態のものと同じである。 The limited area image creation unit 13, the skin color tracking unit 14, and the display unit 15 are the same as those in the first embodiment.

本実施形態は、撮像部を２つ用いてステレオ視の手法により距離情報を作成し、対象距離設定部２２から得られる対象距離情報に基づき限定領域を算出するものである。 In the present embodiment, distance information is created by a stereo vision technique using two imaging units, and a limited region is calculated based on target distance information obtained from the target distance setting unit 22.

撮像部１１₁，１１₂としては市販のカメラを用いることができる。例えばセンサーテクノロジー（株）ＳＴＣ−Ｒ６４０などである。市販のカメラ２つを用いて距離情報を作成する場合、ノンインタレースの同期制御の可能である市販のカメラを用いる。また、特別な複数のカメラがあらかじめ設置された市販の複眼カメラも用いることができる。例えば、ＰｏｉｎｔＧｒｅｙＲｅｓｅｒｃｈＩｎｃ．Ｄｉｇｉｃｌｏｐｓのようなあらかじめステレオ視用にキャリブレーションされたカメラである。 Commercially available cameras can be used as the imaging units 11 ₁ and 11 ₂ . For example, Sensor Technology Co., Ltd. STC-R640. When creating distance information using two commercially available cameras, a commercially available camera capable of non-interlaced synchronous control is used. A commercially available compound eye camera in which a plurality of special cameras are installed in advance can also be used. For example, Point Gray Research Inc. It is a camera that has been calibrated in advance for stereo vision, such as Digitallops.

対象距離設定部２２は手を認識する対象距離範囲（カメラからの距離と範囲）を設定する。対象距離範囲は、ステレオ視で得られるカメラと物体との距離の情報のなかで、どこからどこまでを対象とするか、を指定する値で、カメラから距離１（手前）と距離２（奥）の２つの地点を間にはさむ範囲からなる（図６参照）。設定のインタフェースとしては、距離を２つの数値（例えば１ｍと３ｍの間など）で設定する方法がある。これに対して、「距離情報」はステレオ視のいわゆるｄｅｐｔｈＭａｐ（カメラから見えている全体の画像の奥行き情報）である。各画素が持つ距離情報を輝度に変えてグレースケール画像で、近いところが明るく（白く）、遠いところが暗く（黒）で示されている。つまりグラデーションのある絵というのが距離画像である。 The target distance setting unit 22 sets a target distance range (distance and range from the camera) for recognizing a hand. The target distance range is a value that specifies where the target is from the distance information between the camera and the object obtained by stereo viewing. The target distance range is a distance 1 (front) and a distance 2 (back) from the camera. It consists of a range between two points (see FIG. 6). As a setting interface, there is a method of setting the distance by two numerical values (for example, between 1 m and 3 m). On the other hand, the “distance information” is a so-called depth map (depth information of the entire image viewed from the camera) for stereo viewing. The distance information of each pixel is changed to luminance, and the gray scale image is shown with the near part being bright (white) and the far part being dark (black). In other words, a picture with gradation is a distance image.

なお、対象距離設定部２２の構成は上記に限定されるものではない。例えば、カメラ前方の２地点でパタン（物体）を提示してそのパタン（物体）のカメラからの距離をステレオ視機能で測定して設定範囲の２点とする方法、任意に撮影された画像から距離画像を作成し試行錯誤により設定範囲を変更しながら距離画像を見て範囲を決定する方法、等がある。 The configuration of the target distance setting unit 22 is not limited to the above. For example, a method of presenting a pattern (object) at two points in front of the camera and measuring the distance of the pattern (object) from the camera with the stereo vision function to two points in the setting range, or from an arbitrarily shot image There is a method of creating a range image and determining the range by looking at the range image while changing the set range by trial and error.

また、対象距離を設定するもう一つの方法は、ステレオ視の機能を動的に用い、撮像部に最も近い画素の距離を基準にその距離から何ｍ〜何ｍという形で時系列に連続する各画像毎に動的に対象距離を変更することである。また、最近画素と最遠画素の距離範囲の中の一定範囲をパーセンテージ（割合）で設定することで、動的に距離範囲を設定する方法もある。 Another method of setting the target distance is to use the stereo vision function dynamically and continue in time series in the form of how many meters to how many meters from the distance based on the distance of the pixel closest to the imaging unit. The target distance is dynamically changed for each image. There is also a method of dynamically setting the distance range by setting a certain range in the distance range between the nearest pixel and the farthest pixel as a percentage.

なお、対象となる距離Ｚは後述する距離情報作成部２１で用いられるカメラの焦点距離ｆ、カメラのベースライン距離Ｂを用いてｄ＝ｆ×Ｂ／Ｚなる変換式で視差値ｄに変換して（図７参照）記録することで限定領域作成部２３の処理量を削減することができる。 Note that the target distance Z is converted into a parallax value d using a conversion formula of d = f × B / Z using a camera focal length f and a camera baseline distance B used in the distance information creation unit 21 described later. (See FIG. 7), the processing amount of the limited area creating unit 23 can be reduced.

距離情報作成部２１では、２つ以上の画像から同一の点が撮影されている画像上の位置を画像の類似度から判定し、複数画像間での位置のずれ（視差）から三角測量の原理で基準画像上でのカメラからの距離を求めるステレオ視を行う（図５のステップ１０７）。 The distance information creation unit 21 determines the position on the image where the same point is captured from two or more images based on the similarity of the images, and the principle of triangulation from the positional deviation (parallax) between the plurality of images. The stereo view for obtaining the distance from the camera on the reference image is performed (step 107 in FIG. 5).

以下では、カメラ（撮像手段）が２つで平行に置いてある場合を例として説明する。時間的に同期された２つの一定距離離れて、水平方向に平行に置かれたカメラ１１₁（右）とカメラ１１₂（左）で撮像した２枚のカメラ画像の視差を、カメラから撮像対象までの距離情報とする。 Hereinafter, a case where two cameras (imaging units) are placed in parallel will be described as an example. The parallax between the two camera images taken by the camera 11 ₁ (right) and the camera 11 ₂ (left) placed in parallel in the horizontal direction at two fixed distances synchronized in time is captured from the camera. Distance information.

視差の計算の方法としては、例えば以下のような方法がある。カメラ１１₁を基準として、基準のカメラ１１₁で撮像されたカメラ画像（基準画像）と、カメラ１１₂で撮像されたカメラ画像とで、特定の大きさの画像小領域（ブロック）同士の類似度を計算することによって、対応する画素を求めるブロックマッチングを行うことで視差を計算する。視差をｄとした時、撮像手段と撮像物体間の距離ＺはＺ＝ｆ×Ｂ／ｄ、という関係になる（参考文献：辻三郎、徐剛著「３次元ビジョン」共立出版９５−９７頁）。ｄ＝ｕ−ｕ’であり、Ｂはカメラ１１₁とカメラ１１₂との距離、ｆはカメラの焦点距離である。図７に説明図を示す。ここで、左画像、右画像とはそれぞれ左側のカメラ、右側のカメラに写っている画像である。 As a method for calculating the parallax, for example, there are the following methods. The camera 11 ₁ as a reference, the camera image captured by the reference camera 11 ₁ (reference image) in a camera image captured by the camera 11 _2, similar image subregion (block) between the particular size By calculating the degree, the parallax is calculated by performing block matching to obtain the corresponding pixel. When the parallax is d, the distance Z between the imaging means and the imaging object is Z = f × B / d. ). d = u−u ′, B is the distance between the camera 11 ₁ and the camera 11 _2, and f is the focal length of the camera. FIG. 7 shows an explanatory diagram. Here, the left image and the right image are images captured by the left camera and the right camera, respectively.

距離画像は、同時刻に撮像された右カメラ画像と左カメラ画像との視差を、画像全体の各画素の画素値で表現することで生成される。この視差は、その値が大きいほど人物の位置がカメラ１１₁に近いことを表し、値が小さいほど人物の位置がカメラ１１₁から遠いことを表している。 The distance image is generated by expressing the parallax between the right camera image and the left camera image captured at the same time by the pixel value of each pixel of the entire image. The parallax indicates that the position of the person is closer to the camera 11 ₁ as the value is larger, and the position of the person is farther from the camera 11 ₁ as the value is smaller.

なお、距離情報作成部２１の構成は上記に限定されるものではない。例えば、Ｄｉｇｉｃｌｏｐｓのような市販のステレオカメラを用いた場合は、ステレオカメラとその付属プログラムから距離情報を作成することも可能である。 The configuration of the distance information creation unit 21 is not limited to the above. For example, when a commercially available stereo camera such as Digitallops is used, it is also possible to create distance information from the stereo camera and its attached program.

図６は限定領域作成部２３の動作を説明する図である。限定領域作成部２３は、距離情報作成部２１にて作成された距離画像と、対象距離設定部２２にて設定された対象距離の範囲を入力し、距離画像において、各画素の画素値として表された距離情報を用いて、対象距離設定部２２で設定された範囲にある距離（視差）の画素のみを抽出する（図５のステップ１０８）。 FIG. 6 is a diagram for explaining the operation of the limited area creation unit 23. The limited area creating unit 23 inputs the distance image created by the distance information creating unit 21 and the range of the target distance set by the target distance setting unit 22, and represents the pixel value of each pixel in the distance image. Using the distance information thus obtained, only pixels having a distance (parallax) within the range set by the target distance setting unit 22 are extracted (step 108 in FIG. 5).

例えば、対象距離設定部２２で設定された範囲内にある画素には、手のある一定範囲の中にあるものとして“１”を与え、それ以外の画素には手のある一定範囲外にあるものとして“０”の値を与えるなどの方法により、一定距離範囲に限られた限定領域を作成する。 For example, “1” is given to the pixels within the range set by the target distance setting unit 22 as being within a certain range of the hand, and other pixels are outside the certain range of the hand. A limited area limited to a certain distance range is created by a method such as giving a value of “0” as a thing.

図８はカメラ画像１（右）とカメラ画像２（左）から対象距離に限定した限定領域が得られ、さらに限定領域内のみ画像情報をもつ限定領域画像が得られる様子を示している。 FIG. 8 shows that a limited area limited to the target distance is obtained from the camera image 1 (right) and the camera image 2 (left), and that a limited area image having image information only in the limited area is obtained.

［第３の実施形態］
図９を参照すると、本発明の第３の実施形態の手位置追跡装置は、撮像部１１と限定領域決定部１２Ｃと限定領域画像作成部１３と肌色追跡部１４と表示部１５で構成されている。限定領域決定部１２Ｃは、図１０に示すように、動き情報作成部３１と限定領域作成部３２で構成されている。 [Third Embodiment]
Referring to FIG. 9, the hand position tracking apparatus according to the third embodiment of the present invention includes an imaging unit 11, a limited region determination unit 12 C, a limited region image creation unit 13, a skin color tracking unit 14, and a display unit 15. Yes. As shown in FIG. 10, the limited region determination unit 12 C includes a motion information creation unit 31 and a limited region creation unit 32.

本実施形態は、図１１に示すように。１つの撮像部（カメラ）の画像（Ａ_t）を入力とし、限定領域決定部１２Ｃにおいて、時系列的に連続する画像の差分を用いて動き情報を作成し（ステップ１０９）、動きのある範囲を限定領域とするものである（ステップ１１０）。 This embodiment is as shown in FIG. Using the image (A _t ) of one imaging unit (camera) as an input, the limited region determination unit 12C creates motion information using the difference between images that are continuous in time series (step 109), and a range where there is motion. Is a limited area (step 110).

限定領域の算出後の限定領域画像作成部１３での限定領域画像の生成ならびに肌色追跡部１４の肌色追跡は第１の実施形態と同じである。 The generation of the limited area image in the limited area image creation unit 13 after the calculation of the limited area and the skin color tracking of the skin color tracking unit 14 are the same as in the first embodiment.

以下、限定領域決定部１２Ｃの動き情報作成部３１と限定領域作成部３２について図１２により詳細に説明する。 Hereinafter, the motion information creation unit 31 and the limited region creation unit 32 of the limited region determination unit 12C will be described in detail with reference to FIG.

動き情報作成部３１は、時系列に連続する複数（ｋ＋１個）の撮像画像（Ａ_t-k，Ａ_t-k+1，…，Ａ_t）を入力とし、時系列上隣接（連続）する２つの撮像画像の差分絶対値（｜Ａ_t-k+1−Ａ_t-k｜，…，｜Ａ_t−Ａ_t-1｜）を計算し、画素毎の和からなる画像 The motion information creation unit 31 receives a plurality (k + 1) of captured images (A _tk , A _{t-k + 1} ,..., A _t ) that are continuous in time series and inputs two adjacent (continuous) in time series. An image composed of the sum of each pixel by calculating a difference absolute value (| A _{t−k + 1} −A _tk |,..., | A _t −A _t−1 |) of the captured image.

を求め、Ｓ_tの画素毎にその値が既定値を超える画素を１とし、そうでない画素を０とする、ビットマップＢ_tを作成する。 The calculated, the value for each pixel of S _t is set to 1 pixel exceeds a predetermined value, the pixel is not the case and 0, creates a bitmap B _t.

なお、動き情報作成部３１の構成は上記に限定されるものではない。例えば、動きのある画素を求める方法として、上記では、画像の差分絶対値の総和Ｓ_tによって動き領域を検出しているが、差分２乗和 The configuration of the motion information creation unit 31 is not limited to the above. For example, as a method for determining the in motion pixel, in the above, and detects the motion area by the sum S _t of the difference absolute values of the image, a difference square sum

によって検出する方法もある。さらに、あらかじめ撮像しておいた背景画像とカメラ入力画像の差分である背景差分を作成することにより、動きを検出する方法もある。また、オプティカルフローを計算することにより、動きのある画素を求めることもできる。 There is also a method of detecting by. Further, there is a method for detecting motion by creating a background difference that is a difference between a background image captured in advance and a camera input image. In addition, by calculating the optical flow, a moving pixel can be obtained.

限定領域作成部３２は、動き情報作成部３１にて作成された、Ｂ_t内で画素を隣接関係のある画素集合（連結集合）に分割し、各連結集合のうちその領域に属する画素数があらかじめ決められた閾値以下の大きさになる領域は誤差として、動きがない領域と同じ扱い（値を“０”とする等）とすることにより、限定領域を作成する。これは、「ラベリング」をして、１つの物体として区切られたものの面積を求めて、面積があまり小さいものを無視することを意味している。 Limited area creating unit 32, is created by the motion information generating unit 31 divides the pixels in B _t to the pixel set with adjacency (connected set), the number of pixels belonging to the region of each connected set A limited area is created by treating an area having a size equal to or smaller than a predetermined threshold as an error as the same area as a non-motion area (eg, setting the value to “0”). This means that “labeling” is performed to obtain the area of the object divided as one object, and the area having a very small area is ignored.

図１３はカメラ画像Ａ_tとカメラ画像Ａ_t-1から、動きのある範囲に限定した限定領域が得られ、さらに動きのある限定領域内のみ画像情報をもつ限定領域画像が得られる様子を示している。 Figure 13 shows the manner in which the camera image A _t and the camera image A _t-1, to obtain a limited area is limited to a range of motion limited region image is obtained further having image information only limited areas with motion ing.

本実施形態によれば、動きを用いて限定領域を決定することによって、例えば、カメラからの距離範囲が手と同じ範囲に、肌色と色相の値の近い色の物体が存在した場合にも、その物体が静物であれば、または動きの大きさが少なければ無視することができ、その肌色物体を誤検出することを抑制することができる。 According to the present embodiment, by determining the limited area using the motion, for example, even when an object having a color close to the skin color and the hue value exists in the same distance range from the camera as the hand, If the object is a still life or if the amount of movement is small, it can be ignored, and erroneous detection of the skin color object can be suppressed.

［第４の実施形態］
図１４を参照すると、本発明の第４の実施形態の手位置追跡装置は撮像部１１₁，１１₂と、それぞれ第２、第３の実施形態における限定領域決定部１２Ｂ，１２Ｃと、ＡＮＤ回路１６と、限定領域画像作成部１３と、肌色追跡部１４と、表示部１５で構成されている。 [Fourth Embodiment]
Referring to FIG. 14, the hand position tracking apparatus according to the fourth embodiment of the present invention includes imaging units 11 ₁ and 11 ₂ , limited region determination units 12B and 12C in the second and third embodiments, respectively, and an AND circuit. 16, a limited area image creation unit 13, a skin color tracking unit 14, and a display unit 15.

本実施形態では、第１の実施形態の内容に加え、第１の実施形態で示した限定領域決定部１３において、動き情報と距離情報をあわせて用いて、動きがあり、一定の距離範囲にある領域を探索範囲として限定するものである（ステップ１０２Ｂ、１０２Ｃ、１１１）。 In the present embodiment, in addition to the contents of the first embodiment, the limited area determination unit 13 shown in the first embodiment uses movement information and distance information together, and there is movement, so that a certain distance range is obtained. A certain area is limited as a search range (steps 102B, 102C, 111).

すなわち、限定領域決定部１２Ｂ，１２Ｃそれぞれから出力される第２の実施形態および第３の実施形態で用いられる限定領域の論理積を限定領域として出力する。 That is, the logical product of the limited areas used in the second and third embodiments output from the limited area determining units 12B and 12C is output as the limited area.

図１６は、カメラ画像１（右）とカメラ画像２（左）から距離による限定領域が作成され、カメラ画像Ａ_tとカメラ画像Ａ_t-1から動きによる限定領域が作成され、両限定領域の積がとられて、限定領域画像作成部１３にて、動きがあり一定の距離範囲にある領域内のみ画像情報をもつ限定領域画像が得られることを示している。 Figure 16 is restricted area creates camera image 1 (right) by the distance from the camera image 2 (left), limited areas due to the movement is created from the camera image A _t and the camera image A _t-1, in both confined area This shows that the limited area image creating unit 13 can obtain a limited area image having image information only in an area where there is a motion and within a certain distance range.

なお、以上説明した手位置追跡装置の機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フレキシブルディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。 The program for realizing the function of the hand position tracking device described above is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. May be. The computer-readable recording medium refers to a recording medium such as a flexible disk, a magneto-optical disk, and a CD-ROM, and a storage device such as a hard disk device built in a computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short period of time, as in the case of transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.

本発明の第１の実施形態の手位置追跡装置の構成図である。It is a block diagram of the hand position tracking apparatus of the 1st Embodiment of this invention. 第１の実施形態の手位置追跡装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the hand position tracking apparatus of 1st Embodiment. 本発明の第２の実施形態の手位置追跡装置の構成図である。It is a block diagram of the hand position tracking apparatus of the 2nd Embodiment of this invention. 図３中の限定領域決定部の構成図である。It is a block diagram of the limited area | region determination part in FIG. 第２の実施形態の手位置追跡装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the hand position tracking apparatus of 2nd Embodiment. 第２の実施形態における限定領域作成の説明図である。It is explanatory drawing of limited area creation in 2nd Embodiment. 視差の計算方法の説明図である。It is explanatory drawing of the calculation method of parallax. 第２の実施形態における限定領域画像作成までの説明図である。It is explanatory drawing until limited area image creation in 2nd Embodiment. 本発明の第３の実施形態の手位置追跡装置の構成図である。It is a block diagram of the hand position tracking apparatus of the 3rd Embodiment of this invention. 図９中の限定領域決定部の構成図である。It is a block diagram of the limited area | region determination part in FIG. 第３の実施形態の手位置追跡装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the hand position tracking apparatus of 3rd Embodiment. 第３の実施形態における限定領域作成の説明図である。It is explanatory drawing of limited area creation in 3rd Embodiment. 第３の実施形態における限定領域画像作成までの説明図である。It is explanatory drawing until limited area image creation in 3rd Embodiment. 本発明の第４の実施形態の手位置追跡装置の構成図である。It is a block diagram of the hand position tracking apparatus of the 4th Embodiment of this invention. 第４の実施形態の手位置追跡装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the hand position tracking apparatus of 4th Embodiment. 第４の実施形態における限定領域画像作成までの説明図である。It is explanatory drawing until limited area image creation in 4th Embodiment.

Explanation of symbols

１１，１１₁，１１₂ 撮像部
１２Ａ、１２Ｂ、１２Ｃ限定領域決定部
１３限定領域画像作成部
１４肌色追跡部
１５表示部
１６ＡＮＤ回路
２１距離情報作成部
２２対象距離設定部
２３限定領域作成部
３１動き情報作成部
３２限定領域作成部
１０１〜１１１ステップ 11, 11 ₁ , 11 ₂ Imaging unit 12A, 12B, 12C Limited region determination unit 13 Limited region image creation unit 14 Skin color tracking unit 15 Display unit 16 AND circuit 21 Distance information creation unit 22 Target distance setting unit 23 Limited region creation unit 31 Motion information creation unit 32 Limited area creation unit 101-111 steps

Claims

A method of tracking a hand position by tracking a skin color region on a captured moving image obtained by imaging an imaging target by an imaging means,
A limited area determination step of determining a limited area in the image for limiting the search range of the hand using the captured image captured by the imaging means as an input;
A limited area image creating step of creating a limited area image which is image information obtained by deleting information outside the limited area from the captured image;
A skin color tracking step of tracking a region on the image that matches the skin color information that specifies the color of the skin color region with the limited region image as a search range.

The limited region determining step includes a distance information creating step of creating distance information of each pixel of the image from a plurality of images obtained from the first and second imaging means by a stereo vision technique, and a search range from the distance information. The hand position tracking method according to claim 1, further comprising a limited area creating step of obtaining an area on the image that matches target distance information to be specified and outputting the area as the limited area.

The limited region determination step is performed on an image in which a sum of absolute differences of pixel values of two adjacent images is equal to or greater than a certain value with respect to a plurality of continuous image sequences obtained from the imaging unit. A motion information creation stage for extracting points, and a limited area creation stage for outputting a connected area set in which the area of the connected area defined by the adjacent relationship on the image of the extracted points is a certain value or more as the limited area, The hand position tracking method according to claim 1.

A method of tracking a hand position by tracking a skin color region on a captured moving image obtained by imaging an imaging target by an imaging means,
The distance information of each pixel of the image is created from a plurality of images obtained from the first and second imaging means by a stereo vision technique, and on the image that matches the target distance information for designating the search range from the distance information A first limited area determining step for obtaining the area of the first limited area and outputting it as a first limited area;
For a plurality of image sequences that are continuous in time series obtained from the first imaging means, extract points on the image in which the sum of absolute difference values of each pixel value for every two adjacent images is a certain value or more, A second limited region determination step of outputting a connected region set in which the area of the connected region defined by the adjacent relationship on the image of the extraction points is a certain value or more as a second limited region;
A limited area image creation step of creating a limited area image that is image information obtained by deleting information outside the limited area that is a product set of the first and second limited areas from the captured image;
A skin color tracking step of tracking a region on the image that matches the skin color information that specifies the color of the skin color region with the limited region image as a search range.

An apparatus for tracking a hand position by tracking a skin color region on a captured moving image obtained by imaging an imaging target by an imaging means,
Limited area determination means for determining a limited area in the image for limiting the search range of the hand by using the captured image captured by the imaging means as an input;
Limited area image creating means for creating a limited area image, which is image information obtained by deleting information outside the limited area from the captured image;
Skin color information setting means for holding skin color information to be searched;
A hand position tracking device comprising: a skin color tracking unit configured to track a region on an image that matches the skin color information that designates a color of a skin color region using the limited region image as a search range.

The limited area determining means is a distance information creating means for creating distance information of each pixel of the image from a plurality of images obtained from the first and second imaging means by a stereo vision technique, and for specifying a search range 6. The hand position according to claim 5, comprising target distance setting means for holding target distance information, and limited area creating means for obtaining an area on the image that matches the target distance information from the distance information and outputting the area as the limited area. Tracking device.

The limited region determining means is for an image whose sum of absolute differences of pixel values of every two adjacent images is equal to or greater than a certain value for a plurality of continuous image sequences obtained in time series from the imaging means. A motion information creating means for extracting points; and a limited area creating means for outputting, as a limited area, a connected area set in which the area of the connected area defined by the adjacent relationship on the image of the extracted points is a predetermined value or more. Item 6. The hand position tracking device according to Item 5.

An apparatus for tracking a hand position by tracking a skin color region on a captured moving image obtained by imaging an imaging target by an imaging means,
Target distance creation means for creating distance information of each pixel of an image by a stereo vision technique from a plurality of images obtained from the first and second imaging means;
First limited area determination means including limited area creation means for obtaining an area on the image that matches target distance information for designating a search range from the distance information and outputting the area as a first limited area;
For a plurality of image sequences that are continuous in time series obtained from the first imaging means, extract points on the image in which the sum of absolute difference values of each pixel value for every two adjacent images is a certain value or more, A second limited area determining means including a limited area creating means for outputting a connected area set in which the area of the connected area defined by the adjacent relationship on the image of the extraction points is a certain value or more, as a second limited area;
Limited area image creating means for creating a limited area image that is image information obtained by deleting information outside the limited area that is a product set of the first and second limited areas from the captured image;
Skin color information setting means for holding skin color information to be searched;
A hand position tracking device comprising: a skin color tracking unit configured to track a region on an image that matches the skin color information that designates a color of a skin color region using the limited region image as a search range.

A hand position tracking program for causing a computer to execute the hand position tracking method according to claim 1.