JP2005321966A

JP2005321966A - Interface method, device and program

Info

Publication number: JP2005321966A
Application number: JP2004138756A
Authority: JP
Inventors: Hidekazu Hosoya; 英一細谷; Hidenori Sato; 秀則佐藤; Yoshinori Kitahashi; 美紀北端; Ikuo Harada; 育生原田; Akira Onozawa; 晃小野澤; Hisao Nojima; 久雄野島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-05-07
Filing date: 2004-05-07
Publication date: 2005-11-17
Anticipated expiration: 2024-05-07
Also published as: JP4221330B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interface method and device capable of realizing a three-dimensional pointing with excellent usability by a user. <P>SOLUTION: A three-dimensional instructing direction information calculation part 2 determines three-dimensional instructing direction information instructed by the user 6 from two or more input images. A space information registration part 4 registers information of an object in a real space possibly instructed by the user 6 in a space information database 5. A voting processing part 31 performs a ballot box voting processing corresponding to a voxel which the extension line of the direction instructed by the user crosses based on the three-dimensional instructing direction information in a ballot box data memory 33 including ballot boxes corresponding to voxels of voxel spaces obtained by dividing the real space in 1:1. A detection processing part 32 detects the voxel of the three-dimensional position instructed by the user 6 from information for a voting value recorded in each ballot box and the registered object information. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、複数台のカメラで撮影された画像を入力画像とし、操作者（ユーザ）が何も装着せずに、体の部位（指や腕等）を用いて指し示した実空間内の位置もしくは物体を検出するインタフェース方法および装置に関する。 The present invention uses an image taken by a plurality of cameras as an input image, and a position in a real space indicated by a body part (finger, arm, etc.) without being worn by an operator (user) Alternatively, the present invention relates to an interface method and apparatus for detecting an object.

コンピュータと人間とのインタフェースに関し、人間の３次元的な動作に基づくインタフェース方法や装置としては、これまで以下に挙げるような従来手法がある。 Regarding the interface between a computer and a human, there are conventional methods as described below as interface methods and apparatuses based on human three-dimensional operations.

第１の従来手法として、体に動作計測可能なセンサを装着し、センサ情報から動きを検出する装置がある。例えば、塚田らのＵｂｉ−ｆｉｎｇｅｒ（非特許文献１）は、加速度センサ等を取り付けたグローブを手に装着して、ジェスチャを認識するものである。 As a first conventional technique, there is an apparatus that attaches a sensor capable of measuring movement to a body and detects movement from sensor information. For example, Tsukada et al.'S Ubi-finger (Non-Patent Document 1) recognizes a gesture by wearing a glove with an acceleration sensor or the like attached to the hand.

第２の従来手法として、福本らの非接触ハンドリーダ（非特許文献２）がある。本手法は、体の中に求めた座標位置（「仮想投射中心」）と、検出した指先の座標位置を結び、延長した指示軸線がスクリーンと交差する点をカーソル位置（指示位置）とする方法である。仮想投射中心位置は、スクリーンの４つの角位置から指先へ延長した直線の交点から求めているので、非接触・非装着だが、指示できる位置はスクリーン上のみである。指先の位置の抽出は、２台のカメラを用い、１台を上部から撮影する位置に設置することにより、スクリーンに最も近い物体を検出することで実現している。 As a second conventional technique, there is a non-contact hand reader (Non-Patent Document 2) of Fukumoto et al. This method connects the coordinate position obtained in the body ("virtual projection center") with the detected coordinate position of the fingertip, and sets the point where the extended pointing axis intersects the screen as the cursor position (pointing position). It is. Since the virtual projection center position is obtained from the intersection of straight lines extending from the four corner positions of the screen to the fingertip, it is non-contact / non-wearing, but the position that can be indicated is only on the screen. The extraction of the fingertip position is realized by detecting the object closest to the screen by using two cameras and installing one at the position where the image is taken from above.

第３の従来手法として、金次らの指さしポインター（非特許文献３）がある。本手法は、多眼ステレオカメラを用いて生成した距離画像を用いて、スクリーン（入力画像を表示している表示装置の画面）に最も近い物体の検出によって指先の位置を抽出し、また色情報と距離情報を用いて眉間（目）の位置を検出し、これらを結んだ延長線がスクリーンと交差する点をカーソル位置（指示位置）とする方法である。非接触・非装着だが、指示できる位置はスクリーン上のみである。 As a third conventional method, there is a pointing pointer (Non-patent Document 3) of Kinji et al. This method uses the distance image generated using a multi-lens stereo camera to extract the position of the fingertip by detecting the object closest to the screen (screen of the display device displaying the input image), and color information The distance information is used to detect the position between the eyebrows (eyes), and the point where the extended line connecting them crosses the screen is used as the cursor position (indicated position). Although it is non-contact / non-wearing, the position that can be indicated is only on the screen.

第４の従来手法として、山本らの腕さしジェスチャインタフェース（非特許文献４）がある。本手法は、複数台のステレオカメラ群と統合処理するサーバコンピュータを用いて、実空間内でのユーザの腕さし方向を認識する方法である。非接触・非装着で、実空間内の位置を指し示せる。
塚田ら，“Ｕｂｉ−Ｆｉｎｇｅｒ：モバイル指向ジェスチャ入力デバイスの研究”，情報処理学会論文誌，２００２．１２福本ら，“動画像処理による非接触ハンドリーダ”，第７回ヒューマン・インタフェース・シンポジウム論文集，ｐｐ．４２７−４３２，１９９１金次ら，電子情報通信学会画像工学研究会，２００２．１山本ら，“ユビキタスステレオビジョンを用いた腕さしジェスチャインタフェース”，第９回画像センシングシンポジウム，２００３．６ As a fourth conventional technique, there is an arm gesture gesture interface (Non-Patent Document 4) of Yamamoto et al. This method is a method of recognizing the user's arm holding direction in real space using a server computer that performs integration processing with a plurality of stereo camera groups. The position in the real space can be pointed by non-contact and non-wearing.
Tsukada et al., “Ubi-Finger: Study on Mobile Oriented Gesture Input Device”, Transactions of Information Processing Society of Japan, 2002.12. Fukumoto et al., “Non-contact hand reader by moving image processing”, Proceedings of 7th Human Interface Symposium, pp. 427-432, 1991 Kinji et al., IEICE Technical Committee on Image Engineering, 2002.1 Yamamoto et al., “Arm gesture gesture interface using ubiquitous stereo vision”, 9th Image Sensing Symposium, 2003.

しかしながら、上述した従来の手法では、以下のような問題があった。 However, the conventional method described above has the following problems.

１）第１の従来手法は、手または指の動作を認識でき、実空間内の３次元位置を指し示すことができるポインティング手法であるが、体の部位に常に何らかの装置を装着する必要があるため、実用的なインタフェース装置としての利便性に欠ける。 1) The first conventional method is a pointing method that can recognize the movement of a hand or a finger and can point to a three-dimensional position in real space. However, it is necessary to always wear some device on a body part. It lacks convenience as a practical interface device.

２）第２、第３の従来手法は、ユーザが非装着かつ非接触に、３次元的な動作により、スクリーン上の位置を指示することができるポインティング手法であるが、指し示せるのはスクリーン上の位置だけであるため、実空間中の３次元的な位置や物体を直接指し示すことはできない。 2) The second and third conventional methods are pointing methods in which the user can indicate the position on the screen by a three-dimensional operation without wearing and non-contacting, but can be indicated on the screen. Therefore, it is not possible to directly indicate a three-dimensional position or object in the real space.

３）第４の従来手法は、ユーザが非装着かつ非接触に、３次元的な動作により、実空間内の３次元位置を指し示すことができるポインティング手法であり、かつポインティング位置の判定のために投票処理を用いているが、ポインティング方向の周辺に対する投票処理や、投票値の重みを変えるなどの精度向上化手法は用いられていなかった。 3) The fourth conventional method is a pointing method in which a user can point to a three-dimensional position in the real space by a three-dimensional operation without wearing and non-contacting, and for determining the pointing position. Although voting processing is used, accuracy improvement techniques such as voting processing around the pointing direction and changing the weight of the voting value have not been used.

本発明の目的は、前述した従来の手法に対して、装着型のため利便性に欠ける問題、スクリーン上のポインティングができるが実空間中への３次元的なポインティングはできない問題、投票処理を利用した実空間への３次元的なポインティングはできるがポインティング方向の周辺への投票処理による性能向上化手法は用いられていない問題を解決したインタフェース方法、装置、およびプログラムを提供することにある。 An object of the present invention is to use a problem that lacks convenience due to the wearing type, a problem that can be pointed on a screen but cannot be three-dimensionally pointed into a real space, and a voting process, compared to the conventional method described above. It is an object to provide an interface method, apparatus, and program that solve the problem that the performance improvement technique by voting processing to the periphery in the pointing direction can be performed although the three-dimensional pointing to the real space can be performed.

上述した目的を達成するために、本発明のインタフェース方法は、
実空間上にある物体の情報の一部もしくは全体に渡る３次元位置情報およびその付加情報を登録する空間情報登録ステップと、
複数台のカメラで撮影した複数の入力画像から、操作者の体の部位の、該操作者の指し示した方向に関わる始点と終点の３次元座標を算出し、操作者の指し示す実空間上での３次元指示方向情報を求める３次元指示方向情報算出ステップと、
得られた該３次元指示方向情報と、前記登録された物体の情報とから、操作者が指し示す３次元指示位置情報である、操作者が指し示す方向の延長線と、登録された物体との交点に関する情報を検出する３次元指示位置検出ステップであって、実空間を分割したボクセル空間の各ボクセルに１対１に対応した投票箱を含む投票箱データメモリの、該３次元指示方向情報に基づき操作者の指し示す方向の延長線が交差するボクセルに対応する投票箱に投票処理を行う投票処理ステップと、投票箱データメモリ内の各投票箱に記録されている投票値の情報と、登録されている物体情報とから、操作者の指し示す３次元指示位置のボクセルを検出する検出処理ステップとを含む３次元指示位置検出ステップと
を有する。 In order to achieve the above object, the interface method of the present invention comprises:
A spatial information registration step for registering part or all of the information of the object in the real space and the additional information thereof;
From a plurality of input images taken by a plurality of cameras, the three-dimensional coordinates of the start point and end point of the body part of the operator related to the direction indicated by the operator are calculated, and in the real space indicated by the operator A three-dimensional pointing direction information calculating step for obtaining three-dimensional pointing direction information;
The intersection of the registered object and the extension line in the direction indicated by the operator, which is the three-dimensional indication position information indicated by the operator, from the obtained three-dimensional indication direction information and the registered object information Is a three-dimensional pointing position detection step for detecting information relating to the three-dimensional pointing direction information of a ballot box data memory including a ballot box corresponding to each voxel of the voxel space into which the real space is divided. The voting process step for performing voting processing on the ballot box corresponding to the voxel where the extension line in the direction indicated by the operator intersects, and the information of the voting value recorded in each ballot box in the ballot box data memory are registered. A three-dimensional designated position detecting step including a detection processing step for detecting a voxel at the three-dimensional designated position indicated by the operator from the detected object information.

非装着であるため、ユーザの利便性が向上する。また、ユーザの指示方向を検出し指示位置を検出するため、３次元的なポインティングを実現できる。さらに、３次元的なポインティングの指示先として、スクリーン上だけでなく実空間上の位置もポインティング可能であり、応用先を広げることができる。また、投票処理を用いることによりユーザが指し示したい位置を精度よく認識することができる。よって、前記課題の１）と２）を解決できる。 Since it is not attached, user convenience is improved. In addition, since the pointing direction of the user is detected and the pointing position is detected, three-dimensional pointing can be realized. Further, as a destination for three-dimensional pointing, not only the position on the screen but also the position in the real space can be pointed, and the application destination can be expanded. Further, by using the voting process, it is possible to accurately recognize the position that the user wants to point to. Therefore, the above problems 1) and 2) can be solved.

本発明の実施態様によれば、投票処理ステップが、交差するボクセルに隣接し、ボクセル毎に異なる重み付けがされた投票値が予め設定されているボクセルに投票処理を行うステップをさらに有し、検出ステップは、時間的に連続した画像に対する投票処理により積算された投票値が所定の閾値を越えたボクセルを、該操作者の指し示す３次元指示位置のボクセルとして検出する。 According to an embodiment of the present invention, the voting process step further includes the step of performing a voting process on a voxel that is adjacent to the intersecting voxel and in which a voting value differently weighted for each voxel is preset. In the step, a voxel having a vote value integrated by a voting process for temporally continuous images exceeds a predetermined threshold is detected as a voxel at a three-dimensional designated position indicated by the operator.

そのため、ポインティング方向の周辺に対する投票処理を行うことや投票値の重みを変えることによる精度向上化を図れることから、前記課題の３）を解決できる。 Therefore, it is possible to improve accuracy by performing voting processing on the periphery in the pointing direction and changing the weight of the voting value, so that the problem 3) can be solved.

本発明の実施形態によれば、重み付けは、交差するボクセルに近い方から高い点数となるようなものである。 According to an embodiment of the invention, the weighting is such that the higher the score from the closer to the intersecting voxel.

本発明の実施態様によれば、３次元指示位置検出ステップが、時間的に連続した画像に対する投票値を減算するステップを有している。 According to the embodiment of the present invention, the three-dimensional designated position detecting step includes a step of subtracting a vote value for temporally continuous images.

そのため、投票した投票値を時間経過とともに減らすことができるので誤認識を減らせることから、前記課題の３）を解決できる。 Therefore, since the voted vote value can be reduced with the passage of time, it is possible to reduce misrecognition, so that the above problem 3) can be solved.

本発明は下記の効果がある。
１）請求項１と５と６の発明によれば、非装着であるため、ユーザの利便性が向上し、またユーザの指示方向を検出し指示位置を検出できるため、３次元的なポインティングを実現でき、さらに３次元的なポインティングの指示先として、スクリーン上だけでなく実空間上の位置もポインティング可能であり、応用先を広げることができ、また投票処理を用いることによりユーザが指し示したい位置を精度よく認識することができる。
２）請求項２の発明によれば、ポインティング方向の周辺に対する投票処理を行うことや投票値の重みを変えることによる精度向上化を図ることができる。
３）請求項４の発明によれば、投票した投票値を時間経過とともに減らすことができるので、誤認識を減らし、精度を向上させることができる。 The present invention has the following effects.
1) According to the inventions of claims 1, 5 and 6, since it is not worn, the convenience of the user is improved, and the pointing direction can be detected by detecting the pointing direction of the user. It can be realized, and as a point of instruction for three-dimensional pointing, not only the position on the screen but also the position in the real space can be pointed, the application destination can be expanded, and the position that the user wants to point by using the voting process Can be accurately recognized.
2) According to the invention of claim 2, it is possible to improve accuracy by performing voting processing on the periphery in the pointing direction and changing the weight of the voting value.
3) According to the invention of claim 4, since the voted vote value can be reduced with time, misrecognition can be reduced and the accuracy can be improved.

次に、本発明の実施の形態について図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１は本発明の第１の実施形態のインタフェース装置の構成図、図２はその処理の流れを示すフローチャートである。 [First Embodiment]
FIG. 1 is a configuration diagram of an interface apparatus according to the first embodiment of the present invention, and FIG. 2 is a flowchart showing a flow of the processing.

本インタフェース装置は、画像入力部１ａ，１ｂと３次元指示方向情報算出部２と３次元指示位置検出部３と空間情報登録部４と空間情報メモリ５から構成される。 The interface apparatus includes image input units 1 a and 1 b, a three-dimensional pointing direction information calculation unit 2, a three-dimensional pointing position detection unit 3, a spatial information registration unit 4, and a spatial information memory 5.

画像入力部１ａ，１ｂとしては、図１のように２台（もしくは３台以上）のカメラを用いる。カメラは一般に用いられるビデオカメラやＣＣＤカメラでよく、白黒でもカラーでもよい。ただし後述する色情報を使用した方法を用いる場合はカラーカメラが必要である。２台のカメラの場合、カメラは、ステレオ法による画像処理が可能な程度の距離に、かつカメラの視線方向（光軸）が３次元空間上で並行、もしくは並行に近いように設置する。３台以上の場合も同様（含まれる２台が同様の条件）である。 As the image input units 1a and 1b, two (or three or more) cameras are used as shown in FIG. The camera may be a commonly used video camera or CCD camera, and may be monochrome or color. However, a color camera is required when using a method using color information described later. In the case of two cameras, the cameras are installed at such a distance that image processing by the stereo method is possible and the viewing direction (optical axis) of the camera is parallel or close to parallel in the three-dimensional space. The same applies to the case of three or more units (two included units have the same conditions).

３次元指示方向情報算出部２は、ユーザ６の指し示す方向（３次元指示方向）の３次元情報を求めるものである。具体的には、３次元指示方向情報算出部２は、例えば、距離画像生成部２１と始終点２次元座標算出部２２と始終点３次元座標算出部２３で構成され、以下のような方法によって、該３次元指示方向の３次元情報を求めることができる。 The three-dimensional indication direction information calculation unit 2 obtains three-dimensional information on the direction (three-dimensional indication direction) indicated by the user 6. Specifically, the three-dimensional indication direction information calculation unit 2 includes, for example, a distance image generation unit 21, a start / end point two-dimensional coordinate calculation unit 22, and a start / end point three-dimensional coordinate calculation unit 23. The three-dimensional information of the three-dimensional indication direction can be obtained.

距離画像生成部２１は、入力された２個以上の入力画像から、ステレオ法による画像処理を用いて距離画像を生成する（ステップ１０１）。距離画像とは、カメラから物体までの距離を視覚化した画像のことで、例えば近いものを明るく（値を大きく）、遠いものを暗く（値を小さく）して表示するものである。また、ステレオ法は両眼立体視とも言い、人間の両眼と同様に、同一の物体を異なる２つの視点から見ることにより、対象物の３次元的な位置を測定する方法である。 The distance image generation unit 21 generates a distance image from two or more input images using stereo image processing (step 101). The distance image is an image that visualizes the distance from the camera to the object. For example, the near image is brightened (larger value) and the far object is darkened (smaller value). The stereo method is also called binocular stereoscopic vision, and is a method of measuring the three-dimensional position of an object by viewing the same object from two different viewpoints, similar to human eyes.

距離画像を生成する具体的な画像処理方法の例としては、市販の製品（ＰｏｉｎｔＧｒｅｙＲｅｓｅａｒｃｈ社のＤｉｇｉｃｌｏｐｓ（３眼カメラ式）やＢｕｍｂｌｅｂｅｅ（２眼カメラ式）等）を用いる方法がある。これらは各々、２個もしくは３個のカメラが内蔵された画像入力機器であり、出力として距離画像を生成できるものである。また、ステレオ法を用いる手法は、画像処理分野において一般的である（発表文献多数）ので、任意の２個以上のカメラを用いて自作することも可能である。 As an example of a specific image processing method for generating a distance image, there is a method using a commercially available product (Diglops (trinocular camera type) or Bumblebee (binocular camera type) manufactured by Point Gray Research). Each of these is an image input device incorporating two or three cameras, and can generate a distance image as an output. In addition, since the method using the stereo method is common in the field of image processing (many publications), it is possible to make it by using any two or more cameras.

始終点２次元座標算出部２２は、距離画像生成部２１で用いた入力画像のうち少なくとも１個の入力画像を用いて、ユーザの体の予め定めた部位２箇所（始点と終点）の入力画像上での２次元座標を算出するものである（ステップ１０２）。始点と終点は、例えば、ユーザの肩の位置を始点とし、手または腕の重心の位置を終点とすることが考えられる。これにより、この場合は腕を伸ばした手または腕の重心へ向かう方向が後述する３次元指示方向となる。肩と手（または腕）の位置を始点・終点とした場合の入力画像上での具体的な検出方法について、以下に示す。手または腕の位置を画像処理により検出する方法としては、例えば、入力画像をカラー画像とした場合、カラー画像中のＲＧＢ等の色情報から、肌色成分（任意に幅を持たせた色の値の範囲で指定可能）を抽出し、ラベリング処理を行う。得られた複数の肌色部分の中から、手または腕の大きさや位置等の制約情報（例えば、手または腕の大きさから推測される可能性のある肌色面積の範囲を指定したり、入力画像上の天井付近や床付近等、手または腕が存在する可能性の低いところを除外したりする等の制約）を利用して、目的とする手または腕の肌色部分を選択する。選択する具体的な方法例としては、通常ユーザが衣服を着ているとすると、肌色部分の候補となる可能性が高いのは両手（または腕）と顔と考えられ、また最も面積の大きいのは顔と考えられるので、２番目と３番目に面積が大きい肌色部分を手または腕の候補として選択する。ユーザ指定位置として両手（または腕）２つ使うとした利用方法の場合、その２つの（２番目と３番目に大きい）肌色部分の候補に対する各重心位置を、各々左右の手または腕のユーザ指定位置とすればよい。左右の選択は左手側にあるものを左手（腕）、右手側にあるものを右手（腕）とすればよい。また、ユーザ指定位置を１つ（片手または片腕）だけ使うとした利用方法の場合は、２つの候補から１つを選ぶ必要があるが、予め例えば右手（または右腕）を指定する手としたら、体より右手側にある候補を、右手（または右腕）の可能性が高いことから、右手（または右腕）の肌色部分として選び、その重心位置を右手（または右腕）のユーザ指定位置とすればよい。左手（または左腕）の場合も同様である。また、肩の位置を画像処理により検出する方法としては、例えば、初めに顔の位置を抽出してから、肩の位置を算出する方法がある。具体的には、まず、前記の肌色抽出処理を行った結果から、１番目に面積が大きい肌色部分は顔の可能性が高いので、その肌色部分を顔と判断し、その重心を求める。次に、（通常の姿勢では）肩の位置は顔の重心位置から、下へある程度の距離、左右へある程度の距離ずらしたものと仮定することができるので、予めそのずらす距離を決めておいて（個人差あるのでユーザ６によって値を変えてもよい）、顔の重心位置から左右の肩の位置を算出することができる。また、始点・終点の２次元座標を出力する際に、その候補値を複数求め、始終点３次元座標算出部２３へ複数の値を出力してもよい。その場合、始終点３次元座標算出部２３において、始終点の３次元座標を求める際に用いられる。これらにより、始点・終点の入力画像上での２次元座標を求めることができる。また、ここでは肩の位置を始点としているが、前記により求められる顔の位置（重心）をそのまま始点としてもよい。その場合、顔の位置と手（または腕）の位置を結ぶ延長線がユーザ６の指示方向となる。 The start / end point two-dimensional coordinate calculation unit 22 uses at least one input image among the input images used in the distance image generation unit 21 to input images of two predetermined parts (start point and end point) of the user's body. The above two-dimensional coordinates are calculated (step 102). For example, the start point and the end point may be the position of the user's shoulder and the center of gravity of the hand or arm as the end point. Thus, in this case, the direction toward the center of gravity of the hand or arm with the arm extended becomes a three-dimensional instruction direction to be described later. A specific detection method on the input image when the positions of the shoulder and hand (or arm) are set as the start point and the end point will be described below. As a method for detecting the position of the hand or arm by image processing, for example, when an input image is a color image, a skin color component (a color value having an arbitrary width) is obtained from color information such as RGB in the color image. Can be specified in the range of) and labeling. Restriction information such as the size or position of the hand or arm (for example, the range of the skin color area that may be inferred from the size of the hand or arm, or the input image from the obtained multiple skin color parts The target skin color portion of the hand or arm is selected by using a restriction such as excluding a place where a hand or arm is unlikely to exist, such as the vicinity of the upper ceiling or the floor. As a specific example of how to select, if the user is usually wearing clothes, it is likely that the skin color candidate is likely to be both hands (or arms) and face, and the largest area Is considered to be a face, the skin color part having the second and third largest areas is selected as a hand or arm candidate. In the case of the usage method in which two hands (or arms) are used as user-specified positions, the positions of the center of gravity for the two (second and third largest) skin-colored part candidates are specified by the left and right hands or arms, respectively. It may be the position. For left and right selection, the left hand (arm) may be the left hand, and the right hand (arm) may be the right hand. In addition, in the case of a usage method in which only one user-specified position (one hand or one arm) is used, it is necessary to select one from two candidates. For example, if the hand is to specify the right hand (or right arm) in advance, The candidate on the right hand side of the body is likely to be the right hand (or right arm), so it can be selected as the skin color part of the right hand (or right arm), and the center of gravity position can be set as the user-specified position of the right hand (or right arm) . The same applies to the left hand (or left arm). As a method of detecting the shoulder position by image processing, for example, there is a method of first extracting the face position and then calculating the shoulder position. Specifically, first, from the result of performing the skin color extraction process, since the skin color part having the first largest area is highly likely to be a face, the skin color part is determined to be a face, and the center of gravity is obtained. Next, it can be assumed that the shoulder position is shifted from the center of gravity of the face by a certain distance downward and by a certain distance to the left and right. (There are individual differences, and the value may be changed by the user 6), and the left and right shoulder positions can be calculated from the center of gravity of the face. Further, when outputting the two-dimensional coordinates of the start point / end point, a plurality of candidate values may be obtained and a plurality of values may be output to the start / end point three-dimensional coordinate calculation unit 23. In this case, the start / end point three-dimensional coordinate calculation unit 23 is used when obtaining the start / end point three-dimensional coordinates. Thus, the two-dimensional coordinates on the input image of the start point and end point can be obtained. Although the shoulder position is the starting point here, the face position (center of gravity) obtained as described above may be used as it is. In that case, the extension line connecting the position of the face and the position of the hand (or arm) is the direction instructed by the user 6.

始終点３次元座標算出部２３は、生成された距離画像情報と、始点・終点の２次元座標から、始点・終点の３次元座標値を求めるものである（ステップ１０３）。具体的な方法としては、例えば、距離画像上で、始点の入力画像上での２次元座標と同じ位置の値（距離値）を参照し、それを始点の距離値とすればよい。終点も同様である。３次元の実空間上において、入力画像の２次元座標系と３次元座標系の変換は一般に、予め容易に算出しておけるので、それに基づいて得られた入力画像上での始点および終点の２次元座標値とその各距離値から、始点と終点の３次元空間上での３次元座標値を求めることができる。さらに、得られた始点・終点の２つの３次元座標値から、２点を結ぶ３次元直線を求めることにより、ユーザの指示方向を求めることができる。また、始点・終点の２次元座標が複数入力された場合（始終点２次元座標算出部２２にて記述）、距離画像情報に基づいて、複数の２次元座標候補から選択することもできる。例えば、ユーザ６のいる位置が予め制限された空間内にしかいないとすると、その制限を越えた場所を指示する候補を除くこと等が可能である。すなわち、始終点３次元座標算出部２３において、始終点３次元座標だけでなく、始終点２次元座標の絞込み処理も可能である。これにより、始終点２次元座標算出部２２における誤検出を異なる情報（距離画像情報）を用いて除外できるので、精度向上が期待できる。 The start / end point three-dimensional coordinate calculation unit 23 calculates a three-dimensional coordinate value of the start point / end point from the generated distance image information and the two-dimensional coordinates of the start point / end point (step 103). As a specific method, for example, a value (distance value) at the same position as the two-dimensional coordinate on the input image of the starting point on the distance image may be referred to and used as the starting point distance value. The same applies to the end point. In the three-dimensional real space, the conversion between the two-dimensional coordinate system and the three-dimensional coordinate system of the input image can generally be easily calculated in advance, so that the start point and the end point 2 on the input image obtained based on the conversion can be easily obtained. From the three-dimensional coordinate value and each distance value thereof, the three-dimensional coordinate value in the three-dimensional space of the start point and the end point can be obtained. Furthermore, by obtaining a three-dimensional straight line connecting the two points from the obtained two three-dimensional coordinate values of the start point and end point, the user's indication direction can be obtained. In addition, when a plurality of two-dimensional coordinates of the start point / end point are input (described by the start / end point two-dimensional coordinate calculation unit 22), a plurality of two-dimensional coordinate candidates can be selected based on the distance image information. For example, assuming that the position where the user 6 is located is only in a space restricted in advance, it is possible to remove candidates indicating a place exceeding the restriction. That is, the start / end point three-dimensional coordinate calculation unit 23 can narrow down not only the start / end point three-dimensional coordinates but also the start / end point two-dimensional coordinates. Thereby, since the erroneous detection in the start / end point two-dimensional coordinate calculation part 22 can be excluded using different information (distance image information), an improvement in accuracy can be expected.

空間情報登録部４は、ユーザ６が指示する可能性のある実空間中の物体等の情報を空間情報メモリ５に登録するものである（ステップ１０４）。実空間中の物体等としては、例えば、ユーザ６が部屋の中にいる場合には、部屋の中にある家電機器等（テレビ、エアコン、コンピュータ、時計、窓、棚、椅子、机、引出し、書類、オーディオ機器、照明機器等）の物体や、また部屋自体の壁、床、天井、窓等、任意のものが対象として考えられる。これらの物体等の情報（３次元位置の座標情報やその他物体に関する情報等）は、表１に示すように、予め空間情報データメモリ５に登録・保存しておく。 The spatial information registration unit 4 registers information such as an object in the real space that the user 6 may instruct in the spatial information memory 5 (step 104). As an object in the real space, for example, when the user 6 is in a room, home appliances in the room (TV, air conditioner, computer, clock, window, shelf, chair, desk, drawer, Documents, audio equipment, lighting equipment, etc.) objects, and walls, floors, ceilings, windows, etc. of the room itself can be considered as objects. Information on these objects and the like (three-dimensional position coordinate information and information on other objects) is registered and stored in advance in the spatial information data memory 5 as shown in Table 1.

また、情報の登録に関しては、予め固定の３次元位置座標としておくのではなく、対象とする実物体毎に位置認識可能なセンサ（市販されている磁気センサ、超音波センサ、赤外線タグ、無線タグ等）を取り付けておくことにより、各々の物体の位置をリアルタイムに認識することができるので、それらにより得られた３次元位置情報から該物体情報を生成し、常時その物体の３次元位置座標等の情報を更新していくことも可能である。この場合、物体を移動させても３次元位置情報等をリアルタイムに更新させることができる。 Regarding the registration of information, a sensor capable of recognizing a position for each target real object (commercially available magnetic sensor, ultrasonic sensor, infrared tag, wireless tag) is not used as fixed three-dimensional position coordinates in advance. Etc.), the position of each object can be recognized in real time. Therefore, the object information is generated from the three-dimensional position information obtained from the object, and the three-dimensional position coordinates of the object are always generated. It is also possible to update the information. In this case, even if the object is moved, the three-dimensional position information and the like can be updated in real time.

３次元指示位置検出部３は、ユーザ６が指し示した実空間中の３次元位置を検出するもので、例えば、投票処理部３１と検出処理部３２と投票箱データメモリ３３から構成される。ここで、予め３次元の実空間を分割した３次元のボクセル空間を設定する（図３）。例えば、幅５ｍ×奥行５ｍ×高さ５ｍの空間を各々１０等分した場合（すなわち１０×１０×１０個のボクセルに分割した場合）、１個のボクセルは５０ｃｍ×５０ｃｍ×５０ｃｍのサイズの空間領域に対応する。空間情報登録部４で登録された物体の座標情報に基づき、各物体はどのボクセルに属するか（空間的にどこに位置するか）を予め求めておき、空間情報データメモリ５に登録されている各物体毎に、対応するボクセル座標を求めておく。１個のボクセルには、１個の投票箱を設ける。ここで、実空間の分割数を（Ｘdiv，Ｙdiv，Ｚdiv）とし、座標（Ｘ，Ｙ，Ｚ）のボクセルをＶ（Ｘ，Ｙ，Ｚ）とする。投票箱とは、数値（整数でも実数でもよい）を記憶できるメモリであり、Ｘdiv×Ｙdiv×Ｚdiv個分のメモリがあればよく、これらのメモリを持つものを投票箱データメモリと呼ぶ。すべての投票箱に記憶されている値（投票値）は、最初に１度初期化（例えば値０をセット）しておく。 The three-dimensional designated position detection unit 3 detects a three-dimensional position in the real space pointed to by the user 6, and includes, for example, a voting processing unit 31, a detection processing unit 32, and a voting box data memory 33. Here, a three-dimensional voxel space obtained by dividing a three-dimensional real space in advance is set (FIG. 3). For example, when a space of width 5 m × depth 5 m × height 5 m is divided into 10 parts each (ie, divided into 10 × 10 × 10 voxels), one voxel has a size of 50 cm × 50 cm × 50 cm. Corresponds to the region. Based on the coordinate information of the object registered in the spatial information registration unit 4, which voxel belongs to each object (where it is spatially located) is obtained in advance, and each registered in the spatial information data memory 5. Corresponding voxel coordinates are obtained for each object. One voxel is provided with one ballot box. Here, the number of divisions in the real space is (Xdiv, Ydiv, Zdiv), and the voxel at coordinates (X, Y, Z) is V (X, Y, Z). The ballot box is a memory that can store a numerical value (which may be an integer or a real number), and it is sufficient if there is memory for Xdiv × Ydiv × Zdiv. A memory having these memories is called a ballot box data memory. The values (voting values) stored in all the ballot boxes are initialized once (for example, the value 0 is set).

投票処理部３１は、まず３次元指示方向情報算出部２の始終点３次元座標算出部２３で求められた、始点・終点を結ぶ３次元直線を手または腕（終点）方向に延長していき、このとき、該延長線が交差するボクセル（交差ボクセル）を算出する（ステップ１０５、図４）。延長線は、ボクセル空間内を通過する際に、複数のボクセルと交差するのでそれら交差ボクセルをすべて求める。求めた交差ボクセルに対応する投票箱に対し、各々投票処理を行う（ステップ１０５）。投票処理とは、投票箱に、予め定めた投票値を加算していく処理である。延長線が交差したボクセルの投票箱の値は、初期値０とし、投票値をｖとすると、０＋ｖ＝ｖとなる。投票するボクセルは、交差ボクセルすべてでもよいが、複数の交差ボクセルの中で、（前述したように）予め行う空間情報データとの対応付けによって、そのボクセルに何かの物体が対応付けられている場合のみ、投票する方法も考えられる（図５）。これにより、何もない空間のボクセル（すなわち投票箱）には投票されず、誤認識を減らすことができる。時間的に連続して入力される画像群に対し、各画像毎に３次元指示方向情報算出処理（ステップ１０１〜１０３）、および３次元位置検出処理（ステップ１０５、１０６）を行う。よって、投票処理も各画像毎に行うので、投票箱データメモリ３３内の各ボクセルの投票値は時間と共に積算される。 The voting processing unit 31 first extends the three-dimensional straight line connecting the start point and the end point obtained by the start / end point three-dimensional coordinate calculation unit 23 of the three-dimensional indication direction information calculation unit 2 in the hand or arm (end point) direction. At this time, the voxel (intersection voxel) where the extension lines intersect is calculated (step 105, FIG. 4). When the extension line passes through the voxel space, it intersects a plurality of voxels, so that all the intersecting voxels are obtained. Each voting process is performed for each voting box corresponding to the obtained cross voxel (step 105). The voting process is a process of adding a predetermined voting value to the voting box. The value of the voting box of the voxel where the extension line intersects is 0 + v = v where the initial value is 0 and the voting value is v. The voxels to be voted may be all of the intersecting voxels, but among the plurality of intersecting voxels, an object is associated with the voxel by association with the spatial information data performed in advance (as described above). Only in some cases, a method of voting can be considered (FIG. 5). Thereby, it is not voted for the voxel (namely, ballot box) of the space which has nothing, and misrecognition can be reduced. A three-dimensional pointing direction information calculation process (steps 101 to 103) and a three-dimensional position detection process (steps 105 and 106) are performed for each image group that is input continuously in time. Therefore, since the voting process is also performed for each image, the voting value of each voxel in the voting box data memory 33 is integrated with time.

検出処理部３２は、時間的に連続した画像毎に投票処理部３１にて投票されることにより投票箱データメモリ３３内の各ボクセルにおいて時間的に積算された投票値の中で、予め定めた閾値を超える投票値を持つ投票箱（すなわちボクセル）を検索する（ステップ１０６）。ここで、閾値とは、時間的に連続した画像に対する投票処理により積算された投票値がその閾値を超えたとき、その投票値のあるボクセルが、ユーザ６の指示する３次元指示位置であると判定するための値であり、例えば投票値ｖ＝１、閾値ｔｈ＝５のように任意に設定すればよい。この例のように１回の投票値より閾値を高く設定した場合は、複数の画像において同じ投票箱（ボクセル）に投票されるため（この場合は画像５回分）、ユーザ６がある一定時間（これは連続した時間でも不連続な時間でもよい）指し示すことにより目的のボクセルが検出されることになるので、ユーザ６が腕を動かす際の移動中等に、目的でないボクセルが誤検出されることが大幅に少なくなり、検出性能を高くすることができる。検索する方法は、ボクセル空間内のすべてのボクセル（に対応する投票箱）の中から、閾値を超える投票値を持つ投票箱（ボクセル）を探す方法がある。また、交差ボクセルの中だけから探す方法もあり、そのようにすれば探すボクセルの数を少なくでき処理量を小さくできる。閾値を超えた投票箱（ボクセル）を検出した場合、投票箱はすべて初期化（０にセット）すればよい。 The detection processing unit 32 determines in advance among the vote values accumulated in time in each voxel in the ballot box data memory 33 by voting by the voting processing unit 31 for each temporally continuous image. A voting box (that is, a voxel) having a voting value exceeding the threshold is searched (step 106). Here, the threshold value is that a voxel having the vote value is a three-dimensional designated position designated by the user 6 when the vote value accumulated by the voting process for the temporally continuous images exceeds the threshold value. It is a value for determination, and may be arbitrarily set, for example, voting value v = 1 and threshold value th = 5. When the threshold value is set higher than the one-time vote value as in this example, since a plurality of images are voted on the same ballot box (voxel) (in this case, five images), the user 6 has a certain time ( (This may be a continuous time or a discontinuous time.) By pointing, the target voxel is detected, so that the non-target voxel may be erroneously detected during the movement when the user 6 moves the arm. The detection performance can be increased significantly because the number is greatly reduced. The searching method includes a method of searching for a voting box (voxel) having a voting value exceeding a threshold value from all the voxels in the voxel space. In addition, there is a method of searching only from among the intersecting voxels. By doing so, the number of voxels to be searched can be reduced and the processing amount can be reduced. When ballot boxes (voxels) exceeding the threshold are detected, all the ballot boxes may be initialized (set to 0).

以上により、本実施形態によれば、ユーザ６が腕を伸ばして、３次元空間中の物体等を直接実空間中で、ある一定時間指し示すことにより、指し示された３次元位置（物体）を検出することが可能になる。 As described above, according to the present embodiment, the user 6 extends his / her arm and points an object or the like in the three-dimensional space directly in the real space for a certain period of time, thereby indicating the indicated three-dimensional position (object). It becomes possible to detect.

［第２の実施形態］
図６は本発明の第２の実施形態のインタフェース装置の構成図、図７はその処理の流れを示すフローチャートである。 [Second Embodiment]
FIG. 6 is a block diagram of the interface apparatus according to the second embodiment of the present invention, and FIG. 7 is a flowchart showing the processing flow.

本インタフェース装置は、画像入力部１ａ，１ｂと３次元指示方向情報算出部２と３次元指示位置検出部３と空間情報登録部４と空間情報データメモリ５から構成される。３次元指示位置検出部３は投票処理部３１と検出処理部３２と投票箱データメモリ３３から構成され、投票処理部３１は交差ボクセル投票処理部３１ａと隣接ボクセル投票処理部３１ｂから構成される。画像入力部１ａ，１ｂ、３次元指示方向情報算出部２、空間情報登録部４、空間情報データメモリ５については、第１の実施形態と同様の構成である。 The interface device includes image input units 1 a and 1 b, a three-dimensional pointing direction information calculation unit 2, a three-dimensional pointing position detection unit 3, a spatial information registration unit 4, and a spatial information data memory 5. The three-dimensional designated position detection unit 3 includes a voting processing unit 31, a detection processing unit 32, and a voting box data memory 33. The voting processing unit 31 includes a cross voxel voting processing unit 31a and an adjacent voxel voting processing unit 31b. The image input units 1a and 1b, the three-dimensional indication direction information calculation unit 2, the spatial information registration unit 4, and the spatial information data memory 5 have the same configuration as in the first embodiment.

交差ボクセル投票処理部３１ａは、第１の実施形態と同様に、まず３次元指示方向情報算出部２の始終点３次元座標算出部２３で求められた、始点・終点を結ぶ３次元直線を手または腕（終点）方向に延長していく。このとき、該延長線が交差するボクセル（交差ボクセル）を算出する。延長線は、ボクセル空間内を通過する際に、複数のボクセルと交差するのでそれら交差ボクセルをすべて求める。求めた交差ボクセルに対応する投票箱に対し、各々投票処理を行う（ステップ１０７）。投票処理とは、投票箱に、予め定めた投票値を加算していく処理である。 As in the first embodiment, the intersecting voxel voting processing unit 31a first calculates a three-dimensional straight line connecting the start point and the end point obtained by the start / end point three-dimensional coordinate calculation unit 23 of the three-dimensional indication direction information calculation unit 2. Or extend in the arm (end point) direction. At this time, the voxel (intersection voxel) where the extension lines intersect is calculated. When the extension line passes through the voxel space, it intersects a plurality of voxels, so that all the intersecting voxels are obtained. Each voting process is performed for each voting box corresponding to the obtained cross voxel (step 107). The voting process is a process of adding a predetermined voting value to the voting box.

隣接ボクセル投票処理部３１ｂは、交差ボクセル投票処理部３１ａで投票が行われる各々のボクセルに対して、その周辺の（隣接した）ボクセルにも投票処理を行うものである（ステップ１０８）。投票するボクセルは、例えば３次元ボクセル空間における該ボクセルの２６近傍の全ボクセルでもよいし、もしくは６近傍のボクセルや１８近傍のボクセルでもよいし、また２次元上の４近傍のボクセルや８近傍のボクセルでもよい（図８）。投票する値（投票値）は、近傍すべて同じ値でもよいし、異なる値でもよい。例えば、２６近傍への投票を考えた場合、中央の交差ボクセルへの投票値をａとし、隣接の近傍ボクセルへの投票値をそれぞれ、６近傍ボクセルをｂ、１８近傍ボクセルから６近傍ボクセルを除いた近傍ボクセルをｃ、２６近傍ボクセルから１８近傍ボクセルを除いた近傍ボクセルをｄとすればよい。ここで、ａ＝ｂ＝ｃ＝ｄ＝１（または１以外の値）としてもよいし、また中心（交差ボクセル）に近いボクセルほど値を大きくして、例えばａ＝１０、ｂ＝５、ｃ＝２、ｄ＝１のようにしてもよい（図９）。ｂ＝ｃ＝ｄ＝０とすれば交差ボクセルのみへの投票処理と、ｃ＝ｄ＝０とすれば６近傍への投票処理と、ｄ＝０とすれば１８近傍への投票処理と同じである。中心に近いボクセルほど値を大きくすることにより、投票処理による認識結果の精度を向上させることができる。これは、３次元指示方向情報算出部２で得られる３次元指示方向の情報に含まれる誤差により３次元指示方向が、本来指し示すべき正しい方向よりも周囲の方向にずれる場合でも、正しい方向にあるボクセルへの投票も行われるので、精度向上が期待できるためである。また、これにより、時間と共に投票値が積算される際に、正しい方向のボクセルが早く閾値を超えることによる高速化が期待できる。 The adjacent voxel voting processing unit 31b performs a voting process for each voxel for which voting is performed by the intersecting voxel voting processing unit 31a (step 108). The voxels to be voted may be, for example, all the voxels in the vicinity of 26 of the voxels in the three-dimensional voxel space, may be the voxels in the vicinity of 6, the voxels in the vicinity of 18, the four voxels in the two dimensions, or the eight voxels in the vicinity. Voxels may be used (FIG. 8). The values to be voted (voting values) may be the same for all neighbors or different values. For example, when voting to 26 neighborhoods is considered, the vote value for the central intersecting voxel is a, the vote values for the neighboring neighborhood voxels are b, and the 6 neighborhood voxels are excluded from the 18 neighborhood voxels. The neighboring voxel may be c, and the neighboring voxel obtained by removing the 18 neighboring voxels from the 26 neighboring voxels may be d. Here, a = b = c = d = 1 (or a value other than 1) may be set, or the value is increased as the voxel is closer to the center (cross voxel), for example, a = 10, b = 5, c = 2 and d = 1 (FIG. 9). If b = c = d = 0, the voting process only for the intersecting voxels, if c = d = 0, the voting process for 6 neighborhoods, and if d = 0, the voting process for 18 neighborhoods are the same. is there. By increasing the value of the voxel closer to the center, the accuracy of the recognition result by the voting process can be improved. This is in the correct direction even when the three-dimensional indication direction is shifted to the surrounding direction from the correct direction to be originally indicated due to an error included in the information of the three-dimensional indication direction obtained by the three-dimensional indication direction information calculation unit 2. This is because voting for voxels is also performed, and an improvement in accuracy can be expected. In addition, as a result, when the vote values are integrated with time, it is possible to expect an increase in speed due to the voxels in the correct direction quickly exceeding the threshold.

時間的に連続して入力される画像群に対し、各画像毎に３次元指示方向情報算出処理１００、投票処理（交差ボクセル投票処理１０７、隣接ボクセル投票処理１０８）、検出処理１０９を行う。よって、投票処理も各画像毎に行われるので、投票箱データメモリ３３内の各ボクセルの投票値は時間と共に積算される。 A three-dimensional pointing direction information calculation process 100, a voting process (a cross voxel voting process 107, an adjacent voxel voting process 108), and a detection process 109 are performed for each image group that is input continuously in time. Therefore, since the voting process is also performed for each image, the voting value of each voxel in the voting box data memory 33 is integrated with time.

検出処理部３２は、第１の実施形態と同様に、投票箱データメモリ３３内の各ボクセルの投票値の中で、予め定めた閾値を超える投票値を持つ投票箱（すなわちボクセル）を検索する（ステップ１０６）。閾値とは、時間的に連続した画像に対する投票処理によりの積算された値がその閾値を超えたとき、その投票値のあるボクセルが、ユーザ６の指示する３次元指示位置であると判定するための値であり、例えば投票値ａ＝１０、ｂ＝５、ｃ＝２、ｄ＝１、閾値ｔｈ＝２０などのように任意に設定すればよい。検索する方法は、第１の実施形態と同様に、ボクセル空間内のすべてのボクセル（に対応する投票箱）の中から、閾値を超える投票値を持つ投票箱（ボクセル）を探す方法や、交差ボクセルの中だけから探す方法もある。また、投票した交差ボクセルと近傍ボクセルの中から探してもよく、そのようにすれば探すボクセルの数を少なくでき処理量を小さくできる。閾値を超えた投票箱（ボクセル）を検出した場合、投票箱はすべて初期化（０にセット）すればよい。 As in the first embodiment, the detection processing unit 32 searches for a voting box (that is, a voxel) having a voting value exceeding a predetermined threshold value among the voting values of each voxel in the voting box data memory 33. (Step 106). The threshold value is used to determine that the voxel having the vote value is the three-dimensional designated position designated by the user 6 when the accumulated value obtained by the voting process on the temporally continuous images exceeds the threshold value. For example, voting values a = 10, b = 5, c = 2, d = 1, threshold th = 20, etc. may be set arbitrarily. As in the first embodiment, the searching method is to search for a voting box (voxel) having a voting value exceeding a threshold from all voxels in (the corresponding voting box) in the voxel space, There is also a way to search from within voxels. Further, it may be searched from the voted cross voxels and neighboring voxels. In this way, the number of voxels to be searched can be reduced and the processing amount can be reduced. When ballot boxes (voxels) exceeding the threshold are detected, all the ballot boxes may be initialized (set to 0).

以上により、本実施形態によれば、ユーザ６が腕を伸ばして、３次元空間中の物体等を直接実空間中で、ある一定時間指し示すことにより、指し示された３次元位置（物体）を検出することが可能になるとともに、第１の実施形態よりも精度よく、かつ高速に物体検出を行うことが可能になる。 As described above, according to the present embodiment, the user 6 extends his / her arm and points an object or the like in the three-dimensional space directly in the real space for a certain period of time, thereby indicating the indicated three-dimensional position (object). In addition to being able to detect, it is possible to detect an object with higher accuracy and higher speed than in the first embodiment.

［第３の実施形態］
図１０は本発明の第３の実施形態のインタフェース装置の構成図、図１１はその処理の流れを示すフローチャートである。 [Third Embodiment]
FIG. 10 is a block diagram of an interface apparatus according to the third embodiment of the present invention, and FIG. 11 is a flowchart showing the processing flow.

本インタフェース装置は、画像入力部１ａ，１ｂと３次元指示方向情報算出部２と３次元指示位置検出部３と空間情報登録部４と空間情報データメモリ５から構成される。３次元指示位置検出部３は投票処理部３１と検出処理部３２と投票箱データメモリ３３と投票値減算処理部３４から構成され、投票処理部３１は交差ボクセル投票処理部３１ａと隣接ボクセル投票処理部３１ｂから構成される。画像入力部１ａ，１ｂ、３次元指示方向情報算出部２、交差ボクセル投票処理部３１ａ、隣接ボクセル投票処理部３１ｂ、検出処理部３２、投票箱データメモリ３３、空間情報登録部４、空間情報データメモリ５については、第２の実施形態と同様の構成である。 The interface device includes image input units 1 a and 1 b, a three-dimensional pointing direction information calculation unit 2, a three-dimensional pointing position detection unit 3, a spatial information registration unit 4, and a spatial information data memory 5. The three-dimensional indication position detection unit 3 includes a voting processing unit 31, a detection processing unit 32, a voting box data memory 33, and a voting value subtraction processing unit 34. The voting processing unit 31 includes a cross voxel voting processing unit 31a and an adjacent voxel voting process. It consists of the part 31b. Image input unit 1a, 1b, three-dimensional pointing direction information calculation unit 2, cross voxel voting processing unit 31a, adjacent voxel voting processing unit 31b, detection processing unit 32, ballot box data memory 33, spatial information registration unit 4, spatial information data The memory 5 has the same configuration as that of the second embodiment.

投票値減算処理部３４は、時間的に連続して入力される画像群に対し、各画像毎に３次元指示方向情報算出処理（ステップ１００）、投票処理（ステップ１０７、１０８）、検出処理（ステップ１０６）を行った後に、投票箱データメモリ３３に積算された投票値を、時間の経過と共に（画像を変えて次に進む毎に）減算する処理を行うものである。 The voting value subtraction processing unit 34 performs a three-dimensional pointing direction information calculation process (step 100), a voting process (steps 107 and 108), and a detection process (for each image) for an image group that is input continuously in time. After step 106), the voting value accumulated in the ballot box data memory 33 is subtracted with the passage of time (every time the image is changed and the next step is performed).

減算処理の方法については例えば以下の方法がある。ある画像が入力された時刻をｔ（ｉ）、連続した次の画像が入力された時刻をｔ（ｉ＋１）とする。時刻ｔ（ｉ）における処理で投票された後の投票データメモリ３３内の任意のボクセル（ｘ，ｙ，ｚ）に積算された投票値をｖ（ｉ）（ｘ，ｙ，ｚ）とし、減算値をｓとすると、すべてのボクセルの投票箱に対して、
ｖ（ｉ＋１）＝ｖ（ｉ）（ｘ，ｙ，ｚ）−ｓ
を行うことにより実行できる。投票箱の値が０の場合は、ｓを減算しなくてよい。 For example, there are the following methods for the subtraction process. Let t (i) be the time when an image is input, and t (i + 1) when the next successive image is input. Subtract the voting value accumulated in an arbitrary voxel (x, y, z) in the voting data memory 33 after voting in the process at time t (i) as v (i) (x, y, z). If the value is s, for all voxel ballot boxes,
v (i + 1) = v (i) (x, y, z) -s
It can be executed by doing. When the value of the ballot box is 0, s need not be subtracted.

第１、第２の実施形態では、各ボクセルの投票箱では、どれかが閾値を超えて検出されるまでは、時間と共にその積算される投票箱の値が増加し続けるため、ユーザ６が投票するつもりのない場所にも、その間ずっと積算された値が残ってしまう問題があったが、本実施形態の投票値減算処理により、しばらくユーザ６が指し示していない方向のボクセルは、積算された投票値は時間と共に減少していき、最後は０に落ち着くので、ユーザ６が意図しないボクセルを検出してしまう誤検出を減らすことができる利点を持っている。 In the first and second embodiments, the value of the accumulated ballot box continues to increase with time until one of the ballot boxes of each voxel is detected exceeding the threshold value, so that the user 6 votes. Although there is a problem that the accumulated value remains in the place where the user 6 does not intend to perform, the vote value subtraction process according to the present embodiment causes the voxel in the direction that the user 6 has not pointed for a while to accumulate the vote. Since the value decreases with time and settles to 0 at the end, there is an advantage that it is possible to reduce false detection that the user 6 detects unintended voxels.

また、上記では、減算値を−ｓとしたが、引き算ではなく掛け算により値を減らす方法もある。例えば、掛ける値をｍとし、ｍを０〜１の間の実数とすると、
ｖ（ｉ＋１）＝ｖ（ｉ）（ｘ，ｙ，ｚ）×ｍ
を行うことにより、同様に時間と共に値を減らすことができる。求められた値はそのままでは実数になるが、計算しやすいように、小数点以下を切り捨てるなどの整数化を行ってもよい。減算を引き算（−ｓ）で行う場合は、時間と共に単調に減少していくが、掛け算（×ｍ）で行う場合は、時間が経つにつれ、減少する量は小さくなる特徴を持つ。必要に応じ使い分ければよい。 In the above description, the subtraction value is -s. However, there is a method of reducing the value not by subtraction but by multiplication. For example, if the value to be multiplied is m and m is a real number between 0 and 1,
v (i + 1) = v (i) (x, y, z) × m
By performing the above, the value can be reduced with time. Although the obtained value is a real number as it is, it may be converted into an integer such as truncating after the decimal point for easy calculation. When subtraction is performed by subtraction (-s), it decreases monotonically with time. However, when it is performed by multiplication (xm), the amount of decrease decreases with time. Use them as needed.

以上により、本実施形態によれば、ユーザ６が腕を伸ばして、３次元空間中の物体等を直接実空間中で、ある一定時間指し示すことにより、指し示された３次元位置（物体）を検出することが可能になるとともに、第１の実施形態よりも精度よく、かつ高速に物体検出を行うことが可能になるとともに、ユーザ６が意図しないボクセル位置を誤検出する可能性を減らすことができ、精度向上が図れる。 As described above, according to the present embodiment, the user 6 extends his / her arm and points an object or the like in the three-dimensional space directly in the real space for a certain period of time, thereby indicating the indicated three-dimensional position (object). In addition to being able to detect, it is possible to detect an object with higher accuracy and higher speed than in the first embodiment, and to reduce the possibility of erroneous detection of a voxel position that the user 6 does not intend. And accuracy can be improved.

なお、以上説明したインタフェース装置の機能は専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フロッピーディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。 The functions of the interface device described above are not realized by dedicated hardware, but a program for realizing the functions is recorded on a computer-readable recording medium and recorded on the recording medium. The program may be read into a computer system and executed. The computer-readable recording medium refers to a recording medium such as a floppy disk, a magneto-optical disk, a CD-ROM, or a storage device such as a hard disk device built in the computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short period of time, as in the case of transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.

本発明の第１の実施形態のインタフェース装置の構成図である。It is a block diagram of the interface apparatus of the 1st Embodiment of this invention. 第１の実施形態のインタフェース装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the interface apparatus of 1st Embodiment. ボクセル空間の例を示す図である。It is a figure which shows the example of a voxel space. 交差ボクセルすべてに投票処理を行う例の説明図である。It is explanatory drawing of the example which performs a voting process to all the crossing voxels. 物体が登録されている交差ボクセルのみに投票処理を行う例の説明図である。It is explanatory drawing of the example which performs a voting process only to the crossing voxel with which the object is registered. 本発明の第２の実施形態のインタフェース装置の構成図である。It is a block diagram of the interface apparatus of the 2nd Embodiment of this invention. 第２の実施形態のインタフェース装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the interface apparatus of 2nd Embodiment. 投票するボクセルの例を示す図である。It is a figure which shows the example of the voxel to vote. 重み付きの投票値（同図（ａ））と、重み付きの投票値の例（同図（ｂ））を示す図である。It is a figure which shows the weighted vote value (the figure (a)) and the example of the weighted vote value (the figure (b)). 本発明の第３の実施形態のインタフェース装置の構成を示す図である。It is a figure which shows the structure of the interface apparatus of the 3rd Embodiment of this invention. 第３の実施形態のインタフェース装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the interface apparatus of 3rd Embodiment.

Explanation of symbols

１ａ，１ｂ画像入力部
２３次元指示方向情報算出部
３３次元指示位置検出部
４空間情報登録部
５空間情報データメモリ
６ユーザ
２１距離画像生成部
２２始終点２次元座標算出部
２３始終点３次元座標算出部
３１投票処理部
３１ａ交差ボクセル投票処理部
３１ｂ隣接ボクセル投票処理部
３２検出処理部
３３投票箱データメモリ
３４投票値減算処理部
１００〜１０９ステップ DESCRIPTION OF SYMBOLS 1a, 1b Image input part 2 3D designation | designated direction information calculation part 3 3D designation | designated position detection part 4 Spatial information registration part 5 Spatial information data memory 6 User 21 Distance image generation part 22 Start / end point two-dimensional coordinate calculation part 23 Start / end point 3 Dimensional coordinate calculation unit 31 Voting processing unit 31a Cross voxel voting processing unit 31b Adjacent voxel voting processing unit 32 Detection processing unit 33 Ballot box data memory 34 Voting value subtraction processing unit 100 to 109 Steps

Claims

An interface method for inputting images taken by a plurality of cameras and recognizing a position or an object in a real space indicated by an operator using a body part,
A spatial information registration step for registering part or all of the information of the object in the real space and the additional information thereof;
From a plurality of input images photographed by a plurality of cameras, the three-dimensional coordinates of the start point and end point of the body part of the operator related to the direction indicated by the operator are calculated, and in the real space indicated by the operator A three-dimensional pointing direction detecting step for obtaining three-dimensional pointing direction information of
From the obtained three-dimensional indication direction information and the registered object information, the extension line in the direction indicated by the operator, which is three-dimensional indication position information indicated by the operator, and the registered object 3D pointing position detecting step for detecting information related to the intersection of the three-dimensional pointing direction information of a ballot box data memory including a ballot box corresponding to each voxel in the voxel space obtained by dividing the real space on a one-to-one basis A voting process step for performing voting on a ballot box corresponding to a voxel intersecting with an extension line in the direction indicated by the operator, and information on voting values recorded in each ballot box in the ballot box data memory; A three-dimensional pointing position detection step including a detection step of detecting a voxel at the three-dimensional pointing position indicated by the operator from the registered object information.

The voting process step further includes a step of voting a voxel adjacent to intersecting voxels and preset with voting boxes having different weights for each voxel, and the detecting step is continuous in time The interface method according to claim 1, wherein a voxel having a voting value accumulated by a voting process for a selected image exceeds a predetermined threshold is detected as a voxel at a three-dimensional designated position indicated by the operator.

The interface method according to claim 2, wherein the weighting is such that a higher score is obtained from a side closer to the intersecting voxel.

The interface method according to claim 1, wherein the three-dimensional indication position detection step further includes a step of subtracting the vote value for temporally continuous images.

An interface device for inputting images taken by a plurality of cameras and recognizing a position or an object in a real space indicated by an operator using a body part,
Spatial information registration means for registering part or all of the information of the object in the real space and the additional information thereof;
From a plurality of input images photographed by a plurality of cameras, the three-dimensional coordinates of the start point and end point of the body part of the operator related to the direction indicated by the operator are calculated, and in the real space indicated by the operator 3D pointing direction information calculating means for obtaining 3D pointing direction information of
Based on the obtained three-dimensional indication direction information and the registered object information, the extension line in the direction indicated by the operator, which is three-dimensional indication position information indicated by the operator, and the registered object A three-dimensional pointing position detecting means for detecting information related to the intersection, a ballot box data memory including a one-to-one ballot box corresponding to each voxel in the voxel space obtained by dividing the real space, and the three-dimensional pointing direction information Voting processing means for performing voting processing on the ballot box corresponding to the voxel through which the extension line in the direction indicated by the operator passes, and information on the voting value recorded in each ballot box in the ballot box data memory, An interface device comprising: three-dimensional pointing position detection means including detection processing means for detecting voxels at the three-dimensional pointing position indicated by the operator from registered object information.

An interface program for causing a computer to execute the interface method according to claim 1.