JP2017049662A

JP2017049662A - Information processing apparatus, control method thereof, program, and storage medium

Info

Publication number: JP2017049662A
Application number: JP2015170602A
Authority: JP
Inventors: 伊藤　光; Hikari Ito; 光伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2017-03-09
Anticipated expiration: 2035-08-31
Also published as: JP6618301B2

Abstract

PROBLEM TO BE SOLVED: To improve accuracy of processing for acquiring position information of an end of a predetermined operating body, from three-dimensional position information obtained on the operating body.SOLUTION: An information processing apparatus acquires three-dimensional position information on an operating body which exists in a space on a predetermined operation surface, detects a first point corresponding to an end of the operating body, on the basis of the acquired three-dimensional position information of the operating body, detects a second point corresponding to the end of the operator, on the basis of a shape of the operating body captured in an area of interest including the position of the first point, in a two-dimensional image obtained by capturing the space, and recognizes two-dimensional position information of the first point in parallel to the operation surface, or two-dimensional position information of the second point in parallel to the operation surface, as a position on the operation surface pointed by the operating body.SELECTED DRAWING: Figure 1

Description

本発明は、認識対象とタッチ対象面の近接状態に基づいてタッチ操作を認識する技術に関する。 The present invention relates to a technique for recognizing a touch operation based on a proximity state between a recognition target and a touch target surface.

各種カメラやセンサを使って検出した人の手の動きや位置に応じてＵＩ（ユーザインターフェース）を操作するジェスチャ認識技術による機器操作が広まりつつある。テーブル面に画像やＵＩを映し、その画像やＵＩを手やペン等で触れて操作するテーブルトップインタフェースにおいても、タッチパネルを用いずに、指先やペン先等、所定の操作体の端部がテーブルにタッチした状態を検出する方式が使用され始めている。 Device operation based on gesture recognition technology that operates a UI (user interface) in accordance with the movement and position of a human hand detected using various cameras and sensors is becoming widespread. Even in a table top interface in which an image or UI is displayed on a table surface and the image or UI is operated by touching it with a hand or a pen, the end of a predetermined operation body such as a fingertip or a pen tip is not a touch panel. The method of detecting the state of touching is beginning to be used.

特許文献１では、カメラにより取得する二次元データから、肌色成分を有する画素群を手が被写体として写る手領域として抽出する処理において、抽出に失敗した場合には三次元スキャナにより取得する三次元画像データを用いて手の形状を検出して補完する。 In Patent Document 1, in a process of extracting a pixel group having a skin color component from a two-dimensional data acquired by a camera as a hand region in which a hand appears as a subject, a three-dimensional image acquired by a three-dimensional scanner when extraction fails. Use data to detect and complement the hand shape.

特開２００７−２４１８３３号公報JP 2007-241833 A

特許文献１のように、三次元画像データを、被写体の三次元位置情報を得るために利用する場合は、画像データの各画素値が奥行き方向の位置情報に対応するものとして座標への変換を行う。奥行き方向にある程度の幅を持つ被写体や、重なりのある複数の被写体を認識する場合は、奥行き方向の座標値に閾値を設けるなどして、データ画像から手前側の被写体が写る領域の輪郭を抽出することができる。ただし、奥行き方向の分解能が低い場合や、環境等の要因で画素情報から位置情報への変換に誤差が生じる場合、また、上記被写体の輪郭を正確に得ることは難しい。三次元画像データを使って、指先やペン先等、操作体の端部がテーブル面にタッチした状態とみなせるかを、テーブル面への近接の程度を検出する場合、操作体の輪郭が実際と異なると誤認識の原因になり易い。例えば、タッチ操作によって指定される位置が、実際の指先位置とずれてしまう。 When using 3D image data to obtain 3D position information of a subject as in Patent Document 1, conversion to coordinates is performed assuming that each pixel value of the image data corresponds to position information in the depth direction. Do. When recognizing a subject with a certain width in the depth direction or a plurality of overlapping subjects, a contour is extracted from the data image by setting a threshold value for the coordinate value in the depth direction. can do. However, when the resolution in the depth direction is low, or when an error occurs in the conversion from pixel information to position information due to factors such as the environment, it is difficult to accurately obtain the contour of the subject. When using 3D image data to detect the degree of proximity to the table surface, such as the fingertip or pen tip, the edge of the operation object can be regarded as touching the table surface, If they are different, it is easy to cause misrecognition. For example, the position specified by the touch operation is shifted from the actual fingertip position.

本発明は、上記を鑑みてなされたものであり、所定の操作体に関して得られた三次元の位置情報から、操作位置として認識される端部の位置情報を得る処理の精度を向上させることを目的とする。 The present invention has been made in view of the above, and improves the accuracy of processing for obtaining position information of an end recognized as an operation position from three-dimensional position information obtained with respect to a predetermined operation tool. Objective.

以上の課題を解決するために、本発明の情報処理装置は、所定の操作面上の空間内に存在する操作体に関する三次元の位置情報を取得する操作体取得手段と、前記空間を撮像した二次元画像を取得する画像取得手段と、前記操作体取得手段によって取得された前記三次元の位置情報に基づいて、前記操作体の端部に相当する第１の点を検出する第１検出手段と、前記二次元画像において、前記第１検出手段によって検出された前記第１の点の位置を含む注目領域を設定する設定手段と、前記二次元画像のうち前記設定手段によって前記注目領域に撮像された前記操作体の形状に基づいて、前記操作体の前記端部に相当する第２の点を検出する第２検出手段と、前記第１検出手段が検出した前記第１の点の前記操作面に平行な二次元の位置情報と、前記第２検出手段が検出した前記第２の点の前記操作面に平行な二次元の位置情報とのいずれかを、前記操作体によって指示される前記操作面上の位置として認識する認識手段と、を備える。 In order to solve the above-described problems, an information processing apparatus according to the present invention images an operation object acquisition unit that acquires three-dimensional position information regarding an operation object existing in a space on a predetermined operation surface, and the space. Image acquisition means for acquiring a two-dimensional image, and first detection means for detecting a first point corresponding to an end of the operation body based on the three-dimensional position information acquired by the operation body acquisition means And setting means for setting a region of interest including the position of the first point detected by the first detection unit in the two-dimensional image, and imaging the region of interest by the setting unit of the two-dimensional image. A second detection means for detecting a second point corresponding to the end of the operating body based on the shape of the operating body, and the operation of the first point detected by the first detecting means. Two-dimensional position information parallel to the surface Recognizing means for recognizing one of the two-dimensional position information parallel to the operation surface of the second point detected by the second detection device as a position on the operation surface indicated by the operation body. And comprising.

本発明によれば、所定の操作体に関して得られた三次元の位置情報から、操作位置として認識される端部の位置情報を得る処理の精度が向上する。 According to the present invention, the accuracy of processing for obtaining position information of an end recognized as an operation position from three-dimensional position information obtained with respect to a predetermined operation body is improved.

情報処理装置１００を利用するテーブルトップインタフェースシステムの一例を表す図The figure showing an example of the table top interface system using information processor 100 情報処理装置１００のハードウェア構成、及び機能構成の一例を表す図The figure showing an example of hardware constitutions and functional composition of information processor 100 第１の実施形態での情報処理装置１００が実行するメイン処理の流れの一例を示すフローチャートThe flowchart which shows an example of the flow of the main process which the information processing apparatus 100 in 1st Embodiment performs 第１の実施形態での操作位置の決定処理の流れの一例を示すフローチャートThe flowchart which shows an example of the flow of the determination process of the operation position in 1st Embodiment. ユーザの手の状態の具体例とそれに対応して距離画像から検出される手領域の例を表す図The figure showing the example of the state of a user's hand, and the example of the hand area detected from a distance image corresponding to it ユーザの手の状態の具体例とそれに対応して距離画像から検出される手領域の例を表す図The figure showing the example of the state of a user's hand, and the example of the hand area detected from a distance image corresponding to it 操作位置の決定処理の一例を示す図The figure which shows an example of the determination process of an operation position 操作位置の決定処理の一例を示す図The figure which shows an example of the determination process of an operation position 操作位置の決定処理の一例を示す図The figure which shows an example of the determination process of an operation position 操作位置の決定処理の一例を示す図The figure which shows an example of the determination process of an operation position 変形例２での操作位置の決定処理の流れの一例を示すフローチャートThe flowchart which shows an example of the flow of the determination process of the operation position in the modification 2. 操作位置の決定処理の一例を示す図The figure which shows an example of the determination process of an operation position 第２の実施形態での情報処理装置のメイン処理の流れの一例を示すフローチャートThe flowchart which shows an example of the flow of the main process of the information processing apparatus in 2nd Embodiment. 手領域の決定処理の流れの一例を示すフローチャートThe flowchart which shows an example of the flow of a hand region determination process 手領域の決定処理の一例を示す図The figure which shows an example of the determination process of a hand region

以下、本発明に係る実施例の情報処理を、図面を参照して詳細に説明する。なお、実施例に記載する構成は例示であり、本発明の範囲をそれらの構成に限定する趣旨のものではない。 Hereinafter, information processing according to an embodiment of the present invention will be described in detail with reference to the drawings. In addition, the structure described in an Example is an illustration and is not the meaning which limits the scope of the present invention to those structures.

＜第１の実施形態＞
まず、第１の実施形態として、テーブルトップインタフェースシステムのテーブル面に投影されたアイテムに対し操作者が行うタッチ操作を認識する処理の例を説明する。 <First Embodiment>
First, as a first embodiment, an example of processing for recognizing a touch operation performed by an operator on an item projected on a table surface of a table top interface system will be described.

図１（ａ）は、本実施形態に係る情報処理装置１００を設置したテーブルトップインタフェースシステムの外観の一例である。操作面１０１は、テーブルトップインタフェースのテーブル部分であり、操作者は、操作面１０１をタッチすることでタッチ操作を入力することが可能である。ただし、本実施形態では、操作面１０１にタッチセンサは搭載されない。そのため、情報処理装置１００は、操作面１０１とユーザが操作する指やペンなどの操作体が実際に接触したかではなく、接触したとみなせる程度に近接した状態をタッチ入力中の状態（以下、タッチ状態）検出することで、タッチ操作の認識を可能とする。 FIG. 1A is an example of the appearance of a tabletop interface system in which the information processing apparatus 100 according to the present embodiment is installed. The operation surface 101 is a table portion of the table top interface, and the operator can input a touch operation by touching the operation surface 101. However, in this embodiment, a touch sensor is not mounted on the operation surface 101. For this reason, the information processing apparatus 100 does not determine whether the operation surface 101 and an operation body such as a finger or a pen operated by the user are actually in contact with each other, but a state close enough to be regarded as being in contact (hereinafter referred to as touch input). The touch operation can be recognized by detecting the touch state.

本実施形態では、操作面１０１の上方に、操作面を見下ろすようにして距離画像センサ１０２が設置される。距離画像とは、各画素の値に、当該距離画像を撮像する撮像手段の基準位置（例えばレンズ中心など）から、当該画素に撮像された被写体表面までの距離に対応する情報が反映された画像である。本実施形態において、距離画像センサ１０２が撮像する距離画像の画素値には、距離画像センサ１０２から、操作面１０１あるいはその上方に存在する物体表面までの距離が反映される。撮像された距離画像は、情報処理装置１００に距離画像として入力される。情報処理装置１００は、距離画像を解析することで操作者の手１０６の三次元位置を取得し、入力される操作を認識する。従って操作者は、操作面上の空間（操作面１０１と距離画像センサ１０２の間の空間）のうち、距離画像センサ１０２によって撮像可能な範囲において、手などの所定の物体を動かすことにより空間ジェスチャ操作を入力することが可能である。本実施形態では、赤外光の反射パターン（または反射時間）によって距離情報を取得する方式のセンサを利用する。ただし、例えばステレオカメラシステムや、赤外光発光素子と赤外受光素子を設置することで距離画像を得ることも可能である。また、操作面１０１を含む空間において、操作体の高さ方向を含む三次元の位置情報が得られる手段であれば、距離画像を撮像する形態に限らず、例えば静電センサや温度センサにより三次元の位置情報を得る方法でも、本実施形態を実施可能である。 In the present embodiment, the distance image sensor 102 is installed above the operation surface 101 so as to look down at the operation surface. A distance image is an image in which information corresponding to the distance from the reference position (for example, the center of a lens, etc.) of the imaging means that captures the distance image to the subject surface imaged by the pixel is reflected in the value of each pixel. It is. In the present embodiment, the distance image sensor 102 captures the distance value from the distance image sensor 102 to the operation surface 101 or the object surface above the distance image. The captured distance image is input to the information processing apparatus 100 as a distance image. The information processing apparatus 100 acquires the three-dimensional position of the operator's hand 106 by analyzing the distance image, and recognizes the input operation. Therefore, the operator moves the space gesture by moving a predetermined object such as a hand in a range that can be imaged by the distance image sensor 102 in the space on the operation surface (the space between the operation surface 101 and the distance image sensor 102). An operation can be entered. In the present embodiment, a sensor that acquires distance information using a reflection pattern (or reflection time) of infrared light is used. However, for example, a distance image can be obtained by installing a stereo camera system or an infrared light emitting element and an infrared light receiving element. In addition, in the space including the operation surface 101, any means capable of obtaining three-dimensional position information including the height direction of the operation body is not limited to a mode in which a distance image is captured. The present embodiment can also be implemented by a method for obtaining original position information.

また本実施形態では、可視光カメラ１０３が上方から操作面１０１を見下ろすようにして設置される。情報処理装置１００は、可視光カメラ１０３を制御して、操作面１０１に載置された物体を撮像してその読み取り画像を得る書画カメラとして機能することができる。情報処理装置１００は、可視光カメラ１０３によって得られる可視光画像や、距離画像センサ１０２によって得られる距離画像に基づいて、操作面１０１上の空間に存在する物体を検出し、さらに識別する。物体には、例えば、操作者の手、紙媒体や本などのドキュメントやその他の立体物を含む。ただし、図１（ａ）に例示するシステムの場合は、距離画像センサ１０２と可視光カメラ１０３の画角には、テーブル周囲に存在する操作者の頭部は含まれない。そのため得られた距離画像では、画像端部がユーザの腕（肩から先の部分）の何処か一部と交差する。 In the present embodiment, the visible light camera 103 is installed so as to look down on the operation surface 101 from above. The information processing apparatus 100 can function as a document camera that controls the visible light camera 103 to capture an image of an object placed on the operation surface 101 and obtain a read image thereof. The information processing apparatus 100 detects and further identifies an object existing in the space on the operation surface 101 based on a visible light image obtained by the visible light camera 103 and a distance image obtained by the distance image sensor 102. The object includes, for example, an operator's hand, a paper medium, a document such as a book, and other three-dimensional objects. However, in the case of the system illustrated in FIG. 1A, the angle of view of the distance image sensor 102 and the visible light camera 103 does not include the operator's head around the table. Therefore, in the obtained distance image, the end of the image intersects with some part of the user's arm (portion beyond the shoulder).

プロジェクタ１０４は、操作面１０１の上面に画像の投影を行う。本システムでは、操作者は投影された画像に含まれるアイテム１０５に対して、タッチや空間ジェスチャによる操作を行う。上述したように、本実施形態では、手１０６の検出および操作の認識には、距離画像センサ１０２で取得した距離画像を用いる。距離画像を用いることで、プロジェクタ１０４の投影光の影響で操作者の手の色が変化しても影響を受けにくいという利点がある。本システムの表示装置は、プロジェクタ１０４に替えて、操作面１０１を液晶ディスプレイとするなどで構成することもできる。その場合、可視光画像からの肌色領域を検出するなどして画像から人の手を検出する方式を用いても、投影光の影響は受けずに手の検出が可能である。 The projector 104 projects an image on the upper surface of the operation surface 101. In this system, the operator performs an operation by touch or space gesture on the item 105 included in the projected image. As described above, in the present embodiment, the distance image acquired by the distance image sensor 102 is used for detecting the hand 106 and recognizing the operation. By using the distance image, there is an advantage that even if the color of the operator's hand changes due to the projection light of the projector 104, the distance image is hardly affected. The display device of the present system can be configured by using the operation surface 101 as a liquid crystal display instead of the projector 104. In that case, even if a method of detecting a human hand from an image by detecting a skin color region from a visible light image, the hand can be detected without being affected by the projection light.

なお、操作面１０１を上方から見た画像が得られる構成であれば、必ずしも距離画像センサ１０２及び可視光カメラ１０３自体が上方に設置されている必要はなく、例えばミラーを用いて反射光を撮像するように構成しても構わない。プロジェクタ１０４も同様に、図１（ａ）の例では、斜め上方から見下ろすように操作面１０１上への投影を行うが、異なる方向に向けて投影された投影光を、ミラーなどを利用して操作面１０１に反射させてもよい。操作面１０１が鉛直方向に沿って設置されたような場合も同様である。 If the image obtained by viewing the operation surface 101 from above is obtained, the distance image sensor 102 and the visible light camera 103 themselves do not necessarily have to be installed above. For example, reflected light is imaged using a mirror. You may comprise so that it may do. Similarly, in the example of FIG. 1A, the projector 104 projects onto the operation surface 101 so as to look down from diagonally above, but the projection light projected in different directions is used using a mirror or the like. It may be reflected on the operation surface 101. The same applies to the case where the operation surface 101 is installed along the vertical direction.

本実施形態では、操作面１０１上の三次元空間に図１に示すｘ、ｙ、ｚ軸を定義し、位置情報を扱う。図１（ａ）の例では、点１０７を座標軸の原点とする。ここでは一例として、テーブルの上面に平行な二次元がｘｙ平面、テーブル上面に直交し上方に伸びる方向をｚ軸の正方向としている。本実施形態では、ｚ軸方向は、世界座標系での高さ方向に相当する。しかしながら本実施形態は、ホワイトボードや壁面など、水平ではない面を操作面１０１とするシステムや、操作面が凹凸を有する場合や、ＭＲを利用して生成された仮想面である場合にも適用可能である。 In the present embodiment, the x, y, and z axes shown in FIG. 1 are defined in a three-dimensional space on the operation surface 101, and position information is handled. In the example of FIG. 1A, the point 107 is the origin of the coordinate axes. Here, as an example, the two dimensions parallel to the upper surface of the table are the xy plane, and the direction perpendicular to the table upper surface and extending upward is the positive direction of the z axis. In the present embodiment, the z-axis direction corresponds to the height direction in the world coordinate system. However, the present embodiment is also applied to a system in which a non-horizontal surface such as a whiteboard or a wall surface is used as the operation surface 101, a case where the operation surface has irregularities, or a virtual surface generated using MR. Is possible.

図１（ｂ）は、距離画像センサ１０２によって撮像される距離画像とシステムの関係を表す図である。本実施形態において、距離画像センサ１０２は、画角に含む所定の空間の三次元の位置情報が反映された距離画像と、距離画像とは異なる二次元赤外画像とをそれぞれ取得する機能を持つ。本実施形態では、距離画像センサ１０２は、距離画像センサ１０２から操作面１０１方向に赤外光を照射し、被写体の表面で反射された反射光を、センサの受光素子で受光する。 FIG. 1B is a diagram illustrating the relationship between the distance image captured by the distance image sensor 102 and the system. In the present embodiment, the distance image sensor 102 has a function of acquiring a distance image reflecting three-dimensional position information of a predetermined space included in the angle of view and a two-dimensional infrared image different from the distance image. . In the present embodiment, the distance image sensor 102 emits infrared light from the distance image sensor 102 toward the operation surface 101, and the reflected light reflected by the surface of the subject is received by the light receiving element of the sensor.

本実施形態で利用する距離画像は、受光された反射光の位相遅れを計測することで、画素毎に反射するまでにかかった時間に対応する被写体表面までの距離を、画素値に反映させたものである。本実施形態では、距離画像センサ１０２は操作面１０１を見下ろすように距離画像を撮像するため、距離画像の画素値が表す距離情報は、距離画像センサ１０２から操作面１０１を見下ろす方向の奥行き方向の距離である。言い換えれば、距離画像の各画素値は、操作面１０１からの高さ方向の位置情報（ｚ座標）を得るための情報を含む。ただし、本実施形態では、距離画像センサ１０２は、ｚ軸に対して斜めに角度をもつように設置されている。従って、本実施形態では距離画像の画素値をそのままｚ座標として利用するのではなく、後述するようにセンサや設置環境に応じたパラメータを使った座標変換を施すことで、画像内に定義された位置情報を世界座標のｘｙｚ座標に変換して利用する。なお、距離画像センサ１０２が用いる距離の算出方法は、赤外パターン投影方式や視差方式でもよい。 The distance image used in this embodiment reflects the distance to the subject surface corresponding to the time taken to reflect each pixel by measuring the phase delay of the received reflected light in the pixel value. Is. In the present embodiment, since the distance image sensor 102 captures a distance image so as to look down the operation surface 101, the distance information represented by the pixel value of the distance image is in the depth direction in the direction of looking down the operation surface 101 from the distance image sensor 102. Distance. In other words, each pixel value of the distance image includes information for obtaining position information (z coordinate) in the height direction from the operation surface 101. However, in the present embodiment, the distance image sensor 102 is installed so as to have an angle with respect to the z axis. Therefore, in this embodiment, the pixel value of the distance image is not used as it is as the z-coordinate, but is defined in the image by performing coordinate conversion using parameters according to the sensor and the installation environment as described later. The position information is used after being converted into xyz coordinates of world coordinates. The distance calculation method used by the distance image sensor 102 may be an infrared pattern projection method or a parallax method.

また、本実施形態で利用する二次元赤外画像は、センサの受光素子で受光された反射光の強度を、画素値として有するものである。なお、赤外光を距離画像センサ１０２から照射せずに、環境光に含まれる赤外光が、被写体表面で反射された反射光を受光することでも、同様の赤外画像は得られる。また二次元赤外画像は赤外画像以外でもよく、可視光カメラ１０３で撮像するカラー画像やグレースケール画像等を取得してもよい。 The two-dimensional infrared image used in the present embodiment has the intensity of reflected light received by the light receiving element of the sensor as a pixel value. A similar infrared image can also be obtained by receiving the reflected light reflected by the subject surface by the infrared light included in the ambient light without irradiating the infrared light from the distance image sensor 102. Further, the two-dimensional infrared image may be other than the infrared image, and a color image or a gray scale image captured by the visible light camera 103 may be acquired.

本実施形態では、距離画像センサ１０２から得られる二次元赤外画像と距離画像は、サイズが一致（例えば、６４０［ｄｏｔ］×４８０［ｄｏｔ］）一致しており、全ての画素が互いに対応する。平面上の座標が一致する画素には、同一被写体の同一の位置から反射された赤外光の反射時間（距離情報）あるいは強度が反映されている。なお、上記のような関係にある距離画像と二次元赤外画像が得られれば、両者の撮像手段は本実施形態の距離画像センサ１０２のように一体となったものである必要はない。すなわち、距離を含む三次元位置情報を得るセンサと、二次元の画像を撮像する画像センサを別に設置してもよい。 In the present embodiment, the two-dimensional infrared image and the distance image obtained from the distance image sensor 102 have the same size (for example, 640 [dot] × 480 [dot]), and all the pixels correspond to each other. . The reflection time (distance information) or intensity of infrared light reflected from the same position of the same subject is reflected in the pixels whose coordinates on the plane match. Note that if a distance image and a two-dimensional infrared image having the above relationship are obtained, it is not necessary for the image capturing means of the both to be integrated as in the distance image sensor 102 of the present embodiment. That is, a sensor that obtains three-dimensional position information including a distance and an image sensor that captures a two-dimensional image may be provided separately.

図１（ｂ）において、画像１０８は、距離画像センサ１０２によって撮像される距離画像の内容の一例を表す。ただし、ここでは、画素値に反映された距離情報は省略し、被写体のエッジのみを明示する。操作面１０１及び原点１０７は図１（ａ）に対応している。範囲１１１は、距離画像センサ１０２の画角に相当する。以下では、画像端１１１と称する。 In FIG. 1B, an image 108 represents an example of the content of a distance image captured by the distance image sensor 102. However, here, the distance information reflected in the pixel value is omitted, and only the edge of the subject is specified. The operation surface 101 and the origin 107 correspond to FIG. A range 111 corresponds to an angle of view of the distance image sensor 102. Hereinafter, the image edge 111 is referred to.

距離画像には、図１（ｂ）に示すようにｕ軸及びｖ軸による二次元座標系が設定される。なお図１（ｂ）の例では、距離画像の解像度は６４０［ｄｏｔ］×４８０［ｄｏｔ］とする。操作位置１１０の距離画像内の位置座標が（ｕ，ｖ）であり、距離画像センサ１０２から操作位置までの距離に相当する画素値がｄであるとする。本実施形態では、このように距離画像内で定義される位置情報に対して、距離画像センサ１０２のレンズ特性および操作面１０１との相対位置関係等に基づく座標変換を施す。これにより、各画素の座標をテーブルに定義された実世界上の座標系にマッピングし、操作位置１１０について、実空間の三次元位置（ｘ，ｙ，ｚ）を取得することができる。座標変換に利用する変換行列は、距離画像センサ１０２が設置された時に、予め調整作業を行い取得する。なお、三次元座標の算出に用いる距離ｄは、単一の画素値だけでなく、手領域内で操作位置１１０の近傍の数ピクセル分の画素を対象に、ノイズ除去処理や平均化処理を実施した上で特定してもよい。 As shown in FIG. 1B, a two-dimensional coordinate system based on the u axis and the v axis is set in the distance image. In the example of FIG. 1B, the resolution of the distance image is 640 [dot] × 480 [dot]. Assume that the position coordinates in the distance image of the operation position 110 are (u, v), and the pixel value corresponding to the distance from the distance image sensor 102 to the operation position is d. In the present embodiment, coordinate conversion based on the lens characteristics of the distance image sensor 102 and the relative positional relationship with the operation surface 101 is performed on the position information defined in the distance image in this way. Thereby, the coordinate of each pixel is mapped to the coordinate system in the real world defined in the table, and the three-dimensional position (x, y, z) in the real space can be acquired for the operation position 110. The transformation matrix used for coordinate transformation is acquired by performing adjustment work in advance when the distance image sensor 102 is installed. Note that the distance d used for calculating the three-dimensional coordinates is not limited to a single pixel value, but a noise removal process and an averaging process are performed on pixels corresponding to several pixels near the operation position 110 in the hand region. And may be specified.

斜線で示す領域１０９は、距離画像１０８に写っている操作者の手１０６の像である（以下では単に手領域１０９という）。本実施形態では、ｚ座標に閾値処理を行うことで、テーブル表面である操作面１０１より高い位置に存在する被写体が写る領域を、手領域として抽出する。本実施形態では、検出される手のそれぞれに対して１箇所の操作位置を検出する。操作位置とは、操作者が手指を使って指し示していると推定される位置の座標である。本実施形態では、前提として、操作者がタッチ操作のために点を指定する場合には、１本だけ指を伸ばした「指さしポーズ」を取ることが規定される。１本指を延ばすポーズが、多くの人にとっては１点を指し示すのに自然な体勢だからである。従って、操作位置としては、手領域１０９のうち端部だと推定される位置を特定する。指差しポーズであれば、端部は指先に当たる。具体的には、距離画像から、手領域１０９を抽出し、手領域１０９のうち画像端１１１から最も遠い位置に存在する画素を示す座標を、指先にあたる１点とみなす。図１（ｂ）に示される手領域１０９の場合、画像端１１１から最も遠い点が操作位置１１０として特定される。なお、操作位置の特定方法はこれに限らず、例えば、手領域から手の五指を検出して、所定の指の端部を特定してもよい。 A hatched area 109 is an image of the operator's hand 106 in the distance image 108 (hereinafter simply referred to as the hand area 109). In the present embodiment, by performing threshold processing on the z-coordinate, an area in which a subject existing at a position higher than the operation surface 101 that is the table surface is captured as a hand area. In this embodiment, one operation position is detected for each detected hand. The operation position is a coordinate of a position estimated that the operator is pointing with a finger. In the present embodiment, as a premise, when the operator designates a point for a touch operation, it is prescribed to take a “pointing pose” with one finger extended. This is because a pose with one finger extended is a natural posture for many people to point to one point. Therefore, as the operation position, a position estimated to be an end portion of the hand region 109 is specified. In a pointing pose, the edge hits the fingertip. Specifically, the hand region 109 is extracted from the distance image, and the coordinate indicating the pixel located farthest from the image end 111 in the hand region 109 is regarded as one point corresponding to the fingertip. In the case of the hand region 109 shown in FIG. 1B, the point farthest from the image end 111 is specified as the operation position 110. Note that the method for specifying the operation position is not limited to this, and for example, the five fingers of the hand may be detected from the hand region to specify the end of the predetermined finger.

また本実施形態では、フレームレートに従い繰り返し撮像される距離画像の各フレームで、手領域１０９が画像端１１１と交差する部分の中央を、手の侵入位置として定義する。図１（ｂ）の手領域１０９の場合、侵入位置１１２が特定される。侵入位置１１２もまた、距離画像内に定義された位置情報を変換することで、実空間内での三次元位置情報として取得される。本実施形態では、距離画像から複数の手領域が検出された場合、そのそれぞれについて、操作位置及び侵入位置が特定される。そして、複数の手領域のそれぞれについてタッチ操作を認識する。ただし、指先が操作面１０１に近い高さに存在する場合、テーブルと指先の画素値の違いは極小さい。従って、分解能の大きさや検出誤差によっては、上記の閾値処理によって正確に指とテーブルを区別することは難しくなる。例えば、閾値が実際のテーブル面よりも高く設定され、指の先端部分が閾値を下回る部分であるとみなされると、特定される指先の位置は、実際には指の先ではなく、中間部あるいは根元となってしまうことが有り得る。この検出結果をそのまま、操作位置として利用した場合、ユーザが意図して指示している操作位置と、検出される操作位置がずれることになる。従って、ユーザが選びたいアイテムとは異なるアイテムが選択状態になってしまうなど、操作性を低下させる原因になり得る。 In the present embodiment, the center of the portion where the hand region 109 intersects the image end 111 in each frame of the distance image repeatedly captured according to the frame rate is defined as the hand entry position. In the case of the hand region 109 in FIG. 1B, the intrusion position 112 is specified. The intrusion position 112 is also acquired as three-dimensional position information in real space by converting the position information defined in the distance image. In the present embodiment, when a plurality of hand regions are detected from the distance image, the operation position and the intrusion position are specified for each of them. Then, the touch operation is recognized for each of the plurality of hand regions. However, when the fingertip exists at a height close to the operation surface 101, the difference between the pixel values of the table and the fingertip is extremely small. Therefore, depending on the size of the resolution and the detection error, it is difficult to accurately distinguish between the finger and the table by the above threshold processing. For example, if the threshold is set higher than the actual table surface and the tip of the finger is considered to be a portion below the threshold, the specified fingertip position is not actually the fingertip, It can be the root. When this detection result is used as an operation position as it is, the operation position that the user intends to instruct and the detected operation position are deviated. Therefore, an item different from the item that the user wants to select may be in a selected state, which may cause a decrease in operability.

そこで本実施形態では、距離画像から得る三次元の位置情報だけでなく、別に取得する二次元の画像を利用して、操作位置の操作面方向の位置情報を検出する。より具体的には、操作位置の三次元の位置情報のうち、操作面に平行な方向の位置情報を特定するために、距離画像と二次元の赤外画像の両方から、操作体の端部とみなされる点とその位置情報を検出する。そして、いずれの位置を操作で指示された位置と認識するかを、状況に応じて選択する。 Therefore, in the present embodiment, not only the three-dimensional position information obtained from the distance image but also the position information of the operation position in the operation surface direction is detected using a separately acquired two-dimensional image. More specifically, in order to identify position information in a direction parallel to the operation surface among the three-dimensional position information of the operation position, from both the distance image and the two-dimensional infrared image, the end of the operation body To detect the point and its position information. Then, which position is recognized as the position instructed by the operation is selected according to the situation.

以下、本明細書では操作者がタッチ操作の入力に用いる操作体及びその端部の一例として、操作者の手１０６及びその指が利用されること想定する。ただし、本実施形態では操作体として、手指だけでなくスタイラスやロボットアームなどの器具を利用する場合にも適用可能である。なお操作体の端部とは、タッチ操作のために操作位置を指し示すのに用いられる部位を示すが、操作体の一部に属し、タッチ操作の入力が可能な部位であれば、突起形状の端部に限定せずともよい。 Hereinafter, in this specification, it is assumed that the operator's hand 106 and its finger are used as an example of an operation body and an end portion thereof used by the operator for input of a touch operation. However, in the present embodiment, the present invention can be applied not only to fingers but also instruments such as a stylus and a robot arm as the operating body. Note that the end of the operating body refers to a part used to point to the operation position for the touch operation. However, if the part belongs to a part of the operating body and can be used for input of the touch operation, the end of the operating body has a protruding shape. It does not need to be limited to the end.

図２（ａ）は、本実施形態に係る情報処理装置１００を含むテーブルトップインタフェースのハードウェア構成図である。中央処理ユニット（ＣＰＵ）２００は、ＲＡＭ２０２をワークメモリとして、ＲＯＭ２０１や記憶装置２０３に格納されたＯＳやプログラムを実行して、各種処理の演算や論理判断などを行い、システムバス２０４に接続された各構成を制御する。記憶装置２０３は、ハードディスクドライブや各種インタフェースによって接続された外部記憶装置などであり、実施形態の操作認識処理にかかるプログラムや各種データを記憶する。距離画像センサ１０２は、ＣＰＵ２００の制御に従い、アイテムが表示されるテーブルとアイテムを操作する操作者の手を含む、操作面１０１上の空間の距離画像を撮像し、撮影した距離画像をシステムバス２０４に出力する。本実施形態では、距離画像の取得方法として、環境光やテーブル面の表示の影響が小さい反射時間方式（Time-of-Flight方式）を基に説明するが、用途に応じて視差方式や赤外パターン方式などを利用することも可能である。プロジェクタ１０４は、ＣＰＵ２００の制御に従い、テーブルに操作対象となる画像アイテムを投影表示する。 FIG. 2A is a hardware configuration diagram of a tabletop interface including the information processing apparatus 100 according to the present embodiment. The central processing unit (CPU) 200 is connected to the system bus 204 by using the RAM 202 as a work memory, executing an OS and a program stored in the ROM 201 and the storage device 203, performing various processing calculations and logical determinations, and the like. Control each configuration. The storage device 203 is a hard disk drive, an external storage device connected by various interfaces, and the like, and stores programs and various data related to the operation recognition processing of the embodiment. The distance image sensor 102, under the control of the CPU 200, captures a distance image of a space on the operation surface 101 including a table on which the item is displayed and an operator's hand operating the item, and the captured distance image is a system bus 204. Output to. In the present embodiment, the distance image acquisition method will be described based on a reflection time method (Time-of-Flight method) that is less affected by ambient light or table surface display. It is also possible to use a pattern method or the like. The projector 104 projects and displays an image item to be operated on the table according to the control of the CPU 200.

なお上述したシステムでは、可視光カメラ１０３、距離画像センサ１０２、プロジェクタ１０４はそれぞれ情報処理装置１００に入出力用のインタフェースを介して接続された外部装置であり、情報処理装置１００と協同して情報処理システムを構成する。ただし、これらのデバイスは、情報処理装置１００に一体化されていても構わない
図２（ｂ）は、情報処理装置１００のソフトウェアの構成を示すブロック図の一例である。これらの各機能部は、ＣＰＵ２００が、ＲＯＭ２０１に格納されたプログラムをＲＡＭ２０２に展開し、後述する各フローチャートに従った処理を実行することで実現されている。そして、各処理の実行結果をＲＡＭ２０２に保持する。また例えば、ＣＰＵ２００を用いたソフトウェア処理の代替としてハードウェアを構成する場合には、ここで説明する各機能部の処理に対応させた演算部や回路を構成すればよい。 In the above-described system, the visible light camera 103, the distance image sensor 102, and the projector 104 are external devices connected to the information processing apparatus 100 via an input / output interface, respectively. Configure the processing system. However, these devices may be integrated into the information processing apparatus 100. FIG. 2B is an example of a block diagram illustrating a software configuration of the information processing apparatus 100. Each of these functional units is realized by the CPU 200 developing a program stored in the ROM 201 in the RAM 202 and executing processing according to each flowchart described later. The execution result of each process is held in the RAM 202. Further, for example, when hardware is configured as an alternative to software processing using the CPU 200, arithmetic units and circuits corresponding to the processing of each functional unit described here may be configured.

三次元情報取得部２１０は、距離画像センサ１０２によって撮像された距離画像をフレームレートに従う一定時間毎に取得し、ＲＡＭ２０２に随時保持する。なお三次元情報取得部２１０が取得し、各機能部とやりとりする対象は、実際には画像データに対応する信号であるが、本明細書では単に「距離画像を取得する」として説明する。 The three-dimensional information acquisition unit 210 acquires a distance image captured by the distance image sensor 102 at regular intervals according to the frame rate, and stores the acquired distance image in the RAM 202 as needed. The target acquired by the three-dimensional information acquisition unit 210 and exchanged with each functional unit is actually a signal corresponding to the image data, but will be described as simply “acquiring a distance image” in the present specification.

操作体取得部２１１は、三次元情報取得部２１０によって取得された距離画像の各画素について、閾値判定やノイズ低減処理を施し、距離画像中の手領域を抽出する。手領域とは、入力された距離画像のうち、操作者が操作体として利用する手が被写体として写っている画素群である。第１検出部２１２は、操作体取得部２１１によって抽出された手領域の輪郭情報に基づき、操作体の端部に当たる１点を、第１の点として特定し、座標値を取得してＲＡＭ２０２に保持する。第１の点については、距離画像に基づいて三次元の座標情報が得られる。本実施形態では手領域の輪郭のうち画像端１１１から最も遠くに存在する画素点を示す座標を、第１の点として検出する。この際、本実施形態の第１検出部２１２は、手領域と画像端１１１が交差する部分の中心が、手領域の侵入位置として利用される。 The operating tool acquisition unit 211 performs threshold determination and noise reduction processing on each pixel of the distance image acquired by the three-dimensional information acquisition unit 210, and extracts a hand region in the distance image. The hand region is a pixel group in which a hand used as an operating body by an operator is captured as a subject in the input distance image. The first detection unit 212 identifies one point corresponding to the end of the operation tool as the first point based on the contour information of the hand region extracted by the operation tool acquisition unit 211, acquires a coordinate value, and stores the coordinate value in the RAM 202. Hold. For the first point, three-dimensional coordinate information is obtained based on the distance image. In the present embodiment, coordinates indicating a pixel point that is farthest from the image end 111 in the contour of the hand region are detected as the first point. At this time, in the first detection unit 212 of the present embodiment, the center of the portion where the hand region and the image end 111 intersect is used as the intrusion position of the hand region.

二次元画像取得部２１３は、距離画像センサによって撮像された二次元赤外画像をフレームレートに従う一定時間毎に取得し、入力画像としてＲＡＭ２０２に随時保持する。なお実際には画像データに対応する信号に対する処理を行われるが、本明細書では単に「二次元赤外画像を取得する」として説明する。本実施形態では、入力画像として利用される二次元赤外画像は、距離画像と同一の撮像手段によって撮像されるが、両者は異なる画像である。領域設定部２１４は、入力画像の中に、第１検出部２１２によって検出された第１の点に相当する画素を含む注目領域を設定する。本実施形態では、指先の方向に対応するベクトルを生成し、ベクトルの伸びる先を含むような注目領域を設定する。第２検出部２１５は、入力画像のうち、領域設定部２１４によって設定された注目領域に含まれる領域に写る被写体の形状に基づいて、操作体の端部に当たる１点を、第２の点として特定し、座標値を取得してＲＡＭ２０２に保持する。ただし、二次元赤外画像から得られる座標は、二次元である。本実施形態では、第１の点の高さ（ｚ座標）によってタッチ操作の入力中の状態を検出したときに、第２の点の二次元座標（ｘ、ｙ座標）を使って指示されている位置情報を認識するといった処理を行う。従って、第２の点について高さ情報は必ずしも必要ない。ただし、三次元方向の位置情報を利用する必要がある場合は、図１（ｂ）で説明したように距離画像のうち第２の点の相当する画素の画素値を補正するなどして、高さ方向の座標（ｚ座標）を算出するなどしてもよい。 The two-dimensional image acquisition unit 213 acquires a two-dimensional infrared image captured by the distance image sensor at regular intervals according to the frame rate, and holds the input image in the RAM 202 as needed. In practice, a process corresponding to a signal corresponding to image data is performed, but in the present specification, it is simply described as “acquiring a two-dimensional infrared image”. In the present embodiment, the two-dimensional infrared image used as the input image is captured by the same imaging unit as the distance image, but they are different images. The region setting unit 214 sets a region of interest including a pixel corresponding to the first point detected by the first detection unit 212 in the input image. In the present embodiment, a vector corresponding to the direction of the fingertip is generated, and a region of interest including the tip where the vector extends is set. Based on the shape of the subject in the region included in the region of interest set by the region setting unit 214 in the input image, the second detection unit 215 sets one point corresponding to the end of the operating body as the second point. The coordinate value is acquired and stored in the RAM 202. However, the coordinates obtained from the two-dimensional infrared image are two-dimensional. In the present embodiment, when the input state of the touch operation is detected based on the height (z coordinate) of the first point, it is instructed using the two-dimensional coordinates (x, y coordinates) of the second point. The process of recognizing existing position information is performed. Therefore, height information is not necessarily required for the second point. However, when it is necessary to use position information in the three-dimensional direction, the pixel value of the pixel corresponding to the second point in the distance image is corrected as described with reference to FIG. The vertical coordinate (z coordinate) may be calculated.

認識部２１６は、操作体の端部によって入力される操作を認識する。本実施形態では、操作体の端部の位置を操作位置と言う。操作位置は、第１の点あるいは第２の点の位置情報に基づいて決定される。本実施形態では、第１の点を優先的に操作位置として決定する。そして、距離画像から検出された操作位置の、操作面１０１に対する近接の度合いに応じて、情報処理装置１００が操作体によるタッチ入力中の状態かを判定する。例えば、第１の点と操作面１０１の間の距離（高さに相当するｚ座標）が所定の閾値より小さい場合は、タッチ入力中と判定する。そしてタッチ入力中の状態で、操作体によって指示された操作面１０１上の位置を追跡することで、タッチ、リリース、ムーブ、フリック、ピンチ、ローテート等のタッチイベントを生成し、表示制御部２１７に通知する。ただし、操作体によって指示された操作面１０１上の位置としては、第１の点あるいは第２の点の、操作面に平行な二次元の位置情報（ｘｙ座標）のいずれかを選択して利用する。またさらに、認識部２１６は、操作体領域の形状を判定する。例えば、指の数や指の方向や手の方向等を判定する。表示制御部２１７は、認識部２１６により通知されるユーザ操作に応じて実行される各処理の結果を反映した描画データを生成し、プロジェクタ１０４に出力することで、操作面に表示する内容を制御する。ユーザ操作とは、操作位置や操作位置の動きや操作体領域の形状（ポーズ）等によって定義され、情報処理装置１００に対して何らかの指示を入力することができるものである。 The recognition unit 216 recognizes an operation input by the end of the operation body. In the present embodiment, the position of the end of the operating body is referred to as the operation position. The operation position is determined based on the position information of the first point or the second point. In the present embodiment, the first point is preferentially determined as the operation position. Then, according to the degree of proximity of the operation position detected from the distance image to the operation surface 101, it is determined whether the information processing apparatus 100 is in a touch input state by the operation tool. For example, when the distance (z coordinate corresponding to the height) between the first point and the operation surface 101 is smaller than a predetermined threshold, it is determined that touch input is being performed. Then, by tracking the position on the operation surface 101 instructed by the operation body while the touch input is being performed, a touch event such as touch, release, move, flick, pinch, and rotate is generated, and the display control unit 217 receives the touch event. Notice. However, as the position on the operation surface 101 designated by the operation body, either one of the two-dimensional position information (xy coordinates) parallel to the operation surface of the first point or the second point is selected and used. To do. Furthermore, the recognition unit 216 determines the shape of the operation tool region. For example, the number of fingers, finger direction, hand direction, and the like are determined. The display control unit 217 generates drawing data reflecting the result of each process executed in response to a user operation notified by the recognition unit 216 and outputs the drawing data to the projector 104 to control the content displayed on the operation surface. To do. The user operation is defined by the operation position, the movement of the operation position, the shape (pose) of the operation body region, and the like, and can input some instruction to the information processing apparatus 100.

図３は、本実施形態における情報処理装置のメイン処理の流れを表すフローチャートである。本実施形態では、情報処理装置１００の電源がＯＮになると、まず、図３のステップＳ３０１において、表示制御部２１７が、記憶装置２０３から取得したデータに基づいてＵＩ部品を配置した画面を生成し、プロジェクタ１０４へ出力する。そして、プロジェクタ１０４が操作面１０１の上面に画面を投影する。 FIG. 3 is a flowchart showing the flow of main processing of the information processing apparatus according to this embodiment. In this embodiment, when the information processing apparatus 100 is turned on, first, in step S301 of FIG. 3, the display control unit 217 generates a screen on which UI components are arranged based on the data acquired from the storage device 203. And output to the projector 104. Then, the projector 104 projects a screen on the upper surface of the operation surface 101.

次に、ステップＳ３０２において、三次元情報取得部２１０が、距離画像センサ１０２から距離画像を取得する。また、二次元赤外画像取得部２１３により取得された二次元赤外画像と距離画像を取得する。ちなみに、距離画像センサ１０２は、情報処理装置の電源がＯＮになっている間、操作面１０１の上面の操作領域の二次元赤外画像と距離画像を、所定の時間間隔で撮像する。 Next, in step S <b> 302, the three-dimensional information acquisition unit 210 acquires a distance image from the distance image sensor 102. Further, the two-dimensional infrared image and the distance image acquired by the two-dimensional infrared image acquisition unit 213 are acquired. Incidentally, the distance image sensor 102 captures a two-dimensional infrared image and a distance image of the operation area on the upper surface of the operation surface 101 at predetermined time intervals while the information processing apparatus is powered on.

ステップＳ３０３において、操作体取得部２１１が所定の操作体（ユーザの手や腕）を検出する。操作体が検出された場合（Ｓ３０３、ＹＥＳ）は、ステップＳ３０４に進む。一方、操作体が検出されなかった場合（Ｓ３０３、ＮＯ）は、ステップＳ３０２に処理が戻り、操作体が検出されるまでステップＳ３０２とＳ３０３の処理を繰り返す。 In step S303, the operating tool acquisition unit 211 detects a predetermined operating tool (a user's hand or arm). When the operating tool is detected (S303, YES), the process proceeds to step S304. On the other hand, when the operating tool is not detected (S303, NO), the process returns to step S302, and the processes of steps S302 and S303 are repeated until the operating tool is detected.

ステップＳ３０４において、操作位置の決定処理が実行される。ステップＳ３０４の内容は、図４のフローチャートを参照して後述する。 In step S304, an operation position determination process is executed. The contents of step S304 will be described later with reference to the flowchart of FIG.

ステップＳ３０５において、認識部２１６が、ステップＳ３０４で決定された操作位置に基づいて、情報処理装置１００に入力された操作を認識する。例えば、操作位置の操作面に対する近接の程度を表す高さ（ｚ座標）が、所定の閾値を下回る場合には、タッチ操作を認識する。そして、認識結果を表す情報を表示制御部２１７に通知する。このとき、操作体がタッチによって指示する操作面上の位置（ｘｙ座標）は、ステップＳ３０４によって決定された操作位置のｘｙ座標である。そしてステップＳ３０６において、表示制御部２１７が、認識された操作に応じて、プロジェクタ１０４によって投影されている画面中のオブジェクト（画像、データ、ＵＩ等）の表示を更新する。ステップＳ３０７において、情報処理装置の電源がＯＦＦにされたか否かを判定する。電源がＯＦＦされた場合（Ｓ３０６、ＹＥＳ）は、情報処理装置１００は全ての処理を終了する。一方、電源がＯＦＦにされていない場合（Ｓ３０６、ＮＯ）は、ステップＳ３０２の処理へ戻る。 In step S305, the recognition unit 216 recognizes the operation input to the information processing apparatus 100 based on the operation position determined in step S304. For example, when the height (z coordinate) representing the degree of proximity of the operation position to the operation surface is below a predetermined threshold, the touch operation is recognized. Then, the information indicating the recognition result is notified to the display control unit 217. At this time, the position (xy coordinates) on the operation surface indicated by the operation body by touching is the xy coordinates of the operation position determined in step S304. In step S306, the display control unit 217 updates the display of objects (images, data, UI, etc.) on the screen projected by the projector 104 in accordance with the recognized operation. In step S307, it is determined whether or not the information processing apparatus has been turned off. When the power is turned off (S306, YES), the information processing apparatus 100 ends all the processes. On the other hand, if the power is not turned off (S306, NO), the process returns to step S302.

図４は、本実施形態における操作位置の決定処理（Ｓ３０４）を表すフローチャートである。まず、ステップＳ４０１において、第１検出部２１２が、操作体取得部２１１が取得した手領域に基づいて、第１の点を、操作位置として検出する。本実施形態では、手領域と画像端１１１の交差部分の中心を侵入位置とし、手領域に含まれる画素のうちもっとも侵入位置から遠い画素を、距離画像における指先位置とみなす第１の点として検出する。ここで、検出される第１の点の位置情報は距離画像に基づいて得られる三次元の位置情報（ｘ，ｙ，ｚ座標）である。従って、ステップＳ４０１において、操作位置の位置情報として三次元の位置情報（ｘ，ｙ，ｚ座標）が得られる。 FIG. 4 is a flowchart showing the operation position determination process (S304) in the present embodiment. First, in step S401, the first detection unit 212 detects the first point as the operation position based on the hand region acquired by the operating tool acquisition unit 211. In the present embodiment, the center of the intersection of the hand area and the image edge 111 is set as the intrusion position, and the pixel farthest from the intrusion position among the pixels included in the hand area is detected as the first point regarded as the fingertip position in the distance image. To do. Here, the detected position information of the first point is three-dimensional position information (x, y, z coordinates) obtained based on the distance image. Accordingly, in step S401, three-dimensional position information (x, y, z coordinates) is obtained as the position information of the operation position.

ステップＳ４０２では、認識部２１６が、ステップＳ４０１で取得された操作位置の高さ（ｚ座標）と、所定の高さを表す閾値Ｔ１を比較し、閾値Ｔ１以下かを判定する。ここで、閾値Ｔ１は、第１の点が、指先の位置を正確に検出したものである妥当性を判断するための閾値である。上述した通り、指先が操作面１０１に近づいている、すなわち指先の操作面１０１からの高さが低い場合、センサの分解能の大きさや検出誤差によって、手領域の輪郭が正確ではない可能性が比較的高まる。従って、ステップＳ４０２では、距離画像とは異なる手段を使って指先の位置の精度を向上させる処理を行う必要の有無を判定している。本実施形態では、操作位置の高さが閾値Ｔ１以下である場合を、距離画像とは異なる手段を使って指先の位置の精度を向上させる処理を行う必要がある場合とみなす。操作位置の高さが閾値Ｔ１以下である場合（Ｓ４０２、ＹＥＳ）は、ステップＳ４０３以降の処理で、二次元赤外画像を使ってさらに操作位置の位置情報を精度よく決定する。操作位置の高さが閾値Ｔ１より大きい場合（Ｓ４０２、ＮＯ）は、図４の処理を終了し、Ｓ３０５へ進む。 In step S402, the recognition unit 216 compares the height (z coordinate) of the operation position acquired in step S401 with a threshold T1 representing a predetermined height, and determines whether it is equal to or less than the threshold T1. Here, the threshold value T1 is a threshold value for determining the validity that the first point is obtained by accurately detecting the position of the fingertip. As described above, when the fingertip is approaching the operation surface 101, that is, when the height of the fingertip from the operation surface 101 is low, the possibility that the contour of the hand region is not accurate due to the size of the resolution of the sensor and the detection error is compared. Increase. Therefore, in step S402, it is determined whether or not it is necessary to perform processing for improving the accuracy of the fingertip position using a means different from the distance image. In the present embodiment, the case where the height of the operation position is equal to or less than the threshold value T1 is regarded as the case where it is necessary to perform processing for improving the accuracy of the fingertip position using means different from the distance image. When the height of the operation position is equal to or less than the threshold T1 (S402, YES), the position information of the operation position is further accurately determined using the two-dimensional infrared image in the processing after step S403. If the height of the operation position is greater than the threshold value T1 (S402, NO), the process of FIG. 4 is terminated and the process proceeds to S305.

次に、ステップＳ４０３において、第２検出部２１５が、距離画像に基づいて指先の方向を表す情報を取得する。指先方向とは、手の中心位置を始点とし、操作位置を終点とする方向とする。なお、手の中心位置とは、手領域の最も広い場所の中心点とする。本実施形態は、指先方向をベクトルとして求める。ステップＳ４０４において、領域設定部２１４が、二次元赤外画像内で操作位置を始点とし、指先方向ベクトルを含む領域を、注目領域として決定する。ステップＳ４０５において第２検出部２１５が、二次元赤外画像の注目領域内で、第２の点を検出する。具体的には、注目領域内でエッジ抽出を行い、エッジと注目領域の境界により囲まれた領域で、かつ操作位置を含む領域を、指先領域として検出する。検出された指先領域の端部を、指先に相当するとみなすことができる第２の点とする。ここで検出される位置情報は、操作面１０１に平行な二次元の位置情報（ｘｙ座標）である。なお、エッジが途切れる場合があるので、途切れたエッジ間をつなぐ補完処理を行うことが好ましい。ただし、指先領域を検出する方法は、色情報を用いた背景差分や移動領域検出でもよい。 Next, in step S403, the second detection unit 215 acquires information indicating the direction of the fingertip based on the distance image. The fingertip direction is a direction starting from the center position of the hand and ending at the operation position. The center position of the hand is the center point of the widest area of the hand area. In the present embodiment, the fingertip direction is obtained as a vector. In step S <b> 404, the region setting unit 214 determines a region including the fingertip direction vector as the attention region from the operation position in the two-dimensional infrared image. In step S405, the second detection unit 215 detects the second point within the attention area of the two-dimensional infrared image. Specifically, edge extraction is performed within the attention area, and an area surrounded by the boundary between the edge and the attention area and including the operation position is detected as a fingertip area. Let the edge part of the detected fingertip area | region be the 2nd point which can be considered that it corresponds to a fingertip. The position information detected here is two-dimensional position information (xy coordinates) parallel to the operation surface 101. Since edges may be interrupted, it is preferable to perform a complementing process that connects between the disconnected edges. However, the method of detecting the fingertip region may be background difference using color information or moving region detection.

ステップＳ４０６において、認識部２１６が、第２の点が所定の条件を満たすかを判定する。ここで所定の条件とは、ステップＳ４０１で第１の点に基づいて定義された操作位置の三次元位置情報のうち、操作面１０１に平行な二次元の座標情報が、妥当であるか、あるいは、第２の点の位置情報で補正すべきかを判定するための条件である。例えば、距離画像から第１の点を検出するときに、手領域の操作面１０１に近い部分の輪郭形状が正確に得られていなかった場合、第１の点のｘｙ座標として得られた値は、実際の指先とはずれている可能性がある。一方で、第２の点のｘｙ座標は、距離画像とは別の二次元赤外画像から得られた値であり、上記のような問題が発生している可能性は低い。ステップＳ４０６では、操作面によって操作面上で指示された位置として、第１の点と第２の点のいずれの位置を採用すべきかを判定している。本実施形態では、所定の条件として、第２の点の指先方向ベクトルからの距離が、基準値より小さい場合は、所定の条件を満たすとする。この判定結果は、第２の点が指先の位置方法で補正するのが妥当であることを意味する。一方、所定の条件を満たさない場合は、第２の点と第１の点のかい離が大きいため、第２の点の方が妥当であるとは言い切れない場合を意味する。第２の点の指先方向ベクトルからの距離が近く、所定の条件が満たされる場合は（Ｓ４０６、ＹＥＳ）は、Ｓ４０７へ進む。一方、第２の点の指先方向ベクトルからの距離が遠く、所定の条件が満たされない場合は（Ｓ４０６、ＮＯ）は、図４のフローチャートの処理を終了する。 In step S406, the recognition unit 216 determines whether the second point satisfies a predetermined condition. Here, the predetermined condition is that two-dimensional coordinate information parallel to the operation surface 101 is valid among the three-dimensional position information of the operation position defined based on the first point in step S401, or This is a condition for determining whether correction is to be made with the position information of the second point. For example, when the first point is detected from the distance image and the contour shape of the portion close to the operation surface 101 of the hand region is not accurately obtained, the value obtained as the xy coordinates of the first point is The actual fingertip may be off. On the other hand, the xy coordinates of the second point are values obtained from a two-dimensional infrared image different from the distance image, and it is unlikely that the above problem has occurred. In step S406, it is determined which position of the first point or the second point should be adopted as the position designated on the operation surface by the operation surface. In the present embodiment, as the predetermined condition, it is assumed that the predetermined condition is satisfied when the distance from the fingertip direction vector of the second point is smaller than the reference value. This determination result means that it is appropriate to correct the second point by the fingertip position method. On the other hand, when the predetermined condition is not satisfied, it means that the second point cannot be said to be more appropriate because the second point and the first point are largely separated. When the distance from the fingertip direction vector of the second point is close and the predetermined condition is satisfied (S406, YES), the process proceeds to S407. On the other hand, when the distance from the fingertip direction vector of the second point is far and the predetermined condition is not satisfied (S406, NO), the processing of the flowchart of FIG.

ステップＳ４０７において、第１検出部２１２が、Ｓ４０４で検出した第２の点の位置で、操作位置の三次元位置情報のうちｘｙ座標を、第２の点のｘｙ座標で補正する。そして、図４のフローチャートの処理を終了する。このように、本実施形態では、ステップＳ４０７の処理が実行された場合、ステップＳ３０５の操作認識処理が実行される際には、第２の点のｘｙ座標が、指先によって指示される操作面１０１上の位置として認識される。一方で、ステップＳ４０７の処理が実行されなかった場合、ステップＳ３０５の操作認識処理が実行される際には、第１の点のｘｙ座標が、指先によって指示される操作面１０１上の位置として認識される。 In step S407, the first detection unit 212 corrects the xy coordinates in the three-dimensional position information of the operation position with the xy coordinates of the second point at the position of the second point detected in S404. Then, the process of the flowchart of FIG. 4 ends. As described above, in the present embodiment, when the process of step S407 is executed, when the operation recognition process of step S305 is executed, the xy coordinate of the second point is the operation surface 101 indicated by the fingertip. Recognized as the upper position. On the other hand, when the process of step S407 is not executed, when the operation recognition process of step S305 is executed, the xy coordinate of the first point is recognized as the position on the operation surface 101 indicated by the fingertip. Is done.

ここで、上述した閾値Ｔ１が、指先が操作面１０１にタッチしているとみなされるかを判定するための閾値Ｔ２より小さい値として設定されている場合、操作位置のｚ座標は、ステップＳ４０７の処理後も第１の点に関して検出されたｚ座標が維持して構わない。なぜなら、ステップＳ４０７は、第１の点のｚ座標が、閾値Ｔ１以下である場合にのみ実行されているため、ｚ座標が閾値Ｔ２を下回ることは自明であるからである。しかしながら、タッチ閾値Ｔ２と比較する目的に限らず、第２の点についてもｚ座標を得る必要がある場合は、距離画像において第２の点のｘｙ座標に対応する画素を参照し画素値を使ってｚ座標を算出すればよい。この場合、第１の点の三次元位置情報か、第２の点の三次元位置情報のいずれかが選択的に、操作位置として認識されることになる。 Here, when the threshold value T1 described above is set as a value smaller than the threshold value T2 for determining whether the fingertip is considered to be touching the operation surface 101, the z-coordinate of the operation position is set in step S407. The z coordinate detected for the first point may be maintained after processing. This is because step S407 is executed only when the z coordinate of the first point is equal to or less than the threshold value T1, and therefore it is obvious that the z coordinate falls below the threshold value T2. However, not only for the purpose of comparison with the touch threshold value T2, but also when the z-coordinate needs to be obtained for the second point, the pixel value is referenced with reference to the pixel corresponding to the xy-coordinate of the second point in the distance image. The z coordinate may be calculated. In this case, either the three-dimensional position information of the first point or the three-dimensional position information of the second point is selectively recognized as the operation position.

次に、図５から図８を参照して、ユーザの手の状態の具体例に対応付けて、本実施形態の情報処理装置１００による動作を説明する。図５〜図８は、ユーザが操作面１０１の上空に手を差し伸べてから、指先を操作面１０１に接触させてタッチ操作を入力する様子を段階的に説明する。各段階において特徴的な処理が行われるステップの番号を括弧書きで示す。 Next, operations performed by the information processing apparatus 100 according to the present embodiment will be described with reference to FIGS. 5 to 8 in association with specific examples of the user's hand state. FIGS. 5 to 8 explain step by step how the user touches the operation surface 101 and inputs a touch operation after reaching out over the operation surface 101. The number of the step in which characteristic processing is performed in each stage is shown in parentheses.

まず、ユーザの指が操作面１０１から離れた位置にある状態を、図５（ａ）（ｂ）を用いて説明する。図５（ａ）は時刻ｔ１の距離画像の一部を示し、図５（ｂ）は時刻ｔ１の操作面１０１上の空間を側面（ｙ方向）から見た図である。まず、距離画像センサ１０２から、時刻ｔ１の距離画像と二次元赤外画像が取得される（Ｓ３０２）。取得した距離画像から検出された手領域は、図５（ａ）にグレーの領域として示される。ここで時刻ｔ１における操作体を、操作体Ａとする。操作体Ａは、ユーザの手と腕である。なお手とは、腕のうち手首より先（指側）の全ての部分であり、５指、掌、甲といった部位が含まれる部位である。 First, a state where the user's finger is at a position away from the operation surface 101 will be described with reference to FIGS. FIG. 5A shows a part of the distance image at time t1, and FIG. 5B is a view of the space on the operation surface 101 at time t1 as seen from the side surface (y direction). First, a distance image and a two-dimensional infrared image at time t1 are acquired from the distance image sensor 102 (S302). The hand area detected from the acquired distance image is shown as a gray area in FIG. Here, the operating tool at time t1 is referred to as operating tool A. The operating tool A is a user's hand and arm. The hand is a part of the arm that extends beyond the wrist (finger side) and includes parts such as five fingers, palm, and back.

まず、本実施形態では、距離画像から検出された手領域から、第１の点が操作位置として検出される（Ｓ４０１）。図５（ａ）に示すように、まず侵入位置ｐ１を検出し、手領域の中で侵入位置ｐ１から最も遠い画素が、第１の点ｐ２として検出される。さらに、距離画像内の位置情報と画素値に基づいて、第１の点ｐ２の実空間における三次元位置（ｘ２，ｙ２，ｚ２）が取得される。そして、第１の点ｐ２の操作面からの高さ（ｚ２）が閾値Ｔ１以下かを判定（Ｓ４０２）され、図５（ａ）の場合は閾値Ｔ１以下でないため、引き続き第１の点から得られた三次元位置（ｘ２，ｙ２，ｚ２）を操作位置として、操作の認識が行われる。本実施形態では、タッチ操作を認識するため、指先と操作面１０１との近接の程度を表す、操作位置の高さ（ｚ座標）を、閾値Ｔ２と比較する。ｚ座標が閾値Ｔ２を下回る場合を、指先と操作面１０１との近接の程度が、タッチ状態であるとみなせる程度に近いと判定する。図５（ａ）の状態では、操作体Ａは操作面１０１よりも十分高い位置で維持されており、操作体の高さｚ２は、閾値Ｔ２より大きいため、タッチ状態とはみなされない。 First, in the present embodiment, the first point is detected as the operation position from the hand region detected from the distance image (S401). As shown in FIG. 5A, first, an intrusion position p1 is detected, and a pixel farthest from the intrusion position p1 in the hand region is detected as a first point p2. Further, the three-dimensional position (x2, y2, z2) in the real space of the first point p2 is acquired based on the position information and the pixel value in the distance image. Then, it is determined whether the height (z2) from the operation surface of the first point p2 is equal to or less than the threshold value T1 (S402). In the case of FIG. The operation is recognized using the obtained three-dimensional position (x2, y2, z2) as the operation position. In the present embodiment, in order to recognize the touch operation, the height (z coordinate) of the operation position that represents the degree of proximity between the fingertip and the operation surface 101 is compared with the threshold value T2. When the z coordinate is less than the threshold T2, it is determined that the degree of proximity between the fingertip and the operation surface 101 is close to the degree that the touch state can be considered. In the state of FIG. 5A, the operation tool A is maintained at a position sufficiently higher than the operation surface 101, and the height z2 of the operation tool is larger than the threshold value T2, and thus is not regarded as a touch state.

次に、図６（ａ）（ｂ）を参照して、ユーザの指先が、操作面１０１に接触した状態での情報処理装置１００の動作例を説明する。ただし、ここでは、指先が操作面１０１と極近いことによって、距離画像から得られた手領域の輪郭のうち、指先の周辺の精度が低くなってしまう場合を特に説明する。図６（ａ）は時刻ｔ２において取得される距離画像の一部を示す。図６（ｂ）は、時刻ｔ２において操作面１０１上の空間をｙ方向から見た図である。時刻ｔ２は時刻ｔ１より後の時刻である。 Next, an operation example of the information processing apparatus 100 in a state in which the user's fingertip is in contact with the operation surface 101 will be described with reference to FIGS. However, here, a case where the accuracy of the periphery of the fingertip among the contours of the hand region obtained from the distance image is lowered due to the fingertip being very close to the operation surface 101 will be particularly described. FIG. 6A shows a part of the distance image acquired at time t2. FIG. 6B is a diagram of the space on the operation surface 101 viewed from the y direction at time t2. Time t2 is a time after time t1.

距離画像センサ１０２から、時刻ｔ２の距離画像と二次元赤外画像が取得される（Ｓ３０２）。取得した距離画像から取得された手領域を、図６（ａ）にグレーの領域として示す。時刻ｔ２での操作体を操作体Ｂとする。このときの手領域では、操作体Ｂのうち、指先の一部に相当する部分が欠けている。これは、距離画像センサ１０２の分解能あるいは検出精度が十分でないため、重なった状態の操作面１０１とユーザの指の境界を、距離情報によって抽出することが困難となったためである。この場合も、本実施形態では、手領域に基づいて侵入位置ｐ３を検出し、手領域に含まれるうち侵入位置ｐ３から最も遠い画素を、第１の点ｐ４として検出する（Ｓ３０４）。さらに、検出された第１の点ｐ４の実空間における三次元位置（ｘ４，ｙ４，ｚ４）を、操作位置の三次元位置情報として取得する。 A distance image and a two-dimensional infrared image at time t2 are acquired from the distance image sensor 102 (S302). The hand region acquired from the acquired distance image is shown as a gray region in FIG. The operating tool at time t2 is referred to as operating tool B. In the hand region at this time, a part corresponding to a part of the fingertip is missing in the operating body B. This is because the resolution or detection accuracy of the distance image sensor 102 is not sufficient, and it becomes difficult to extract the boundary between the overlapped operation surface 101 and the user's finger based on the distance information. Also in this case, in this embodiment, the intrusion position p3 is detected based on the hand area, and the pixel farthest from the intrusion position p3 included in the hand area is detected as the first point p4 (S304). Further, the three-dimensional position (x4, y4, z4) in the real space of the detected first point p4 is acquired as the three-dimensional position information of the operation position.

ここで、検出された第１の点ｐ４の座標（ｘ４，ｙ４，ｚ４）のうち、操作面１０１からの高さに相当するｚ座標が、閾値Ｔ１よりも小さいとする。このとき本実施形態では、操作位置の操作面からの高さが閾値Ｔ１以下と判定されることで（Ｓ４０２でＹＥＳ）、操作位置の操作面に平行な二次元での位置情報を、二次元赤外画像からも検出することになる。具体的にはまず、距離画像において、操作体Ｂの指先方向ベクトルＡを作成する（Ｓ４０３）。図７（ａ）は、距離画像における指先方向の決定方法、図７（ｂ）は二次元赤外画像における注目領域の決定方法の概要を表す図である。図７（ａ）で示すように、本実施形態では、操作体Ｂの最も幅が広い場所の中心位置を、手の中心位置ｐ５として取得する。そして、取得した手の中心位置ｐ５を始点とし、第１の点ｐ４を終点とする指先方向ベクトルＡを作成する。次に、図７（ｂ）で示すように、二次元赤外画像の一部に注目領域として設定する（Ｓ４０４）。本実施形態では、距離画像で算出された指先方向ベクトルＡの始点を、距離画像で検出された第１の点ｐ４に変更し、指先方向ベクトルＢを作成する。単に指先方向ベクトルＡを延長するのではなく、始点を移動させることで、現実的な指の長さの範囲を考慮して、指先の探索範囲を絞り込むことができる。ただし、指の曲がり具合などによって、第１の点と手の中心位置との距離は変わるため、場合によっては始点の移動はせず所定の長さに指先方向ベクトルＡを延長させることで、指先方向ベクトルＢを設定してもよい。 Here, it is assumed that the z coordinate corresponding to the height from the operation surface 101 among the coordinates (x4, y4, z4) of the detected first point p4 is smaller than the threshold T1. At this time, in this embodiment, when the height of the operation position from the operation surface is determined to be equal to or less than the threshold T1 (YES in S402), the two-dimensional position information parallel to the operation surface of the operation position is converted into the two-dimensional information. It will also detect from an infrared image. Specifically, first, a fingertip direction vector A of the operating tool B is created in the distance image (S403). FIG. 7A is a diagram illustrating an outline of a method for determining a fingertip direction in a distance image, and FIG. 7B is a diagram illustrating an overview of a method for determining a region of interest in a two-dimensional infrared image. As shown in FIG. 7A, in the present embodiment, the center position of the widest place of the operating tool B is acquired as the hand center position p5. Then, a fingertip direction vector A having the acquired center position p5 of the hand as a start point and the first point p4 as an end point is created. Next, as shown in FIG. 7B, a part of the two-dimensional infrared image is set as a region of interest (S404). In this embodiment, the starting point of the fingertip direction vector A calculated from the distance image is changed to the first point p4 detected from the distance image, and the fingertip direction vector B is created. Instead of simply extending the fingertip direction vector A, the fingertip search range can be narrowed down by considering the realistic finger length range by moving the starting point. However, since the distance between the first point and the center position of the hand changes depending on how the finger is bent, in some cases, the fingertip direction vector A is extended to a predetermined length without moving the starting point. A direction vector B may be set.

そして本実施形態では、二次元赤外画像上で、指先方向ベクトルＢを包含する部分領域を、注目領域として決定される。本実施形態の場合、距離画像と二次元赤外画像は、画素が１対１で対応するので、距離画像内で定義した指先方向ベクトルＢをそのまま二次元赤外画像に複写して考えることができる。図８（ａ）は、二次元赤外画像において決定された注目領域を拡大したものである。本実施形態では、注目領域内に写る被写体の輪郭の形状に基づいて、第２の点を検出する（Ｓ４０５）。具体的には、注目領域内でエッジ抽出を行い、エッジと注目領域の境界線で囲まれた領域であり、かつ距離画像で検出された第１の点ｐ４を含む領域を指先領域とする。検出された指先領域の端部となる画素を特定し、第２の点ｐ６とする。そしてｐ６の、操作面１０１に平行な二次元での位置情報（ｘ６，ｙ６）を取得する。また必要な場合は、距離画像内でのｐ６を参照し、ｚ座標を含むｐ６の三次元位置情報（ｘ６，ｙ６，ｚ６）を取得する。 In the present embodiment, the partial area including the fingertip direction vector B is determined as the attention area on the two-dimensional infrared image. In the case of the present embodiment, the distance image and the two-dimensional infrared image have a one-to-one correspondence with the pixels. Therefore, the fingertip direction vector B defined in the distance image can be copied to the two-dimensional infrared image as it is. it can. FIG. 8A is an enlarged view of the attention area determined in the two-dimensional infrared image. In the present embodiment, the second point is detected based on the contour shape of the subject appearing in the attention area (S405). Specifically, edge extraction is performed in the attention area, and an area that is surrounded by a boundary line between the edge and the attention area and that includes the first point p4 detected in the distance image is defined as a fingertip area. A pixel serving as an end portion of the detected fingertip region is specified as a second point p6. Then, position information (x6, y6) in two dimensions parallel to the operation surface 101 of p6 is acquired. If necessary, p6 in the distance image is referred to, and three-dimensional position information (x6, y6, z6) of p6 including the z coordinate is acquired.

検出された第２の点ｐ６の座標によって、この時点では第１の点の座標情報によって定義されている操作位置の座標情報を補正すべきかを判定する（Ｓ４０６）。本実施形態では、ｘｙ平面における、第２の点ｐ６と指先方向ベクトルＢの距離ｄ１が閾値Ｔ３以下であることを所定の条件として判定を行い、この条件が満たされる場合は補正を実行する。図８（ａ）の場合は、距離ｄ１が閾値Ｔ３以下であるため、第２の点ｐ６の（ｘ６，ｙ６）が、操作位置のｘｙ座標として妥当であるとみなされる（ステップＳ４０６でＹＥＳ）。そして、第１の点ｐ４を第２の点ｐ６で操作位置の補正が実行される（ステップＳ４０７）。なお必要に応じて、距離画像から第２の点のｚ座標ｚ６を求めてもよい。 Based on the detected coordinates of the second point p6, it is determined at this time whether the coordinate information of the operation position defined by the coordinate information of the first point should be corrected (S406). In the present embodiment, it is determined as a predetermined condition that the distance d1 between the second point p6 and the fingertip direction vector B in the xy plane is equal to or less than the threshold T3, and correction is executed when this condition is satisfied. In the case of FIG. 8A, since the distance d1 is equal to or less than the threshold value T3, (x6, y6) of the second point p6 is regarded as valid as the xy coordinates of the operation position (YES in step S406). . Then, the operation point is corrected at the first point p4 and the second point p6 (step S407). If necessary, the z-coordinate z6 of the second point may be obtained from the distance image.

図８（ｂ）に、補正された操作位置とその三次元座標を示す。ただし、図８（ｂ）では、第２の点のｘｙ座標で示される距離画像の画素値を使って、補正された操作位置のｚ座標ｚ６を求めた場合を示す。このように、第１の点ｐ４の高さｚ４が閾値Ｔ１を下回る場合には、ｚ４の第２の点の高さｚ６のいずれもタッチ閾値Ｔ２を下回る。従って、ｚ６を求めたか否かに寄らず、ステップＳ３０５では、第２の点のｘｙ座標が、ユーザの指先によって指示された操作面１０１上の位置として認識される。例えば、操作面１０１に投影されたＵＩボタンのうち、指示された位置（ｘ６、ｙ６）を包含するボタンが選択状態になり、それに応じて投影内容が更新される（ステップＳ３０６）。 FIG. 8B shows the corrected operation position and its three-dimensional coordinates. However, FIG. 8B shows a case where the z coordinate z6 of the corrected operation position is obtained using the pixel value of the distance image indicated by the xy coordinate of the second point. As described above, when the height z4 of the first point p4 is lower than the threshold value T1, the height z6 of the second point z4 is lower than the touch threshold value T2. Therefore, regardless of whether or not z6 has been obtained, in step S305, the xy coordinate of the second point is recognized as the position on the operation surface 101 designated by the user's fingertip. For example, among the UI buttons projected on the operation surface 101, the button including the designated position (x6, y6) is selected, and the projection content is updated accordingly (step S306).

以上のように、本実施形態では、三次元位置情報に基づいて検出した操作体の端部の位置情報を、状況に応じて二次元画像から検出した操作体の端部の位置情報で補う。このとき、二次元画像のうち、操作体の端部の位置を探索する範囲を、三次元位置情報に基づいて検出した操作体の端部の位置情報によって定まる注目領域の内部に限定する。これにより、全体の計算量を削減し、かつ第１の点検出精度も向上することができる。 As described above, in the present embodiment, the position information of the end of the operating tool detected based on the three-dimensional position information is supplemented with the position information of the end of the operating tool detected from the two-dimensional image according to the situation. At this time, the range of searching for the position of the end of the operating tool in the two-dimensional image is limited to the inside of the attention area determined by the position information of the end of the operating tool detected based on the three-dimensional position information. Thereby, the total calculation amount can be reduced and the first point detection accuracy can be improved.

＜変形例１＞
ここで第１の実施形態の変形例１として、別の方法で注目領域を設定する例を、図９及び図１０を参照して説明する。なお、図９、図１０では、図７〜図８と重複する要素には同じ番号を付与している。 <Modification 1>
Here, as a first modification of the first embodiment, an example in which a region of interest is set by another method will be described with reference to FIGS. 9 and 10. In FIGS. 9 and 10, the same numbers are assigned to elements that overlap those in FIGS. 7 to 8.

まず、図９（ａ）で示すように、距離画像から検出した第１の点ｐ４を中心とする所定の大きさの領域を、注目領域として設定する。ここで注目領域とは、検出された第１の点が、実際には指の端部から根元に至るまでのどこであっても、指先が包含される程度の大きさの領域が設定される。注目領域の大きさは、第１の点ｐ４の高さ（ｚ４）に応じて所定の値が選ばれるとする。例えば、注目領域の大きさには、距離画像センサ１０２と手（第１の点または手の中心位置）の距離や、手領域の大きさを考慮してもよい。ただし、腕はフレームによって写り込みの度合いの変動が大きい。従って、手領域の大きさを考慮する場合は、腕のうちフレーム間で大きさの違いが生じにくい手（手首から先）の部分の大きさを表す領域を定義して、その大きさを扱う。例えば、手の中心位置付近での手領域の幅を手領域の大きさとする。その上で、手領域がある程度大きい場合（センサと手が近い場合に相当）は、注目領域を大きく設定し、手領域が小さい場合（センサと手が遠い場合に相当）は、注目領域を小さく設定するなど、注目領域の大きさを選択する。 First, as shown in FIG. 9A, an area having a predetermined size centered on the first point p4 detected from the distance image is set as an attention area. Here, the attention area is set to an area that is large enough to encompass the fingertip, wherever the detected first point is actually from the end of the finger to the root. As the size of the attention area, a predetermined value is selected according to the height (z4) of the first point p4. For example, the size of the region of interest may take into account the distance between the distance image sensor 102 and the hand (first point or the center position of the hand) and the size of the hand region. However, the degree of reflection of the arm varies greatly depending on the frame. Therefore, when considering the size of the hand area, define an area that represents the size of the hand (wrist to tip) part of the arm that is unlikely to differ in size between frames, and handle the size. . For example, the width of the hand region near the center position of the hand is set as the size of the hand region. In addition, if the hand area is large to some extent (corresponding to the case where the hand is close to the sensor), the attention area is set to be large. Select the size of the region of interest, such as setting.

次に、図９（ｂ）は注目領域を拡大したものである。注目領域内の操作体Ｂの重心位置ｐ７（ｘ７，ｙ７，ｚ７）を算出する。算出された重心位置ｐ７を始点とし、第１の点ｐ４を終点とする指先方向ベクトルＣを作成する。次に、図１０（ａ）で示すように、作成された指先方向ベクトルＣの始点を第１の点ｐ４に変更し、指先方向ベクトルＤを作成する。そして、二次元赤外画像の中で、指先方向ベクトルＤを包含する部分領域を、注目領域として設定する。 Next, FIG. 9B is an enlarged view of the region of interest. The center-of-gravity position p7 (x7, y7, z7) of the operating tool B within the attention area is calculated. A fingertip direction vector C having the calculated center of gravity position p7 as a start point and the first point p4 as an end point is created. Next, as shown in FIG. 10A, the starting point of the created fingertip direction vector C is changed to the first point p4, and the fingertip direction vector D is created. Then, a partial area including the fingertip direction vector D in the two-dimensional infrared image is set as the attention area.

図１０（ｂ）は、変形例１によって設定した、二次元赤外画像内の注目領域を拡大したものである。第１の実施形態と変形例では、第２の点が実際の指先位置を正確に抽出できていたとしても、指先方向ベクトルからの距離が離れている場合は、第２の点が操作位置としては利用されない。従って、指先方向ベクトルはより正確に指の方向を表していることが望ましい。しかしながら、指先方向ベクトルＢは手の中心位置を基点としているため、操作している指や操作時の手の形状等の条件により、実際との指の方向との差分が大きい場合があった。例えば、図８（ａ）と図１０（ｂ）を比較すると、第１の実施形態の指先方向ベクトルＢよりも、変形例１の指先方向ベクトルＤの方が、より実際の指先の方向に近いことがわかる。一方、ベクトルを生成する処理に必要な計算量は、指先方向ベクトルＤより指先方向ベクトルＢの方法の方が少ない。注目領域の設定方法は、情報処理装置１００あるいはアプリケーションが、処理スピードを優先すべきものかなどに応じて、選択的に用いられても良い。 FIG. 10B is an enlarged view of the region of interest in the two-dimensional infrared image set according to the first modification. In the first embodiment and the modification, even if the second point can accurately extract the actual fingertip position, if the distance from the fingertip direction vector is long, the second point is set as the operation position. Is not used. Therefore, it is desirable that the fingertip direction vector represents the direction of the finger more accurately. However, since the fingertip direction vector B is based on the center position of the hand, the difference between the actual finger direction and the finger may be large depending on conditions such as the finger being operated and the shape of the hand at the time of operation. For example, comparing FIG. 8A and FIG. 10B, the fingertip direction vector D of the first modification is closer to the actual fingertip direction than the fingertip direction vector B of the first embodiment. I understand that. On the other hand, the calculation amount required for the process of generating the vector is smaller in the fingertip direction vector B method than in the fingertip direction vector D. The attention area setting method may be selectively used depending on whether the information processing apparatus 100 or the application should prioritize the processing speed.

＜変形例２＞
さらに変形例２として、指先方向ベクトルを使わずに、注目領域を決定する方法を記載する。図１１は、変形例２における、操作位置の決定処理（Ｓ３０４）を表すフローチャートである。図４のフローチャートと共通する処理ステップは、同じ番号を付与し、詳細な説明は省略する。 <Modification 2>
Furthermore, as a second modification, a method for determining a region of interest without using a fingertip direction vector will be described. FIG. 11 is a flowchart showing an operation position determination process (S304) in the second modification. The same processing steps as those in the flowchart of FIG. 4 are given the same numbers, and detailed description thereof is omitted.

変形例２では、ステップＳ４０１で距離画像から検出された第１の点が検出され、その高さが閾値Ｔ１より大きい場合はステップＳ１１００に進む。ステップＳ１１００では、領域設定部２１４が、二次元赤外画像において、第１の点のｘｙ座標から特定される、この時点での操作位置を基準して、所定の大きさの注目領域を決定する。図１２（ａ）は、距離画像で検出された第１の点ｐ４の位置を中心として、二次元赤外画像内に、所定の大きさの注目領域Ｃが決定された様子を表す。ここで、注目領域の大きさは、変形例１で注目領域を設定した考え方と同様、手領域がある程度大きい場合（センサと手が近い場合に相当）は、注目領域を大きく設定し、手領域が小さい場合（センサと手が遠い場合に相当）は、注目領域を小さく設定する。 In the second modification, when the first point detected from the distance image in step S401 is detected and the height is larger than the threshold value T1, the process proceeds to step S1100. In step S1100, the region setting unit 214 determines a region of interest having a predetermined size on the basis of the operation position at this time point specified from the xy coordinates of the first point in the two-dimensional infrared image. . FIG. 12A shows a state in which a region of interest C having a predetermined size is determined in the two-dimensional infrared image with the position of the first point p4 detected in the distance image as the center. Here, the size of the attention area is set to a larger attention area when the hand area is somewhat large (corresponding to the case where the hand is close to the sensor) as in the idea of setting the attention area in the first modification. Is small (corresponding to the case where the sensor is far from the hand), the attention area is set small.

ステップＳ１１０１において、第２検出部２１５が、二次元赤外画像の、注目領域Ｃ内の部分から、操作体が被写体として写る画素群である手領域Ｂを検出する。具体的には、注目領域Ｃ内でエッジ検出を行い、検出されたエッジと注目領域Ｃの境界線により囲まれ、かつ第１の点ｐ４を含む領域を手領域Ｂとして取得する。 In step S <b> 1101, the second detection unit 215 detects a hand region B, which is a pixel group in which the operating tool appears as a subject, from a portion in the attention region C of the two-dimensional infrared image. Specifically, edge detection is performed in the attention area C, and an area surrounded by the boundary line between the detected edge and the attention area C and including the first point p4 is acquired as the hand area B.

ステップＳ１１０２において、第２検出部２１５が、距離画像でも同様に、第１の点を中心とした所定の大きさの注目領域Ｃを設定し、距離画像から検出されている手領域と、注目領域Ｃに含まれる領域を手領域Ａ（図１２（ａ））とする。手領域Ａと、ステップＳ１１０１で取得された手領域Ｂの形状の差異の大きさが、小さいとみなせる程度かを判定する。具体的には、手領域Ａと手領域Ｂの、重複しない部分の面積を面積Ｄとして算出し、面積Ｄと所定の閾値Ｄ１の大きさを比較することで判定を行う。例えば、面積Ｄが閾値Ｄ１以下の場合は、手領域Ａと手領域Ｂの形状の差異の大きさが小さいとみなし（Ｓ１１０２、ＹＥＳ）はステップＳ１１０３へ進む。面積Ｄが閾値Ｄ１より大きい場合は、手領域Ａと手領域Ｂの形状の差異の大きさが大きいとみなし（Ｓ１１０２、ＮＯ）、操作位置の補正を行わずに図３のフローチャートに戻る。 In step S1102, the second detection unit 215 similarly sets the attention area C having a predetermined size centered on the first point in the distance image, and the hand area detected from the distance image and the attention area. An area included in C is a hand area A (FIG. 12A). It is determined whether the difference in shape between the hand region A and the hand region B acquired in step S1101 can be regarded as small. Specifically, the area of the hand region A and the hand region B that do not overlap is calculated as the area D, and the determination is performed by comparing the area D with the predetermined threshold value D1. For example, when the area D is equal to or smaller than the threshold value D1, the difference in shape between the hand area A and the hand area B is considered to be small (S1102, YES), and the process proceeds to step S1103. When the area D is larger than the threshold value D1, it is considered that the size difference between the hand region A and the hand region B is large (NO in S1102), and the process returns to the flowchart of FIG. 3 without correcting the operation position.

ステップＳ１１０３において、第２検出部２１５が、二次元赤外画像の、注目領域Ｃ内の手領域Ｂの輪郭の形状に基づいて、操作体の端部とみなされる位置を第２の点ｐ６として検出する。ステップＳ１１０４において、認識部２１６が、操作位置のｘｙ座標を、第２の点ｐ６のｘｙ座標で補正する。そして、図３のフローチャートに戻る。変形例２によれば、指先方向ベクトルを検出しなくても、手領域の形状により第１の点を補正することができる。 In step S1103, the second detection unit 215 sets, as the second point p6, the position regarded as the end of the operating body based on the contour shape of the hand region B in the attention region C of the two-dimensional infrared image. To detect. In step S1104, the recognition unit 216 corrects the xy coordinates of the operation position with the xy coordinates of the second point p6. And it returns to the flowchart of FIG. According to the second modification, the first point can be corrected by the shape of the hand region without detecting the fingertip direction vector.

＜第２の実施形態＞
第１の実施形態の情報処理装置は、操作位置を特定し、操作面上で指定された位置に基づく操作を認識した。第２の実施形態では操作体（手領域）の形状に基づいて、操作体であるユーザの手の姿勢に対応する操作を認識する。例えば、上述した指差しポーズ、その他にも片手の指を３本伸ばして２本を曲げた姿勢、全ての指を伸ばした姿勢、などを識別することで、予めそれぞれの姿勢に対応付けられた指示として認識する。また例えば、それぞれの姿勢が維持された状態での、操作体の移動軌跡の形状に基づいて、各種指示を認識する。 <Second Embodiment>
The information processing apparatus according to the first embodiment identifies an operation position and recognizes an operation based on a position specified on the operation surface. In the second embodiment, an operation corresponding to the posture of the user's hand, which is the operation body, is recognized based on the shape of the operation body (hand region). For example, by identifying the pointing pose described above, the posture in which the fingers of one hand are stretched and the two are bent, the posture in which all the fingers are stretched, and the like, it is associated with each posture in advance. Recognize as an instruction. Further, for example, various instructions are recognized based on the shape of the movement locus of the operating body in a state where each posture is maintained.

第２の実施形態は、第２の実施形態と同じ構成の情報処理装置１００によって実行可能である。ただし、メイン処理は、図３のフローチャートに代わって図１３のフローチャートに従って行われる。ここでは図３との差異のみ説明する。 The second embodiment can be executed by the information processing apparatus 100 having the same configuration as that of the second embodiment. However, the main process is performed according to the flowchart of FIG. 13 instead of the flowchart of FIG. Only the differences from FIG. 3 will be described here.

図１３のフローチャートでは、ステップＳ１３０１において、ステップＳ３０３で検出された手領域を補正する処理を行う。図１４は、第２の実施形態における手領域の決定処理（Ｓ１３０１）を表すフローチャートである。まず、ステップＳ１４０１において、第２の距離取得部２０６が、手領域Ｃの最低点ｐ８（第１の点）の高さｚ８を取得し、高さｚ８が閾値Ｔ１以下と判定された場合（Ｓ１４０１、ＹＥＳ）は、Ｓ１４０２に進む。 In the flowchart of FIG. 13, in step S1301, a process for correcting the hand area detected in step S303 is performed. FIG. 14 is a flowchart showing hand region determination processing (S1301) in the second embodiment. First, in step S1401, the second distance acquisition unit 206 acquires the height z8 of the lowest point p8 (first point) of the hand region C, and when the height z8 is determined to be equal to or less than the threshold T1 (S1401). , YES), the process proceeds to S1402.

ステップＳ１４０２において、領域設定部２１４が、手領域Ｃの最低点ｐ８を基準して注目領域Ｄを決定する。図１５（ａ）に示すように、距離画像で検出された最低点ｐ８を中心として、任意の大きさの注目領域Ｄを決定する。なお、注目領域の大きさは、距離画像センサとの距離を考慮し、距離が近い程大きく、距離が遠い程小さいことが好ましい。 In step S1402, the region setting unit 214 determines the attention region D with reference to the lowest point p8 of the hand region C. As shown in FIG. 15A, an attention area D having an arbitrary size is determined around the lowest point p8 detected in the distance image. The size of the attention area is preferably larger as the distance is shorter and smaller as the distance is farther in consideration of the distance to the distance image sensor.

ステップＳ１４０３において、第２の操作体検出部３１０が、決定された注目領域Ｄ内の二次元赤外画像で、手領域Ｄを検出する。具体的には、注目領域Ｄ内でエッジ検出を行い、検出されたエッジと注目領域Ｄの境界線により囲まれ、かつ最低点ｐ８を含む領域を手領域Ｄとして検出する。ステップＳ１４０４において、第２の操作体検出部３１０が、二次元赤外画像の注目領域Ｄ内の手領域Ｄを手領域とする。 In step S1403, the second operating tool detection unit 310 detects the hand region D from the determined two-dimensional infrared image in the attention region D. Specifically, edge detection is performed in the attention area D, and an area surrounded by the boundary line between the detected edge and the attention area D and including the lowest point p8 is detected as the hand area D. In step S1404, the second operating tool detection unit 310 sets the hand region D in the attention region D of the two-dimensional infrared image as the hand region.

図１３のフローチャートに戻り、ステップＳ１３０２において、認識部２１６が手領域Ｄの形状から、操作体の姿勢が、辞書に登録されたいずれの姿勢であるかを判定する。そして、判定された姿勢に対応づけられた指示が認識される。例えば、図１５（ｂ）のように、手領域Ｄの形状が３本指を伸ばした姿勢であると判定された場合には、操作メニューの一覧を表示させる指示として認識する。その場合、ステップＳ３０６では、表示制御部２１７により、操作メニューを操作面１０１に投影させる表示更新が実行される。本実施形態によれば、操作位置として点の座標を指示する操作だけでなく、操作体の姿勢によって入力される操作も精度よく認識することができる。 Returning to the flowchart of FIG. 13, in step S <b> 1302, the recognition unit 216 determines from the shape of the hand region D which posture is registered in the dictionary. Then, an instruction associated with the determined posture is recognized. For example, as shown in FIG. 15B, when it is determined that the shape of the hand region D is a posture in which three fingers are extended, it is recognized as an instruction to display a list of operation menus. In that case, in step S306, the display control unit 217 executes display update for projecting the operation menu onto the operation surface 101. According to the present embodiment, it is possible to accurately recognize not only an operation for instructing the coordinates of a point as an operation position but also an operation input according to the posture of the operating body.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２１０三次元情報取得部
２１１操作体取得部
２１２第１検出部
２１３二次元画像取得部
２１４領域設定部
２１５第２検出部
２１６認識部
２１７表示制御部 210 Three-dimensional information acquisition unit 211 Operation object acquisition unit 212 First detection unit 213 Two-dimensional image acquisition unit 214 Region setting unit 215 Second detection unit 216 Recognition unit 217 Display control unit

Claims

An operating tool acquisition means for acquiring three-dimensional position information relating to an operating tool existing in a space on a predetermined operating surface;
Image acquisition means for acquiring a two-dimensional image obtained by imaging the space;
First detecting means for detecting a first point corresponding to an end of the operating body based on the three-dimensional position information acquired by the operating body acquiring means;
Setting means for setting a region of interest including the position of the first point detected by the first detection means in the two-dimensional image;
Second detection means for detecting a second point corresponding to the end of the operating body based on the shape of the operating body imaged in the region of interest by the setting means in the two-dimensional image;
Two-dimensional position information parallel to the operation surface of the first point detected by the first detection means and two-dimensional position information parallel to the operation surface of the second point detected by the second detection means. Recognizing means for recognizing any of position information as a position on the operation surface instructed by the operation body;
An information processing apparatus comprising:

The operating tool acquisition means is a means for acquiring a pixel group constituting an area in which the operating tool appears as a subject from a distance image in which distance information in the space is reflected in each pixel value, and the distance is The information processing apparatus according to claim 1, wherein the distance corresponds to a degree of proximity to an operation surface included in the space.

The recognizing means is detected by the first detecting means when the position of the first point detected by the first detecting means is closer than a predetermined distance to the operation surface existing in the space. One of the two-dimensional position information parallel to the operation surface of the first point and the two-dimensional position information parallel to the operation surface of the second point detected by the second detection means. The information processing apparatus according to claim 1, wherein the information processing apparatus is recognized as a position on the operation surface instructed by the operation body.

If the position of the first point detected by the first detecting unit is not considered to be closer than a predetermined distance to the operation surface existing in the space, the recognizing unit detects the first point Recognizing as a position on the operation surface instructed by the operating body based on the position information of the point,
Two-dimensional position information parallel to the operation surface of the first point and the operation of the second point when the operation surface existing in the space is considered to be closer than a predetermined distance. The process of determining whether to recognize any of two-dimensional position information parallel to a surface as a position on the operation surface instructed by the operation body is performed. Information processing device.

The recognition means recognizes a state in which the end of the operating body is closer to the operation surface than a predetermined distance as a state in which the end of the operating body is touching the operation surface. The information processing apparatus according to any one of claims 1 to 4.

The recognizing means recognizes a touch operation input to the information processing device based on a position while the end of the operating body is in a state of touching the operation surface. The information processing apparatus according to claim 5.

The operating body is a user's arm that inputs an operation to the information processing apparatus, and the first point corresponding to the end is a point corresponding to a fingertip of the user's arm. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

The recognizing unit is configured to calculate one of the three-dimensional position information of the first point detected by the first detecting unit and the three-dimensional position information of the second point detected by the second detecting unit. The information processing apparatus according to claim 1, wherein the information processing apparatus is recognized as three-dimensional position information of an operation position corresponding to an end of the predetermined operation body.

The recognition means includes a contour shape of a portion corresponding to the attention area set in the two-dimensional image among areas where the operation body acquired by the operation body acquisition means from the distance image is captured as a subject, When the difference between the contour shape of the operating body imaged in the attention area in the two-dimensional image is larger than a predetermined reference, two-dimensional position information parallel to the operating surface of the first point, the operating body If the difference is smaller than a predetermined reference, two-dimensional position information of the first point parallel to the operation surface and the operation of the second point are recognized. The process of determining whether any of two-dimensional position information parallel to a surface is recognized as a position on the operation surface instructed by the operation body is performed. Information processing device.

The two-dimensional image is an image obtained by imaging infrared light,
The said recognition means acquires the shape of the outline of the said operation body by detecting an edge inside the said attention area among the said two-dimensional images, The any one of Claim 1 thru | or 9 characterized by the above-mentioned. The information processing apparatus described.

An operation object acquisition step of acquiring three-dimensional position information related to the operation object existing in the space on the predetermined operation surface by the operation object acquisition means;
An image acquisition step of acquiring a two-dimensional image obtained by imaging the space by an image acquisition means;
A first detection step of detecting a first point corresponding to an end of the operation body based on the three-dimensional position information acquired in the operation body acquisition step by a first detection unit;
A setting step of setting a region of interest including the position of the first point detected in the first detection step in the two-dimensional image by the setting means;
Second detection means detects a second point corresponding to the end of the operating body based on the shape of the operating body imaged in the region of interest in the setting step in the two-dimensional image. Two detection steps;
Two-dimensional position information parallel to the operation surface of the first point detected in the first detection step by the recognition means, and the operation surface of the second point detected in the second detection step. Recognizing any of the two-dimensional position information parallel to the position as the position on the operation surface instructed by the operation body,
A method for controlling an information processing apparatus, comprising:

Computer
An operating tool acquisition means for acquiring three-dimensional position information relating to an operating tool existing in a space on a predetermined operating surface;
Image acquisition means for acquiring a two-dimensional image obtained by imaging the space;
First detecting means for detecting a first point corresponding to an end of the operating body based on the three-dimensional position information acquired by the operating body acquiring means;
Setting means for setting a region of interest including the position of the first point detected by the first detection means in the two-dimensional image;
Second detection means for detecting a second point corresponding to the end of the operating body based on the shape of the operating body imaged in the region of interest by the setting means in the two-dimensional image;
Two-dimensional position information parallel to the operation surface of the first point detected by the first detection means and two-dimensional position information parallel to the operation surface of the second point detected by the second detection means. Recognizing means for recognizing any of position information as a position on the operation surface instructed by the operation body;
A program for causing an information processing apparatus to function.

A computer-readable storage medium storing the program according to claim 12.

First image acquisition means for acquiring a distance image reflecting three-dimensional position information defined in a space on a predetermined operation surface on which a person performs an operation using a hand;
Second image acquisition means for acquiring a two-dimensional image obtained by imaging the space;
First detection means for detecting a hand region in which the human hand appears as a subject from the distance image based on the three-dimensional position information reflected in the distance image;
In the two-dimensional image, a setting unit that sets a region of interest so as to include a reference position among the hand regions acquired by the first detection unit;
Second detection means for detecting a hand region in which the human hand appears as a subject from the attention region set by the setting means in the two-dimensional image;
Recognition means for recognizing the state of the person's hand based on either the shape of the hand area acquired by the first detection means or the shape of the hand area acquired by the second detection means;
An information processing apparatus comprising:

A first image acquisition step of acquiring a distance image in which three-dimensional position information defined in a space on a predetermined operation surface on which a person performs an operation using a hand is reflected by a first image acquisition unit;
A second image acquisition step of acquiring a two-dimensional image obtained by imaging the space by a second image acquisition means;
A first detection step of detecting a hand region in which the human hand appears as a subject from the distance image based on the three-dimensional position information reflected in the distance image by a first detection means;
A setting step of setting a region of interest so as to include a reference position among the hand regions acquired in the first detection step in the two-dimensional image by the setting unit;
A second detection step of detecting a hand region in which the human hand appears as a subject from the region of interest set in the setting step by the second detection means;
A recognition step of recognizing the state of the person's hand based on either the shape of the hand region acquired in the first detection step or the shape of the hand region acquired in the second detection step. When,
A method for controlling an information processing apparatus, comprising:

Computer
First image acquisition means for acquiring a distance image reflecting three-dimensional position information defined in a space on a predetermined operation surface on which a person performs an operation using a hand;
Second image acquisition means for acquiring a two-dimensional image obtained by imaging the space;
First detection means for detecting a hand region in which the human hand appears as a subject from the distance image based on the three-dimensional position information reflected in the distance image;
In the two-dimensional image, setting means for setting a region of interest so as to include a reference position among the hand regions acquired by the first detection unit;
Second detection means for detecting a hand region in which the human hand appears as a subject from the attention region set by the setting means in the two-dimensional image;
Recognition means for recognizing the state of the person's hand based on either the shape of the hand area acquired by the first detection means or the shape of the hand area acquired by the second detection means;
A program for causing an information processing apparatus to function.

A computer-readable storage medium storing the program according to claim 16.