JP2016139396A

JP2016139396A - User interface device, method and program

Info

Publication number: JP2016139396A
Application number: JP2015147083A
Authority: JP
Inventors: 宗士大志万; Soshi Oshima
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-08-25
Filing date: 2015-07-24
Publication date: 2016-08-04
Anticipated expiration: 2035-07-24
Also published as: JP6643825B2

Abstract

PROBLEM TO BE SOLVED: To read information on a finger from above using a distance image sensor and the like, and to determine a touch position as intended by users when implementing touch detection with respect to a plane surface.SOLUTION: A user interface device is configured to: acquire a three-dimensional image of an operation surface and a region above the operation surface; extract a hand region from the three-dimensional image; specify a position of a fingertip; detect a touch to the operation surface on the basis of the operation surface included in the three-dimensional image, and the specified position of the fingertip; when the touch is detected, specify a direction of the fingertip; and determine a position having the position of the fingertip shifted by a prescribed amount in an opposite direction of the specified direction of the fingertip as a touch position.SELECTED DRAWING: Figure 8

Description

本発明は、例えば手や指先等の位置を遠隔から検出し、特定の面上に表示された表示部品の操作を行うユーザーインターフェイス装置、方法およびプログラムに関する。 The present invention relates to a user interface device, a method, and a program for detecting the position of a hand or a fingertip from a remote location and operating a display component displayed on a specific surface.

プロジェクタとカメラや距離センサを用いたユーザーインターフェイスでは、プロジェクタによりユーザーインターフェイスを投影することで、紙等の現実の物体上に重畳して表示を行うことができる。このため、ユーザーは現実の物体を電子的データとのインターフェイスとして扱うことが可能となる。特許文献１で開示されているユーザーインターフェイスシステムでは、プロジェクタでコンピュータ画面を机上に投射し、指先でそのコンピュータ画面を操作する。指先による平面へのタッチの検出には赤外線カメラを用いている。特許文献１では、テーブルや紙といった物体をユーザーインターフェイスとして、指やペンによるタッチ指示を行う。このとき、指を用いて５ｍｍ四方程度の大きさの文字を選択したり、文字の下に線を引いたりする操作を行う場合、正確なタッチ位置を決定することが必要になる。 In a user interface using a projector, a camera, and a distance sensor, the user interface is projected by the projector, so that it can be displayed superimposed on an actual object such as paper. Therefore, the user can handle a real object as an interface with electronic data. In the user interface system disclosed in Patent Document 1, a computer screen is projected on a desk by a projector, and the computer screen is operated by a fingertip. An infrared camera is used to detect a touch on a plane with a fingertip. In Patent Document 1, a touch instruction with a finger or a pen is performed using an object such as a table or paper as a user interface. At this time, when an operation of selecting a character having a size of about 5 mm square using a finger or drawing a line under the character, it is necessary to determine an accurate touch position.

特開２０１３−３４１６８号公報JP 2013-34168 A

T. Lee and T. Hollerer, Handy AR: Markerless Inspection of Augmented Reality Objects Using Fingertip Tracking. In Proc. IEEE International Symposium on Wearable Computers (ISWC), Boston, MA, Oct. 2007T. Lee and T. Hollerer, Handy AR: Markerless Inspection of Augmented Reality Objects Using Fingertip Tracking.In Proc.IEEE International Symposium on Wearable Computers (ISWC), Boston, MA, Oct. 2007

しかしながら、特許文献１では、指と平面のタッチ検出を行う際に、指と平面がなす角度を考慮していなかった。指先の角度を考慮しないと、正しく平面や指先の位置が取得できず、指と操作面との接触位置が正確に認識されないという問題がある。この場合、前述したような細かい文字を選択したり、文字の下に線を引いたりする操作は困難となる。 However, Patent Document 1 does not consider the angle formed between the finger and the plane when performing touch detection between the finger and the plane. If the angle of the fingertip is not taken into account, there is a problem that the plane and the position of the fingertip cannot be acquired correctly, and the contact position between the finger and the operation surface cannot be recognized accurately. In this case, it becomes difficult to select a fine character as described above or draw a line under the character.

本発明は上記課題を鑑みてなされたものであり、画像解析によりタッチ検出を行う技術において、接触位置の検出精度を向上させ、ひいてはユーザーの操作性を向上させることができるユーザーインターフェイス装置および方法を提供することを目的としている。 The present invention has been made in view of the above problems, and in a technique for performing touch detection by image analysis, a user interface device and method capable of improving the detection accuracy of a contact position and thus improving user operability. It is intended to provide.

上記目的を達成するため本発明は、その一側面によれば以下の構成を有する。 In order to achieve the above object, according to one aspect, the present invention has the following configuration.

操作面で行われた操作を特定するユーザーインターフェイス装置であって、
前記操作面および該操作面を底面とする３次元空間の領域の３次元画像を取得する取得手段と、
前記３次元画像から手領域を抽出する抽出手段と、
前記手領域から指先の位置を特定する第１の特定手段と、
前記３次元画像に含まれた前記操作面および前記指先の位置に基づいて、前記操作面へのタッチを検出する検出手段と、
前記操作面へのタッチが検出された場合に、前記手領域から前記指先の向きを特定する第２の特定手段と、
前記操作面において、前記指先の位置を前記指先の向きと逆方向に所定量シフトした位置をタッチ位置として決定する決定手段とを有する。 A user interface device that identifies operations performed on the operation surface,
Acquisition means for acquiring a three-dimensional image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
Extracting means for extracting a hand region from the three-dimensional image;
First specifying means for specifying the position of a fingertip from the hand region;
Detecting means for detecting a touch on the operation surface based on the position of the operation surface and the fingertip included in the three-dimensional image;
A second specifying means for specifying an orientation of the fingertip from the hand region when a touch on the operation surface is detected;
And determining means for determining, as a touch position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface.

また他の側面によれば以下の構成を有する。 According to another aspect, it has the following configuration.

操作面で行われた操作を特定するユーザーインターフェイス装置であって、
前記操作面を底面とする３次元空間の領域の３次元画像を取得する取得手段と、
前記３次元画像から指の腹の位置を推定する推定手段とを有する。 A user interface device that identifies operations performed on the operation surface,
Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
Estimating means for estimating the position of the belly of the finger from the three-dimensional image.

本発明によれば、画像に基づいて操作面へのタッチ検出を行う際に、触位置の検出精度を向上させ、ユーザーの操作性を向上させることができる。 According to the present invention, when performing touch detection on the operation surface based on an image, it is possible to improve the detection accuracy of the touch position and improve user operability.

カメラスキャナ１０１のネットワーク構成の一例を示す図である。1 is a diagram illustrating an example of a network configuration of a camera scanner 101. FIG. カメラスキャナ１０１の外観の一例を示す図である。2 is a diagram illustrating an example of the appearance of a camera scanner 101. FIG. コントローラ部２０１のハードウェア構成の一例の図である。3 is a diagram illustrating an example of a hardware configuration of a controller unit 201. FIG. カメラスキャナ１０１の制御用プログラムの機能構成の一例の図である。3 is a diagram illustrating an example of a functional configuration of a control program for the camera scanner 101. FIG. 距離画像取得部４０８が実行する処理のフローチャートおよび説明図である。It is the flowchart and explanatory drawing of the process which the distance image acquisition part 408 performs. 実施形態１においてジェスチャー認識部４０９が実行する処理のフローチャートである。4 is a flowchart of processing executed by a gesture recognition unit 409 in the first embodiment. 実施形態１においてジェスチャー認識部４０９が実行する処理の説明図である。6 is an explanatory diagram of processing executed by a gesture recognition unit 409 in Embodiment 1. FIG. 実施形態１において、指先位置を推定する方法を模式的に表した図である。In Embodiment 1, it is the figure which represented typically the method of estimating a fingertip position. 実施形態１において、指先位置からタッチ位置を推定する方法を模式的に表した図である。In Embodiment 1, it is the figure which represented typically the method of estimating a touch position from a fingertip position. 実施形態２においてジェスチャー認識部４０９が実行する処理のフローチャートである。10 is a flowchart of processing executed by the gesture recognition unit 409 in the second embodiment. 実施形態２において、平面に対する指の角度情報からタッチ位置を推定する方法を模式的に説明した図である。In Embodiment 2, it is the figure which demonstrated typically the method of estimating a touch position from the angle information of the finger | toe with respect to a plane. 実施形態３において、ジェスチャー認識部４０９が実行する処理のフローチャートである。10 is a flowchart of processing executed by a gesture recognition unit 409 in Embodiment 3. 実施形態４において、ＲＧＢ画像の情報と平面の角度情報からタッチ位置を推定する方法を模式的に表した図である。In Embodiment 4, it is the figure which represented typically the method of estimating a touch position from the information of RGB image, and the angle information of a plane. 実施形態３において、ジェスチャー認識部４０９が実行する処理のフローチャートである。10 is a flowchart of processing executed by a gesture recognition unit 409 in Embodiment 3. 実施形態４において、タッチ位置を推定する方法を模式的に表した図である。In Embodiment 4, it is the figure which represented typically the method of estimating a touch position. 実施形態４において、ジェスチャー認識部４０９が実行するフローチャートである。10 is a flowchart executed by the gesture recognition unit 409 in the fourth embodiment.

以下、本発明を実施するための形態について図面を参照して説明する。
［実施形態１］
図１は、一実施形態に係るカメラスキャナ１０１が含まれるネットワーク構成を示す図である。図１に示すように、カメラスキャナ１０１はイーサネット（登録商標）等のネットワーク１０４にてホストコンピュータ１０２およびプリンタ１０３に接続されている。図１のネットワーク構成において、ホストコンピュータ１０２からの指示により、カメラスキャナ１０１から画像を読み取るスキャン機能や、スキャンデータをプリンタ１０３により出力するプリント機能の実行が可能である。また、ホストコンピュータ１０２を介さず、カメラスキャナ１０１への直接の指示により、スキャン機能、プリント機能の実行も可能である。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
[Embodiment 1]
FIG. 1 is a diagram illustrating a network configuration including a camera scanner 101 according to an embodiment. As shown in FIG. 1, a camera scanner 101 is connected to a host computer 102 and a printer 103 via a network 104 such as Ethernet (registered trademark). In the network configuration of FIG. 1, a scan function for reading an image from the camera scanner 101 and a print function for outputting scan data by the printer 103 can be executed by an instruction from the host computer 102. Further, it is possible to execute a scan function and a print function by direct instructions to the camera scanner 101 without using the host computer 102.

＜カメラスキャナの構成＞
図２は、一実施形態に係るカメラスキャナ１０１の構成例を示す図である。図２（ａ）に示すように、カメラスキャナ１０１は、コントローラ部２０１、カメラ部２０２、腕部２０３、プロジェクタ２０７、距離画像センサ部２０８を含む。カメラスキャナの本体であるコントローラ部２０１と、撮像を行うためのカメラ部２０２、プロジェクタ２０７および距離画像センサ部２０８は、腕部２０３により連結されている。腕部２０３は関節を用いて曲げ伸ばしが可能である。図２（ａ）には、カメラスキャナ１０１が設置されている書画台２０４も示している。カメラ部２０２および距離画像センサ部２０８のレンズは書画台２０４方向に向けられており、破線で囲まれた読み取り領域２０５内の画像を読み取り可能である。図２（ａ）の例では、原稿２０６は読み取り領域２０５内に置かれているので、カメラスキャナ１０１に読み取り可能となっている。カメラ部２０２は単一解像度で画像を撮像するものとしてもよいが、高解像度画像撮像と低解像度画像撮像が可能なものとすることが好ましい。書画台２０４内にはターンテーブル２０９を設けてもよい。ターンテーブル２０９はコントローラ部２０１からの指示によって回転することが可能であり、ターンテーブル２０９上に置かれた物体とカメラ部２０２との角度を変えることができる。また、図２に示されていないが、カメラスキャナ１０１は、ＬＣＤタッチパネル３３０およびスピーカ３４０をさらに含むこともできる。さらに、周囲の環境情報を収集するための人感センサ、照度センサ、加速度センサなどの各種センサデバイスを含むこともできる。距離画像とは、画像データの各画素に、距離画像センサ部２０８からの距離を関連付けた画像データである。 <Configuration of camera scanner>
FIG. 2 is a diagram illustrating a configuration example of the camera scanner 101 according to the embodiment. As shown in FIG. 2A, the camera scanner 101 includes a controller unit 201, a camera unit 202, an arm unit 203, a projector 207, and a distance image sensor unit 208. A controller unit 201 that is a main body of the camera scanner, a camera unit 202 for performing imaging, a projector 207, and a distance image sensor unit 208 are connected by an arm unit 203. The arm portion 203 can be bent and stretched using a joint. FIG. 2A also shows a document table 204 on which the camera scanner 101 is installed. The lenses of the camera unit 202 and the distance image sensor unit 208 are directed toward the document table 204, and can read an image in the reading region 205 surrounded by a broken line. In the example of FIG. 2A, the document 206 is placed in the reading area 205 and can be read by the camera scanner 101. The camera unit 202 may capture an image with a single resolution, but it is preferable that the camera unit 202 can capture a high-resolution image and a low-resolution image. A turntable 209 may be provided in the document table 204. The turntable 209 can be rotated by an instruction from the controller unit 201, and the angle between the object placed on the turntable 209 and the camera unit 202 can be changed. Although not shown in FIG. 2, the camera scanner 101 may further include an LCD touch panel 330 and a speaker 340. Furthermore, various sensor devices such as a human sensor, an illuminance sensor, and an acceleration sensor for collecting surrounding environmental information can be included. The distance image is image data in which the distance from the distance image sensor unit 208 is associated with each pixel of the image data.

図２（ｂ）は、カメラスキャナ１０１における座標系について表している。カメラスキャナ１０１では各ハードウェアデバイスに対して、カメラ座標系、距離画像座標系、プロジェクタ座標系という座標系が定義される。これらはカメラ部２０２および距離画像センサ部２０８が撮像する画像平面、あるいはプロジェクタ２０７が投影する画像平面をそれぞれＸＹ平面とし、各画像平面に直交した方向をＺ方向として定義したものである。さらに、これらの独立した座標系の３次元画像のデータ（３次元データ）を統一的に扱えるようにするために、書画台２０４を含む平面をＸＹ平面とし、このＸＹ平面から上方に垂直な向きをＺ軸とする直交座標系を定義する。すなわちＸＹ平面は底面ということもできる。 FIG. 2B shows a coordinate system in the camera scanner 101. The camera scanner 101 defines a coordinate system such as a camera coordinate system, a distance image coordinate system, and a projector coordinate system for each hardware device. In these, the image plane captured by the camera unit 202 and the distance image sensor unit 208 or the image plane projected by the projector 207 is defined as an XY plane, and a direction orthogonal to each image plane is defined as a Z direction. Further, in order to be able to handle three-dimensional image data (three-dimensional data) of these independent coordinate systems in a unified manner, the plane including the document table 204 is an XY plane, and the direction perpendicular to the XY plane is upward. Define a Cartesian coordinate system with Z as the Z axis. That is, the XY plane can also be called a bottom surface.

座標系を変換する場合の例として、図２（ｃ）に直交座標系と、カメラ部２０２を中心としたカメラ座標系を用いて表現された空間と、カメラ部２０２が撮像する画像平面との関係を示す。直交座標系における３次元点Ｐ［Ｘ，Ｙ，Ｚ］は、（１）式によって、カメラ座標系における３次元点Ｐｃ［Ｘｃ，Ｙｃ，Ｚｃ］へ変換できる。
[X_c,Y_c,Z_c]^T= [R_c|t_c][X,Y,Z,1]^T ... （１）
ここで、Ｒｃおよびｔｃは、直交座標系に対するカメラの姿勢（回転）と位置（並進）によって求まる外部パラメータによって構成され、Ｒｃを３×３の回転行列、ｔｃを並進ベクトルと呼ぶ。逆に、カメラ座標系で定義された３次元点は（２）式によって、直交座標系への変換することができる。
[X,Y,Z]^T= [R_c ^-1|-R_c ^-1t_c] [X_c,Y_c,Z_c,1]^T ... （２）
さらに、カメラ部２０２で撮影される２次元のカメラ画像平面は、カメラ部２０２によって３次元空間中の３次元情報が２次元情報に変換されたものである。すなわち、カメラ座標系上での３次元点Ｐｃ［Ｘｃ，Ｙｃ，Ｚｃ］を、（３）式によってカメラ画像平面での２次元座標ｐｃ［ｘｐ，ｙｐ］に透視投影変換することによって変換することが出来る。
λ[x_p,y_p,1]^T=A [X_c,Y_c,Z_c]^T ... （３）
ここで、Ａは、カメラの内部パラメータと呼ばれ、焦点距離と画像中心などで表現される３×３の行列である。 As an example of transforming the coordinate system, FIG. 2C illustrates a rectangular coordinate system, a space expressed using a camera coordinate system centered on the camera unit 202, and an image plane captured by the camera unit 202. Show the relationship. The three-dimensional point P [X, Y, Z] in the orthogonal coordinate system can be converted to the three-dimensional point Pc [Xc, Yc, Zc] in the camera coordinate system by the equation (1).
[X _c , Y _c , Z _c ] ^T = [R _c | t _c ] [X, Y, Z, 1] ^T ... (1)
Here, Rc and tc are constituted by external parameters obtained by the posture (rotation) and position (translation) of the camera with respect to the orthogonal coordinate system, and Rc is called a 3 × 3 rotation matrix and tc is called a translation vector. Conversely, a three-dimensional point defined in the camera coordinate system can be converted to an orthogonal coordinate system using equation (2).
[X, Y, Z] ^T = [R _c ^-1 | -R _c ^-1 t _c ] [X _c , Y _c , Z _c , 1] ^T ... (2)
Further, the two-dimensional camera image plane photographed by the camera unit 202 is obtained by converting the three-dimensional information in the three-dimensional space into two-dimensional information by the camera unit 202. That is, the three-dimensional point Pc [Xc, Yc, Zc] on the camera coordinate system is converted by perspective projection conversion to the two-dimensional coordinate pc [xp, yp] on the camera image plane according to the equation (3). I can do it.
λ [x _p , y _p , 1] ^T = A [X _c , Y _c , Z _c ] ^T ... (3)
Here, A is a 3 × 3 matrix called an internal parameter of the camera and expressed by a focal length and an image center.

以上のように、（１）式と（３）式を用いることで、直交座標系で表された３次元点群を、カメラ座標系での３次元点群座標やカメラ画像平面に変換することが出来る。なお、各ハードウェアデバイスの内部パラメータおよび直交座標系に対する位置姿勢（外部パラメータ）は、公知のキャリブレーション手法によりあらかじめキャリブレーションされているものとする。以後、特に断りがなく３次元点群と表記した場合は、直交座標系における３次元データを表しているものとする。 As described above, by using the equations (1) and (3), the three-dimensional point group represented by the orthogonal coordinate system is converted into the three-dimensional point group coordinates or the camera image plane in the camera coordinate system. I can do it. It is assumed that the internal parameters of each hardware device and the position / orientation (external parameters) with respect to the orthogonal coordinate system are calibrated in advance by a known calibration method. Hereinafter, when there is no particular notice and it is expressed as a three-dimensional point group, it represents three-dimensional data in an orthogonal coordinate system.

＜カメラスキャナのコントローラのハードウェア構成＞
図３は、カメラスキャナ１０１の本体であるコントローラ部２０１のハードウェア構成例を示す図である。図３に示すように、コントローラ部２０１は、システムバス３０１に接続されたＣＰＵ３０２、ＲＡＭ３０３、ＲＯＭ３０４、ＨＤＤ３０５、ネットワークＩ／Ｆ３０６、画像処理プロセッサ３０７、カメラＩ／Ｆ３０８、ディスプレイコントローラ３０９、シリアルＩ／Ｆ３１０、オーディオコントローラ３１１およびＵＳＢコントローラ３１２を含む。 <Hardware configuration of camera scanner controller>
FIG. 3 is a diagram illustrating a hardware configuration example of the controller unit 201 which is the main body of the camera scanner 101. As shown in FIG. 3, the controller unit 201 includes a CPU 302, a RAM 303, a ROM 304, an HDD 305, a network I / F 306, an image processor 307, a camera I / F 308, a display controller 309, and a serial I / F 310 connected to the system bus 301. Audio controller 311 and USB controller 312.

ＣＰＵ３０２はコントローラ部２０１全体の動作を制御する中央演算装置である。ＲＡＭ３０３は揮発性メモリである。ＲＯＭ３０４は不揮発性メモリであり、ＣＰＵ３０２の起動用プログラムが格納されている。ＨＤＤ３０５はＲＡＭ３０３と比較して大容量なハードディスクドライブ（ＨＤＤ）である。ＨＤＤ３０５にはコントローラ部２０１の実行する、カメラスキャナ１０１の制御用プログラムが格納されている。 The CPU 302 is a central processing unit that controls the operation of the entire controller unit 201. The RAM 303 is a volatile memory. A ROM 304 is a non-volatile memory, and stores a startup program for the CPU 302. The HDD 305 is a hard disk drive (HDD) having a larger capacity than the RAM 303. The HDD 305 stores a control program for the camera scanner 101 executed by the controller unit 201.

ＣＰＵ３０２は電源ＯＮ等の起動時、ＲＯＭ３０４に格納されている起動用プログラムを実行する。この起動用プログラムは、ＨＤＤ３０５に格納されている制御用プログラムを読み出し、ＲＡＭ３０３上に展開するためのものである。ＣＰＵ３０２は起動用プログラムを実行すると、続けてＲＡＭ３０３上に展開した制御用プログラムを実行し、制御を行う。また、ＣＰＵ３０２は制御用プログラムによる動作に用いるデータもＲＡＭ３０３上に格納して読み書きを行う。ＨＤＤ３０５上にはさらに、制御用プログラムによる動作に必要な各種設定や、また、カメラ入力によって生成した画像データを格納することができ、ＣＰＵ３０２によって読み書きされる。ＣＰＵ３０２はネットワークＩ／Ｆ３０６を介してネットワーク１０４上の他の機器との通信を行う。 The CPU 302 executes a startup program stored in the ROM 304 when the power is turned on. This activation program is for reading a control program stored in the HDD 305 and developing it on the RAM 303. When the CPU 302 executes the startup program, the CPU 302 subsequently executes the control program developed on the RAM 303 to perform control. Further, the CPU 302 also stores data used for the operation by the control program on the RAM 303 to read / write. Further, various settings necessary for operation by the control program and image data generated by camera input can be stored on the HDD 305 and read / written by the CPU 302. The CPU 302 communicates with other devices on the network 104 via the network I / F 306.

画像処理プロセッサ３０７はＲＡＭ３０３に格納された画像データを読み出して処理し、またＲＡＭ３０３へ書き戻す。なお、画像処理プロセッサ３０７が実行する画像処理は、回転、変倍、色変換等である。 The image processor 307 reads and processes the image data stored in the RAM 303 and writes it back to the RAM 303. Note that image processing executed by the image processor 307 includes rotation, scaling, color conversion, and the like.

カメラＩ／Ｆ３０８はカメラ部２０２および距離画像センサ２０８と接続され、ＣＰＵ３０２からの指示に応じてカメラ部２０２から画像データを、距離画像センサ部２０８から距離画像データを取得してＲＡＭ３０３へ書き込む。また、ＣＰＵ３０２からの制御コマンドをカメラ部２０２および距離画像センサ２０８へ送信し、カメラ部２０２および距離画像センサ２０８の設定を行う。距離画像センサ２０８には赤外線パターン投射部３６１、赤外線カメラ３６２、ＲＧＢカメラ３６３が含まれる。これについては後述する。 The camera I / F 308 is connected to the camera unit 202 and the distance image sensor 208, and acquires image data from the camera unit 202 and the distance image data from the distance image sensor unit 208 in accordance with an instruction from the CPU 302, and writes them into the RAM 303. In addition, a control command from the CPU 302 is transmitted to the camera unit 202 and the distance image sensor 208 to set the camera unit 202 and the distance image sensor 208. The distance image sensor 208 includes an infrared pattern projection unit 361, an infrared camera 362, and an RGB camera 363. This will be described later.

また、コントローラ部２０２は、ディスプレイコントローラ３０９、シリアルＩ／Ｆ３１０、オーディオコントローラ３１１およびＵＳＢコントローラ３１２のうち少なくとも１つをさらに含むことができる。 The controller unit 202 can further include at least one of a display controller 309, a serial I / F 310, an audio controller 311, and a USB controller 312.

ディスプレイコントローラ３０９はＣＰＵ３０２の指示に応じてディスプレイへの画像データの表示を制御する。ここでは、ディスプレイコントローラ３０９は短焦点プロジェクタ２０７およびＬＣＤタッチパネル３３０に接続されている。 A display controller 309 controls display of image data on the display in accordance with an instruction from the CPU 302. Here, the display controller 309 is connected to the short focus projector 207 and the LCD touch panel 330.

シリアルＩ／Ｆ３１０はシリアル信号の入出力を行う。ここでは、シリアルＩ／Ｆ３１０はターンテーブル２１０に接続され、ＣＰＵ３０２の回転開始・終了および回転角度の指示をターンテーブル２０９へ送信する。また、シリアルＩ／Ｆ３１０はＬＣＤタッチパネル３３０に接続され、ＣＰＵ３０２はＬＣＤタッチパネル３３０が押下されたときに、シリアルＩ／Ｆ３１０を介して押下された座標を取得する。 The serial I / F 310 inputs and outputs serial signals. Here, the serial I / F 310 is connected to the turntable 210, and transmits instructions for starting and ending rotation of the CPU 302 and a rotation angle to the turntable 209. Further, the serial I / F 310 is connected to the LCD touch panel 330, and the CPU 302 acquires the coordinates pressed via the serial I / F 310 when the LCD touch panel 330 is pressed.

オーディオコントローラ３１１はスピーカ３４０に接続され、ＣＰＵ３０２の指示に応じて音声データをアナログ音声信号に変換し、スピーカ３４０を通じて音声を出力する。 The audio controller 311 is connected to the speaker 340, converts audio data into an analog audio signal in accordance with an instruction from the CPU 302, and outputs audio through the speaker 340.

ＵＳＢコントローラ３１２はＣＰＵ３０２の指示に応じて外付けのＵＳＢデバイスの制御を行う。ここでは、ＵＳＢコントローラ３１２はＵＳＢメモリやＳＤカードなどの外部メモリ３５０に接続され、外部メモリ３５０へのデータの読み書きを行う。 The USB controller 312 controls an external USB device in accordance with an instruction from the CPU 302. Here, the USB controller 312 is connected to an external memory 350 such as a USB memory or an SD card, and reads / writes data from / to the external memory 350.

＜カメラスキャナの制御用プログラムの機能構成＞
図４は、ＣＰＵ３０２が実行するカメラスキャナ１０１の制御用プログラムの機能構成４０１を示す図である。カメラスキャナ１０１の制御用プログラムは前述のようにＨＤＤ３０５に格納され、ＣＰＵ３０２が起動時にＲＡＭ３０３上に展開して実行する。メイン制御部４０２は制御の中心であり、機能構成４０１内の他の各モジュールを制御する。画像取得部４１６は画像入力処理を行うモジュールであり、カメラ画像取得部４０７、距離画像取得部４０８から構成される。カメラ画像取得部４０７はカメラＩ／Ｆ３０８を介してカメラ部２０２が出力する画像データを取得し、ＲＡＭ３０３へ格納する。距離画像取得部４０８はカメラＩ／Ｆ３０８を介して距離画像センサ部２０８が出力する距離画像データを取得し、ＲＡＭ３０３へ格納する。距離画像取得部４０８の処理の詳細は図５を用いて後述する。 <Functional structure of camera scanner control program>
FIG. 4 is a diagram illustrating a functional configuration 401 of a control program for the camera scanner 101 executed by the CPU 302. The control program for the camera scanner 101 is stored in the HDD 305 as described above, and the CPU 302 develops and executes it on the RAM 303 at the time of activation. The main control unit 402 is the center of control and controls other modules in the functional configuration 401. The image acquisition unit 416 is a module that performs image input processing, and includes a camera image acquisition unit 407 and a distance image acquisition unit 408. The camera image acquisition unit 407 acquires image data output from the camera unit 202 via the camera I / F 308 and stores the image data in the RAM 303. The distance image acquisition unit 408 acquires the distance image data output from the distance image sensor unit 208 via the camera I / F 308 and stores it in the RAM 303. Details of the processing of the distance image acquisition unit 408 will be described later with reference to FIG.

ジェスチャー認識部４０９は、画像取得部４１６から書画台２０４上の画像を取得し続け、タッチなどのジェスチャーを検知するとメイン制御部４０２へ通知する。処理の詳細は図６Ａのフローチャートを用いて後述する。画像処理部４１１は、カメラ部２０２および距離画像センサ部２０８から取得した画像を画像処理プロセッサ３０７で解析するために用いられる。前述のジェスチャー認識部４０９も画像処理部４１１の機能を利用して実行される。 The gesture recognition unit 409 continues to acquire the image on the document table 204 from the image acquisition unit 416, and notifies the main control unit 402 when a gesture such as touch is detected. Details of the processing will be described later using the flowchart of FIG. 6A. The image processing unit 411 is used by the image processing processor 307 to analyze the images acquired from the camera unit 202 and the distance image sensor unit 208. The gesture recognition unit 409 described above is also executed using the function of the image processing unit 411.

ユーザーインターフェイス部４０３は、メイン制御部４０２からの要求を受け、メッセージやボタン等のＧＵＩ部品を生成する。そして、表示部４０６へ生成したＧＵＩ部品の表示を要求する。表示部４０６はディスプレイコントローラ３０９を介して、プロジェクタ２０７もしくはＬＣＤタッチパネル３３０へ要求されたＧＵＩ部品の表示を行う。プロジェクタ２０７は書画台２０４に向けて設置されているため、書画台２０４上にＧＵＩ部品を投射することが可能となっている。また、ユーザーインターフェイス部４０３は、ジェスチャー認識部４０９が認識したタッチ等のジェスチャー操作、あるいはシリアルＩ／Ｆ３１０を介したＬＣＤタッチパネル３３０からの入力操作、そしてさらにそれらの座標を受信する。そして、ユーザーインターフェイス部４０３は描画中の操作画面の内容と操作座標を対応させて操作内容（押下されたボタン等）を判定する。この操作内容をメイン制御部４０２へ通知することにより、操作者の操作を受け付ける。 The user interface unit 403 receives a request from the main control unit 402 and generates a GUI component such as a message or a button. Then, the display unit 406 is requested to display the generated GUI component. The display unit 406 displays the requested GUI component on the projector 207 or the LCD touch panel 330 via the display controller 309. Since the projector 207 is installed toward the document table 204, it is possible to project GUI parts on the document table 204. In addition, the user interface unit 403 receives a gesture operation such as touch recognized by the gesture recognition unit 409, an input operation from the LCD touch panel 330 via the serial I / F 310, and coordinates thereof. The user interface unit 403 determines the operation content (such as a pressed button) by associating the content of the operation screen being drawn with the operation coordinates. By notifying the main control unit 402 of this operation content, the operator's operation is accepted.

ネットワーク通信部４０４は、ネットワークＩ／Ｆ３０６を介して、ネットワーク１０４上の他の機器とＴＣＰ／ＩＰによる通信を行う。データ管理部４０５は、制御用プログラム４０１の実行において生成した作業データなど様々なデータをＨＤＤ３０５上の所定の領域へ保存し、管理する。例えば平面原稿画像撮影部４１１、書籍画像撮影部４１２、立体形状測定部４１３が生成したスキャンデータなどである。 The network communication unit 404 communicates with other devices on the network 104 by TCP / IP via the network I / F 306. The data management unit 405 stores and manages various data such as work data generated in the execution of the control program 401 in a predetermined area on the HDD 305. For example, the scan data generated by the flat document image photographing unit 411, the book image photographing unit 412, and the three-dimensional shape measuring unit 413.

＜距離画像センサおよび距離画像取得部の説明＞
図５に距離画像センサ２０８の構成を示している。距離画像センサ２０８は赤外線によるパターン投射方式の距離画像センサである。赤外線パターン投射部３６１は対象物に、人の目には不可視である赤外線によって３次元測定パターンを投射する。赤外線カメラ３６２は対象物に投射した３次元測定パターンを読み取るカメラである。ＲＧＢカメラ３６３は人の目に見える可視光をＲＧＢ信号で撮影するカメラである。 <Description of Distance Image Sensor and Distance Image Acquisition Unit>
FIG. 5 shows the configuration of the distance image sensor 208. The distance image sensor 208 is an infrared pattern projection type distance image sensor. The infrared pattern projection unit 361 projects a three-dimensional measurement pattern onto an object using infrared rays that are invisible to human eyes. The infrared camera 362 is a camera that reads a three-dimensional measurement pattern projected on an object. The RGB camera 363 is a camera that captures visible light visible to the human eye using RGB signals.

距離画像取得部４０８の処理を図５（ａ）のフローチャートを用いて説明する。また、図５（ｂ）〜（ｄ）はパターン投射方式による距離画像の計測原理を説明するための図面である。距離画像取得部４０８が処理を開始すると、ステップＳ５０１では、図５（ｂ）に示すように赤外線パターン投射部３６１を用いて赤外線による３次元形状測定パターン５２２を対象物５２１に投射する。ステップＳ５０２では、ＲＧＢカメラ３６３を用いて対象物を撮影したＲＧＢ画像５２３および、赤外線カメラ３６２を用いてステップＳ５０１で投射した３次元測定パターン５２２を撮影した赤外線カメラ画像５２４を取得する。なお、赤外線カメラ３６２とＲＧＢカメラ３６３とでは設置位置が異なるため、図５（ｃ）に示すようにそれぞれで撮影される２つのＲＧＢカメラ画像５２３および赤外線カメラ画像５２４の撮影領域が異なる。そこでステップＳ５０３では、赤外線カメラ３６２の座標系からＲＧＢカメラ３６３の座標系への座標系変換を用いて赤外線カメラ画像５２４をＲＧＢカメラ画像５２３の座標系に合わせる。なお、赤外線カメラ３６２とＲＧＢカメラ３６３の相対位置や、それぞれの内部パラメータは事前のキャリブレーション処理により既知であるとする。 The processing of the distance image acquisition unit 408 will be described with reference to the flowchart of FIG. FIGS. 5B to 5D are diagrams for explaining the principle of distance image measurement by the pattern projection method. When the distance image acquisition unit 408 starts processing, in step S501, the infrared pattern projection unit 361 is used to project a three-dimensional shape measurement pattern 522 using infrared rays onto the object 521 as shown in FIG. In step S502, an RGB image 523 obtained by photographing an object using the RGB camera 363 and an infrared camera image 524 obtained by photographing the three-dimensional measurement pattern 522 projected in step S501 using the infrared camera 362 are acquired. Since the infrared camera 362 and the RGB camera 363 have different installation positions, the shooting areas of the two RGB camera images 523 and the infrared camera image 524 that are respectively captured as shown in FIG. 5C are different. Therefore, in step S503, the infrared camera image 524 is matched with the coordinate system of the RGB camera image 523 by using coordinate system conversion from the coordinate system of the infrared camera 362 to the coordinate system of the RGB camera 363. It is assumed that the relative positions of the infrared camera 362 and the RGB camera 363 and the respective internal parameters are known by a prior calibration process.

ステップＳ５０４では、図５（ｄ）に示すように、３次元測定パターン５２２とステップＳ５０３で座標変換を行った赤外線カメラ画像５２４間での対応点を抽出する。例えば、赤外線カメラ画像５２４上の１点を３次元形状測定パターン５２２上から探索して、同一の点が検出された場合に対応付けを行う。あるいは、赤外線カメラ画像５２４の画素の周辺のパターンを３次元形状測定パターン５２２上から探索し、一番類似度が高い部分と対応付けてもよい。ステップＳ５０５では、赤外線パターン投射部３６１と赤外線カメラ３６２を結ぶ直線を基線５２５として三角測量の原理を用いて計算を行うことにより、赤外線カメラ３６２からの距離を算出する。ステップＳ５０４で対応付けが出来た画素については、赤外線カメラ３６２からの距離を算出して画素値として保存し、対応付けが出来なかった画素については、距離の計測が出来なかった部分として無効値を保存する。これをステップＳ５０３で座標変換を行った赤外線カメラ画像５２４の全画素に対して行うことで、各画素に距離値が入った距離画像を生成する。ステップＳ５０６では、距離画像の各画素にＲＧＢカメラ画像５２５のＲＧＢ値すなわち色情報を保存することにより、１画素につきＲ、Ｇ、Ｂ、距離の４つの値を持つ距離画像を生成する。ここで取得した距離画像は距離画像センサ２０８のＲＧＢカメラ３６３で定義された距離画像センサ座標系が基準となっている。そこでステップＳ５０７では、図２（ｂ）を用いて上述したように、距離画像センサ座標系として得られた距離データを直交座標系における３次元点群に変換する。（前述したように、特に指定がなく３次元点群と表記した場合は、直交座標系における３次元点群を示すものとする。）このようにして、測定した物体の形状を示す３次元点群を取得できる。 In step S504, as shown in FIG. 5D, corresponding points between the three-dimensional measurement pattern 522 and the infrared camera image 524 that has undergone coordinate transformation in step S503 are extracted. For example, one point on the infrared camera image 524 is searched from the three-dimensional shape measurement pattern 522, and association is performed when the same point is detected. Alternatively, a pattern around the pixels of the infrared camera image 524 may be searched from the three-dimensional shape measurement pattern 522 and associated with a portion having the highest similarity. In step S505, the distance from the infrared camera 362 is calculated by performing calculation using the triangulation principle with the straight line connecting the infrared pattern projection unit 361 and the infrared camera 362 as the base line 525. For the pixels that can be associated in step S504, the distance from the infrared camera 362 is calculated and stored as a pixel value. For pixels that cannot be associated, an invalid value is set as the portion where the distance could not be measured. save. This is performed for all the pixels of the infrared camera image 524 that have undergone coordinate transformation in step S503, thereby generating a distance image in which each pixel has a distance value. In step S506, the RGB value of the RGB camera image 525, that is, the color information is stored in each pixel of the distance image, thereby generating a distance image having four values of R, G, B, and distance for each pixel. The distance image acquired here is based on the distance image sensor coordinate system defined by the RGB camera 363 of the distance image sensor 208. Therefore, in step S507, as described above with reference to FIG. 2B, the distance data obtained as the distance image sensor coordinate system is converted into a three-dimensional point group in the orthogonal coordinate system. (As described above, when there is no particular designation and it is expressed as a three-dimensional point group, it indicates a three-dimensional point group in an orthogonal coordinate system.) In this way, a three-dimensional point indicating the shape of an object measured. You can get a group.

なお、本実施例では上述したように、距離画像センサ２０８として赤外線パターン投射方式を採用しているが、他の方式の距離画像センサを用いることも可能である。例えば、２つのＲＧＢカメラでステレオ立体視を行うステレオ方式や、レーザー光の飛行時間を検出することで距離を測定するＴＯＦ（Ｔｉｍe ｏｆＦｌｉｇｈｔ）方式など、他の計測手段を用いても構わない。 In this embodiment, as described above, the infrared pattern projection method is adopted as the distance image sensor 208, but a distance image sensor of another method may be used. For example, other measuring means such as a stereo system that performs stereo stereoscopic vision with two RGB cameras and a TOF (Time of Flight) system that measures distance by detecting the time of flight of laser light may be used.

＜ジェスチャー認識部の説明＞
ジェスチャー認識部４０９の処理の詳細を、図６Ａのフローチャートを用いて説明する。図６Ａにおいて、ジェスチャー認識部４０９が処理を開始すると、ステップＳ６０１で初期化処理を行う。初期化処理では、ジェスチャー認識部４０９は距離画像取得部４０８から距離画像を１フレーム取得する。ここで、ジェスチャー認識部の開始時は書画台２０４上に対象物が置かれていない状態であるため、初期状態として書画台２０４の平面の認識を行う。つまり、取得した距離画像から最も広い平面を抽出し、その位置と法線ベクトル（以降、書画台２０４の平面パラメータと呼ぶ）を算出し、ＲＡＭ３０３に保存する。 <Description of gesture recognition unit>
Details of the processing of the gesture recognition unit 409 will be described with reference to the flowchart of FIG. 6A. In FIG. 6A, when the gesture recognition unit 409 starts processing, initialization processing is performed in step S601. In the initialization process, the gesture recognition unit 409 acquires one frame of the distance image from the distance image acquisition unit 408. Here, since the object is not placed on the document table 204 at the start of the gesture recognition unit, the plane of the document table 204 is recognized as an initial state. That is, the widest plane is extracted from the acquired distance image, and the position and normal vector (hereinafter referred to as plane parameters of the document table 204) are calculated and stored in the RAM 303.

続いてステップＳ６０２では、ステップＳ６２１〜６２２に示す、書画台２０４上に存在する物体の３次元点群を取得する。その際、ステップＳ６２１では距離画像取得部４０８から距離画像と３次元点群を１フレーム取得する。ステップＳ６２２では書画台２０４の平面パラメータを用いて、取得した３次元点群から書画台２０４を含む平面にある点群を除去する。 In step S602, a three-dimensional point group of the object existing on the document table 204 shown in steps S621 to 622 is acquired. At that time, in step S621, the distance image acquisition unit 408 acquires one frame of the distance image and the three-dimensional point group. In step S622, using the plane parameter of the document table 204, the point group on the plane including the document table 204 is removed from the acquired three-dimensional point group.

ステップＳ６０３では、ステップＳ６３１〜Ｓ６３４に示す、取得した３次元点群からユーザーの手の形状および指先を検出する処理を行う。ここで、図６Ｂ（ｂ）〜（ｅ）に示す、指先検出処理の方法を模式的に表した図を用いて説明する。ステップＳ６３１では、ステップＳ６０２で取得した３次元点群から、書画台２０４を含む平面から所定の高さ（距離）以上にある、肌色（手の色）の３次元点群を抽出することで、手の３次元点群を得る。図６Ｂ（ｂ）の３次元点群６６１は抽出した手の３次元点群すなわち手領域を表している。なおここでいう肌色とは特定の色を指すものではなく、様々な肌の色をカバーした総称である。肌色は予め定めておいてもよいし、操作者に応じて選択できるようにしてもよい。 In step S603, the process of detecting the shape and fingertip of the user's hand from the acquired three-dimensional point group shown in steps S631 to S634 is performed. Here, it demonstrates using the figure which represented typically the method of the fingertip detection process shown to FIG. 6B (b)-(e). In step S631, by extracting from the 3D point group acquired in step S602 a 3D point group of skin color (hand color) that is at a predetermined height (distance) or more from the plane including the document table 204, Obtain a 3D point cloud of the hand. A three-dimensional point group 661 in FIG. 6B (b) represents a three-dimensional point group of extracted hands, that is, a hand region. The skin color here does not indicate a specific color, but is a general term covering various skin colors. The skin color may be determined in advance or may be selected according to the operator.

また、肌色を使わず、距離画像の背景差分をとることで手の領域を発見してもよい。発見した手の領域は上述した方法で３次元点群に変換することが可能である。 Alternatively, the hand region may be found by taking the background difference of the distance image without using the skin color. The discovered hand region can be converted into a three-dimensional point group by the method described above.

ステップＳ６３２では、抽出した手の３次元点群を、書画台２０４の平面に射影した２次元画像を生成して、その手の外形を検出する。図６Ｂ（ｂ）の２次元点群６６２は、書画台２０４の平面に投影した３次元点群を表している。投影は、点群の各座標を、書画台２０４の平面パラメータを用いて投影すればよい。また、図６Ｂ（ｃ）に示すように、投影した３次元点群から、ｘｙ座標の値だけを取り出せば、ｚ軸方向から見た２次元画像６６３として扱うことができる。このとき、手の３次元点群の各点が、書画台２０４の平面に投影した２次元画像の各座標のどれに対応するかを、記憶しておくものとする。 In step S632, a two-dimensional image obtained by projecting the extracted three-dimensional point group of the hand onto the plane of the document table 204 is generated, and the outline of the hand is detected. A two-dimensional point group 662 in FIG. 6B (b) represents a three-dimensional point group projected onto the plane of the document table 204. The projection may be performed by projecting the coordinates of the point group using the plane parameters of the document table 204. Further, as shown in FIG. 6B (c), if only the value of the xy coordinates is extracted from the projected three-dimensional point group, it can be handled as a two-dimensional image 663 viewed from the z-axis direction. At this time, it is assumed that each point of the three-dimensional point group of the hand corresponds to which coordinate of the two-dimensional image projected on the plane of the document table 204.

ステップＳ６３３では、指先の検出を行う。以下に指先を発見する方法についていくつか方法を述べる。まず、手の外形（すなわち輪郭）の曲率を用いる方法を説明する。 In step S633, the fingertip is detected. Here are some ways to find your fingertips. First, a method of using the curvature of the hand outline (that is, outline) will be described.

検出した手の外形上の各点について、その点での外形の曲率を算出し、算出した曲率が所定値より大きい点を指先として検出する。曲率の計算の仕方を次に説明する。図６Ｂ（ｅ）の輪郭点６６４は、書画台２０４の平面に投影された２次元画像６６３の外形を表す点の一部を表している。ここで、輪郭点６６４のような外形を表す点のうち、隣り合う有限個の輪郭点に関して、最小二乗法を用いた円フィッティングを行うことで、手の外形の曲率を計算する。これを、全ての外形の輪郭点に対して行い、曲率が所定の値よりも大きく、かつフィットした円の中心が手の外形の内側にある場合に、隣り合う有限個の輪郭点の真ん中の点を指先として決定する。前に述べたように、ＲＡＭ３０３が、手の外形の輪郭点に関する３次元点群との対応関係を記憶しているので、ジェスチャー認識部４０９は指先点の３次元情報を利用することができる。円の中心が手の外形の内側にあるか外側にあるかは、たとえば円の中心を通り座標軸と平行なライン上にある輪郭点を見出し、その輪郭点と円の中心との位置関係により判定できる。輪郭点と円の中心との中で、円の中心がそのラインの端から奇数番目にあれば円の中心は手の外形の外側にあり、偶数番目にあれば内側に在ると判定できる。 For each point on the detected outer shape of the hand, the curvature of the outer shape at that point is calculated, and a point where the calculated curvature is greater than a predetermined value is detected as a fingertip. Next, how to calculate the curvature will be described. An outline point 664 in FIG. 6B (e) represents a part of a point representing the outer shape of the two-dimensional image 663 projected onto the plane of the document table 204. Here, the curvature of the outer shape of the hand is calculated by performing circular fitting using the least square method for a finite number of adjacent contour points among the points representing the outer shape such as the contour point 664. This is performed for all contour points, and when the curvature is larger than a predetermined value and the center of the fitted circle is inside the contour of the hand, the center of adjacent finite number of contour points. A point is determined as a fingertip. As described above, since the RAM 303 stores the correspondence relationship with the three-dimensional point group related to the contour point of the outer shape of the hand, the gesture recognizing unit 409 can use the three-dimensional information of the fingertip point. Whether the center of the circle is inside or outside the outline of the hand is determined by, for example, finding a contour point on the line that passes through the center of the circle and parallel to the coordinate axis, and is based on the positional relationship between the contour point and the circle center it can. If the center of the circle is odd-numbered from the end of the line among the contour points and the center of the circle, it can be determined that the center of the circle is outside the outer shape of the hand, and if it is even-numbered, it is inside.

図６Ｂ（ｅ）の円６６９、６７０は、フィットした円の例を表している。円６６９は曲率が所定値より小さく、かつ円の中心が外形の外にあるので指先としては検出されず、円６７０は曲率が所定値より大きく、かつ円の中心が外形の内側にあるので指先として検出される。 Circles 669 and 670 in FIG. 6B (e) represent examples of fitted circles. The circle 669 is not detected as a fingertip because the curvature is smaller than a predetermined value and the center of the circle is outside the outer shape, and the circle 670 has a curvature larger than the predetermined value and the center of the circle is inside the outer shape. Detected as

また、ここでは最小二乗法を用いた円フィッティングを用いて曲率を計算し、指先を発見する方法を用いたが、隣り合う有限個の輪郭点を囲む円の半径が最小になるようにして指先を発見してもよい。次にその例を説明する。 Here, the method of finding the fingertip by calculating the curvature using circle fitting using the least square method is used, but the fingertip is set so that the radius of the circle surrounding the finite number of adjacent contour points is minimized. You may discover. Next, an example will be described.

図６Ｂ（ｄ）は、有限個の輪郭点を囲む円から指先を検出する方法を模式的に表したものである。例として、隣り合う５個の輪郭点を含むように円を描くことを考える。円６６５、６６７が、その例である。この円を、全ての外形の輪郭点に対して順に描き、その直径（例えば６６６、６６８）が所定の値より小さいことを以て、隣り合う５個の輪郭点の真ん中（中央）の点を指先とする。この例では隣り合う５個の点としたが、その数は限定されるものではない。また、上記は円をフィッティングして指先を発見する方法について述べたが、楕円フィッティングにより指先を発見するようにしてもよい。楕円フィッティングによる指先発見の方法は、非特許文献１に書かれているので、その方法を用いればよい。 FIG. 6B (d) schematically shows a method for detecting a fingertip from a circle surrounding a finite number of contour points. As an example, consider drawing a circle so as to include five adjacent contour points. Circles 665 and 667 are examples. This circle is drawn in order with respect to the contour points of all the outlines, and the diameter (for example, 666, 668) is smaller than a predetermined value, so that the middle (center) point of the five adjacent contour points is defined as the fingertip. To do. In this example, five points are adjacent to each other, but the number is not limited. Moreover, although the above described the method of discovering a fingertip by fitting a circle, you may make it discover a fingertip by elliptical fitting. Since the method of finding the fingertip by ellipse fitting is described in Non-Patent Document 1, that method may be used.

上記のような、円フィッティングや楕円フィッティングは、ＯｐｅnＣＶなどのオープンソースのコンピュータライブラリを用いることで簡単に実現可能である。 The circle fitting and the ellipse fitting as described above can be easily realized by using an open source computer library such as OpenCV.

その他、腕から最も遠くにある点を指先として発見してもよい。図７（ｂ）は、読み取り領域２０５中に腕７０４がある状態を表している。これは、前述した、手の領域の３次元点群を書画台２０４の平面に投影したものと考えることができる。この投影画像の画素数は、距離センサ２０８で得る距離画像と同じである。領域７０３は、投影画像の外枠から、所定の数画素内側の領域である。領域７０５は、読み取り領域２０５と領域７０３の間にある薄い領域と腕７０４の領域のアンドをとった領域である。領域７０５により、腕７０３が読み取り領域２０５に侵入している点７０９と７１０とが発見できる。これらの処理のために、距離センサ２０８で取得した距離画像を直接処理してもよい。その際、腕７０４の領域は、ＲＡＭ３０３に保存された距離画像の背景画像と、現在の距離画像の差分をとり、所定の閾値で二値化することで求められる。 In addition, a point farthest from the arm may be found as a fingertip. FIG. 7B shows a state where the arm 704 is in the reading area 205. This can be considered as a projection of the three-dimensional point group of the hand region onto the plane of the document table 204 described above. The number of pixels of this projection image is the same as the distance image obtained by the distance sensor 208. A region 703 is a region inside a predetermined number of pixels from the outer frame of the projection image. An area 705 is an area obtained by ANDing the thin area between the reading area 205 and the area 703 and the area of the arm 704. From the area 705, points 709 and 710 where the arm 703 has entered the reading area 205 can be found. For these processes, the distance image acquired by the distance sensor 208 may be directly processed. At this time, the area of the arm 704 is obtained by taking the difference between the background image of the distance image stored in the RAM 303 and the current distance image and binarizing it with a predetermined threshold value.

図７（ｅ）の線分７０６は、点７０９と点７１０を結んだ線分である。線分７０６の中点を７１１とし、この点を腕の付け根の点とする。この腕の付け根の点７１１から、腕の外形の画素で一番遠くにある画素を指先点７１２とすれば指先を決定することができる。また、ここでは腕の付け根の点を求めるために、腕の侵入位置の中点をとったが、腕７０４自体を細線化することで、付根と指先を求めてもよい。細線化に関しては、一般的な画像処理の細線化アルゴリズムを用いれば可能である。細線化された腕のうち、領域７０５と交わっている方を腕の付け根とし、反対の端を指先として検出すればよい。
ステップＳ６３３では以上の方法で指先を検出することが可能となる。 A line segment 706 in FIG. 7E is a line segment connecting the points 709 and 710. The middle point of the line segment 706 is set to 711, and this point is set as the base point of the arm. From the base point 711 of the arm, the fingertip point 712 can be determined by setting the farthest pixel in the outer shape of the arm as the fingertip point 712. Here, in order to obtain the base point of the arm, the midpoint of the arm intrusion position is taken. However, the base and the fingertip may be obtained by thinning the arm 704 itself. Thinning is possible by using a thinning algorithm for general image processing. Of the thinned arms, the one that intersects the region 705 may be detected as the base of the arm and the opposite end as the fingertip.
In step S633, the fingertip can be detected by the above method.

ステップＳ６３４では、検出した指先の個数および各指先の座標を算出する。このとき、前述したように、書画台２０４に投影した２次元画像の各点と、手の３次元点群の各点の対応関係を記憶しているため、各指先の３次元座標を得ることができる。今回は、３次元点群から２次元画像に投影した画像から指先を検出する方法を説明したが、指先検出の対象とする画像は、これに限定されるものではない。例えば、距離画像の背景差分や、ＲＧＢ画像の肌色領域から手の領域を抽出し、上に述べたのと同様の方法（外形の曲率計算等）で、手領域のうちの指先を検出してもよい。この場合、検出した指先の座標はＲＧＢ画像や距離画像といった、２次元画像上の座標であるため、その座標における距離画像の距離情報を用いて、直交座標系の３次元座標に変換する必要がある。 In step S634, the number of detected fingertips and the coordinates of each fingertip are calculated. At this time, as described above, since the correspondence between each point of the two-dimensional image projected on the document table 204 and each point of the three-dimensional point group of the hand is stored, the three-dimensional coordinates of each fingertip can be obtained. Can do. This time, a method of detecting a fingertip from an image projected from a three-dimensional point group onto a two-dimensional image has been described, but the image to be detected by the fingertip is not limited to this. For example, the hand region is extracted from the background difference of the distance image or the skin color region of the RGB image, and the fingertip in the hand region is detected by the same method (external curvature calculation, etc.) as described above. Also good. In this case, since the coordinates of the detected fingertip are coordinates on a two-dimensional image such as an RGB image or a distance image, it is necessary to convert the coordinate information into the three-dimensional coordinates of the orthogonal coordinate system using the distance information of the distance image at that coordinate. is there.

ステップＳ６０６では、タッチジェスチャー判定処理を行う。このとき、ジェスチャー認識部４０９は、直前のステップで検出した指先と書画台２０４を含む平面との距離を計算する。この計算には、検出した指先の３次元座標と、前述した書画台２０４の平面パラメータを用いる。距離が、所定の微小な値以下の場合は「タッチジェスチャーあり」であり、所定の微小な値より大きい場合は「タッチジェスチャーなし」である。 In step S606, touch gesture determination processing is performed. At this time, the gesture recognition unit 409 calculates the distance between the fingertip detected in the immediately preceding step and the plane including the document table 204. This calculation uses the detected three-dimensional coordinates of the fingertip and the plane parameters of the document table 204 described above. When the distance is equal to or smaller than a predetermined minute value, “touch gesture is present”, and when the distance is larger than the predetermined minute value, “touch gesture is absent”.

また、直行座標系の所定の高さ（Ｚ方向）に仮想的な閾値平面（不図示）を設け、指先座標のＺの値が閾値平面のＺの値より小さくなることでタッチ検出を行ってもよい。 Further, a virtual threshold plane (not shown) is provided at a predetermined height (Z direction) of the orthogonal coordinate system, and touch detection is performed by making the value of Z of the fingertip coordinates smaller than the value of Z of the threshold plane. Also good.

次にステップＳ６０７では、直前のステップで「タッチジェスチャーあり」だった場合にステップＳ６０８へ移行し、「タッチジェスチャーなし」だった場合はステップS６０２へ戻る。 In step S607, if “touch gesture is present” in the previous step, the process proceeds to step S608, and if “touch gesture is not present”, the process returns to step S602.

ステップＳ６０８では、指先方向の特定処理を行う。指先方向とは、図７（ａ）でいえば、矢印７０２の方向である。つまり、手７０１の指が、書画台２０４の平面の中で指し示す方向と同じである。指先方向を特定するには、指部分の特定を行う。そのためにまず、腕が読み取り領域２０５に侵入している部分を特定する。前述したように、図７（ｂ）の点７０９と点７１０とを、腕７０４が読み取り領域２０５に侵入している点として発見することができる。 In step S608, fingertip direction identification processing is performed. The fingertip direction is the direction of an arrow 702 in FIG. That is, the finger of the hand 701 is the same as the direction indicated in the plane of the document table 204. To specify the fingertip direction, the finger part is specified. For this purpose, first, a portion where the arm has entered the reading area 205 is specified. As described above, the points 709 and 710 in FIG. 7B can be found as the points where the arm 704 has entered the reading area 205.

次に指部分を特定する。図７（ｃ）の線分７０６は、点７０９と点７１０を結んだ線分である。線分７０６と平行に、線分群７０７を、所定の微小な間隔で、腕７０４の領域（これを腕領域７０４とも呼ぶ）に引いていく。この長さが、所定の閾値よりも短くなっている部分を指先として特定する。図７（ｃ）では、線分７０８の位置から、所定の閾値以下となる。 Next, the finger part is specified. A line segment 706 in FIG. 7C is a line segment connecting the points 709 and 710. In parallel with the line segment 706, the line segment group 707 is drawn to a region of the arm 704 (also referred to as an arm region 704) at a predetermined minute interval. A portion where this length is shorter than a predetermined threshold is specified as a fingertip. In FIG.7 (c), it becomes below a predetermined | prescribed threshold value from the position of the line segment 708. FIG.

次に指先方向を特定する。線分７０８の中点の座標から、ステップＳ６３３で発見した、ｘｙ平面上の指先座標に向けてベクトル７０９を定義する。ベクトル７０９の向きが指先の方向であり、長さが指の長さを表す。ベクトル７０９はたとえば、線分７０８の中点を始点とし、ステップＳ６３４で特定した指先位置を終点とするベクトルとして特定することができる。また、図７（ｅ）を用いて説明した方法で指先座標を求めた場合は、腕の付け根の点７１１から指先点７１２までを結んだベクトル７１３を指の方向ベクトルとして決めてもよい。この場合は、指の長さは上述の方法で求める必要がある。ただし、この場合にはベクトル７０９を求めるには及ばない。そこでたとえば長さが上述した所定の閾値（すなわち指の幅の上限）よりも短い線分群７０７のうち、腕の付け根の点７１１に最も近い線分またはその延長線がベクトル７１３と交差する点を求め、その点を指の付け根の位置とする。その点から指先点７１２までの距離を指の長さとして決めることができる。もちろん上述した方法でベクトル７０９を求め、それに基づいて指の長さを決めることもできる。 Next, the fingertip direction is specified. A vector 709 is defined from the coordinates of the midpoint of the line segment 708 toward the fingertip coordinates on the xy plane found in step S633. The direction of the vector 709 is the direction of the fingertip, and the length represents the length of the finger. For example, the vector 709 can be specified as a vector having the midpoint of the line segment 708 as the start point and the fingertip position specified in step S634 as the end point. Further, when the fingertip coordinates are obtained by the method described with reference to FIG. 7E, a vector 713 connecting the arm base point 711 to the fingertip point 712 may be determined as the finger direction vector. In this case, it is necessary to obtain the finger length by the method described above. However, in this case, it is not necessary to obtain the vector 709. Therefore, for example, in the line segment group 707 whose length is shorter than the above-described predetermined threshold (that is, the upper limit of the finger width), the line segment closest to the base 711 of the arm or the extension line intersects the vector 713. Find that point as the base of the finger. The distance from the point to the fingertip point 712 can be determined as the length of the finger. Of course, the vector 709 can be obtained by the method described above, and the finger length can be determined based on the vector 709.

また、図７（ｆ）に示したように、掌（手の甲）の中心点７１４から指先点７１５までを結んだベクトルを指の方向ベクトル７１６として定めてもよい。このとき掌（手の甲）の中心点７１４は、手領域のうち、手領域の輪郭７１７を構成するそれぞれの画素からの距離が最大となる点として求めることができる。 Further, as shown in FIG. 7F, a vector connecting the center point 714 of the palm (back of hand) to the fingertip point 715 may be defined as the finger direction vector 716. At this time, the center point 714 of the palm (back of the hand) can be obtained as a point where the distance from each pixel constituting the contour 717 of the hand region becomes the maximum in the hand region.

更に、指先に対して楕円フィッティングを行った場合は、楕円の二つの焦点を結ぶ方向を指の方向ベクトルとしてもよい。このとき、ベクトルの方向は、上述の方法で求めた、腕が読み取り領域に侵入している点の中点を起点に決めればよい。この場合も、指の長さは上述の方法を用いて求める必要がある。 Furthermore, when ellipse fitting is performed on the fingertip, the direction connecting the two focal points of the ellipse may be used as the finger direction vector. At this time, the vector direction may be determined from the midpoint of the point where the arm has entered the reading area, which is obtained by the above method. Also in this case, the length of the finger needs to be obtained using the above-described method.

上記処理は、指さし姿勢に限定した例を書いたが、五指が開かれた状態でも、各指について得られるであろう複数の線分７０８それぞれについて上記処理を施すことで、全ての指についてその方向と長さを求めることができる。 Although the above processing has been described as an example limited to the pointing posture, even if five fingers are opened, the above processing is performed for each of the plurality of line segments 708 that will be obtained for each finger, so that all the fingers are The direction and length can be determined.

ステップＳ６０８が終了すると、ステップＳ６０９へと進む。ステップＳ６０９では、タッチ位置の決定処理を行う。これは、ユーザーが実際にタッチしていると感じている、指の腹の位置を推定する処理である。図８（ａ）の二次元点群８０１は、書画台２０４に投影されたｘｙ平面上の手の領域の画像を表している。このうち、部分８０２の拡大図が拡大部分８０３である。指８０４に対して、ベクトル８０５は、ステップＳ６０８で求めた、指先方向のベクトル７０９である。ここで、ｘｙ平面上の指先点８０６をベクトル８０５と反対向きに所定量シフト（すなわち所定距離８０７だけシフト）させた点のｘｙ座標を、タッチ点８０８として決定し、ＲＡＭ３０３の所定の領域に保存する。シフトさせる所定距離は、設定可能なものとする。この場合のタッチ点のｚ座標は、０としてもよいし、３次元点群の対応する点からｚ座標を定めてもよい。なお、指先８０６の位置は、ステップＳ６３４で特定した指先位置であってよい。 When step S608 ends, the process proceeds to step S609. In step S609, touch position determination processing is performed. This is a process for estimating the position of the belly of the finger that the user actually feels touching. A two-dimensional point group 801 in FIG. 8A represents an image of a hand region on the xy plane projected on the document table 204. Among these, the enlarged view of the part 802 is the enlarged part 803. For the finger 804, the vector 805 is the fingertip direction vector 709 obtained in step S608. Here, the xy coordinate of a point obtained by shifting the fingertip point 806 on the xy plane by a predetermined amount in the opposite direction to the vector 805 (ie, by a predetermined distance 807) is determined as the touch point 808 and stored in a predetermined area of the RAM 303. To do. The predetermined distance to be shifted is settable. In this case, the z coordinate of the touch point may be 0, or the z coordinate may be determined from the corresponding point of the three-dimensional point group. Note that the position of the fingertip 806 may be the fingertip position specified in step S634.

また、タッチ位置（指の腹）を決める方法は、上述のように所定距離だけシフトする方法に限られるものではない。たとえば、図８（ｂ）に示すように、指先発見の際に、円フィッティングに用いた円８０９の中心８１０をタッチ位置として決定してもよい。 Further, the method of determining the touch position (finger's belly) is not limited to the method of shifting by a predetermined distance as described above. For example, as shown in FIG. 8B, the center 810 of the circle 809 used for the circle fitting may be determined as the touch position when the fingertip is found.

また、図８（ｃ）に示すように、指先に楕円フィッティングした楕円８１１の焦点（８１２、８１３）のうち指先側の点８１２をタッチ位置として決定してもよい。このとき、焦点が指先側かどうかを決定するためには、前述した腕の付け根から遠い方を採用するようにすればよい。 Further, as shown in FIG. 8C, a point 812 on the fingertip side among the focal points (812, 813) of the ellipse 811 fitted to the fingertip may be determined as the touch position. At this time, in order to determine whether or not the focus is on the fingertip side, the one far from the base of the arm may be adopted.

更に、指先の外形を構成する画素の重心点をタッチ位置として決定するようにしてもよい。図８（ｄ）は、指先の外形を構成する画素と重心点との関係を模式的にあらわしたものである。指先の外形を構成する画素群８１４は、前述した指先の発見の際に用いた、腕の外形を構成する輪郭点の画素のうち、隣り合う複数の画素を表している。画素群８１４は、そのうちで指先として発見された際の９個の画素であり、真ん中の画素８０６が指先として発見されたと仮定する。指先点８０６を含めた画素群８１４の重心点を８１５としこの重心点８１５をタッチ位置として決定すればよい。 Furthermore, the barycentric point of the pixels constituting the outer shape of the fingertip may be determined as the touch position. FIG. 8D schematically shows the relationship between the pixels constituting the outer shape of the fingertip and the barycentric point. A pixel group 814 that forms the outer shape of the fingertip represents a plurality of adjacent pixels among the pixels of the contour points that form the outer shape of the arm used in the above-described discovery of the fingertip. It is assumed that the pixel group 814 includes nine pixels when the fingertip is found, and the middle pixel 806 is found as the fingertip. The barycentric point of the pixel group 814 including the fingertip point 806 is set to 815, and the barycentric point 815 may be determined as the touch position.

また、図８（ｉ）に示すように、指先点８０６の所定の周辺領域８２５に含まれる指の画素の重心８２６をタッチ位置として決定してもよい。このとき、所定の周辺領域は、図８（ｉ）のように円に限定されるものではない。また、重心８２６から指先点８０６までを結んだベクトルを、指先の方向ベクトルとしてもよい。 Further, as shown in FIG. 8I, the center of gravity 826 of the finger pixel included in the predetermined peripheral region 825 of the fingertip point 806 may be determined as the touch position. At this time, the predetermined peripheral region is not limited to a circle as shown in FIG. A vector connecting the center of gravity 826 to the fingertip point 806 may be used as the fingertip direction vector.

また、指先の外形を構成する画素に対して、多角形近似を行って、その多角形の重心をタッチ位置として決定してもよい。図８（ｅ）は、指先の外形に対して多角形近似を行う様子を模式的に表したものである。五角形８１６が指先の外形に対して近似された多角形を表している。その重心が点８１７で表現されているので、点８１７をタッチ位置として決定すればよい。多角形近似は、ＯｐｅｎＣＶなどのオープンソースで公開されているＡＰＩを用いれば簡単に実行することが可能である。 Further, polygon approximation may be performed on the pixels constituting the outer shape of the fingertip, and the center of gravity of the polygon may be determined as the touch position. FIG. 8 (e) schematically shows how the polygonal approximation is performed on the outer shape of the fingertip. A pentagon 816 represents a polygon approximated to the outer shape of the fingertip. Since the center of gravity is represented by a point 817, the point 817 may be determined as the touch position. Polygon approximation can be easily executed by using an open source API such as OpenCV.

更に、指先発見時にフィッティングに用いた円と指先方向ベクトルを用いて、タッチ位置を決定するようにしてもよい。図８（ｆ）は、指先発見時にフィッティングに用いた円と指先方向ベクトルを用いて、タッチ位置を決定する方法を模式的にあらわした図である。ベクトル８１８は指先方向ベクトルを延長したベクトルを表している。このベクトル８１８と指先にフィッティングさせた円８０９の交点のうち、ベクトルの先端に近い方の点８１９を仮想的な指先として求める。この仮想的な指先点は、タッチを検出する際に用いた指先点とは異なる。仮想的な指先点８１９を、所定距離８０７だけ、指先方向ベクトルと反対向きにシフトさせた点をタッチ位置８２０として決めてもよい。 Furthermore, the touch position may be determined using the circle used for fitting at the time of fingertip discovery and the fingertip direction vector. FIG. 8F schematically shows a method for determining the touch position using the circle used for fitting at the time of fingertip discovery and the fingertip direction vector. A vector 818 represents a vector obtained by extending the fingertip direction vector. Of the intersection points of the vector 818 and the circle 809 fitted to the fingertip, a point 819 closer to the tip of the vector is obtained as a virtual fingertip. This virtual fingertip point is different from the fingertip point used when detecting a touch. A point obtained by shifting the virtual fingertip point 819 by a predetermined distance 807 in the direction opposite to the fingertip direction vector may be determined as the touch position 820.

同様にして、指先にフィッティングした楕円と指先方向ベクトルを用いて、タッチ位置を決定するようにしてもよい。図８（ｇ）は、指先にフィッティングした楕円と指先方向ベクトルを用いて、タッチ位置を決定する方法を模式的に表している。指先方向ベクトルを延長したベクトル８１８と楕円８１１の交点のうち、指先側の点８２１を仮想的な指先として設定する。仮想的な指先８２１を、指先方向ベクトルと反対向きに所定距離だけシフトさせた点８２２を指先点として決定すればよい。 Similarly, the touch position may be determined using the ellipse fitted to the fingertip and the fingertip direction vector. FIG. 8G schematically shows a method for determining a touch position using an ellipse fitted to the fingertip and a fingertip direction vector. Of the intersections of the vector 818 obtained by extending the fingertip direction vector and the ellipse 811, a point 821 on the fingertip side is set as a virtual fingertip. A point 822 obtained by shifting the virtual fingertip 821 by a predetermined distance in the direction opposite to the fingertip direction vector may be determined as the fingertip point.

上記の処理は、手の３次元点群を書画台２０４の平面に射影した２次元画像、もしくは、距離画像センサ２０８から取得した距離画像を用いれば可能である。 The above processing can be performed using a two-dimensional image obtained by projecting a three-dimensional point group of a hand onto the plane of the document table 204 or a distance image acquired from the distance image sensor 208.

加えて、ＲＧＢ画像を用いてタッチ位置を決定するようにしてもよい。更に、ＲＧＢ画像を用いる場合、爪を発見することにより、タッチ位置を決定するようにしてもよい。図８（ｈ）は、指先８０５の拡大図であり、ＲＧＢ画像で爪の領域からタッチ位置を決定する様子を模式的に表したものである。爪８２３が、ＲＧＢ画像から発見した爪の領域を表している。爪の領域は、周辺の指の領域との輝度値の違いを見れば発見出来る。発見した爪の領域の重心点を求めて、それをタッチ位置として決定すればよい。この時、前述したようにＲＧＢ画像と距離画像との位置合わせがなされているので、爪領域の重心点を、距離画像もしくは手の３次元点群を書画台２０４の平面に射影した２次元画像の相当する位置に変換することは、容易に可能である。 In addition, the touch position may be determined using an RGB image. Further, when an RGB image is used, the touch position may be determined by finding a nail. FIG. 8H is an enlarged view of the fingertip 805, and schematically shows how the touch position is determined from the nail region in the RGB image. A nail 823 represents a nail region found from the RGB image. The nail area can be found by looking at the difference in luminance value from the surrounding finger area. What is necessary is just to obtain | require the gravity center point of the area | region of the discovered nail | claw, and determine it as a touch position. At this time, since the RGB image and the distance image are aligned as described above, the two-dimensional image obtained by projecting the center of gravity of the nail region onto the plane of the document table 204 with the distance image or the three-dimensional point group of the hand. It is easily possible to convert to the corresponding position.

以上のような方法を用いれば、平面にタッチしているタッチ位置（指の腹の位置）を推定することが可能である。 If the method as described above is used, it is possible to estimate the touch position (the position of the belly of the finger) touching the plane.

ステップＳ６０９が終了すると、ステップＳ６０５に移行する。ステップＳ６０５では判定したタッチジェスチャーとタッチ位置の３次元座標をメイン制御部４０２へ通知し、ステップＳ６０２へ戻ってジェスチャー認識処理を繰り返す。 When step S609 ends, the process proceeds to step S605. In step S605, the determined touch gesture and the three-dimensional coordinates of the touch position are notified to the main control unit 402, and the process returns to step S602 to repeat the gesture recognition process.

なお、本実施形態では一本指でのジェスチャー認識について説明を行ったが、複数の指あるいは複数の手でのジェスチャー認識に応用することも可能である。たとえば、図６Ａの手順を繰り返してタッチ位置を定期的に取得すれば、タッチの有無やタッチ位置の変化などから様々なジェスチャーを特定することができる。メイン制御部４０２は、アプリケーションを実行する部分である。メイン制御部４０２はタッチジェスチャーを受信すると、アプリケーションで定義された対応する処理を実行する。 In the present embodiment, the gesture recognition with one finger has been described, but the present invention can also be applied to gesture recognition with a plurality of fingers or a plurality of hands. For example, if the touch position is periodically acquired by repeating the procedure of FIG. 6A, various gestures can be specified based on the presence or absence of a touch or a change in the touch position. The main control unit 402 is a part that executes an application. When the main control unit 402 receives the touch gesture, the main control unit 402 executes a corresponding process defined by the application.

本実施例により、上面から距離画像センサにより指先と平面とを撮影し、距離画像を用いて、正確な平面へのタッチ位置を特定することが可能となる。 According to the present embodiment, it is possible to photograph the fingertip and the plane from the upper surface by the distance image sensor, and specify the exact touch position on the plane using the distance image.

［実施形態２］
実施形態１では、上面のセンサから指先と平面を撮影する場合の、タッチ位置の決定方法の基本的な部分を説明した。タッチ位置を決定する為に、距離画像センサで取得した距離画像から指先を発見し、指先方向と逆方向に指先位置の座標を所定距離シフトさせることで、タッチ位置の座標を決定する方法をとった。本実施形態では、ユーザーがもう少し細かいタッチ指示を行いたい場合に、タッチ位置の補正を行い、補正後の位置をタッチ位置として特定あるいは推定することで操作性を向上させる方法を、図９のジェスチャー認識部４０９が実行するフローチャートに沿って説明する。図１０（ａ）は、タッチ位置の補正が必要な場合を模式的に表している。図１０（ａ）の上の図は、書画台２０４の一部である平面１００３に指１００１がタッチしている様子を横から見た図である。この場合、実施形態１で説明した方法と同じやり方で発見した指先の３次元点が、指先位置１００５で表される。実施形態１で説明した方法で、ユーザー定義の所定の値１００７だけ指先の位置を示す指先座標をシフトさせて決めたタッチ位置の点は、タッチ位置１００６で表される。図１０（ａ）の下の図は、上の図よりも指１００２の、平面１００４に対する角度が大きい場合を表している。この場合、実施形態１と同じ方法で求めたタッチ位置の点は、位置１００８となるが、実際に平面に接触している点は位置１００９である。このように、タッチ点を求めるために、所定の固定値だけシフトさせただけでは、指先の平面に対する角度に応じて、実際にタッチしている、もしくはタッチしているとユーザーが感じる点と、タッチ位置の点として求めた点がずれてしまうことがあり得る。そこで本実施形態では、タッチ位置の点を求めるために指先位置をシフトさせる量を求めるのに、指先の角度を用いる。 [Embodiment 2]
In the first embodiment, the basic part of the method for determining the touch position when photographing the fingertip and the plane from the sensor on the upper surface has been described. In order to determine the touch position, a fingertip is found from the distance image acquired by the distance image sensor, and the coordinate of the fingertip position is shifted by a predetermined distance in the direction opposite to the fingertip direction to determine the touch position coordinates. It was. In this embodiment, when the user wants to give a slightly more detailed touch instruction, the touch position is corrected, and the method for improving the operability by specifying or estimating the corrected position as the touch position is shown in FIG. A description will be given along a flowchart executed by the recognition unit 409. FIG. 10A schematically illustrates a case where correction of the touch position is necessary. The upper part of FIG. 10A is a side view of the state where the finger 1001 is touching the plane 1003 which is a part of the document table 204. In this case, the three-dimensional point of the fingertip discovered in the same manner as the method described in the first embodiment is represented by the fingertip position 1005. A touch position point determined by shifting the fingertip coordinates indicating the position of the fingertip by a user-defined predetermined value 1007 by the method described in the first embodiment is represented by a touch position 1006. The lower diagram in FIG. 10A shows a case where the angle of the finger 1002 with respect to the plane 1004 is larger than that in the upper diagram. In this case, the point of the touch position obtained by the same method as in the first embodiment is the position 1008, but the point actually in contact with the plane is the position 1009. In this way, in order to obtain the touch point, only by shifting by a predetermined fixed value, depending on the angle with respect to the plane of the fingertip, it is actually touching, or the point that the user feels touching, There is a possibility that the point obtained as the point of the touch position is shifted. Therefore, in the present embodiment, the angle of the fingertip is used to determine the amount by which the fingertip position is shifted in order to obtain the touch position point.

図９のフローチャートのステップＳ６ｘｘと記載されたステップは、実施形態１において、図６Ａ，図６Ｂの説明時に説明済みである。ここでは、差分であるステップＳ９ｘｘと記載されたステップを中心に説明する。 Steps described as step S6xx in the flowchart of FIG. 9 have been described in the description of FIGS. 6A and 6B in the first embodiment. Here, the description will focus on the step described as step S9xx, which is the difference.

ステップＳ６０８で指先方向ベクトル７０９を特定した後、ジェスチャー認識部４０９は、ステップＳ９０１において、手指と書画台２０４の平面がなす角度を求める。この時、ステップＳ６０８で求めた、指先方向のベクトル７０９を用いる。指先方向のベクトル７０９は、書画台２０４の平面、つまりｘｙ平面上の２次元ベクトルである。このベクトルは、横から見た図では、図１０（ｂ）のベクトル１０１０、１０１２のように表される。これらのベクトル１０１０、１０１２の始点と終点を、前述した手の３次元点群内の点と対応づける。この対応づけは、前述したステップＳ６０３において３次元点群を平面に投影した際に既になされている。図１０（ｂ）上の例では、ベクトル１０１０の始点は、３次元点１０１８に、終点は３次元点１００５に対応づけることができる。たとえば、ベクトルのそれぞれの端点を通りｚ軸に平行な直線と、手の３次元点群で構成される面との交点を、求める三次元ベクトルのそれぞれの端点とする。手の３次元点群は手の表面を形成しているので交点は直線ごとに２つずつ存在する可能性があるが、それぞれの端点で同じ側（すなわちｚ成分がより小さい方か、又はより大きい方のいずれか）の交点を採用すれば、いずれを用いてもよい。図１０の例では、ｚ成分がより大きい方の交点を用いている。もちろんこれは一例に過ぎない。このようにして３次元点１０１８、１００５をそれぞれ、始点、終点とするベクトル１０１１を得れば、これが指の３次元ベクトルとなる。同様にして、指の３次元ベクトル１０１３を得ることができる。ベクトル１０１０とベクトル１０１１のなす角１０２０、および、ベクトル１０１２とベクトル１０１３のなす角１０２２が、指と平面のなす角として求められる。 After identifying the fingertip direction vector 709 in step S608, the gesture recognizing unit 409 obtains the angle formed by the finger and the plane of the document table 204 in step S901. At this time, the fingertip direction vector 709 obtained in step S608 is used. The fingertip direction vector 709 is a two-dimensional vector on the plane of the document table 204, that is, the xy plane. This vector is expressed as vectors 1010 and 1012 in FIG. 10B in a side view. The start point and end point of these vectors 1010 and 1012 are associated with the points in the three-dimensional point group of the hand described above. This association has already been made when the three-dimensional point group is projected onto the plane in step S603 described above. In the example in FIG. 10B, the start point of the vector 1010 can be associated with the three-dimensional point 1018, and the end point can be associated with the three-dimensional point 1005. For example, an intersection of a straight line passing through each end point of the vector and parallel to the z-axis and a surface formed by the three-dimensional point group of the hand is set as each end point of the obtained three-dimensional vector. Since the three-dimensional point cloud of the hand forms the surface of the hand, there may be two intersection points for each straight line, but at each end point the same side (ie, the z component is smaller or more Any one of the larger ones may be used as long as the intersection is adopted. In the example of FIG. 10, the intersection having the larger z component is used. Of course, this is only an example. In this way, if the vectors 1011 having the three-dimensional points 1018 and 1005 as the start point and the end point are obtained, this becomes the three-dimensional vector of the finger. Similarly, a three-dimensional vector 1013 of the finger can be obtained. An angle 1020 formed by the vector 1010 and the vector 1011 and an angle 1022 formed by the vector 1012 and the vector 1013 are obtained as angles formed by the finger and the plane.

次に、ステップＳ９０２では、タッチ位置を求めるために指先位置をシフトさせる量を求める。図１０（ｃ）は、ステップＳ９０１で求めた平面に対する指の角度を利用してシフトする量を決めている様子を模式的に表した図である。まず、図１０（ｃ）の上の図で説明する。ベクトル１０１４は、指先の３次元点１００５を始点とし、指の３次元ベクトル１０１８と逆方向の単位ベクトルを持ち、ユーザー指定の所定の長さを持っているものとする。このベクトル１０１４の終点をｘｙ平面１００３にｚ軸に沿って投影した点が点１０１６であり、この点を求めたいタッチ位置とする。同じ方法で図１０（ｃ）の下の図でもタッチ位置１０１７を求めることができる。このように指の先端から三次元の指先方向ベクトルの逆方向に所定距離ずらした位置をｘｙ平面（すなわち操作面）に投影すれば、平面に対する指の角度に応じてタッチ位置を前後させることができるため、ユーザーのタッチ感覚を損なわないタッチ位置を提供することができる。 Next, in step S902, an amount by which the fingertip position is shifted to obtain the touch position is obtained. FIG. 10C is a diagram schematically illustrating how the shift amount is determined using the angle of the finger with respect to the plane obtained in step S901. First, a description will be given with reference to the upper part of FIG. The vector 1014 has a three-dimensional point 1005 at the fingertip as a starting point, has a unit vector in the opposite direction to the three-dimensional vector 1018 of the finger, and has a predetermined length designated by the user. A point obtained by projecting the end point of the vector 1014 onto the xy plane 1003 along the z-axis is a point 1016, and this point is a touch position to be obtained. In the same way, the touch position 1017 can be obtained in the lower diagram of FIG. In this way, if a position shifted from the tip of the finger by a predetermined distance in the opposite direction of the three-dimensional fingertip direction vector is projected onto the xy plane (that is, the operation surface), the touch position can be moved back and forth according to the angle of the finger with respect to the plane. Therefore, it is possible to provide a touch position that does not impair the user's touch feeling.

このタッチ位置１０１６、１０１７を求める作業は、書画台２０４のｘｙ平面上にあり、指先点を始点とするベクトル１０２１、１０２３を求める作業に他ならない。図１０（ｄ）に示したように、ベクトル１０１０、ベクトル１０１２と逆向きのベクトルをそれぞれ、ベクトル１０２４、ベクトル１０２５とする。ベクトル１０１４、１０１５をベクトルｖ、ベクトル１０２４、１０２５をベクトルｗ、求めたいベクトル１０２１、１０２３をベクトルｘと定義すれば、ベクトルｘはベクトルｖをベクトルｗに正射影したものである。ベクトルｖをベクトルｗに正射影したベクトルｖ'は、角度１０２０、１０２２をθと定義すると、角度θを用いて次式で表される。
ｖ'＝（｜ｖ｜｜ｗ｜ｃｏｓθ／｜ｗ｜）×ｗ／｜ｗ｜・・・（４）
この（４）式のうちｗ／｜ｗ｜はベクトルｗと同方向の単位べクトルであるから、定数"｜ｖ｜｜ｗ｜ｃｏｓθ／｜ｗ｜"＝｜ｖ｜ｃｏｓθが、求めたいベクトルｖ'の大きさ、すなわち指先の位置をタッチ位置までｘｙ平面内でシフトさせるシフト量である。なお、ベクトルｗがｘｙ平面上にあるのなら、ベクトルｖのベクトルｗに対する正射影ｖ'は、ベクトルｖの始点及び終点のｚ成分をいずれも０に置換することで求めることができる。 The operation for obtaining the touch positions 1016 and 1017 is on the xy plane of the document table 204 and is nothing but the operation for obtaining the vectors 1021 and 1023 starting from the fingertip points. As shown in FIG. 10D, vectors 1010 and 1012 that are opposite to the vectors 1010 and 1012 are assumed to be a vector 1024 and a vector 1025, respectively. If vectors 1014 and 1015 are defined as vector v, vectors 1024 and 1025 are defined as vector w, and vectors 1021 and 1023 to be obtained are defined as vector x, vector x is an orthogonal projection of vector v onto vector w. The vector v ′ obtained by orthogonally projecting the vector v onto the vector w is expressed by the following equation using the angle θ, where the angles 1020 and 1022 are defined as θ.
v ′ = (| v || w | cos θ / | w |) × w / | w | (4)
Since w / | w | in the equation (4) is a unit vector in the same direction as the vector w, the constant “| v || w | cosθ / | w |” = | v | cosθ is a vector to be obtained. The magnitude of v ′, that is, a shift amount for shifting the position of the fingertip to the touch position in the xy plane. If the vector w is on the xy plane, the orthogonal projection v ′ of the vector v with respect to the vector w can be obtained by substituting the z component of the starting point and the ending point of the vector v with 0.

ジェスチャー認識部４０９は、ステップＳ９０３では、指先位置を始点とするステップＳ９０２で求めたベクトルｖ'の終点をタッチ位置として決定する。すなわち、ｘｙ平面内の指先方向の２次元ベクトルに沿ってステップＳ９０２で求めたシフト量だけ指先位置をシフトさせ、その座標をタッチ位置として決定し、ＲＡＭ３０３に保存する。 In step S903, the gesture recognition unit 409 determines the end point of the vector v ′ obtained in step S902 starting from the fingertip position as the touch position. That is, the fingertip position is shifted by the shift amount obtained in step S902 along the two-dimensional vector in the fingertip direction in the xy plane, the coordinates are determined as the touch position, and stored in the RAM 303.

上記処理を行うことにより、指先方向と操作平面との角度に応じてタッチ位置を変化させ、より正確にタッチ位置を特定することができる。 By performing the above processing, the touch position can be changed according to the angle between the fingertip direction and the operation plane, and the touch position can be specified more accurately.

また、図１０（ｃ）を見てもわかるように、指１００２が平面に対して立っている場合（図１０（ｃ）の下の図）の補正量１０２３は、指１００１が平面に対して寝ている場合（図１０（ｃ）の上の図）の補正量１０２１よりも補正量が小さい。この前提のもと、ユーザーがタッチした位置を用いて補正量を決定するようにしてもよい。ユーザーは、自分から見て遠い位置をタッチする時の方が、近い位置をタッチする時よりも指先が寝る傾向がある。したがって、遠い位置では大きい補正量、近い位置では少ない補正量を指先からシフトすることによって、タッチ位置を決定するようにすればよい。ユーザーからタッチ位置までの距離は、実施形態１で述べた腕の付け根の点から指先点までの距離により計測できる。 As can also be seen from FIG. 10C, the correction amount 1023 when the finger 1002 is standing with respect to the plane (the lower diagram in FIG. 10C) is the finger 1001 with respect to the plane. The correction amount is smaller than the correction amount 1021 when sleeping (upper figure of FIG. 10C). Under this assumption, the correction amount may be determined using the position touched by the user. When a user touches a position far from the user, the fingertip tends to sleep more than when a close position is touched. Therefore, the touch position may be determined by shifting from the fingertip a large correction amount at a distant position and a small correction amount at a close position. The distance from the user to the touch position can be measured by the distance from the base of the arm to the fingertip point described in the first embodiment.

図１０（ｅ）は、ユーザーからタッチ位置までの距離と補正量との関係の一例を模式的にグラフで表したものである。横軸はユーザーからの距離を、縦軸は補正量を表している。図１０（ｅ）では線形のグラフを描いたが、線形に限定されるものではない。上記処理を用いても、指先と平面の角度に応じたタッチ位置の補正を、簡易的に行うことが可能となる。 FIG. 10E schematically shows an example of the relationship between the distance from the user to the touch position and the correction amount in a graph. The horizontal axis represents the distance from the user, and the vertical axis represents the correction amount. Although a linear graph is drawn in FIG. 10 (e), it is not limited to linear. Even if the above process is used, the correction of the touch position according to the angle between the fingertip and the plane can be easily performed.

［実施形態３］
実施形態１および２では、上面のセンサから指先と平面を撮影する場合の、タッチ位置の決定方法の基本的な部分と、平面に対する指の角度に応じてタッチ位置を決定する方法を説明した。これらの方法は距離画像センサ２０８のノイズが少ない場合に成立する。 [Embodiment 3]
In the first and second embodiments, the basic part of the method for determining the touch position when photographing the fingertip and the plane from the sensor on the upper surface and the method for determining the touch position according to the angle of the finger relative to the plane have been described. These methods are established when the distance image sensor 208 has little noise.

ここで距離画像センサ２０８のノイズが、平面へのタッチ位置の検出に及ぼす影響について説明する。図１２（ａ）の上側の図は、平面１２０２に、指１２０１がタッチしている様子と、実際に距離画像センサにより取得した平面の距離情報１２０３を横から見た状態を、模式的に表している。距離画像センサ２０８と書画台２０４の位置関係は固定であるため、理想的には、距離画像センサ２０８で取得した平面の距離情報は一定である。しかし、実際にはある程度のノイズが乗るため、書画台２０４の平面の距離情報には時間軸方向での揺らぎがある。距離画像センサから得られる平面の距離情報は、距離情報として取得した段階では、図１２（ａ）の距離情報１２０３のように、ノイズを含み、凹凸をもった状態で取得される。前述の平面パラメータを求める際は、これらの凹凸の平均をとるようにして求める。この凹凸が時間軸方向の揺らぎによって、距離画像センサ２０８で取得する距離画像のフレーム毎に変化する。書画台２０４の平面、乃至前述の平面パラメータの平面は、図１２（ａ）の平面１２０２で表されている。これに対し、現在一般的な距離画像センサでは、取得した距離画像の距離情報１２０３は、±３ｍｍ程度上下する凹凸を示す。そのため、前述の図６ＡのステップＳ６３１で、所定の高さ以上の３次元点群を指先として抽出する際は、上記のような平面に乗ったノイズの時間方向の揺らぎを誤って検出してしまわないようにしなければならない。そのために、距離画像に表れた、本来は平面であるはずの面の凹凸を吸収するマージンとして、所定の高さ１２０５が、５ｍｍ程度必要となる。図１２（ａ）では、平面１２０２から、所定の高さ１２０５（５ｍｍ程度）に設定された平面を１２０４で表している。前述したように、手の領域を検出する際、平面１２０４より下にある部分は、平面と共に除去する必要があるため、指先の３次元点１２０６が、平面１２０４より下にある場合は、除去されてしまう。この時、除去されずに残った点の中で、指先として検出できる、仮想的な指先点が、平面１２０４上の点１２０７である。図１２（ａ）の下側の図は、上側の図を上から見た様子（ｘｙ平面上の状態）を模式的に表している。指先点１２０６は点１２１２に、仮想的な指先点１２０７は点１２１１に対応している。指１２０９のうち、点線１２１０より左側の領域は、検出できない。図１２（ｂ）でいえば、点線１２１３で囲まれた部分が、手の領域から除去されてしまう。実線で囲まれた部分のみが、手の領域として抽出される。この場合、真の指先の３次元点１２０６と、仮想的な指先点１２０７（１２１１）との差分の距離１２０８は、５ｍｍ〜１０ｍｍになる。 Here, the influence of the noise of the distance image sensor 208 on the detection of the touch position on the plane will be described. The upper diagram in FIG. 12A schematically shows a state where the finger 1201 is touching the plane 1202 and a state where the distance information 1203 of the plane actually acquired by the distance image sensor is viewed from the side. ing. Since the positional relationship between the distance image sensor 208 and the document stage 204 is fixed, ideally, the plane distance information acquired by the distance image sensor 208 is constant. However, since a certain amount of noise actually occurs, the distance information of the plane of the document table 204 has fluctuation in the time axis direction. When the distance information of the plane obtained from the distance image sensor is acquired as the distance information, it is acquired in a state including noise and having unevenness like the distance information 1203 in FIG. When obtaining the above-mentioned plane parameters, the average of these irregularities is obtained. The unevenness changes for each frame of the distance image acquired by the distance image sensor 208 due to fluctuation in the time axis direction. The plane of the document table 204 or the plane of the plane parameters described above is represented by the plane 1202 in FIG. On the other hand, in the current general distance image sensor, the distance information 1203 of the acquired distance image indicates unevenness that moves up and down by about ± 3 mm. Therefore, when extracting a three-dimensional point group having a predetermined height or more as a fingertip in step S631 of FIG. 6A described above, the fluctuation in the time direction of noise on the plane as described above is erroneously detected. There must be no. Therefore, a predetermined height 1205 of about 5 mm is required as a margin for absorbing irregularities of a surface that should be a flat surface that appears in the distance image. In FIG. 12A, a plane set to a predetermined height 1205 (about 5 mm) from the plane 1202 is represented by 1204. As described above, when detecting the hand region, the portion below the plane 1204 needs to be removed together with the plane, and therefore, if the three-dimensional point 1206 of the fingertip is below the plane 1204, it is removed. End up. At this time, a virtual fingertip point that can be detected as a fingertip among the points remaining without being removed is a point 1207 on the plane 1204. The lower diagram of FIG. 12A schematically shows the upper diagram as viewed from above (state on the xy plane). Fingertip point 1206 corresponds to point 1212, and virtual fingertip point 1207 corresponds to point 1211. Of the finger 1209, the region on the left side of the dotted line 1210 cannot be detected. In FIG. 12B, the portion surrounded by the dotted line 1213 is removed from the hand region. Only the part surrounded by the solid line is extracted as the hand region. In this case, a difference distance 1208 between the three-dimensional point 1206 of the true fingertip and the virtual fingertip point 1207 (1211) is 5 mm to 10 mm.

実施形態１や実施形態２で行った方法では、指先位置を正確に取得することを前提として、タッチ位置を決定する。そのため、上記のように距離画像にノイズがある場合は、正確なタッチ位置の決定が難しい。仮想的な指先点１２０７を用いてタッチ位置を検出した場合、上述したように、５ｍｍ〜１０ｍｍ程度実際のタッチ位置とずれが生じる。そこで本実施形態では、同時に取得している、距離画像よりノイズの少ないＲＧＢ画像を用いて、正確なタッチ位置を決定する。この方法を、図１１の、ジェスチャー認識部４０９が実行するフローチャートを用いて説明する。図１１のステップＳ６ｘｘ、ステップＳ９ｘｘと書かれた部分は、図６Ａ、図９でそれぞれ説明した部分なので、説明を割愛する。 In the method performed in the first embodiment or the second embodiment, the touch position is determined on the assumption that the fingertip position is accurately acquired. Therefore, when there is noise in the distance image as described above, it is difficult to determine an accurate touch position. When the touch position is detected using the virtual fingertip point 1207, as described above, a deviation from the actual touch position occurs by about 5 mm to 10 mm. Therefore, in the present embodiment, an accurate touch position is determined using RGB images that are simultaneously acquired and have less noise than the distance image. This method will be described with reference to the flowchart executed by the gesture recognition unit 409 in FIG. The portions written as step S6xx and step S9xx in FIG. 11 are portions described in FIG. 6A and FIG.

ステップＳ６０８で指先方向ベクトル７０９を特定した後、ジェスチャー認識部４０９は、ステップＳ１１０１において、画像取得部４１６より、距離画像センサ２０８がＲＧＢカメラ３６３で取得したカラー画像、すなわちＲＧＢ画像を取得する。 After identifying the fingertip direction vector 709 in step S608, the gesture recognition unit 409 acquires the color image, that is, the RGB image, acquired by the distance image sensor 208 with the RGB camera 363 from the image acquisition unit 416 in step S1101.

ステップＳ１１０２において、ジェスチャー認識部４０９は、取得したＲＧＢ画像から指先検出を行う。これにはまず、距離画像で行ったのと同様に、ＲＧＢ画像中の手の領域を検出する必要がある。そのため、起動時に予めＲＡＭ３０３に保存しておいた背景画像（何も置かれていない書画台２０４の画像）と、ステップＳ１１０１で取得したＲＧＢ画像の差分画像を求める。もしくは、ステップＳ１１０１で取得したＲＧＢ画像から肌色の領域を検出する。その後、図６ＡのステップＳ６３３、Ｓ６３４と同様の処理を施せば、ｘｙ平面上の２次元の指先位置を発見することができる。図１２（ｃ）は、ｘｙ平面の画像上で、距離画像で得られた指１２０９にＲＧＢ画像の指が重ねて表示されている様子を表している。この時、距離画像を用いて求めた指先が１２１１で表されている。また、点線で囲まれ部分１２１４は、指のうち、ＲＧＢ画像と距離画像の差分の領域である。点１２１５は、ＲＧＢ画像を用いて発見した指先を表している。 In step S1102, the gesture recognition unit 409 performs fingertip detection from the acquired RGB image. To do this, it is first necessary to detect the hand region in the RGB image, as was done with the distance image. For this reason, a difference image between the background image (the image on the document table 204 on which nothing is placed) saved in advance in the RAM 303 at the time of activation and the RGB image acquired in step S1101 is obtained. Alternatively, a skin color region is detected from the RGB image acquired in step S1101. Thereafter, by performing processing similar to steps S633 and S634 in FIG. 6A, a two-dimensional fingertip position on the xy plane can be found. FIG. 12C shows a state in which the finger of the RGB image is displayed superimposed on the finger 1209 obtained from the distance image on the xy plane image. At this time, the fingertip obtained using the distance image is represented by 1211. A portion 1214 surrounded by a dotted line is a difference area between the RGB image and the distance image of the finger. A point 1215 represents a fingertip discovered using an RGB image.

ステップＳ１１０３において、ジェスチャー認識部４０９は、手指と平面のなす角度を取得する。この処理は、図９のステップＳ９０１の処理と同様の処理である。このとき、ステップＳ１１０３で、指先の座標として、距離画像を用いて取得した指先点１２１１を用いる。 In step S1103, the gesture recognizing unit 409 acquires an angle formed by the finger and the plane. This process is the same as the process of step S901 in FIG. At this time, in step S1103, the fingertip point 1211 acquired using the distance image is used as the fingertip coordinates.

ステップＳ１１０４において、ジェスチャー認識部４０９は、真の３次元の指先位置を推定する。真の３次元の指先位置とは、前述したように、ノイズと共に除去された指先の３次元座標である。図１２（ｄ）のベクトル１２１６は、直前のステップＳ１１０３で求めた指先方向を示す３次元ベクトル（指のベクトルとも呼べる）である。この指の３次元ベクトルは、仮想的な３次元の指先位置１２０７を先端として求められている。点線１２１９は、ＲＧＢ画像から求めた２次元の指先位置１２１２を通り、かつ、平面１２０２への指ベクトル１２１６の正射影に直交する平面１２１９を横から見たものである。ベクトル１２１６を終点側に延長し、平面１２１９と交わった点１２２０を真の３次元の指先位置として推定する。点１２１０のｘ、ｙ成分は点１２１２のｘ、ｙ成分とそれぞれ一致するので、点１２０７のｚ成分およびベクトル１２１６の傾きに応じて点１２１０のｚ成分を求めれば、点１２２０を特定できる。ベクトル１２１８は、延長した分のベクトルを表現している。ベクトル１２１６とベクトル１２１８を足したベクトルを、真の指の３次元ベクトルとして、以後の処理で利用する。ステップＳ１１０４が終了すると、ステップＳ９０２へ移行する。ここからの処理は、図９で説明した処理と同様の処理である。すなわち、指先位置１２２０から指ベクトルの逆方向へ所定距離戻した点をｘｙ平面へと投影し、その点をタッチ位置として推定する。その際、指の３次元ベクトルとしては、上記真の指の３次元ベクトルを用いて処理を行う。 In step S1104, the gesture recognition unit 409 estimates a true three-dimensional fingertip position. As described above, the true three-dimensional fingertip position is the three-dimensional coordinates of the fingertip removed together with noise. A vector 1216 in FIG. 12D is a three-dimensional vector (also referred to as a finger vector) indicating the fingertip direction obtained in the immediately preceding step S1103. The three-dimensional vector of the finger is obtained with the virtual three-dimensional fingertip position 1207 as the tip. A dotted line 1219 is a side view of a plane 1219 passing through the two-dimensional fingertip position 1212 obtained from the RGB image and orthogonal to the orthogonal projection of the finger vector 1216 onto the plane 1202. The vector 1216 is extended to the end point side, and a point 1220 intersecting the plane 1219 is estimated as a true three-dimensional fingertip position. Since the x and y components of the point 1210 coincide with the x and y components of the point 1212, the point 1220 can be specified by obtaining the z component of the point 1210 according to the z component of the point 1207 and the slope of the vector 1216. A vector 1218 represents the extended vector. A vector obtained by adding the vector 1216 and the vector 1218 is used in the subsequent processing as a true finger three-dimensional vector. When step S1104 ends, the process proceeds to step S902. The processing from here is the same as the processing described in FIG. That is, a point returned from the fingertip position 1220 by a predetermined distance in the direction opposite to the finger vector is projected onto the xy plane, and the point is estimated as the touch position. At this time, the processing is performed using the three-dimensional vector of the true finger as the three-dimensional vector of the finger.

上記処理により、距離画像センサの精度がよくない場合でも、３次元の指先位置を推定し、タッチ位置を決定することが可能となる。 By the above processing, even when the accuracy of the distance image sensor is not good, it is possible to estimate the three-dimensional fingertip position and determine the touch position.

［実施形態４］
実施形態３では、距離画像にノイズがある場合に、ＲＧＢ画像を用いて３次元の指先位置を発見し、タッチ位置を決定する方法を説明した。本実施形態では、ＲＧＢ画像を用いず、距離画像のみで真の３次元の指先位置を発見し、タッチ位置の決定に用いる方法を説明する。 [Embodiment 4]
In the third embodiment, the method of finding a three-dimensional fingertip position using an RGB image and determining the touch position when there is noise in the distance image has been described. In the present embodiment, a method will be described in which a true three-dimensional fingertip position is found using only a distance image without using an RGB image and is used for determining a touch position.

図１４（ａ）は、平面１４０８にタッチする寸前の指１４０１が、矢印１４０４の方向に降下して、タッチしている状態の指１４０２に変化する様子を模式的に表している。実施形態３でも述べたように、距離画像にノイズがある場合、所定の高さ１４０６の位置に平面の閾値（あるいは平面度の閾値）を設定しなければならなくなる。そのため、タッチしている状態の指１４０２は、先端部分１４０５が、平面と共に除去され、指先が欠けた状態になるので、真の３次元の指先位置を直接発見することが難しい。しかし、タッチする寸前の指１４０１は所定の高さ１４０６より高い位置にあるため、指先が欠けることはない。この状態の指の長さを保存しておき、タッチ後の指先位置の推定に用いる。 FIG. 14A schematically shows a state where the finger 1401 just before touching the plane 1408 is lowered in the direction of the arrow 1404 and changed to the finger 1402 in the touched state. As described in the third embodiment, when there is noise in the distance image, it is necessary to set a flat threshold value (or flatness threshold value) at a predetermined height 1406. Therefore, the finger 1402 in the touched state has the tip portion 1405 removed together with the flat surface, and the fingertip is missing, so it is difficult to directly find the true three-dimensional fingertip position. However, since the finger 1401 just before touching is at a position higher than the predetermined height 1406, the fingertip is not lost. The finger length in this state is stored and used for estimation of the fingertip position after touching.

この方法を図１３の、ジェスチャー認識部４０９が実行するフローチャートを用いて詳細に説明する。図１３のステップのうち、ステップＳ６ｘｘ、Ｓ９ｘｘ、Ｓ１１ｘｘと記載されているものは、図６Ａ、図９、図１１のフローチャートで説明したステップと同様であるため、詳しい説明を割愛する。 This method will be described in detail with reference to the flowchart executed by the gesture recognition unit 409 in FIG. Of the steps in FIG. 13, those described as steps S6xx, S9xx, and S11xx are the same as the steps described in the flowcharts in FIGS. 6A, 9, and 11, and thus detailed description thereof is omitted.

ステップＳ６０３で指先検出を行った後、ステップＳ１３０１において、ジェスチャー認識部４０９は、後述するタッチカウントが所定値以下かどうかを確かめる。ここでタッチカウントとは、ジェスチャー認識部４０９の処理が開始された後、何回平面へのタッチが行われたかを表す数値である。ステップＳ６０７でタッチジェスチャーありと判定された場合に、インクリメントされ、ＲＡＭ３０３に保存されているものとする。所定値以下であった場合は、ステップＳ１３０２へ進み、所定値以上であった場合は、ステップＳ６０６へ進む。 After performing fingertip detection in step S603, in step S1301, the gesture recognition unit 409 confirms whether a touch count described later is equal to or less than a predetermined value. Here, the touch count is a numerical value indicating how many times a touch on the plane has been performed after the processing of the gesture recognition unit 409 is started. If it is determined in step S607 that there is a touch gesture, it is incremented and stored in the RAM 303. If it is equal to or smaller than the predetermined value, the process proceeds to step S1302, and if it is equal to or larger than the predetermined value, the process proceeds to step S606.

ステップＳ１３０２において、ジェスチャー認識部４０９は、指先位置が所定の高さ以下かどうかを確認する。ここでいう所定の高さとは、図１４（ａ）の１４１２の高さである。これは、ノイズを回避するための高さ１４０６より大きく設定される必要がある。高さ１４１２は平面１４０７よりも十分離れた位置に指があることを保証するためのものなので、高さ１４１２は高さ１４０６よりも大きく、かつ、通常の操作における指の高さよりも低い範囲で、たとえば高さ１４０６の倍程度に設定される。高さ１４０６が５ｍｍ程度に設定されている場合、高さ１４１２は１０〜２０ｍｍに設定しておけばよい。ここで所定の高さ以下だった場合はステップＳ１３０３へ移行し、所定の高さより高かった場合はステップＳ６０６へ移行する。 In step S1302, the gesture recognizing unit 409 checks whether the fingertip position is equal to or lower than a predetermined height. The predetermined height here is the height of 1412 in FIG. This needs to be set larger than the height 1406 for avoiding noise. Since the height 1412 is used to ensure that the finger is located far away from the plane 1407, the height 1412 is larger than the height 1406 and lower than the height of the finger in normal operation. For example, it is set to about twice the height 1406. When the height 1406 is set to about 5 mm, the height 1412 may be set to 10 to 20 mm. If it is below the predetermined height, the process proceeds to step S1303, and if it is higher than the predetermined height, the process proceeds to step S606.

ステップＳ１３０３において、ジェスチャー認識部４０９は、指の長さを保存する処理を実行する。この時、ジェスチャー認識部４０９は、実施形態２で説明したステップＳ９０１と同じ方法で、指の３次元ベクトル１４１１を得る。この指の３次元ベクトル１４１１の長さを、ＲＡＭ３０３の所定領域に保存しておく。 In step S1303, the gesture recognizing unit 409 executes processing for storing the finger length. At this time, the gesture recognizing unit 409 obtains the finger three-dimensional vector 1411 by the same method as step S901 described in the second embodiment. The length of the three-dimensional vector 1411 of the finger is stored in a predetermined area of the RAM 303.

上記ステップＳ１３０１〜Ｓ１３０３の処理は、タッチカウントが所定の回数を超えるまで実行されるが、その回数分、指の３次元ベクトルを取得し、長さの平均値をとるようにしてもよい。 The processes in steps S1301 to S1303 are executed until the touch count exceeds a predetermined number. However, the three-dimensional vector of the finger may be acquired for the number of times, and the average length may be taken.

さてステップＳ６０８で操作面にタッチした指先方向の特定が済むと、ステップＳ１３０５において、ジェスチャー認識部４０９は、手指と平面のなす角度から、３次元指先位置を推定する処理を行う。この時、ステップＳ１１０３で求めた、仮想的な指の３次元ベクトル１４１４を、その始点位置を維持したままステップＳ１３０１〜Ｓ１３０３で求めた指の長さまで延長する。延長された指の３次元ベクトルが、ベクトル１４１６である。この先端１４１７を真の３次元の指先点とする。この真の３次元の指先点を用いれば、以降のステップにおいて、実施形態１，２と同様にしてタッチ位置を決定することができる。 When the direction of the fingertip touching the operation surface is identified in step S608, the gesture recognition unit 409 performs a process of estimating the three-dimensional fingertip position from the angle formed by the finger and the plane in step S1305. At this time, the virtual finger three-dimensional vector 1414 obtained in step S1103 is extended to the finger length obtained in steps S1301 to S1303 while maintaining the starting point position. The three-dimensional vector of the extended finger is a vector 1416. This tip 1417 is a true three-dimensional fingertip point. If this true three-dimensional fingertip point is used, the touch position can be determined in the subsequent steps as in the first and second embodiments.

本実施形態は前述の平面の閾値が平面に対して一定で、所定以上の場合を対象にしている。しかし、環境によっては、平面の場所ごとにセンサの感度が変わるため、場所ごとに平面の閾値（図１４の高さ１４０６）を変えられる場合がある。その場合、場所ごとに真の３次元の指先位置を推定する必要がある場合とそうでない場合がある。そのようなときは、予め平面の場所ごとの閾値を保存しておく。場所とは、操作平面上における領域区分等で特定する。そして図１５のフローチャートのステップＳ１５０１に示すように、タッチされた位置の平面の閾値が所定値以下かどうかを判定して、所定値を超えていればステップＳ１１０３、Ｓ１３０５、Ｓ９０２を行うものと決めるようにしてもよい。同様にして、ＲＧＢ画像から真の指先の３次元位置を推定する場合も、平面の場所ごとの閾値に応じて処理を切り替えるようにしてもよい。 The present embodiment is intended for the case where the above-described plane threshold is constant with respect to the plane and is greater than or equal to a predetermined value. However, depending on the environment, the sensitivity of the sensor changes for each plane location, so that the plane threshold (height 1406 in FIG. 14) may be changed for each location. In that case, a true three-dimensional fingertip position may or may not be estimated for each location. In such a case, a threshold value for each plane location is stored in advance. The place is specified by an area division on the operation plane. Then, as shown in step S1501 of the flowchart of FIG. 15, it is determined whether or not the threshold value of the touched position plane is equal to or smaller than a predetermined value. You may do it. Similarly, when the true three-dimensional position of the fingertip is estimated from the RGB image, the processing may be switched according to the threshold value for each place on the plane.

尚、上記処理では、所定の高さ１４１２より指先が低くなったら指の長さを保存するようにしたが、初回起動時に指先を距離画像センサにかざす等して、指の長さを保存するようにしてもよい。 In the above processing, the finger length is stored when the fingertip becomes lower than the predetermined height 1412. However, the finger length is stored by holding the fingertip over the distance image sensor at the first activation. You may do it.

また、フローチャートでは、ステップＳ１３０５で真の３次元の指先位置を推定した後、ステップＳ９０２、Ｓ６０９でタッチ位置を決定したが、この順序を逆にしてもよい。まず指先がタッチされていないうちに、ステップＳ９０２、Ｓ６０９と同様の処理を用いて補正量、つまりは指の腹の位置を算出しておく。ステップＳ１３０３では、求めた指の長さに加えて、指の付け根から指の腹までの長さを保存する。タッチジェスチャーが検出されたのちに、先に保存した指の腹までの長さを用いて、ステップＳ１３０５と同様の処理を行い、正確なタッチ位置を推定するようにしてもよい。 In the flowchart, the true three-dimensional fingertip position is estimated in step S1305 and then the touch position is determined in steps S902 and S609. However, this order may be reversed. First, while the fingertip is not touched, the correction amount, that is, the position of the belly of the finger is calculated using the same processing as in steps S902 and S609. In step S1303, in addition to the obtained finger length, the length from the base of the finger to the belly of the finger is stored. After the touch gesture is detected, the same processing as in step S1305 may be performed using the length of the finger stored in advance to estimate the accurate touch position.

また、上記処理では、指の角度と長さを用いて、正確なタッチ位置を推定する方法を説明したが、指先の軌跡を保存することにより、正確なタッチ位置を推定するようにしてもよい。図１４（ｃ）は指先の位置の軌跡を用いてタッチ時の指先位置を推定する様子を模式的に表した図である。位置１４２１、１４２２、１４２３は、タッチする直前の、時系列に連続した指の位置を表している。 In the above processing, the method for estimating the accurate touch position using the angle and length of the finger has been described. However, the accurate touch position may be estimated by storing the trajectory of the fingertip. . FIG. 14C is a diagram schematically showing how the fingertip position at the time of touching is estimated using the locus of the fingertip position. Positions 1421, 1422, and 1423 represent finger positions that are continuous in time series immediately before touching.

位置１４２４は予測されるタッチ位置での指の位置を表しており、この時の指先はノイズを回避するための高さ１４０６より下になっているため、そのままでは正しい指先位置が発見出来ない。軌跡１４２５は、指先の軌跡を表している。軌跡１４２６は予測される指先の軌跡を表している。ここで、ノイズを回避するための高さ１４０６より高い所定の位置１４２０に、閾値を設けて置く。図１４（ｃ）の高さ方向の座標値がこの閾値１４２０以下となるまで、指先の軌跡をＲＡＭ３０３に保存し、その後の指先の軌跡を予測するのに用いる。軌跡の保存は、現在の指先位置と直前の指先位置を用いて、２点間を結ぶ３次元的な直線を保存していくようにすればよい。この場合、直線の方向に軌跡の方向ベクトルをとれば、その方向ベクトルと、書画台の平面１４０８（もしくは書画台の平面から所定の高さに設けた仮想的な平面）が交わる点が予測される指先点となる。 The position 1424 represents the position of the finger at the predicted touch position. Since the fingertip at this time is below the height 1406 for avoiding noise, the correct fingertip position cannot be found as it is. A locus 1425 represents the locus of the fingertip. A trajectory 1426 represents a predicted trajectory of the fingertip. Here, a threshold is set at a predetermined position 1420 higher than the height 1406 for avoiding noise. The trajectory of the fingertip is stored in the RAM 303 until the coordinate value in the height direction in FIG. 14C is equal to or less than the threshold value 1420, and is used to predict the trajectory of the subsequent fingertip. The trajectory may be stored by using a current fingertip position and a previous fingertip position to store a three-dimensional straight line connecting two points. In this case, if the direction vector of the trajectory is taken in the direction of the straight line, a point where the direction vector and the plane 1408 of the document table (or a virtual plane provided at a predetermined height from the plane of the document table) intersect is predicted. It becomes a fingertip point.

また、現在と直前の２点だけではなく、直近の指先位置を所定個数だけＲＡＭ３０３に保存しておき、所定個数の指先位置を通る近似曲線を３次元的に求めるようにしてもよい。この場合、３次元的な曲線と書画台の平面１４０８（もしくは書画台の平面から所定の高さに設けた仮想的な平面）が交わる点が予測される指先点となる。上記の書画台の平面１４０８から所定の高さに設けた仮想的な平面は不図示である。この仮想的な平面は、指の厚みを考慮して、指の厚み分実際の書画台の平面１４０８より上に設定された平面である。指先位置が推定されれば、これまでに述べてきた方法を用いて、タッチ位置（指の腹の位置）を求めることができる。 Alternatively, not only the current and previous two points but also a predetermined number of the latest fingertip positions may be stored in the RAM 303, and an approximate curve passing through the predetermined number of fingertip positions may be obtained three-dimensionally. In this case, a point at which the three-dimensional curve and the plane 1408 of the document table intersect (or a virtual plane provided at a predetermined height from the plane of the document table) is a predicted fingertip point. A virtual plane provided at a predetermined height from the plane 1408 of the document table is not shown. This virtual plane is a plane set above the plane 1408 of the actual document table by the thickness of the finger in consideration of the thickness of the finger. If the fingertip position is estimated, the touch position (the position of the belly of the finger) can be obtained using the methods described so far.

また、上記の方法は、まず指先の軌跡を用いて、タッチ時の指先位置を推定してから、指の腹の位置を求める順序で説明したが、これを逆にしてもよい。つまり、毎フレーム、指の腹の位置の推定処理を、これまで述べてきた方法を使って行っておき、指の腹の位置の軌跡を求めることで、タッチ位置を推定するようにしてもよい。 In the above-described method, first, the fingertip position at the time of touching is estimated using the trajectory of the fingertip and then the order of obtaining the position of the belly of the finger is described. However, this may be reversed. That is, the estimation process of the position of the finger belly may be performed every frame using the method described so far, and the touch position may be estimated by obtaining the locus of the finger belly position. .

また、上記は、常に指の軌跡を保存しておく方法を述べたが、ＣＰＵパフォーマンスを低下させない観点から、指が所定の高さ以下になった時点で軌跡を保存し始めるようにしてもよい。図１４（ｃ）でいえば、高さ１４１２に閾値を設け、指先がその閾値以下になった時点で指の軌跡をとり始めればよい。 Further, the above has described the method of always storing the trajectory of the finger. However, from the viewpoint of not reducing the CPU performance, the trajectory may be started to be stored when the finger falls below a predetermined height. . In FIG. 14C, a threshold value may be provided at the height 1412, and the finger trajectory may be started when the fingertip falls below the threshold value.

更に、指の軌跡を簡易的に求める方法として、所定の高さの２点間を結ぶ直線を求めることで指先位置を予測するようにしてもよい。たとえば、図１４（ｃ）の閾値１４０３、１４２０を、指が上から順番に横切った時の指先の座標を保存しておき、それぞれの座標を３次元的に結んだ直線を求める。
この直線と書画台の平面１４０８が交わる点を予測される指先点としてもよい。
上記処理により、正確なタッチ位置を推定することが可能となる。 Furthermore, as a method for easily obtaining the trajectory of the finger, the fingertip position may be predicted by obtaining a straight line connecting two points having a predetermined height. For example, the coordinates of the fingertip when the finger crosses in order from the top are stored as the threshold values 1403 and 1420 in FIG. 14C, and a straight line connecting the respective coordinates three-dimensionally is obtained.
The point where the straight line and the plane 1408 of the document table intersect may be a predicted fingertip point.
With the above processing, it is possible to estimate an accurate touch position.

［その他の実施例］
なお本発明は以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 [Other Examples]
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１０１カメラスキャナ、２０１コントローラ部、２０２カメラ部、２０４書画台、２０７プロジェクタ、２０８距離画像センサ部 DESCRIPTION OF SYMBOLS 101 Camera scanner, 201 Controller part, 202 Camera part, 204 Document stand, 207 Projector, 208 Distance image sensor part

Claims

A user interface device that identifies operations performed on the operation surface,
Acquisition means for acquiring a three-dimensional image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
Extracting means for extracting a hand region from the three-dimensional image;
First specifying means for specifying the position of a fingertip from the hand region;
Detecting means for detecting a touch on the operation surface based on the position of the operation surface and the fingertip included in the three-dimensional image;
A second specifying means for specifying an orientation of the fingertip from the hand region when a touch on the operation surface is detected;
A user interface device, comprising: a determining unit configured to determine, as a touch position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface.

The second specifying means specifies the direction of the fingertip from the hand region projected on the operation surface,
The determination means determines, as a touch position, a position obtained by shifting the position of the fingertip of the hand region projected onto the operation surface by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface. The user interface device according to 1.

The second specifying means specifies the orientation of the fingertip in the three-dimensional image;
The said determination means projects the position which shifted the position of the said fingertip predetermined amount in the direction opposite to the direction of the said fingertip on the said operation surface, and determines this position as a touch position. User interface device.

Means for acquiring a color image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
A third specifying means for specifying a position of the fingertip on the operation surface based on a color of a hand area in the color image;
The determining means corrects the position extended to the fingertip position on the operation surface specified by the third specifying means in the direction of the fingertip of the three-dimensional image specified by the second specifying means. 4. The user according to claim 3, wherein the position is a fingertip position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip is projected on the operation surface, and the position is determined as a touch position. Interface device.

The second specifying means determines the center of the palm or back of the hand, and specifies the direction of the fingertip as the direction from the center of the palm or back of the hand toward the fingertip. The user interface device according to any one of the above.

6. The user interface device according to claim 1, wherein the determination unit determines a center of gravity of a pixel of a finger included in a predetermined peripheral region of the position of the fingertip as a touch position. .

A determination unit that determines that the fingertip is within a predetermined distance from the operation surface when a touch on the operation surface is not detected by the detection unit;
Measuring means for measuring the length of the finger from the hand region when the fingertip is below the predetermined distance;
The determining means corrects the position of the fingertip after correcting the position of the fingertip shifted to the finger length measured by the measuring means in the direction of the fingertip of the three-dimensional image specified by the second specifying means. 4. The user interface device according to claim 3, wherein a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip is projected onto the operation surface, and the position is determined as a touch position.

The measuring means calculates a curvature of the outer shape from the outer shape of the hand region, specifies a point where the curvature is smaller than a predetermined value as a fingertip position,
From the image obtained by projecting the three-dimensional image onto the operation surface, a location where the hand region has entered the operation surface is specified, and a position where the width of the hand region is smaller than a predetermined threshold is designated from the location. The user interface device according to claim 7, wherein the user interface device is specified as an original position of the finger, and a length from the original position of the finger to the position of the fingertip is measured as the length of the finger.

The determination means obtains the corrected fingertip position after determining the corrected fingertip position if a flatness threshold value stored in advance for each region of the operation surface is greater than a predetermined value. A position shifted by a predetermined amount in the direction opposite to the direction of the fingertip is projected onto the operation surface, and the position is determined as the touch position. Otherwise, the position of the fingertip specified by the first specifying means is The user interface device according to any one of claims 4 to 8, wherein a position shifted by a predetermined amount in a direction opposite to the direction of a fingertip is projected onto the operation surface, and the position is determined as a touch position.

The said acquisition means contains a distance sensor, The said three-dimensional image is acquired based on the distance image which has the distance measured by the said distance sensor for every pixel. The described user interface device.

The user interface device according to claim 1, wherein the extraction unit extracts a region of a predetermined color from the three-dimensional image as the hand region.

The said 1st specific | specification means calculates a curvature from the external shape of the said hand area | region, and specifies the point where the said curvature is smaller than a predetermined value as a position of a fingertip. The described user interface device.

The first specifying means calculates a curvature of the outer shape based on a circle or an ellipse that fits the outer shape of the hand region, and when the curvature is smaller than a predetermined value and is inside the hand region, The user interface device according to any one of claims 1 to 11, wherein a center point of a contour point that fits a circle or an ellipse is specified as the position of the fingertip.

When the radius of the smallest circle surrounding the adjacent finite number of contour points is smaller than a predetermined value among the contour points of the outline of the hand region, the first specifying unit is configured to determine the center of the finite number of contour points. The user interface device according to claim 1, wherein a point is specified as the position of the fingertip.

The first specifying means specifies a place where the hand area has entered the operation surface, and specifies the position of the hand area farthest from the place as the position of the fingertip. The user interface device according to any one of 1 to 11.

The first specifying means sets the touch position according to a distance from a position where the hand area has entered the operation surface to the position of the fingertip so that the predetermined amount increases as the distance increases. The user interface device according to claim 1, wherein the user interface device is specified.

The second specifying unit specifies a location where the hand region has entered the operation surface from an image obtained by projecting the three-dimensional image onto the operation surface, and the width of the hand region is a predetermined width from the location. The user interface device according to any one of claims 1 to 16, wherein a direction from a position smaller than a threshold value to a position of the fingertip is specified as a vector indicating a direction and length of the fingertip.

The second specifying means determines, from the hand region projected on the operation surface, a region where the width of the hand region is smaller than a predetermined value as a finger region, and from the end where the finger region is specified The user interface device according to any one of claims 1 to 16, wherein a direction toward the fingertip is specified as an orientation of the fingertip.

The second specifying means determines, from an image obtained by projecting the three-dimensional image on the operation surface, a location where the hand region has entered the operation surface as a root of the hand region, and the fingertip from the base The user interface device according to any one of claims 1 to 16, wherein a direction up to is specified as an orientation of the fingertip.

The determining means determines the distance from the fingertip to the center of a circle that fits the outer shape of the hand region or a point on the fingertip direction side of the focal point of an ellipse that fits the outer shape of the hand region. The user interface device according to any one of claims 1 to 19, wherein the touch position is determined as a fixed amount.

The determining means determines a touch position using a distance from the fingertip position to the center of gravity of a contour point surrounded by a minimum circle including the contour point specified as the fingertip position as a central point, as the predetermined amount. The user interface device according to claim 1, wherein the user interface device is a device.

A user interface device that identifies operations performed on the operation surface,
Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
Estimating means for estimating the position of the belly of the finger from the three-dimensional image;
A user interface device comprising:

A method of controlling a user interface device that identifies an operation performed on an operation surface,
An acquisition step of acquiring a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
And a estimating step of estimating a position of a finger belly from the three-dimensional image.

Computer
Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
A program for functioning as an estimation means for estimating the position of the belly of a finger from the three-dimensional image.