JP2016139396A - User interface device, method and program - Google Patents

User interface device, method and program Download PDF

Info

Publication number
JP2016139396A
JP2016139396A JP2015147083A JP2015147083A JP2016139396A JP 2016139396 A JP2016139396 A JP 2016139396A JP 2015147083 A JP2015147083 A JP 2015147083A JP 2015147083 A JP2015147083 A JP 2015147083A JP 2016139396 A JP2016139396 A JP 2016139396A
Authority
JP
Japan
Prior art keywords
fingertip
position
operation surface
user interface
interface device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2015147083A
Other languages
Japanese (ja)
Other versions
JP2016139396A5 (en
Inventor
宗士 大志万
Soshi Oshima
宗士 大志万
Original Assignee
キヤノン株式会社
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP2014170886 priority Critical
Priority to JP2014170886 priority
Priority to JP2015010680 priority
Priority to JP2015010680 priority
Application filed by キヤノン株式会社, Canon Inc filed Critical キヤノン株式会社
Priority claimed from US14/818,770 external-priority patent/US10310675B2/en
Publication of JP2016139396A publication Critical patent/JP2016139396A/en
Publication of JP2016139396A5 publication Critical patent/JP2016139396A5/en
Application status is Pending legal-status Critical

Links

Images

Abstract

PROBLEM TO BE SOLVED: To read information on a finger from above using a distance image sensor and the like, and to determine a touch position as intended by users when implementing touch detection with respect to a plane surface.SOLUTION: A user interface device is configured to: acquire a three-dimensional image of an operation surface and a region above the operation surface; extract a hand region from the three-dimensional image; specify a position of a fingertip; detect a touch to the operation surface on the basis of the operation surface included in the three-dimensional image, and the specified position of the fingertip; when the touch is detected, specify a direction of the fingertip; and determine a position having the position of the fingertip shifted by a prescribed amount in an opposite direction of the specified direction of the fingertip as a touch position.SELECTED DRAWING: Figure 8

Description

  The present invention relates to a user interface device, a method, and a program for detecting the position of a hand or a fingertip from a remote location and operating a display component displayed on a specific surface.

  In a user interface using a projector, a camera, and a distance sensor, the user interface is projected by the projector, so that it can be displayed superimposed on an actual object such as paper. Therefore, the user can handle a real object as an interface with electronic data. In the user interface system disclosed in Patent Document 1, a computer screen is projected on a desk by a projector, and the computer screen is operated by a fingertip. An infrared camera is used to detect a touch on a plane with a fingertip. In Patent Document 1, a touch instruction with a finger or a pen is performed using an object such as a table or paper as a user interface. At this time, when an operation of selecting a character having a size of about 5 mm square using a finger or drawing a line under the character, it is necessary to determine an accurate touch position.

JP 2013-34168 A

T. Lee and T. Hollerer, Handy AR: Markerless Inspection of Augmented Reality Objects Using Fingertip Tracking.In Proc.IEEE International Symposium on Wearable Computers (ISWC), Boston, MA, Oct. 2007

  However, Patent Document 1 does not consider the angle formed between the finger and the plane when performing touch detection between the finger and the plane. If the angle of the fingertip is not taken into account, there is a problem that the plane and the position of the fingertip cannot be acquired correctly, and the contact position between the finger and the operation surface cannot be recognized accurately. In this case, it becomes difficult to select a fine character as described above or draw a line under the character.

  The present invention has been made in view of the above problems, and in a technique for performing touch detection by image analysis, a user interface device and method capable of improving the detection accuracy of a contact position and thus improving user operability. It is intended to provide.

  In order to achieve the above object, according to one aspect, the present invention has the following configuration.

A user interface device that identifies operations performed on the operation surface,
Acquisition means for acquiring a three-dimensional image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
Extracting means for extracting a hand region from the three-dimensional image;
First specifying means for specifying the position of a fingertip from the hand region;
Detecting means for detecting a touch on the operation surface based on the position of the operation surface and the fingertip included in the three-dimensional image;
A second specifying means for specifying an orientation of the fingertip from the hand region when a touch on the operation surface is detected;
And determining means for determining, as a touch position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface.

  According to another aspect, it has the following configuration.

A user interface device that identifies operations performed on the operation surface,
Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
Estimating means for estimating the position of the belly of the finger from the three-dimensional image.

  According to the present invention, when performing touch detection on the operation surface based on an image, it is possible to improve the detection accuracy of the touch position and improve user operability.

1 is a diagram illustrating an example of a network configuration of a camera scanner 101. FIG. 2 is a diagram illustrating an example of the appearance of a camera scanner 101. FIG. 3 is a diagram illustrating an example of a hardware configuration of a controller unit 201. FIG. 3 is a diagram illustrating an example of a functional configuration of a control program for the camera scanner 101. FIG. It is the flowchart and explanatory drawing of the process which the distance image acquisition part 408 performs. 4 is a flowchart of processing executed by a gesture recognition unit 409 in the first embodiment. 6 is an explanatory diagram of processing executed by a gesture recognition unit 409 in Embodiment 1. FIG. In Embodiment 1, it is the figure which represented typically the method of estimating a fingertip position. In Embodiment 1, it is the figure which represented typically the method of estimating a touch position from a fingertip position. 10 is a flowchart of processing executed by the gesture recognition unit 409 in the second embodiment. In Embodiment 2, it is the figure which demonstrated typically the method of estimating a touch position from the angle information of the finger | toe with respect to a plane. 10 is a flowchart of processing executed by a gesture recognition unit 409 in Embodiment 3. In Embodiment 4, it is the figure which represented typically the method of estimating a touch position from the information of RGB image, and the angle information of a plane. 10 is a flowchart of processing executed by a gesture recognition unit 409 in Embodiment 3. In Embodiment 4, it is the figure which represented typically the method of estimating a touch position. 10 is a flowchart executed by the gesture recognition unit 409 in the fourth embodiment.

Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
[Embodiment 1]
FIG. 1 is a diagram illustrating a network configuration including a camera scanner 101 according to an embodiment. As shown in FIG. 1, a camera scanner 101 is connected to a host computer 102 and a printer 103 via a network 104 such as Ethernet (registered trademark). In the network configuration of FIG. 1, a scan function for reading an image from the camera scanner 101 and a print function for outputting scan data by the printer 103 can be executed by an instruction from the host computer 102. Further, it is possible to execute a scan function and a print function by direct instructions to the camera scanner 101 without using the host computer 102.

<Configuration of camera scanner>
FIG. 2 is a diagram illustrating a configuration example of the camera scanner 101 according to the embodiment. As shown in FIG. 2A, the camera scanner 101 includes a controller unit 201, a camera unit 202, an arm unit 203, a projector 207, and a distance image sensor unit 208. A controller unit 201 that is a main body of the camera scanner, a camera unit 202 for performing imaging, a projector 207, and a distance image sensor unit 208 are connected by an arm unit 203. The arm portion 203 can be bent and stretched using a joint. FIG. 2A also shows a document table 204 on which the camera scanner 101 is installed. The lenses of the camera unit 202 and the distance image sensor unit 208 are directed toward the document table 204, and can read an image in the reading region 205 surrounded by a broken line. In the example of FIG. 2A, the document 206 is placed in the reading area 205 and can be read by the camera scanner 101. The camera unit 202 may capture an image with a single resolution, but it is preferable that the camera unit 202 can capture a high-resolution image and a low-resolution image. A turntable 209 may be provided in the document table 204. The turntable 209 can be rotated by an instruction from the controller unit 201, and the angle between the object placed on the turntable 209 and the camera unit 202 can be changed. Although not shown in FIG. 2, the camera scanner 101 may further include an LCD touch panel 330 and a speaker 340. Furthermore, various sensor devices such as a human sensor, an illuminance sensor, and an acceleration sensor for collecting surrounding environmental information can be included. The distance image is image data in which the distance from the distance image sensor unit 208 is associated with each pixel of the image data.

  FIG. 2B shows a coordinate system in the camera scanner 101. The camera scanner 101 defines a coordinate system such as a camera coordinate system, a distance image coordinate system, and a projector coordinate system for each hardware device. In these, the image plane captured by the camera unit 202 and the distance image sensor unit 208 or the image plane projected by the projector 207 is defined as an XY plane, and a direction orthogonal to each image plane is defined as a Z direction. Further, in order to be able to handle three-dimensional image data (three-dimensional data) of these independent coordinate systems in a unified manner, the plane including the document table 204 is an XY plane, and the direction perpendicular to the XY plane is upward. Define a Cartesian coordinate system with Z as the Z axis. That is, the XY plane can also be called a bottom surface.

As an example of transforming the coordinate system, FIG. 2C illustrates a rectangular coordinate system, a space expressed using a camera coordinate system centered on the camera unit 202, and an image plane captured by the camera unit 202. Show the relationship. The three-dimensional point P [X, Y, Z] in the orthogonal coordinate system can be converted to the three-dimensional point Pc [Xc, Yc, Zc] in the camera coordinate system by the equation (1).
[X c , Y c , Z c ] T = [R c | t c ] [X, Y, Z, 1] T ... (1)
Here, Rc and tc are constituted by external parameters obtained by the posture (rotation) and position (translation) of the camera with respect to the orthogonal coordinate system, and Rc is called a 3 × 3 rotation matrix and tc is called a translation vector. Conversely, a three-dimensional point defined in the camera coordinate system can be converted to an orthogonal coordinate system using equation (2).
[X, Y, Z] T = [R c -1 | -R c -1 t c ] [X c , Y c , Z c , 1] T ... (2)
Further, the two-dimensional camera image plane photographed by the camera unit 202 is obtained by converting the three-dimensional information in the three-dimensional space into two-dimensional information by the camera unit 202. That is, the three-dimensional point Pc [Xc, Yc, Zc] on the camera coordinate system is converted by perspective projection conversion to the two-dimensional coordinate pc [xp, yp] on the camera image plane according to the equation (3). I can do it.
λ [x p , y p , 1] T = A [X c , Y c , Z c ] T ... (3)
Here, A is a 3 × 3 matrix called an internal parameter of the camera and expressed by a focal length and an image center.

  As described above, by using the equations (1) and (3), the three-dimensional point group represented by the orthogonal coordinate system is converted into the three-dimensional point group coordinates or the camera image plane in the camera coordinate system. I can do it. It is assumed that the internal parameters of each hardware device and the position / orientation (external parameters) with respect to the orthogonal coordinate system are calibrated in advance by a known calibration method. Hereinafter, when there is no particular notice and it is expressed as a three-dimensional point group, it represents three-dimensional data in an orthogonal coordinate system.

<Hardware configuration of camera scanner controller>
FIG. 3 is a diagram illustrating a hardware configuration example of the controller unit 201 which is the main body of the camera scanner 101. As shown in FIG. 3, the controller unit 201 includes a CPU 302, a RAM 303, a ROM 304, an HDD 305, a network I / F 306, an image processor 307, a camera I / F 308, a display controller 309, and a serial I / F 310 connected to the system bus 301. Audio controller 311 and USB controller 312.

  The CPU 302 is a central processing unit that controls the operation of the entire controller unit 201. The RAM 303 is a volatile memory. A ROM 304 is a non-volatile memory, and stores a startup program for the CPU 302. The HDD 305 is a hard disk drive (HDD) having a larger capacity than the RAM 303. The HDD 305 stores a control program for the camera scanner 101 executed by the controller unit 201.

  The CPU 302 executes a startup program stored in the ROM 304 when the power is turned on. This activation program is for reading a control program stored in the HDD 305 and developing it on the RAM 303. When the CPU 302 executes the startup program, the CPU 302 subsequently executes the control program developed on the RAM 303 to perform control. Further, the CPU 302 also stores data used for the operation by the control program on the RAM 303 to read / write. Further, various settings necessary for operation by the control program and image data generated by camera input can be stored on the HDD 305 and read / written by the CPU 302. The CPU 302 communicates with other devices on the network 104 via the network I / F 306.

  The image processor 307 reads and processes the image data stored in the RAM 303 and writes it back to the RAM 303. Note that image processing executed by the image processor 307 includes rotation, scaling, color conversion, and the like.

  The camera I / F 308 is connected to the camera unit 202 and the distance image sensor 208, and acquires image data from the camera unit 202 and the distance image data from the distance image sensor unit 208 in accordance with an instruction from the CPU 302, and writes them into the RAM 303. In addition, a control command from the CPU 302 is transmitted to the camera unit 202 and the distance image sensor 208 to set the camera unit 202 and the distance image sensor 208. The distance image sensor 208 includes an infrared pattern projection unit 361, an infrared camera 362, and an RGB camera 363. This will be described later.

  The controller unit 202 can further include at least one of a display controller 309, a serial I / F 310, an audio controller 311, and a USB controller 312.

  A display controller 309 controls display of image data on the display in accordance with an instruction from the CPU 302. Here, the display controller 309 is connected to the short focus projector 207 and the LCD touch panel 330.

  The serial I / F 310 inputs and outputs serial signals. Here, the serial I / F 310 is connected to the turntable 210, and transmits instructions for starting and ending rotation of the CPU 302 and a rotation angle to the turntable 209. Further, the serial I / F 310 is connected to the LCD touch panel 330, and the CPU 302 acquires the coordinates pressed via the serial I / F 310 when the LCD touch panel 330 is pressed.

  The audio controller 311 is connected to the speaker 340, converts audio data into an analog audio signal in accordance with an instruction from the CPU 302, and outputs audio through the speaker 340.

  The USB controller 312 controls an external USB device in accordance with an instruction from the CPU 302. Here, the USB controller 312 is connected to an external memory 350 such as a USB memory or an SD card, and reads / writes data from / to the external memory 350.

<Functional structure of camera scanner control program>
FIG. 4 is a diagram illustrating a functional configuration 401 of a control program for the camera scanner 101 executed by the CPU 302. The control program for the camera scanner 101 is stored in the HDD 305 as described above, and the CPU 302 develops and executes it on the RAM 303 at the time of activation. The main control unit 402 is the center of control and controls other modules in the functional configuration 401. The image acquisition unit 416 is a module that performs image input processing, and includes a camera image acquisition unit 407 and a distance image acquisition unit 408. The camera image acquisition unit 407 acquires image data output from the camera unit 202 via the camera I / F 308 and stores the image data in the RAM 303. The distance image acquisition unit 408 acquires the distance image data output from the distance image sensor unit 208 via the camera I / F 308 and stores it in the RAM 303. Details of the processing of the distance image acquisition unit 408 will be described later with reference to FIG.

  The gesture recognition unit 409 continues to acquire the image on the document table 204 from the image acquisition unit 416, and notifies the main control unit 402 when a gesture such as touch is detected. Details of the processing will be described later using the flowchart of FIG. 6A. The image processing unit 411 is used by the image processing processor 307 to analyze the images acquired from the camera unit 202 and the distance image sensor unit 208. The gesture recognition unit 409 described above is also executed using the function of the image processing unit 411.

  The user interface unit 403 receives a request from the main control unit 402 and generates a GUI component such as a message or a button. Then, the display unit 406 is requested to display the generated GUI component. The display unit 406 displays the requested GUI component on the projector 207 or the LCD touch panel 330 via the display controller 309. Since the projector 207 is installed toward the document table 204, it is possible to project GUI parts on the document table 204. In addition, the user interface unit 403 receives a gesture operation such as touch recognized by the gesture recognition unit 409, an input operation from the LCD touch panel 330 via the serial I / F 310, and coordinates thereof. The user interface unit 403 determines the operation content (such as a pressed button) by associating the content of the operation screen being drawn with the operation coordinates. By notifying the main control unit 402 of this operation content, the operator's operation is accepted.

  The network communication unit 404 communicates with other devices on the network 104 by TCP / IP via the network I / F 306. The data management unit 405 stores and manages various data such as work data generated in the execution of the control program 401 in a predetermined area on the HDD 305. For example, the scan data generated by the flat document image photographing unit 411, the book image photographing unit 412, and the three-dimensional shape measuring unit 413.

<Description of Distance Image Sensor and Distance Image Acquisition Unit>
FIG. 5 shows the configuration of the distance image sensor 208. The distance image sensor 208 is an infrared pattern projection type distance image sensor. The infrared pattern projection unit 361 projects a three-dimensional measurement pattern onto an object using infrared rays that are invisible to human eyes. The infrared camera 362 is a camera that reads a three-dimensional measurement pattern projected on an object. The RGB camera 363 is a camera that captures visible light visible to the human eye using RGB signals.

  The processing of the distance image acquisition unit 408 will be described with reference to the flowchart of FIG. FIGS. 5B to 5D are diagrams for explaining the principle of distance image measurement by the pattern projection method. When the distance image acquisition unit 408 starts processing, in step S501, the infrared pattern projection unit 361 is used to project a three-dimensional shape measurement pattern 522 using infrared rays onto the object 521 as shown in FIG. In step S502, an RGB image 523 obtained by photographing an object using the RGB camera 363 and an infrared camera image 524 obtained by photographing the three-dimensional measurement pattern 522 projected in step S501 using the infrared camera 362 are acquired. Since the infrared camera 362 and the RGB camera 363 have different installation positions, the shooting areas of the two RGB camera images 523 and the infrared camera image 524 that are respectively captured as shown in FIG. 5C are different. Therefore, in step S503, the infrared camera image 524 is matched with the coordinate system of the RGB camera image 523 by using coordinate system conversion from the coordinate system of the infrared camera 362 to the coordinate system of the RGB camera 363. It is assumed that the relative positions of the infrared camera 362 and the RGB camera 363 and the respective internal parameters are known by a prior calibration process.

  In step S504, as shown in FIG. 5D, corresponding points between the three-dimensional measurement pattern 522 and the infrared camera image 524 that has undergone coordinate transformation in step S503 are extracted. For example, one point on the infrared camera image 524 is searched from the three-dimensional shape measurement pattern 522, and association is performed when the same point is detected. Alternatively, a pattern around the pixels of the infrared camera image 524 may be searched from the three-dimensional shape measurement pattern 522 and associated with a portion having the highest similarity. In step S505, the distance from the infrared camera 362 is calculated by performing calculation using the triangulation principle with the straight line connecting the infrared pattern projection unit 361 and the infrared camera 362 as the base line 525. For the pixels that can be associated in step S504, the distance from the infrared camera 362 is calculated and stored as a pixel value. For pixels that cannot be associated, an invalid value is set as the portion where the distance could not be measured. save. This is performed for all the pixels of the infrared camera image 524 that have undergone coordinate transformation in step S503, thereby generating a distance image in which each pixel has a distance value. In step S506, the RGB value of the RGB camera image 525, that is, the color information is stored in each pixel of the distance image, thereby generating a distance image having four values of R, G, B, and distance for each pixel. The distance image acquired here is based on the distance image sensor coordinate system defined by the RGB camera 363 of the distance image sensor 208. Therefore, in step S507, as described above with reference to FIG. 2B, the distance data obtained as the distance image sensor coordinate system is converted into a three-dimensional point group in the orthogonal coordinate system. (As described above, when there is no particular designation and it is expressed as a three-dimensional point group, it indicates a three-dimensional point group in an orthogonal coordinate system.) In this way, a three-dimensional point indicating the shape of an object measured. You can get a group.

  In this embodiment, as described above, the infrared pattern projection method is adopted as the distance image sensor 208, but a distance image sensor of another method may be used. For example, other measuring means such as a stereo system that performs stereo stereoscopic vision with two RGB cameras and a TOF (Time of Flight) system that measures distance by detecting the time of flight of laser light may be used.

<Description of gesture recognition unit>
Details of the processing of the gesture recognition unit 409 will be described with reference to the flowchart of FIG. 6A. In FIG. 6A, when the gesture recognition unit 409 starts processing, initialization processing is performed in step S601. In the initialization process, the gesture recognition unit 409 acquires one frame of the distance image from the distance image acquisition unit 408. Here, since the object is not placed on the document table 204 at the start of the gesture recognition unit, the plane of the document table 204 is recognized as an initial state. That is, the widest plane is extracted from the acquired distance image, and the position and normal vector (hereinafter referred to as plane parameters of the document table 204) are calculated and stored in the RAM 303.

  In step S602, a three-dimensional point group of the object existing on the document table 204 shown in steps S621 to 622 is acquired. At that time, in step S621, the distance image acquisition unit 408 acquires one frame of the distance image and the three-dimensional point group. In step S622, using the plane parameter of the document table 204, the point group on the plane including the document table 204 is removed from the acquired three-dimensional point group.

  In step S603, the process of detecting the shape and fingertip of the user's hand from the acquired three-dimensional point group shown in steps S631 to S634 is performed. Here, it demonstrates using the figure which represented typically the method of the fingertip detection process shown to FIG. 6B (b)-(e). In step S631, by extracting from the 3D point group acquired in step S602 a 3D point group of skin color (hand color) that is at a predetermined height (distance) or more from the plane including the document table 204, Obtain a 3D point cloud of the hand. A three-dimensional point group 661 in FIG. 6B (b) represents a three-dimensional point group of extracted hands, that is, a hand region. The skin color here does not indicate a specific color, but is a general term covering various skin colors. The skin color may be determined in advance or may be selected according to the operator.

  Alternatively, the hand region may be found by taking the background difference of the distance image without using the skin color. The discovered hand region can be converted into a three-dimensional point group by the method described above.

  In step S632, a two-dimensional image obtained by projecting the extracted three-dimensional point group of the hand onto the plane of the document table 204 is generated, and the outline of the hand is detected. A two-dimensional point group 662 in FIG. 6B (b) represents a three-dimensional point group projected onto the plane of the document table 204. The projection may be performed by projecting the coordinates of the point group using the plane parameters of the document table 204. Further, as shown in FIG. 6B (c), if only the value of the xy coordinates is extracted from the projected three-dimensional point group, it can be handled as a two-dimensional image 663 viewed from the z-axis direction. At this time, it is assumed that each point of the three-dimensional point group of the hand corresponds to which coordinate of the two-dimensional image projected on the plane of the document table 204.

  In step S633, the fingertip is detected. Here are some ways to find your fingertips. First, a method of using the curvature of the hand outline (that is, outline) will be described.

  For each point on the detected outer shape of the hand, the curvature of the outer shape at that point is calculated, and a point where the calculated curvature is greater than a predetermined value is detected as a fingertip. Next, how to calculate the curvature will be described. An outline point 664 in FIG. 6B (e) represents a part of a point representing the outer shape of the two-dimensional image 663 projected onto the plane of the document table 204. Here, the curvature of the outer shape of the hand is calculated by performing circular fitting using the least square method for a finite number of adjacent contour points among the points representing the outer shape such as the contour point 664. This is performed for all contour points, and when the curvature is larger than a predetermined value and the center of the fitted circle is inside the contour of the hand, the center of adjacent finite number of contour points. A point is determined as a fingertip. As described above, since the RAM 303 stores the correspondence relationship with the three-dimensional point group related to the contour point of the outer shape of the hand, the gesture recognizing unit 409 can use the three-dimensional information of the fingertip point. Whether the center of the circle is inside or outside the outline of the hand is determined by, for example, finding a contour point on the line that passes through the center of the circle and parallel to the coordinate axis, and is based on the positional relationship between the contour point and the circle center it can. If the center of the circle is odd-numbered from the end of the line among the contour points and the center of the circle, it can be determined that the center of the circle is outside the outer shape of the hand, and if it is even-numbered, it is inside.

  Circles 669 and 670 in FIG. 6B (e) represent examples of fitted circles. The circle 669 is not detected as a fingertip because the curvature is smaller than a predetermined value and the center of the circle is outside the outer shape, and the circle 670 has a curvature larger than the predetermined value and the center of the circle is inside the outer shape. Detected as

  Here, the method of finding the fingertip by calculating the curvature using circle fitting using the least square method is used, but the fingertip is set so that the radius of the circle surrounding the finite number of adjacent contour points is minimized. You may discover. Next, an example will be described.

  FIG. 6B (d) schematically shows a method for detecting a fingertip from a circle surrounding a finite number of contour points. As an example, consider drawing a circle so as to include five adjacent contour points. Circles 665 and 667 are examples. This circle is drawn in order with respect to the contour points of all the outlines, and the diameter (for example, 666, 668) is smaller than a predetermined value, so that the middle (center) point of the five adjacent contour points is defined as the fingertip. To do. In this example, five points are adjacent to each other, but the number is not limited. Moreover, although the above described the method of discovering a fingertip by fitting a circle, you may make it discover a fingertip by elliptical fitting. Since the method of finding the fingertip by ellipse fitting is described in Non-Patent Document 1, that method may be used.

  The circle fitting and the ellipse fitting as described above can be easily realized by using an open source computer library such as OpenCV.

  In addition, a point farthest from the arm may be found as a fingertip. FIG. 7B shows a state where the arm 704 is in the reading area 205. This can be considered as a projection of the three-dimensional point group of the hand region onto the plane of the document table 204 described above. The number of pixels of this projection image is the same as the distance image obtained by the distance sensor 208. A region 703 is a region inside a predetermined number of pixels from the outer frame of the projection image. An area 705 is an area obtained by ANDing the thin area between the reading area 205 and the area 703 and the area of the arm 704. From the area 705, points 709 and 710 where the arm 703 has entered the reading area 205 can be found. For these processes, the distance image acquired by the distance sensor 208 may be directly processed. At this time, the area of the arm 704 is obtained by taking the difference between the background image of the distance image stored in the RAM 303 and the current distance image and binarizing it with a predetermined threshold value.

A line segment 706 in FIG. 7E is a line segment connecting the points 709 and 710. The middle point of the line segment 706 is set to 711, and this point is set as the base point of the arm. From the base point 711 of the arm, the fingertip point 712 can be determined by setting the farthest pixel in the outer shape of the arm as the fingertip point 712. Here, in order to obtain the base point of the arm, the midpoint of the arm intrusion position is taken. However, the base and the fingertip may be obtained by thinning the arm 704 itself. Thinning is possible by using a thinning algorithm for general image processing. Of the thinned arms, the one that intersects the region 705 may be detected as the base of the arm and the opposite end as the fingertip.
In step S633, the fingertip can be detected by the above method.

  In step S634, the number of detected fingertips and the coordinates of each fingertip are calculated. At this time, as described above, since the correspondence between each point of the two-dimensional image projected on the document table 204 and each point of the three-dimensional point group of the hand is stored, the three-dimensional coordinates of each fingertip can be obtained. Can do. This time, a method of detecting a fingertip from an image projected from a three-dimensional point group onto a two-dimensional image has been described, but the image to be detected by the fingertip is not limited to this. For example, the hand region is extracted from the background difference of the distance image or the skin color region of the RGB image, and the fingertip in the hand region is detected by the same method (external curvature calculation, etc.) as described above. Also good. In this case, since the coordinates of the detected fingertip are coordinates on a two-dimensional image such as an RGB image or a distance image, it is necessary to convert the coordinate information into the three-dimensional coordinates of the orthogonal coordinate system using the distance information of the distance image at that coordinate. is there.

  In step S606, touch gesture determination processing is performed. At this time, the gesture recognition unit 409 calculates the distance between the fingertip detected in the immediately preceding step and the plane including the document table 204. This calculation uses the detected three-dimensional coordinates of the fingertip and the plane parameters of the document table 204 described above. When the distance is equal to or smaller than a predetermined minute value, “touch gesture is present”, and when the distance is larger than the predetermined minute value, “touch gesture is absent”.

  Further, a virtual threshold plane (not shown) is provided at a predetermined height (Z direction) of the orthogonal coordinate system, and touch detection is performed by making the value of Z of the fingertip coordinates smaller than the value of Z of the threshold plane. Also good.

  In step S607, if “touch gesture is present” in the previous step, the process proceeds to step S608, and if “touch gesture is not present”, the process returns to step S602.

  In step S608, fingertip direction identification processing is performed. The fingertip direction is the direction of an arrow 702 in FIG. That is, the finger of the hand 701 is the same as the direction indicated in the plane of the document table 204. To specify the fingertip direction, the finger part is specified. For this purpose, first, a portion where the arm has entered the reading area 205 is specified. As described above, the points 709 and 710 in FIG. 7B can be found as the points where the arm 704 has entered the reading area 205.

  Next, the finger part is specified. A line segment 706 in FIG. 7C is a line segment connecting the points 709 and 710. In parallel with the line segment 706, the line segment group 707 is drawn to a region of the arm 704 (also referred to as an arm region 704) at a predetermined minute interval. A portion where this length is shorter than a predetermined threshold is specified as a fingertip. In FIG.7 (c), it becomes below a predetermined | prescribed threshold value from the position of the line segment 708. FIG.

  Next, the fingertip direction is specified. A vector 709 is defined from the coordinates of the midpoint of the line segment 708 toward the fingertip coordinates on the xy plane found in step S633. The direction of the vector 709 is the direction of the fingertip, and the length represents the length of the finger. For example, the vector 709 can be specified as a vector having the midpoint of the line segment 708 as the start point and the fingertip position specified in step S634 as the end point. Further, when the fingertip coordinates are obtained by the method described with reference to FIG. 7E, a vector 713 connecting the arm base point 711 to the fingertip point 712 may be determined as the finger direction vector. In this case, it is necessary to obtain the finger length by the method described above. However, in this case, it is not necessary to obtain the vector 709. Therefore, for example, in the line segment group 707 whose length is shorter than the above-described predetermined threshold (that is, the upper limit of the finger width), the line segment closest to the base 711 of the arm or the extension line intersects the vector 713. Find that point as the base of the finger. The distance from the point to the fingertip point 712 can be determined as the length of the finger. Of course, the vector 709 can be obtained by the method described above, and the finger length can be determined based on the vector 709.

  Further, as shown in FIG. 7F, a vector connecting the center point 714 of the palm (back of hand) to the fingertip point 715 may be defined as the finger direction vector 716. At this time, the center point 714 of the palm (back of the hand) can be obtained as a point where the distance from each pixel constituting the contour 717 of the hand region becomes the maximum in the hand region.

  Furthermore, when ellipse fitting is performed on the fingertip, the direction connecting the two focal points of the ellipse may be used as the finger direction vector. At this time, the vector direction may be determined from the midpoint of the point where the arm has entered the reading area, which is obtained by the above method. Also in this case, the length of the finger needs to be obtained using the above-described method.

  Although the above processing has been described as an example limited to the pointing posture, even if five fingers are opened, the above processing is performed for each of the plurality of line segments 708 that will be obtained for each finger, so that all the fingers are The direction and length can be determined.

  When step S608 ends, the process proceeds to step S609. In step S609, touch position determination processing is performed. This is a process for estimating the position of the belly of the finger that the user actually feels touching. A two-dimensional point group 801 in FIG. 8A represents an image of a hand region on the xy plane projected on the document table 204. Among these, the enlarged view of the part 802 is the enlarged part 803. For the finger 804, the vector 805 is the fingertip direction vector 709 obtained in step S608. Here, the xy coordinate of a point obtained by shifting the fingertip point 806 on the xy plane by a predetermined amount in the opposite direction to the vector 805 (ie, by a predetermined distance 807) is determined as the touch point 808 and stored in a predetermined area of the RAM 303. To do. The predetermined distance to be shifted is settable. In this case, the z coordinate of the touch point may be 0, or the z coordinate may be determined from the corresponding point of the three-dimensional point group. Note that the position of the fingertip 806 may be the fingertip position specified in step S634.

  Further, the method of determining the touch position (finger's belly) is not limited to the method of shifting by a predetermined distance as described above. For example, as shown in FIG. 8B, the center 810 of the circle 809 used for the circle fitting may be determined as the touch position when the fingertip is found.

  Further, as shown in FIG. 8C, a point 812 on the fingertip side among the focal points (812, 813) of the ellipse 811 fitted to the fingertip may be determined as the touch position. At this time, in order to determine whether or not the focus is on the fingertip side, the one far from the base of the arm may be adopted.

  Furthermore, the barycentric point of the pixels constituting the outer shape of the fingertip may be determined as the touch position. FIG. 8D schematically shows the relationship between the pixels constituting the outer shape of the fingertip and the barycentric point. A pixel group 814 that forms the outer shape of the fingertip represents a plurality of adjacent pixels among the pixels of the contour points that form the outer shape of the arm used in the above-described discovery of the fingertip. It is assumed that the pixel group 814 includes nine pixels when the fingertip is found, and the middle pixel 806 is found as the fingertip. The barycentric point of the pixel group 814 including the fingertip point 806 is set to 815, and the barycentric point 815 may be determined as the touch position.

  Further, as shown in FIG. 8I, the center of gravity 826 of the finger pixel included in the predetermined peripheral region 825 of the fingertip point 806 may be determined as the touch position. At this time, the predetermined peripheral region is not limited to a circle as shown in FIG. A vector connecting the center of gravity 826 to the fingertip point 806 may be used as the fingertip direction vector.

  Further, polygon approximation may be performed on the pixels constituting the outer shape of the fingertip, and the center of gravity of the polygon may be determined as the touch position. FIG. 8 (e) schematically shows how the polygonal approximation is performed on the outer shape of the fingertip. A pentagon 816 represents a polygon approximated to the outer shape of the fingertip. Since the center of gravity is represented by a point 817, the point 817 may be determined as the touch position. Polygon approximation can be easily executed by using an open source API such as OpenCV.

  Furthermore, the touch position may be determined using the circle used for fitting at the time of fingertip discovery and the fingertip direction vector. FIG. 8F schematically shows a method for determining the touch position using the circle used for fitting at the time of fingertip discovery and the fingertip direction vector. A vector 818 represents a vector obtained by extending the fingertip direction vector. Of the intersection points of the vector 818 and the circle 809 fitted to the fingertip, a point 819 closer to the tip of the vector is obtained as a virtual fingertip. This virtual fingertip point is different from the fingertip point used when detecting a touch. A point obtained by shifting the virtual fingertip point 819 by a predetermined distance 807 in the direction opposite to the fingertip direction vector may be determined as the touch position 820.

  Similarly, the touch position may be determined using the ellipse fitted to the fingertip and the fingertip direction vector. FIG. 8G schematically shows a method for determining a touch position using an ellipse fitted to the fingertip and a fingertip direction vector. Of the intersections of the vector 818 obtained by extending the fingertip direction vector and the ellipse 811, a point 821 on the fingertip side is set as a virtual fingertip. A point 822 obtained by shifting the virtual fingertip 821 by a predetermined distance in the direction opposite to the fingertip direction vector may be determined as the fingertip point.

  The above processing can be performed using a two-dimensional image obtained by projecting a three-dimensional point group of a hand onto the plane of the document table 204 or a distance image acquired from the distance image sensor 208.

  In addition, the touch position may be determined using an RGB image. Further, when an RGB image is used, the touch position may be determined by finding a nail. FIG. 8H is an enlarged view of the fingertip 805, and schematically shows how the touch position is determined from the nail region in the RGB image. A nail 823 represents a nail region found from the RGB image. The nail area can be found by looking at the difference in luminance value from the surrounding finger area. What is necessary is just to obtain | require the gravity center point of the area | region of the discovered nail | claw, and determine it as a touch position. At this time, since the RGB image and the distance image are aligned as described above, the two-dimensional image obtained by projecting the center of gravity of the nail region onto the plane of the document table 204 with the distance image or the three-dimensional point group of the hand. It is easily possible to convert to the corresponding position.

  If the method as described above is used, it is possible to estimate the touch position (the position of the belly of the finger) touching the plane.

  When step S609 ends, the process proceeds to step S605. In step S605, the determined touch gesture and the three-dimensional coordinates of the touch position are notified to the main control unit 402, and the process returns to step S602 to repeat the gesture recognition process.

  In the present embodiment, the gesture recognition with one finger has been described, but the present invention can also be applied to gesture recognition with a plurality of fingers or a plurality of hands. For example, if the touch position is periodically acquired by repeating the procedure of FIG. 6A, various gestures can be specified based on the presence or absence of a touch or a change in the touch position. The main control unit 402 is a part that executes an application. When the main control unit 402 receives the touch gesture, the main control unit 402 executes a corresponding process defined by the application.

  According to the present embodiment, it is possible to photograph the fingertip and the plane from the upper surface by the distance image sensor, and specify the exact touch position on the plane using the distance image.

[Embodiment 2]
In the first embodiment, the basic part of the method for determining the touch position when photographing the fingertip and the plane from the sensor on the upper surface has been described. In order to determine the touch position, a fingertip is found from the distance image acquired by the distance image sensor, and the coordinate of the fingertip position is shifted by a predetermined distance in the direction opposite to the fingertip direction to determine the touch position coordinates. It was. In this embodiment, when the user wants to give a slightly more detailed touch instruction, the touch position is corrected, and the method for improving the operability by specifying or estimating the corrected position as the touch position is shown in FIG. A description will be given along a flowchart executed by the recognition unit 409. FIG. 10A schematically illustrates a case where correction of the touch position is necessary. The upper part of FIG. 10A is a side view of the state where the finger 1001 is touching the plane 1003 which is a part of the document table 204. In this case, the three-dimensional point of the fingertip discovered in the same manner as the method described in the first embodiment is represented by the fingertip position 1005. A touch position point determined by shifting the fingertip coordinates indicating the position of the fingertip by a user-defined predetermined value 1007 by the method described in the first embodiment is represented by a touch position 1006. The lower diagram in FIG. 10A shows a case where the angle of the finger 1002 with respect to the plane 1004 is larger than that in the upper diagram. In this case, the point of the touch position obtained by the same method as in the first embodiment is the position 1008, but the point actually in contact with the plane is the position 1009. In this way, in order to obtain the touch point, only by shifting by a predetermined fixed value, depending on the angle with respect to the plane of the fingertip, it is actually touching, or the point that the user feels touching, There is a possibility that the point obtained as the point of the touch position is shifted. Therefore, in the present embodiment, the angle of the fingertip is used to determine the amount by which the fingertip position is shifted in order to obtain the touch position point.

  Steps described as step S6xx in the flowchart of FIG. 9 have been described in the description of FIGS. 6A and 6B in the first embodiment. Here, the description will focus on the step described as step S9xx, which is the difference.

  After identifying the fingertip direction vector 709 in step S608, the gesture recognizing unit 409 obtains the angle formed by the finger and the plane of the document table 204 in step S901. At this time, the fingertip direction vector 709 obtained in step S608 is used. The fingertip direction vector 709 is a two-dimensional vector on the plane of the document table 204, that is, the xy plane. This vector is expressed as vectors 1010 and 1012 in FIG. 10B in a side view. The start point and end point of these vectors 1010 and 1012 are associated with the points in the three-dimensional point group of the hand described above. This association has already been made when the three-dimensional point group is projected onto the plane in step S603 described above. In the example in FIG. 10B, the start point of the vector 1010 can be associated with the three-dimensional point 1018, and the end point can be associated with the three-dimensional point 1005. For example, an intersection of a straight line passing through each end point of the vector and parallel to the z-axis and a surface formed by the three-dimensional point group of the hand is set as each end point of the obtained three-dimensional vector. Since the three-dimensional point cloud of the hand forms the surface of the hand, there may be two intersection points for each straight line, but at each end point the same side (ie, the z component is smaller or more Any one of the larger ones may be used as long as the intersection is adopted. In the example of FIG. 10, the intersection having the larger z component is used. Of course, this is only an example. In this way, if the vectors 1011 having the three-dimensional points 1018 and 1005 as the start point and the end point are obtained, this becomes the three-dimensional vector of the finger. Similarly, a three-dimensional vector 1013 of the finger can be obtained. An angle 1020 formed by the vector 1010 and the vector 1011 and an angle 1022 formed by the vector 1012 and the vector 1013 are obtained as angles formed by the finger and the plane.

  Next, in step S902, an amount by which the fingertip position is shifted to obtain the touch position is obtained. FIG. 10C is a diagram schematically illustrating how the shift amount is determined using the angle of the finger with respect to the plane obtained in step S901. First, a description will be given with reference to the upper part of FIG. The vector 1014 has a three-dimensional point 1005 at the fingertip as a starting point, has a unit vector in the opposite direction to the three-dimensional vector 1018 of the finger, and has a predetermined length designated by the user. A point obtained by projecting the end point of the vector 1014 onto the xy plane 1003 along the z-axis is a point 1016, and this point is a touch position to be obtained. In the same way, the touch position 1017 can be obtained in the lower diagram of FIG. In this way, if a position shifted from the tip of the finger by a predetermined distance in the opposite direction of the three-dimensional fingertip direction vector is projected onto the xy plane (that is, the operation surface), the touch position can be moved back and forth according to the angle of the finger with respect to the plane. Therefore, it is possible to provide a touch position that does not impair the user's touch feeling.

The operation for obtaining the touch positions 1016 and 1017 is on the xy plane of the document table 204 and is nothing but the operation for obtaining the vectors 1021 and 1023 starting from the fingertip points. As shown in FIG. 10D, vectors 1010 and 1012 that are opposite to the vectors 1010 and 1012 are assumed to be a vector 1024 and a vector 1025, respectively. If vectors 1014 and 1015 are defined as vector v, vectors 1024 and 1025 are defined as vector w, and vectors 1021 and 1023 to be obtained are defined as vector x, vector x is an orthogonal projection of vector v onto vector w. The vector v ′ obtained by orthogonally projecting the vector v onto the vector w is expressed by the following equation using the angle θ, where the angles 1020 and 1022 are defined as θ.
v ′ = (| v || w | cos θ / | w |) × w / | w | (4)
Since w / | w | in the equation (4) is a unit vector in the same direction as the vector w, the constant “| v || w | cosθ / | w |” = | v | cosθ is a vector to be obtained. The magnitude of v ′, that is, a shift amount for shifting the position of the fingertip to the touch position in the xy plane. If the vector w is on the xy plane, the orthogonal projection v ′ of the vector v with respect to the vector w can be obtained by substituting the z component of the starting point and the ending point of the vector v with 0.

  In step S903, the gesture recognition unit 409 determines the end point of the vector v ′ obtained in step S902 starting from the fingertip position as the touch position. That is, the fingertip position is shifted by the shift amount obtained in step S902 along the two-dimensional vector in the fingertip direction in the xy plane, the coordinates are determined as the touch position, and stored in the RAM 303.

  By performing the above processing, the touch position can be changed according to the angle between the fingertip direction and the operation plane, and the touch position can be specified more accurately.

  As can also be seen from FIG. 10C, the correction amount 1023 when the finger 1002 is standing with respect to the plane (the lower diagram in FIG. 10C) is the finger 1001 with respect to the plane. The correction amount is smaller than the correction amount 1021 when sleeping (upper figure of FIG. 10C). Under this assumption, the correction amount may be determined using the position touched by the user. When a user touches a position far from the user, the fingertip tends to sleep more than when a close position is touched. Therefore, the touch position may be determined by shifting from the fingertip a large correction amount at a distant position and a small correction amount at a close position. The distance from the user to the touch position can be measured by the distance from the base of the arm to the fingertip point described in the first embodiment.

  FIG. 10E schematically shows an example of the relationship between the distance from the user to the touch position and the correction amount in a graph. The horizontal axis represents the distance from the user, and the vertical axis represents the correction amount. Although a linear graph is drawn in FIG. 10 (e), it is not limited to linear. Even if the above process is used, the correction of the touch position according to the angle between the fingertip and the plane can be easily performed.

[Embodiment 3]
In the first and second embodiments, the basic part of the method for determining the touch position when photographing the fingertip and the plane from the sensor on the upper surface and the method for determining the touch position according to the angle of the finger relative to the plane have been described. These methods are established when the distance image sensor 208 has little noise.

  Here, the influence of the noise of the distance image sensor 208 on the detection of the touch position on the plane will be described. The upper diagram in FIG. 12A schematically shows a state where the finger 1201 is touching the plane 1202 and a state where the distance information 1203 of the plane actually acquired by the distance image sensor is viewed from the side. ing. Since the positional relationship between the distance image sensor 208 and the document stage 204 is fixed, ideally, the plane distance information acquired by the distance image sensor 208 is constant. However, since a certain amount of noise actually occurs, the distance information of the plane of the document table 204 has fluctuation in the time axis direction. When the distance information of the plane obtained from the distance image sensor is acquired as the distance information, it is acquired in a state including noise and having unevenness like the distance information 1203 in FIG. When obtaining the above-mentioned plane parameters, the average of these irregularities is obtained. The unevenness changes for each frame of the distance image acquired by the distance image sensor 208 due to fluctuation in the time axis direction. The plane of the document table 204 or the plane of the plane parameters described above is represented by the plane 1202 in FIG. On the other hand, in the current general distance image sensor, the distance information 1203 of the acquired distance image indicates unevenness that moves up and down by about ± 3 mm. Therefore, when extracting a three-dimensional point group having a predetermined height or more as a fingertip in step S631 of FIG. 6A described above, the fluctuation in the time direction of noise on the plane as described above is erroneously detected. There must be no. Therefore, a predetermined height 1205 of about 5 mm is required as a margin for absorbing irregularities of a surface that should be a flat surface that appears in the distance image. In FIG. 12A, a plane set to a predetermined height 1205 (about 5 mm) from the plane 1202 is represented by 1204. As described above, when detecting the hand region, the portion below the plane 1204 needs to be removed together with the plane, and therefore, if the three-dimensional point 1206 of the fingertip is below the plane 1204, it is removed. End up. At this time, a virtual fingertip point that can be detected as a fingertip among the points remaining without being removed is a point 1207 on the plane 1204. The lower diagram of FIG. 12A schematically shows the upper diagram as viewed from above (state on the xy plane). Fingertip point 1206 corresponds to point 1212, and virtual fingertip point 1207 corresponds to point 1211. Of the finger 1209, the region on the left side of the dotted line 1210 cannot be detected. In FIG. 12B, the portion surrounded by the dotted line 1213 is removed from the hand region. Only the part surrounded by the solid line is extracted as the hand region. In this case, a difference distance 1208 between the three-dimensional point 1206 of the true fingertip and the virtual fingertip point 1207 (1211) is 5 mm to 10 mm.

  In the method performed in the first embodiment or the second embodiment, the touch position is determined on the assumption that the fingertip position is accurately acquired. Therefore, when there is noise in the distance image as described above, it is difficult to determine an accurate touch position. When the touch position is detected using the virtual fingertip point 1207, as described above, a deviation from the actual touch position occurs by about 5 mm to 10 mm. Therefore, in the present embodiment, an accurate touch position is determined using RGB images that are simultaneously acquired and have less noise than the distance image. This method will be described with reference to the flowchart executed by the gesture recognition unit 409 in FIG. The portions written as step S6xx and step S9xx in FIG. 11 are portions described in FIG. 6A and FIG.

  After identifying the fingertip direction vector 709 in step S608, the gesture recognition unit 409 acquires the color image, that is, the RGB image, acquired by the distance image sensor 208 with the RGB camera 363 from the image acquisition unit 416 in step S1101.

  In step S1102, the gesture recognition unit 409 performs fingertip detection from the acquired RGB image. To do this, it is first necessary to detect the hand region in the RGB image, as was done with the distance image. For this reason, a difference image between the background image (the image on the document table 204 on which nothing is placed) saved in advance in the RAM 303 at the time of activation and the RGB image acquired in step S1101 is obtained. Alternatively, a skin color region is detected from the RGB image acquired in step S1101. Thereafter, by performing processing similar to steps S633 and S634 in FIG. 6A, a two-dimensional fingertip position on the xy plane can be found. FIG. 12C shows a state in which the finger of the RGB image is displayed superimposed on the finger 1209 obtained from the distance image on the xy plane image. At this time, the fingertip obtained using the distance image is represented by 1211. A portion 1214 surrounded by a dotted line is a difference area between the RGB image and the distance image of the finger. A point 1215 represents a fingertip discovered using an RGB image.

  In step S1103, the gesture recognizing unit 409 acquires an angle formed by the finger and the plane. This process is the same as the process of step S901 in FIG. At this time, in step S1103, the fingertip point 1211 acquired using the distance image is used as the fingertip coordinates.

  In step S1104, the gesture recognition unit 409 estimates a true three-dimensional fingertip position. As described above, the true three-dimensional fingertip position is the three-dimensional coordinates of the fingertip removed together with noise. A vector 1216 in FIG. 12D is a three-dimensional vector (also referred to as a finger vector) indicating the fingertip direction obtained in the immediately preceding step S1103. The three-dimensional vector of the finger is obtained with the virtual three-dimensional fingertip position 1207 as the tip. A dotted line 1219 is a side view of a plane 1219 passing through the two-dimensional fingertip position 1212 obtained from the RGB image and orthogonal to the orthogonal projection of the finger vector 1216 onto the plane 1202. The vector 1216 is extended to the end point side, and a point 1220 intersecting the plane 1219 is estimated as a true three-dimensional fingertip position. Since the x and y components of the point 1210 coincide with the x and y components of the point 1212, the point 1220 can be specified by obtaining the z component of the point 1210 according to the z component of the point 1207 and the slope of the vector 1216. A vector 1218 represents the extended vector. A vector obtained by adding the vector 1216 and the vector 1218 is used in the subsequent processing as a true finger three-dimensional vector. When step S1104 ends, the process proceeds to step S902. The processing from here is the same as the processing described in FIG. That is, a point returned from the fingertip position 1220 by a predetermined distance in the direction opposite to the finger vector is projected onto the xy plane, and the point is estimated as the touch position. At this time, the processing is performed using the three-dimensional vector of the true finger as the three-dimensional vector of the finger.

  By the above processing, even when the accuracy of the distance image sensor is not good, it is possible to estimate the three-dimensional fingertip position and determine the touch position.

[Embodiment 4]
In the third embodiment, the method of finding a three-dimensional fingertip position using an RGB image and determining the touch position when there is noise in the distance image has been described. In the present embodiment, a method will be described in which a true three-dimensional fingertip position is found using only a distance image without using an RGB image and is used for determining a touch position.

  FIG. 14A schematically shows a state where the finger 1401 just before touching the plane 1408 is lowered in the direction of the arrow 1404 and changed to the finger 1402 in the touched state. As described in the third embodiment, when there is noise in the distance image, it is necessary to set a flat threshold value (or flatness threshold value) at a predetermined height 1406. Therefore, the finger 1402 in the touched state has the tip portion 1405 removed together with the flat surface, and the fingertip is missing, so it is difficult to directly find the true three-dimensional fingertip position. However, since the finger 1401 just before touching is at a position higher than the predetermined height 1406, the fingertip is not lost. The finger length in this state is stored and used for estimation of the fingertip position after touching.

  This method will be described in detail with reference to the flowchart executed by the gesture recognition unit 409 in FIG. Of the steps in FIG. 13, those described as steps S6xx, S9xx, and S11xx are the same as the steps described in the flowcharts in FIGS. 6A, 9, and 11, and thus detailed description thereof is omitted.

  After performing fingertip detection in step S603, in step S1301, the gesture recognition unit 409 confirms whether a touch count described later is equal to or less than a predetermined value. Here, the touch count is a numerical value indicating how many times a touch on the plane has been performed after the processing of the gesture recognition unit 409 is started. If it is determined in step S607 that there is a touch gesture, it is incremented and stored in the RAM 303. If it is equal to or smaller than the predetermined value, the process proceeds to step S1302, and if it is equal to or larger than the predetermined value, the process proceeds to step S606.

  In step S1302, the gesture recognizing unit 409 checks whether the fingertip position is equal to or lower than a predetermined height. The predetermined height here is the height of 1412 in FIG. This needs to be set larger than the height 1406 for avoiding noise. Since the height 1412 is used to ensure that the finger is located far away from the plane 1407, the height 1412 is larger than the height 1406 and lower than the height of the finger in normal operation. For example, it is set to about twice the height 1406. When the height 1406 is set to about 5 mm, the height 1412 may be set to 10 to 20 mm. If it is below the predetermined height, the process proceeds to step S1303, and if it is higher than the predetermined height, the process proceeds to step S606.

  In step S1303, the gesture recognizing unit 409 executes processing for storing the finger length. At this time, the gesture recognizing unit 409 obtains the finger three-dimensional vector 1411 by the same method as step S901 described in the second embodiment. The length of the three-dimensional vector 1411 of the finger is stored in a predetermined area of the RAM 303.

  The processes in steps S1301 to S1303 are executed until the touch count exceeds a predetermined number. However, the three-dimensional vector of the finger may be acquired for the number of times, and the average length may be taken.

  When the direction of the fingertip touching the operation surface is identified in step S608, the gesture recognition unit 409 performs a process of estimating the three-dimensional fingertip position from the angle formed by the finger and the plane in step S1305. At this time, the virtual finger three-dimensional vector 1414 obtained in step S1103 is extended to the finger length obtained in steps S1301 to S1303 while maintaining the starting point position. The three-dimensional vector of the extended finger is a vector 1416. This tip 1417 is a true three-dimensional fingertip point. If this true three-dimensional fingertip point is used, the touch position can be determined in the subsequent steps as in the first and second embodiments.

  The present embodiment is intended for the case where the above-described plane threshold is constant with respect to the plane and is greater than or equal to a predetermined value. However, depending on the environment, the sensitivity of the sensor changes for each plane location, so that the plane threshold (height 1406 in FIG. 14) may be changed for each location. In that case, a true three-dimensional fingertip position may or may not be estimated for each location. In such a case, a threshold value for each plane location is stored in advance. The place is specified by an area division on the operation plane. Then, as shown in step S1501 of the flowchart of FIG. 15, it is determined whether or not the threshold value of the touched position plane is equal to or smaller than a predetermined value. You may do it. Similarly, when the true three-dimensional position of the fingertip is estimated from the RGB image, the processing may be switched according to the threshold value for each place on the plane.

  In the above processing, the finger length is stored when the fingertip becomes lower than the predetermined height 1412. However, the finger length is stored by holding the fingertip over the distance image sensor at the first activation. You may do it.

  In the flowchart, the true three-dimensional fingertip position is estimated in step S1305 and then the touch position is determined in steps S902 and S609. However, this order may be reversed. First, while the fingertip is not touched, the correction amount, that is, the position of the belly of the finger is calculated using the same processing as in steps S902 and S609. In step S1303, in addition to the obtained finger length, the length from the base of the finger to the belly of the finger is stored. After the touch gesture is detected, the same processing as in step S1305 may be performed using the length of the finger stored in advance to estimate the accurate touch position.

  In the above processing, the method for estimating the accurate touch position using the angle and length of the finger has been described. However, the accurate touch position may be estimated by storing the trajectory of the fingertip. . FIG. 14C is a diagram schematically showing how the fingertip position at the time of touching is estimated using the locus of the fingertip position. Positions 1421, 1422, and 1423 represent finger positions that are continuous in time series immediately before touching.

  The position 1424 represents the position of the finger at the predicted touch position. Since the fingertip at this time is below the height 1406 for avoiding noise, the correct fingertip position cannot be found as it is. A locus 1425 represents the locus of the fingertip. A trajectory 1426 represents a predicted trajectory of the fingertip. Here, a threshold is set at a predetermined position 1420 higher than the height 1406 for avoiding noise. The trajectory of the fingertip is stored in the RAM 303 until the coordinate value in the height direction in FIG. 14C is equal to or less than the threshold value 1420, and is used to predict the trajectory of the subsequent fingertip. The trajectory may be stored by using a current fingertip position and a previous fingertip position to store a three-dimensional straight line connecting two points. In this case, if the direction vector of the trajectory is taken in the direction of the straight line, a point where the direction vector and the plane 1408 of the document table (or a virtual plane provided at a predetermined height from the plane of the document table) intersect is predicted. It becomes a fingertip point.

  Alternatively, not only the current and previous two points but also a predetermined number of the latest fingertip positions may be stored in the RAM 303, and an approximate curve passing through the predetermined number of fingertip positions may be obtained three-dimensionally. In this case, a point at which the three-dimensional curve and the plane 1408 of the document table intersect (or a virtual plane provided at a predetermined height from the plane of the document table) is a predicted fingertip point. A virtual plane provided at a predetermined height from the plane 1408 of the document table is not shown. This virtual plane is a plane set above the plane 1408 of the actual document table by the thickness of the finger in consideration of the thickness of the finger. If the fingertip position is estimated, the touch position (the position of the belly of the finger) can be obtained using the methods described so far.

  In the above-described method, first, the fingertip position at the time of touching is estimated using the trajectory of the fingertip and then the order of obtaining the position of the belly of the finger is described. However, this may be reversed. That is, the estimation process of the position of the finger belly may be performed every frame using the method described so far, and the touch position may be estimated by obtaining the locus of the finger belly position. .

  Further, the above has described the method of always storing the trajectory of the finger. However, from the viewpoint of not reducing the CPU performance, the trajectory may be started to be stored when the finger falls below a predetermined height. . In FIG. 14C, a threshold value may be provided at the height 1412, and the finger trajectory may be started when the fingertip falls below the threshold value.

Furthermore, as a method for easily obtaining the trajectory of the finger, the fingertip position may be predicted by obtaining a straight line connecting two points having a predetermined height. For example, the coordinates of the fingertip when the finger crosses in order from the top are stored as the threshold values 1403 and 1420 in FIG. 14C, and a straight line connecting the respective coordinates three-dimensionally is obtained.
The point where the straight line and the plane 1408 of the document table intersect may be a predicted fingertip point.
With the above processing, it is possible to estimate an accurate touch position.

[Other Examples]
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

DESCRIPTION OF SYMBOLS 101 Camera scanner, 201 Controller part, 202 Camera part, 204 Document stand, 207 Projector, 208 Distance image sensor part

Claims (24)

  1. A user interface device that identifies operations performed on the operation surface,
    Acquisition means for acquiring a three-dimensional image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
    Extracting means for extracting a hand region from the three-dimensional image;
    First specifying means for specifying the position of a fingertip from the hand region;
    Detecting means for detecting a touch on the operation surface based on the position of the operation surface and the fingertip included in the three-dimensional image;
    A second specifying means for specifying an orientation of the fingertip from the hand region when a touch on the operation surface is detected;
    A user interface device, comprising: a determining unit configured to determine, as a touch position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface.
  2. The second specifying means specifies the direction of the fingertip from the hand region projected on the operation surface,
    The determination means determines, as a touch position, a position obtained by shifting the position of the fingertip of the hand region projected onto the operation surface by a predetermined amount in a direction opposite to the direction of the fingertip on the operation surface. The user interface device according to 1.
  3. The second specifying means specifies the orientation of the fingertip in the three-dimensional image;
    The said determination means projects the position which shifted the position of the said fingertip predetermined amount in the direction opposite to the direction of the said fingertip on the said operation surface, and determines this position as a touch position. User interface device.
  4. Means for acquiring a color image of a region of a three-dimensional space having the operation surface and the operation surface as a bottom surface;
    A third specifying means for specifying a position of the fingertip on the operation surface based on a color of a hand area in the color image;
    The determining means corrects the position extended to the fingertip position on the operation surface specified by the third specifying means in the direction of the fingertip of the three-dimensional image specified by the second specifying means. 4. The user according to claim 3, wherein the position is a fingertip position, a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip is projected on the operation surface, and the position is determined as a touch position. Interface device.
  5.   The second specifying means determines the center of the palm or back of the hand, and specifies the direction of the fingertip as the direction from the center of the palm or back of the hand toward the fingertip. The user interface device according to any one of the above.
  6.   6. The user interface device according to claim 1, wherein the determination unit determines a center of gravity of a pixel of a finger included in a predetermined peripheral region of the position of the fingertip as a touch position. .
  7. A determination unit that determines that the fingertip is within a predetermined distance from the operation surface when a touch on the operation surface is not detected by the detection unit;
    Measuring means for measuring the length of the finger from the hand region when the fingertip is below the predetermined distance;
    The determining means corrects the position of the fingertip after correcting the position of the fingertip shifted to the finger length measured by the measuring means in the direction of the fingertip of the three-dimensional image specified by the second specifying means. 4. The user interface device according to claim 3, wherein a position obtained by shifting the position of the fingertip by a predetermined amount in a direction opposite to the direction of the fingertip is projected onto the operation surface, and the position is determined as a touch position.
  8. The measuring means calculates a curvature of the outer shape from the outer shape of the hand region, specifies a point where the curvature is smaller than a predetermined value as a fingertip position,
    From the image obtained by projecting the three-dimensional image onto the operation surface, a location where the hand region has entered the operation surface is specified, and a position where the width of the hand region is smaller than a predetermined threshold is designated from the location. The user interface device according to claim 7, wherein the user interface device is specified as an original position of the finger, and a length from the original position of the finger to the position of the fingertip is measured as the length of the finger.
  9.   The determination means obtains the corrected fingertip position after determining the corrected fingertip position if a flatness threshold value stored in advance for each region of the operation surface is greater than a predetermined value. A position shifted by a predetermined amount in the direction opposite to the direction of the fingertip is projected onto the operation surface, and the position is determined as the touch position. Otherwise, the position of the fingertip specified by the first specifying means is The user interface device according to any one of claims 4 to 8, wherein a position shifted by a predetermined amount in a direction opposite to the direction of a fingertip is projected onto the operation surface, and the position is determined as a touch position.
  10.   The said acquisition means contains a distance sensor, The said three-dimensional image is acquired based on the distance image which has the distance measured by the said distance sensor for every pixel. The described user interface device.
  11.   The user interface device according to claim 1, wherein the extraction unit extracts a region of a predetermined color from the three-dimensional image as the hand region.
  12.   The said 1st specific | specification means calculates a curvature from the external shape of the said hand area | region, and specifies the point where the said curvature is smaller than a predetermined value as a position of a fingertip. The described user interface device.
  13.   The first specifying means calculates a curvature of the outer shape based on a circle or an ellipse that fits the outer shape of the hand region, and when the curvature is smaller than a predetermined value and is inside the hand region, The user interface device according to any one of claims 1 to 11, wherein a center point of a contour point that fits a circle or an ellipse is specified as the position of the fingertip.
  14.   When the radius of the smallest circle surrounding the adjacent finite number of contour points is smaller than a predetermined value among the contour points of the outline of the hand region, the first specifying unit is configured to determine the center of the finite number of contour points. The user interface device according to claim 1, wherein a point is specified as the position of the fingertip.
  15.   The first specifying means specifies a place where the hand area has entered the operation surface, and specifies the position of the hand area farthest from the place as the position of the fingertip. The user interface device according to any one of 1 to 11.
  16.   The first specifying means sets the touch position according to a distance from a position where the hand area has entered the operation surface to the position of the fingertip so that the predetermined amount increases as the distance increases. The user interface device according to claim 1, wherein the user interface device is specified.
  17.   The second specifying unit specifies a location where the hand region has entered the operation surface from an image obtained by projecting the three-dimensional image onto the operation surface, and the width of the hand region is a predetermined width from the location. The user interface device according to any one of claims 1 to 16, wherein a direction from a position smaller than a threshold value to a position of the fingertip is specified as a vector indicating a direction and length of the fingertip.
  18.   The second specifying means determines, from the hand region projected on the operation surface, a region where the width of the hand region is smaller than a predetermined value as a finger region, and from the end where the finger region is specified The user interface device according to any one of claims 1 to 16, wherein a direction toward the fingertip is specified as an orientation of the fingertip.
  19.   The second specifying means determines, from an image obtained by projecting the three-dimensional image on the operation surface, a location where the hand region has entered the operation surface as a root of the hand region, and the fingertip from the base The user interface device according to any one of claims 1 to 16, wherein a direction up to is specified as an orientation of the fingertip.
  20.   The determining means determines the distance from the fingertip to the center of a circle that fits the outer shape of the hand region or a point on the fingertip direction side of the focal point of an ellipse that fits the outer shape of the hand region. The user interface device according to any one of claims 1 to 19, wherein the touch position is determined as a fixed amount.
  21.   The determining means determines a touch position using a distance from the fingertip position to the center of gravity of a contour point surrounded by a minimum circle including the contour point specified as the fingertip position as a central point, as the predetermined amount. The user interface device according to claim 1, wherein the user interface device is a device.
  22. A user interface device that identifies operations performed on the operation surface,
    Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
    Estimating means for estimating the position of the belly of the finger from the three-dimensional image;
    A user interface device comprising:
  23. A method of controlling a user interface device that identifies an operation performed on an operation surface,
    An acquisition step of acquiring a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
    And a estimating step of estimating a position of a finger belly from the three-dimensional image.
  24. Computer
    Obtaining means for obtaining a three-dimensional image of a region in a three-dimensional space having the operation surface as a bottom surface;
    A program for functioning as an estimation means for estimating the position of the belly of a finger from the three-dimensional image.
JP2015147083A 2014-08-25 2015-07-24 User interface device, method and program Pending JP2016139396A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2014170886 2014-08-25
JP2014170886 2014-08-25
JP2015010680 2015-01-22
JP2015010680 2015-01-22

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/818,770 US10310675B2 (en) 2014-08-25 2015-08-05 User interface apparatus and control method

Publications (2)

Publication Number Publication Date
JP2016139396A true JP2016139396A (en) 2016-08-04
JP2016139396A5 JP2016139396A5 (en) 2018-09-06

Family

ID=56560408

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2015147083A Pending JP2016139396A (en) 2014-08-25 2015-07-24 User interface device, method and program

Country Status (1)

Country Link
JP (1) JP2016139396A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017170027A1 (en) * 2016-03-30 2017-10-05 セイコーエプソン株式会社 Image recognition apparatus, image recognition method, and image recognition unit
JP2018055257A (en) * 2016-09-27 2018-04-05 キヤノン株式会社 Information processing device, control method thereof, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0991079A (en) * 1995-09-28 1997-04-04 Toshiba Corp Information input device and image processing method
US20080273755A1 (en) * 2007-05-04 2008-11-06 Gesturetek, Inc. Camera-based user input for compact devices
JP2009042796A (en) * 2005-11-25 2009-02-26 Panasonic Corp Gesture input device and method
JP2009259117A (en) * 2008-04-18 2009-11-05 Panasonic Electric Works Co Ltd Mirror system
JP2010176565A (en) * 2009-01-30 2010-08-12 Denso Corp Operation device
US20130063336A1 (en) * 2011-09-08 2013-03-14 Honda Motor Co., Ltd. Vehicle user interface system
WO2014080829A1 (en) * 2012-11-22 2014-05-30 シャープ株式会社 Data input device
US20140204120A1 (en) * 2013-01-23 2014-07-24 Fujitsu Limited Image processing device and image processing method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0991079A (en) * 1995-09-28 1997-04-04 Toshiba Corp Information input device and image processing method
JP2009042796A (en) * 2005-11-25 2009-02-26 Panasonic Corp Gesture input device and method
US20080273755A1 (en) * 2007-05-04 2008-11-06 Gesturetek, Inc. Camera-based user input for compact devices
JP2010526391A (en) * 2007-05-04 2010-07-29 ジェスチャー テック,インコーポレイテッド Camera-based user input for compact devices
JP2009259117A (en) * 2008-04-18 2009-11-05 Panasonic Electric Works Co Ltd Mirror system
JP2010176565A (en) * 2009-01-30 2010-08-12 Denso Corp Operation device
US20130063336A1 (en) * 2011-09-08 2013-03-14 Honda Motor Co., Ltd. Vehicle user interface system
WO2014080829A1 (en) * 2012-11-22 2014-05-30 シャープ株式会社 Data input device
US20140204120A1 (en) * 2013-01-23 2014-07-24 Fujitsu Limited Image processing device and image processing method
JP2014143548A (en) * 2013-01-23 2014-08-07 Fujitsu Ltd Image processing apparatus, image processing method, and image processing program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017170027A1 (en) * 2016-03-30 2017-10-05 セイコーエプソン株式会社 Image recognition apparatus, image recognition method, and image recognition unit
JP2018055257A (en) * 2016-09-27 2018-04-05 キヤノン株式会社 Information processing device, control method thereof, and program

Similar Documents

Publication Publication Date Title
US8180114B2 (en) Gesture recognition interface system with vertical display
US9274608B2 (en) Systems and methods for triggering actions based on touch-free gesture detection
CN102572505B (en) System and method for calibrating a depth imaging sensor
KR101575016B1 (en) Dynamic selection of surfaces in real world for projection of information thereon
US8686943B1 (en) Two-dimensional method and system enabling three-dimensional user interaction with a device
US10482679B2 (en) Capturing and aligning three-dimensional scenes
RU2455676C2 (en) Method of controlling device using gestures and 3d sensor for realising said method
JP3950837B2 (en) Projector, electronic blackboard system using projector, and indication position acquisition method
JP5122641B2 (en) Pointing device with camera and mark output
US8971565B2 (en) Human interface electronic device
US20110267264A1 (en) Display system with multiple optical sensors
US8593402B2 (en) Spatial-input-based cursor projection systems and methods
US8723789B1 (en) Two-dimensional method and system enabling three-dimensional user interaction with a device
JP5991041B2 (en) Virtual touch screen system and bidirectional mode automatic switching method
CN102541365B (en) Multi-touch command generating apparatus and method
JP2016509394A (en) Method and system for projecting digital information on a real object in a real environment
US9117274B2 (en) System and method for interactive markerless paper documents in 3D space with mobile cameras and projectors
CN102906671B (en) Gesture input apparatus and gesture input method
US10203765B2 (en) Interactive input system and method
US20130135199A1 (en) System and method for user interaction with projected content
JP4768196B2 (en) Apparatus and method for pointing a target by image processing without performing three-dimensional modeling
US8854433B1 (en) Method and system enabling natural user interface gestures with an electronic system
JP2002213947A (en) System for measuring position of target, and method therefor
CN104937635B (en) More hypothesis target tracking devices based on model
US10241616B2 (en) Calibration of sensors and projector

Legal Events

Date Code Title Description
A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20180724

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20180724

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20190318

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20190322

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20190521

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20190802

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20191101

A911 Transfer of reconsideration by examiner before appeal (zenchi)

Free format text: JAPANESE INTERMEDIATE CODE: A911

Effective date: 20191111