CN101807114B

CN101807114B - Natural interactive method based on three-dimensional gestures

Info

Publication number: CN101807114B
Application number: CN201010139526XA
Authority: CN
Inventors: 潘志庚; 郭康德; 邵兴旦; 李扬; 李光霞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2010-04-02
Filing date: 2010-04-02
Publication date: 2011-12-07
Anticipated expiration: 2030-04-02
Also published as: CN101807114A

Abstract

The invention discloses a natural interactive method based on three-dimensional gestures, the method utilizes a computer vision technology to obtain local features of a hand by foreground segmentation and fingertip detection, and the local features include fingertip position, palm contour, palm center position and the like. By adopting the stereoscopic vision technology, the hand features such as the fingertip position, the palm center position and the like are reconstructed in the three-dimensional space. The finger tip position, the palm center position and the like in the three-dimensional space are parameterized, and a three-dimensional interactive model based on points, lines and planes is defined, thus realizing various three-dimensional gestures in the three-dimensional space, such as fingertip clicking, fingertip squeezing, palm overturning, fingertip directing and the like. The method needs only two ordinary network cameras to meet the demands of real-time man-machine interaction.

Description

A kind of natural interactive method based on three-dimension gesture

Technical field

The present invention relates to computer vision and human-computer interaction technology, relate in particular to a kind of man-machine interaction method based on three-dimension gesture.

Background technology

Traditional two-dimentional human-computer interaction technology (based on mouse/keyboard/game paddle/window interface), aspect its Software tool and interactive mode two, what developed is ripe relatively and perfect.But along with the continuous upgrading of man-machine interaction application demand, comprise the pursuit day by day that user's (being called the player in the recreation) experiences virtual scene, it is more and more obvious that the limitation of traditional two-dimentional man-machine interaction mode also embodies, and mainly shows:

From information representation capability, can not be with a kind of multi-dimensional relation of natural method representation complexity.When the expression multi-dimensional relation, have to use multi-windowed environment to represent application message, the division of multiwindow makes the user need bigger cognitive effort to set up the cognitive model of a complete sum unanimity.

On interactive mode, can't carry out nature, reasonably express with traditional interactive mode for the application in some fields (as the virtual world roaming etc.).When needs carry out three-dimension interaction with object, can only realize by the combination of different input modes.This method is very unnatural on the one hand, and has increased user's mutual difficulty, has increased the weight of the integration work of interactive task simultaneously yet.

Three-dimensional human-computer interaction technology is compared with the former limitation and to be had its born advantage, can satisfy the needs of three-dimension interaction naturally, mainly shows:

From information representation capability, can be enough a kind of multi-dimensional relation of natural method representation complexity.Three dimensional representation at certain application internal information is comparatively directly perceived, and is comparatively approaching with the object in the real world, and people are easier to this expression of perception and rise to a kind of rational knowledge.

On interactive mode, the interactive mode of people and object in the three-dimensional man-machine interaction mode simulating reality world, and allow real people or the direct and virtual three dimensional object of thing carry out alternately.This interactive mode is a nature and clearly for the expression of interaction semantics, and actual situation merges and can produce the mutual world simultaneously, can make the experience of man-machine interaction have more attractive force.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of natural interactive method based on three-dimension gesture is provided.

The objective of the invention is to be achieved through the following technical solutions: a kind of natural interactive method based on three-dimension gesture comprises the steps:

(1) inputted video image from two cameras uses online training Face Detection algorithm to obtain the foreground image of hand to image.

(2) the hand foreground image to obtaining uses the finger tip detection method to detect fingertip location.

(3) reconstruction of three-dimensional fingertip location, and by three-dimensional fingertip location definition three-dimension gesture interaction semantics.

The invention has the beneficial effects as follows: the present invention is based on the natural interactive method utilization computer vision technique of three-dimension gesture, detect by foreground segmentation, finger tip and obtain the hand local feature, these local features comprise fingertip location, palm profile, position, the centre of the palm etc.The utilization stereovision technique is rebuild hand-characteristics such as fingertip location, position, the centre of the palm at three dimensions.Three-dimensional fingertip location, position, the centre of the palm etc. is carried out parametrization to be handled, definition one cover is based on the three-dimension interaction model of point, line, surface, by this set of model, realize multiple three-dimension gestures such as three-dimensional finger tip click, finger tip gripping, palm upset, finger tip sensing.The present invention only needs two common IP Camera, promptly can satisfy the needs of real time human-machine interaction.

Description of drawings

Fig. 1 is based on the system architecture diagram of three-dimension gesture natural interactive method;

Fig. 2 is a palm profile synoptic diagram;

Fig. 3 is the computing method synoptic diagram of K vector.

Embodiment

The natural interactive method that the present invention is based on three-dimension gesture carries out with dummy object naturally by three-dimension gesture in virtual reality or augmented reality environment alternately.Comprise the steps:

One, inputted video image from two cameras uses online training Face Detection algorithm to obtain the foreground image of hand to image.Online inspection training Face Detection algorithm utilizes the cluster characteristic of people's the colour of skin at the YCbCr color space, by judging that the particular color scope is partitioned into the hand foreground graphic in the video image.In the Face Detection process, used the online training method of the current colour of skin of real-time learning to reduce the influence of illumination variation to Face Detection.

Two, the hand foreground image to obtaining uses the finger tip detection method to detect fingertip location.The finger tip detection method has merged single finger tip detection and many finger tips detect.Under the situation of illumination variation, also can find the position of finger tip accurately, have very strong stability and environmental suitability.

Finger tip detection method step is as follows:

1, extracts finger foreground image profile.

2, seek the longest profile of length, this profile is the finger contours zone, under the bad situation of foreground segmentation, still can find the finger tip point like this.

Foreground segmentation is made mistakes if 3 profile length, are then thought finger less than a threshold value (distance dependent with hand and video camera is made as 100), carries out foreground segmentation again, otherwise carries out next step.

4, the profile that back is obtained carries out the polygonal approximation processing, can reduce some noise spots like this.The defective of calculating profile (part of the wide concave of finger wheel, quantity as shown in Figure 2) is if quantity is 0, changeing v continues to handle, otherwise demarcate all convex body defective locations, obtain their starting point, end point and depth point, starting point and end point are candidate's finger tip points.Then we judge the angle between starting point, end point, the depth point, if less than 120 degree, then starting point and end point are fingertip location.

5, calculate the K vector value of each point, and obtain the center of contour area.If ask finger tip to press K vector descending order for the first time, obtain the individual K vector of preceding N (being made as 10) extreme point, and relatively each extreme point is to the distance of center, the maximum point of distance is finger tip point, the position of writing down finger tip point simultaneously.Otherwise after obtaining top n K vector extreme point, comparing the distance of N extreme point and last registration finger tip point, if less than threshold value (being made as 5), then is the finger tip point, otherwise compares the distance of N extreme point to the center, and the maximum point of distance is the finger tip point.Write down fingertip location at last.

The K vector is meant: for each the pixel v on the profile, with this some position starting point, being the some position v1 of K according to the profile clockwise direction from its distance, is v2 apart from it for the point of K counterclockwise by profile, and then the K vector of v is:

\frac{(v_{1} - v)}{| | v_{1} - v | |} \times \frac{(v_{2} - v)}{| | v_{2} - v | |}

Fig. 3 has shown the computing method of K vector.

Three, reconstruction of three-dimensional fingertip location, and by three-dimensional fingertip location definition three-dimension gesture interaction semantics.

1, calibrating camera external parameter at first, thus the model view matrix of OpenGL obtained.

2, utilize model view matrix and the finger tip obtain to detect the fingertip location that obtains, utilize the three-dimensional reconstruction algorithm in the computer vision, rebuild finger tip point based on the three-dimensional position in the world coordinate system of mark.

3, according to the three-dimensional finger tip point of rebuilding, define the three-dimension interaction semanteme, thereby realize three-dimension interaction.We define the three-dimension interaction semanteme by fingertip location and palm profile.The three-dimension interaction semanteme of realizing comprises: finger tip clicks, two finger gripping, finger tip sensings, three-dimensional finger tip rate parameterization, palm overturn, the both hands finger tip is apart from controlling etc.According to these interaction semantics, set up the point, line, surface three-dimension gesture interaction models of a cover based on hand-characteristic.

4, by the three-dimension gesture interaction semantics of definition, we can realize multiple three-dimensional applications.Such as the virtual portrait in the direct game, thus the man-machine interaction of realization natural harmony.

Claims

1. the natural interactive method based on three-dimension gesture is characterized in that, comprises the steps:

(1) inputted video image from two video cameras uses online training Face Detection algorithm to obtain the foreground image of hand to image;

(2) the hand foreground image to obtaining uses the finger tip detection method to detect fingertip location;

(3) reconstruction of three-dimensional fingertip location, and by three-dimensional fingertip location definition three-dimension gesture interaction semantics;

Wherein, described step (2) is specific as follows:

(A) extract finger foreground image profile;

(B) seek the longest profile of length, this profile is the finger contours zone, and foreground segmentation is made mistakes if profile length, is then thought finger less than threshold value, carries out foreground segmentation again;

(C) profile that obtains is carried out polygonal approximation and handle, calculate the defects count of profile, if quantity is 0, changeing (D) continues to handle, otherwise demarcate all convex body defective locations, obtain starting point, end point and the depth point of convex body defect part, starting point and end point are candidate's finger tip points; Then judge the angle between starting point, end point, the depth point, if less than 120 degree, then starting point and end point are fingertip location;

(D) calculate the K vector value of each point, and obtain the center of contour area; If ask finger tip to press K vector descending order for the first time, obtain top n K vector extreme point, and relatively each extreme point is to the distance of center, the maximum point of distance is finger tip point, the position of writing down finger tip point simultaneously; Otherwise after obtaining top n K vector extreme point, the distance of N extreme point and last registration finger tip point relatively, if the distance of maximum less than threshold value, then the point of this distance maximum is the finger tip point, otherwise compare the distance of N extreme point to the center, the maximum point of distance is the finger tip point; Write down fingertip location at last;

The K vector is meant: for each the pixel v on the profile, being starting point with this point, is v1 from its distance for the point of K according to the profile clockwise direction, is that the point of K is v2 apart from it counterclockwise by profile, and then the K vector of v is:

\frac{(v_{1} - v)}{| | v_{1} - v | |} \times \frac{(v_{2} - v)}{| | v_{2} - v | |};

Described step (3) is specific as follows:

(a) calibrating camera external parameter at first, thus the model view matrix of OpenGL obtained;

(b) utilize model view matrix and the finger tip obtain to detect the fingertip location that obtains, utilize the three-dimensional reconstruction algorithm in the computer vision, rebuild finger tip point based on the three-dimensional position in the world coordinate system of mark;

(c) according to the three-dimensional finger tip point of rebuilding, define the three-dimension gesture interaction semantics, thereby realize three-dimension interaction;

(d) by the three-dimension gesture interaction semantics of definition, realize three-dimensional applications.