CN104656890A

CN104656890A - Virtual realistic intelligent projection gesture interaction all-in-one machine

Info

Publication number: CN104656890A
Application number: CN201410756006.1A
Authority: CN
Inventors: 费越; 何安莉; 陈柯臻
Original assignee: Hangzhou Ling Shou Science And Technology Ltd
Current assignee: Hangzhou Ling Shou Science And Technology Ltd
Priority date: 2014-12-10
Filing date: 2014-12-10
Publication date: 2015-05-27

Abstract

The invention discloses a virtual realistic intelligent projection gesture interaction all-in-one machine. The virtual realistic intelligent projection gesture interaction all-in-one machine comprises a microcomputer, a 3D object and human body motion detection device connected to the microcomputer and a projection device. The 3D object and human body motion detection device is used for sensing objects, recognizing three-dimensional objects and gestures and inputting reorganization signals into the microcomputer. The projection device is used for displaying content to be displayed by the microcomputer. The all-in-one machine is arranged on a wall, can sense three-dimensional motion of a hand, a finger and a pen, 2.5D motion of the finger and the pen on the plane (wall) and three-dimensional motion of other objects, achieves virtual and realistic integration through projection and interaction of the hand and the pen and displayed 2D interface and interaction of 3D virtual objects, can run a variety of software in a terminal and overcomes the deficiency that display devices of televisions and screens are harmful to eyes and lack of man and machine interaction and accordingly affects brain development due to the two-dimensional display in a traditional teaching process.

Description

Virtual reality intelligence projection gesture interaction all-in-one and interaction realizing method

Technical field

The invention discloses virtual reality intelligence projection gesture interaction all-in-one and interaction realizing method.

Background technology

Contemporary family, material life is very abundant.Get off from harsh, to the child of 10 next years old, family all piled with various toy.The education of children and amusement are mainly from books and common toy.Very uninteresting, take up space very large.Eliminate with more new capital is cumbersome.Scribble is the nature of children, but can damage wall, waste paper.Common toy and books are inflexible, can do nothing to help the ability that child develops the imagination and association.A vivo environment cannot be presented to child.Child is also difficult to learn by oneself.Family with good conditionsi, child starts to play panel computer, and panel computer and Game education software are above to the more modern Edutainment of child one.But panel computer child uses very inconvenient.Need oneself adeptly to hold in both hands, or lie prone and see in bed.Closely must see that the display of panel computer also can damage the eyesight of children for a long time.Be that virtual 2 dimensions are mutual during the interactive mode that children's panel computer etc. touch, very different with real three-dimension interaction, affect the growth of children's hand and brain.Do not have electronic interactive systems that children can be allowed to be undertaken alternately based on the system (TV of display screen by the mode exceeding 2 dimensions touches at present, panel computer, display screen+touch frame) there is potential safety hazard: children's close contact can smash screen and cause personal injury, the radiation of display screen is unfavorable to health, impacts a lot of head of a family and children education teacher all finding new form of teaching to vision.In addition, current projector is a kind of simple projection display apparatus mostly just.Be not a kind of intelligent terminal, there is no operating system, also without any man-machine interaction mode.Starting have smart projector to occur, is just the function first projector making TV.Video display can be seen by direct projector.

Summary of the invention

For above-mentioned technological deficiency, the present invention proposes virtual reality intelligence projection gesture interaction all-in-one and interaction realizing method, and display device is made in this device projection, and a face wall is become a lively world; By the intelligent system that three-dimension gesture and object identification combine, user and computer carry out two dimension, the interaction of two five peacekeeping three-dimensionals, by terminal, existing application software (the existing software of such as Android) can be utilized, also can with the software developed specially for this system, device of the present invention solves all problems above-mentioned, there is provided brand-new, modern amusement, the platform of education.

For solving the problems of the technologies described above, technical scheme of the present invention is as follows:

Virtual reality intelligence projection gesture interaction all-in-one, comprises microcomputer, the 3D object be connected with microcomputer and human action sniffer, projector equipment; Described 3D object and human action sniffer are used for the sensing of object, and carry out the identification of three-dimensional body and gesture, and identification signal is inputed to microcomputer, and projector equipment is for showing the content of microcomputer needs display; Described all-in-one is arranged on wall.

Further, described microcomputer, in interaction content storehouse, to be selected interaction content according to identification signal and performs, and is shown by projection.

Virtual reality intelligence projection gesture interaction implementation method, comprises the steps:

31) user does standard operation, and 3D object and human action sniffer carry out three-dimension gesture identification, the three-dimensional information of detection hand and finger;

32) 3D object and human action sniffer gather the movable information of hand in a period of time and finger, are stored as template;

33) user does action, and 3D object and human action sniffer carry out three-dimension gesture identification, the three-dimensional information of detection hand and finger;

34) by step 33) three-dimensional information of the hand that detects and finger and step 32) in template compare, carry out determining step 33 by setting threshold value) in action and step 31) action whether specification.

Further, when same standard operation have recorded a large amount of template, train motion model with machine learning method, described motion model is for judging the gesture motion whether specification of user.

Further, also comprise and set up 3 d rendering engine and three-dimensional physical simulation engine, thus set up the virtual three-dimensional world, 3 d rendering engine is converted to image the virtual three-dimensional world, utilize Microcomputer control, shown by projector equipment, described 3 d rendering engine plays up all objects comprised in the virtual reality world of hand; Three-dimensional physical simulation engine is used for calculating the motion of the object driven in the virtual three-dimensional world, the object receiving force in simulating reality and motion; The hand detected according to three-dimension gesture recognition system and the three-dimensional configuration of finger, three-dimensional hand model is set up in the virtual three-dimensional world, calculate contacting of hand and dummy object by three-dimensional physical simulation engine, change the motion of virtual three-dimensional object, realize the interaction between finger with dummy object.

Further, also comprise the step of recognition object, select a kind of following steps or multiple;

A) gather the two dimensional image of object multiple directions shooting in advance, and often opening in image the information all extracted in image, this information is stored in database, often kind of object gathers multidirectional image all equally, and is stored in database; Gather the real-time video of user's object, video is picture stream, and the method that wherein each pictures feature obtains is converted into characteristic information; Search for information bar that is similar and user's object in a database, obtain the object x_t mated most in t, before, process before n frame time drawn mate most object (x_t-n, x_t-n-1 ... x_t-1), by the sequence comparing x_t and mate most before, obtain more stable matching result, object y; In the characteristic information of photo current and database, the picture of each different directions of object y compares, and finds out the direction of mating most, and and the object direction that draws of front m frame compare, draw more stable direction,

The actual size of object is drawn by the size of object in the size of object in comparison diagram and database.

B) for Reference collection, 3-D view obtains is depth map, being converted into three dimensions dot matrix is stored in database, for user's object, and same sampling depth figure, mate with the object in database, find immediate object and the immediate anglec of rotation.

C) two-dimension code pattern is attached on object, two-dimensional camera is adopted to gather image, by identifying the sequence number of Quick Response Code, draw the kind of object, analysis of two-dimensional code position in the picture and size, rotate, distortion draws the three-dimensional position of Quick Response Code in real physical world and three-dimensional, draws three-dimensional position and the direction of object with this.

Further, three-dimension gesture identification comprises metope and ground automatic testing method, specifically comprise cog region entity indicia thing being placed on 3D object and human action sniffer, the image of the imaging sensor shooting entity indicia thing of 3D object and human action sniffer identification marking, thus obtain touch interaction surface calibration data.

Further, the three-dimensional interaction method of 3D object and human action sniffer comprises and uses face recognition and tracing algorithm to identify the 3D position E (x of eyes in the coordinate system of sensing apparatus, y, z), identify the 3D position T (x of hand in the coordinate system of sensing apparatus, y, and the action of hand z), at calibration phase, 3D object and human action sniffer sense and record the 3D information of screen in the coordinate system of sensing apparatus after projecting, the 3D position of eyes is transformed into from the coordinate system of sensing apparatus the coordinate system that screen uses, to present virtual 3D object, Es (x, y, z), and these information are sent to microcomputer and 3D interactive application, 3D interactive application is according to the 3D position Es (x of the eyes of user, y, z) this virtual 3D object is presented, in addition, the coordinate system that the 3D position of hand uses from the ordinate transform of sensing apparatus to screen by 3D object and human action sniffer, to present virtual 3D object Ts (x, y, z), and these information are sent to microcomputer and 3D interactive application, 3D interactive application uses Ts (x, y, z) information, carry out alternately to allow user and virtual 3D object.

Beneficial effect of the present invention is: device of the present invention has sensing hand, finger, the three-dimensional motion of pen; The 2.5D motion in plane (wall) of sensing finger and pen, and the three-dimensional sensing of other objects, such as toy; And realize virtual and real fusion with projection, the mutual of hand and pen and the 2D interface that demonstrates can be realized, mutual with 3D dummy object, can run various software in terminal, such as education software, entertainment software management education, entertainment software collection, age-based education software storehouse of waiting classification, education software distribution platform, Faculty and Students' remote information acquire, content update and real-time, interactive.The display device that system solves the problem TV and screen in conventional teaching process is hindered eyes and is indulged in the display of two dimension, do not have man-machine between mutual, the manipulative ability of child can be affected, affect the shortcoming of the growth of brain.

Accompanying drawing explanation

Fig. 1 is that the present invention installs and arranges figure;

Fig. 2 is structure of the present invention and principle framework figure;

Fig. 3 is anatomical connectivity figure of the present invention;

Fig. 4 is that the present invention installs setting and search coverage figure;

Fig. 5 is that the present invention detects process flow diagram;

Fig. 6 is the recognition methods process flow diagram based on depth map;

Fig. 7 is the hardware design of multi-view angle three-dimensional imaging system;

Fig. 8 is the another kind design of the hardware of multi-view angle three-dimensional imaging system

Fig. 9 is the information flow chart extracting hand and finger in two dimensional image;

Figure 10 is the process of the two-dimensional structure for finding foreground object and identification foreground object for each imaging sensor diagrammatically illustrated according to an exemplary embodiment;

Figure 11 be according to the subdivision for calculating foreground object and foreground object of an exemplary embodiment the high level flow chart of process of three-dimensional information;

Figure 12 be show according to an exemplary embodiment each finger between association;

Figure 13 is the embodiment showing association two skeleton lines;

Figure 14 shows the 3D skeleton obtained according to an exemplary embodiment;

The calculating on the 3D border to palm that Figure 15 carries out for the 2D border shown based on the palm in two the 2D images photographed by two different imaging sensors;

Figure 16 is the exemplary output showing the calculating of hand skeleton;

Figure 17 shows one based on the framework of model;

Figure 18 shows the process flow diagram automatically being detected the process on touch interaction surface by certification mark according to an exemplary embodiment;

Figure 19 is the process flow diagram of the process for automatic monitoring and testing display screen shown according to an exemplary embodiment;

Figure 20 is the process flow diagram showing the process limiting dummy contact surface according to an exemplary embodiment;

Figure 21 shows the process flow diagram for the 3D information of foreground object being converted to the process of 2.5D information according to an exemplary embodiment;

Figure 22 is be the process flow diagram of process for determining the distance d between foreground object and touch interaction surface;

The process flow diagram of Figure 23 for being the process obtaining z ' according to an exemplary embodiment;

The process flow diagram of Figure 24 for being the process obtaining z ' according to an exemplary embodiment;

The process flow diagram of Figure 25 for being the process obtaining z ' according to an exemplary embodiment;

Figure 26 shows the hand-written process using touch interaction surface;

Figure 27 is the process showing the hovering of display foreground object;

Figure 28 diagrammatically illustrates the 3D content that user and 3D display screen represent to carry out mutual situation;

Figure 29 is the process flow diagram of the method for record and study standard operation template/model;

Figure 30, for gathering user action, corrects the process flow diagram of user action method;

Figure 31 is by physical simulation, and hand is direct and virtual three-dimensional object is mutual;

Figure 32 is the exemplary plot of the use scenes of augmented reality;

The process flow diagram of the foundation of Figure 33 interactive system database;

Figure 34 compares the kind calculating object, position, direction, the method flow diagram of size;

Figure 35 is the process flow diagram of the interaction method of augmented reality.

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention is described further.

As shown in Figure 1 to 4, this programme includes several ingredient, and namely software and hardware combines

Hardware components is the all-in-one box become one, comprising:

Vertical projection equipment, insults and refers to three-dimensional body sensing and action input equipment, and personal computer mainboard, has sound equipment to export.All-in-one hangs over the top of wall, projects downwards.Imaging or be placed on corner on metope below and ground, upwards projects, imaging on metope up.

Software has following components

Operating system (Android), three-dimension gesture recognizer software (icepro refers to system), Edutainment application software.

As shown in Figure 5, system capability:

– senses hand, finger, the three-dimensional motion of pen; The 2.5D motion in plane (wall) of sensing finger and pen.

The three-dimensional sensing of other objects of –, such as toy; And realize virtual and real fusion with projection.

– hand and pen and the 2D interface that demonstrates mutual, and 3D dummy object is mutual

– can run various software, such as education software, entertainment software

– management education, entertainment software collection, age-based education software storehouse of waiting classification, education software distribution platform.

– Faculty and Students remote information acquire, content update, and real-time, interactive.

Three-dimensional body and gesture recognition system scheme comprise specific method and image acquisition hardware.

Implementation method can be divided into two classes,

1.1 1 classes are as shown in Figure 6, use the hardware of sampling depth figure (depth map), the such as hardware of structure based light (structured light), or based on the hardware of photon flight time (Time of Flight), and use the algorithm for the treatment of depth figure to identify hand and object.

1.2 another kind of be use various visual angles imaging hardware (multi-view camera, stereo camera), and use various visual angles imaging algorithm identify hand and object.

1.2.1 the hardware design of multi-view angle three-dimensional imaging system

Sensing apparatus can comprise multiple imaging sensor, as camera.Imaging sensor can be visual light imaging sensor, and it is to visible light-responded sensitiveer, or infrared (IR) imaging sensor, and it is sensitiveer to Infrared.Sensing apparatus can also comprise one or more light source, provides the illumination of various wavelength according to the type of imaging sensor.Light source can be, such as light emitting diode (LED) or be configured with the laser instrument of scatterer.In some embodiments, light source can be omitted and imaging sensor senses the light sent by the surround lighting of object reflection or object.

Below several implementation:

As Fig. 7 A and 7B diagrammatically illustrates the exemplary sensing apparatus 300 according to disclosure embodiment.Sensing apparatus 300 comprises shell 302, multiple imaging sensor 304, one or more light source 306.Imaging sensor 304 and one or more light source 306 to be all formed in or beyond shell 302 on shell 302.Such design is in the disclosure also referred to as integrated design.

Sensing apparatus 300 shown in Fig. 7 A has a light source 306, and the sensor device 300 shown in Fig. 7 B has six light sources 306.In the example shown in Fig. 7 A, light source 306 is disposed between imaging sensor 304, and in the example shown in Fig. 7 B, light source 306 is evenly distributed on to provide better illuminating effect etc. on shell 302, such as wider coverage or evenly illumination.Such as two light sources 306 are between two imaging sensors 304, and two light sources 306 are positioned at left one side of something of shell 302, and two light sources 306 are positioned at right one side of something of shell 302.

In accompanying drawing of the present disclosure, light source illustrates for LED, as discussed above, also can use other light sources, as being configured with the laser instrument of scatterer.

In some embodiments, the illumination in infrared spectrum bandwidth is needed.People's bore hole may can't see this illumination.In these embodiments, light source 306 can comprise, such as, launch the LED of infrared light.Or light source 306 can comprise the LED launching and comprise the light of the wider frequency band of visible ray.Under these circumstances, each light source 306 such as can configure infrared transmitting filter (not shown) before corresponding light source 306.

In some embodiments, sensing apparatus 300 also can comprise the infrared transmitting filter (not shown) being placed on imaging sensor 304 front, to filter out visible ray.In some embodiments, sensing apparatus 300 also can comprise the camera lens (not shown) be placed on before imaging sensor 304, is used for focused light.Before infrared transmitting filter can be placed on camera lens, or between camera lens and imaging sensor 304.According to embodiment of the present disclosure, sensing apparatus 300 can also comprise control circuit (not shown).Control circuit may be controlled to the operating parameter of image-position sensor 304, such as shutter duration or gain.It is synchronous that control circuit also can control between two or more imaging sensor 304.In addition, control circuit can control the brightness of illumination of light source 306, the ON/OFF of light source 306 or illumination duration, or synchronous between light source 306 and imaging sensor 304.Control circuit also can perform other functions, and such as, power management, image data acquisition and process, data are to the output of other equipment (as computing machine 104), or the reception of order from other equipment (as computing machine 104).

In some embodiments, sensing apparatus 300 also can comprise and is configured to ON/OFF or resets sensing apparatus 300 or force to carry out one or more buttons of environment recalibration.Such as a button can be configured to allow user to start manual calibration process by force, to calibrate touch interaction surface.

In some embodiments, whether sensing apparatus 300 also can comprise one or more pilot lamp of display sensing apparatus 300 state, such as, demonstrate sensing apparatus 300 and open or close, and whether in execution environment calibration, or performs tactile interaction surface calibration.

In some embodiments, sensing apparatus 102 can have multiple separative element, and each separative element respectively has an imaging sensor.Below, such design designs also referred to as split.Fig. 8 shows the exemplary sensing apparatus 500 with split design according to disclosure embodiment.Sensing apparatus 500 comprises two sensing cells 502 and 504, and each sensing cell has an imaging sensor 304 and one or more light source 306.In the embodiment shown in fig. 5, sensing cell 502 has a light source 304, and sensing cell 504 has two light sources 306.Sensing cell 502 and 504 all can have control circuit, is used for controlling the operation of corresponding sensing cell.

1.2.2 the algorithm design of multi-view angle three-dimensional imaging system

1.2.2.1 from each two dimensional image, extract the information of hand and finger

Fig. 9 diagrammatically illustrates the finding foreground object and identifying the process of the 2D structure of foreground object for each imaging sensor 304 according to disclosure embodiment.In embodiment in fig .9 and associated picture, the situation of to be foreground object the be hand of user of discussion.After obtaining input picture, below execution: 1) find foreground object (1606), 2) minor structure (1608 and 1610) of foreground object is analyzed, 3) analyze the detailed nature of foreground object (1612 and 1614).The details of following this process of description.1606, compare the new input picture from imaging sensor 304 and background model, to extract foreground area.As shown in Figure 10,1608, in foreground area, calculate at each location of pixels (x, y) place: pixel is probability P _ tip (x, y) of a finger tip part, pixel is probability P _ finger (x of finger main body (finger trunk) part, y), and pixel be probability P _ palm (x, y) of a palm part.

In some embodiments, probability P _ tip (x, y), P_finger (x, y) with P_palm (x, y) can calculate by the Luminance Distribution in the adjacent area around location of pixels (x, y) and one group of predefine template (as finger tip template, pointing main body template or palm template) are compared.Pixel is the probability of finger tip, finger main body or a palm part, i.e. P_tip (x, y), P_finger (x, or P_palm (x y), y), can define by the much degree of adjacent area being applicable to respective template (i.e. finger fingertip template, finger main body template or palm template).In some embodiments, probability P _ tip (x, y), P_finger (x, y) and P_palm (x, y) can calculate by performing function/operator F at the adjacent area of location of pixels (x, y).The light reflectance model of the brightness of adjacent area and finger or finger tip is carried out matching by function/operator, if the reflection (cylindricality reflection) of distribution close to finger main body or the reflection (half cheese reflection) of finger tip, will return high level.

1610, the calculating of probability P _ tip (x, y), P_finger (x, y) and P_palm (x, y) is for dividing foreground object, and the hand of such as user is divided into finger and palm.Figure 21 shows the result of division.In figure 21, the region of shade is finger, and white portion is palm.Probability P _ tip (x, y), P_finger (x, y) and P_palm (x, y) and division result can be used for calculating the structure of hand, comprise finger framework information.As used in the disclosure, finger skeleton is abstract to finger structure.In some embodiments, finger framework information can comprise, the center line (also referred to as skeleton line) such as pointed, the position of finger tip and the border of finger.

At some embodiments, the hand of user is divided into be pointed and after palm, can obtain the 2D border of the subdivision (such as finger or palm) of hand.1612, the center line of finger is calculated by the center found be connected across on the sweep trace of whole finger.As used herein, sweep trace refers to performing that line found and carry in the process of center.Sweep trace can be such as horizontal line.In some embodiments, for sweep trace L (y) in finger, probability of use P_finger (x, y) carrys out the weighted mean value of the position x of each pixel (x, y) on calculated level line L (y) as weight factor.On sweep trace L (y), the weighted mean value of described position x is center, x_center=C (y), after the upper all sweep traces of finger are processed, obtains upper a series of center C (y) of sweep trace L (y).Connect these centers and provide finger centre line, namely point the center line of skeleton.Figure 23 diagrammatically illustrates the center line of finger.Same 1612, calculate the position (Tx, Ty) of finger tip.The position of finger tip can be defined as the position of the finger top area matched with the shape of finger tip and shade.In some embodiments, the position that the position of finger tip can carry out all pixels in average fingertip by probability of use P_tip (x, y) as weight factor calculates.Such as

Ty = \frac{\underset{y}{Σ} \underset{x}{Σ} P_tip (x, y) * y}{\underset{y}{Σ} \underset{x}{Σ} P_tip (x, y)}

In other embodiments, probability P _ finger (x, y) that the position of finger tip can pass through to use on average point the pixel of top area position as weight factor calculates.In the fingertip location (Tx, Ty) generated, Tx and Ty is floating number, has sub-pixel resolution.

1.2.2.2 two-dimensional signal that multiple visual angle/camera obtains is merged to obtain three-dimensional information

2D minor structure result (as finger or palm) from different imaging sensor 304 is compared according to the high-level flowchart of the process of the calculating foreground object of disclosure embodiment and the 3D information of foreground object subdivision as Figure 11 shows, and the association between the subdivision creating each foreground object observed by different imaging sensors 304.Such as, the finger A observed by imaging sensor A can be relevant with the finger C observed by imaging sensor B.In some embodiments, association can based on minimize all fingers between total finger tip distance make, as shown in figure 12.Embodiment as shown in figure 12, left-half and right half part show the 2D image of the hand (i.e. foreground object) photographed by two different imaging sensors 304 respectively.Refer again to Figure 11,2504, the characteristic (as 2D finger tip, 2D skeleton line and 2D frontier point) having associated subdivision associates further, with obtain respectively finger tip to, skeleton line to and boundary point pair.The embodiment that the 2D skeleton line that Figure 13 diagrammatically illustrates the 2D skeleton line of the in a 2D image (the picture left above picture) first finger taken by the first imaging sensor 304 and the second finger in the 2nd 2D image (top right plot picture) of the second imaging sensor 304 shooting associates.The result of association obtains the right image of skeleton line (bottom diagram picture).Refer again to Figure 11, in 2506,2508 and 2510, calculate 3D skeleton line, 3D finger tip and 3D frontier point (the 3D shape as hand, finger or palm), more details describe in detail respectively below.2506, process a finger tip and 3D information is obtained to T1 (Tx1, Ty1) and T2 (Tx2, Ty1), as 3D position T (Tx, Ty, Tz) of corresponding finger tip.In some embodiments, 3D reprojection (reprojection) function can be used for calculating 3D fingertip location T (Tx, Ty, Tz).3D reprojection function can use the 2D position (Tx1 of finger tip, and (Tx2 Ty1), Ty1), and the information of imaging sensor 304 and lens, the distance (baseline) of the spacing (as every millimeter of pixel count) of such as focal length, sensing apparatus, two imaging sensors 304.In some embodiments, calculate spacing d=Tx1 – Tx2 to use as the input of 3D reprojection function.The output of 3D reprojection function is the 3D position (Tx, Ty, Tz) of finger tip.(Tx, Ty, Tz) can have physical unit in 3D position, thus also can be expressed as (fx, fy, fz).

In some embodiments, 3D reprojection function can be used in 4 × 4 perspective transformation matrixs obtained in imaging sensor calibration process and represents.But this matrix spacing is to the mapping matrix of the degree of depth (disparity-to-depth).2508, use the skeleton line pair obtained as mentioned above, calculate the 3D skeleton line of corresponding finger.In some embodiments, for skeleton line pair, each pixel on two two-dimensional framework lines is mated according to its y direction, to obtain pixel pair.Pixel to can the similar above-described processing mode right to finger tip process, to obtain the 3D position of the right point of respective pixel, as shown in figure 28.Processing all pixels to rear, consequent point couples together to obtain 3D skeleton line, as shown in figure 14.Get back to Figure 11,2510, based on the 2D position of the frontier point on the image that two different imaging sensors 304 are taken, calculated example is as the 3D position of the frontier point of finger or palm.At some embodiments, the account form of the 3D position of frontier point can be similar to the account form of the 3D position of finger tip.Behind the 3D position of computation bound point, the corresponding point in 3d space can couple together to obtain 3D border.

Figure 15 shows the calculating on the 3D border of the palm on the 2D border based on the palm in two the 2D images taken by two different imaging sensors 304.

The information of acquisition described above can combine, and to generate output, such as, exemplary output shown in Figure 16, which show the 3D finger tip (circle in Figure 30) of finger, the 3D skeleton line (line in Figure 30) of finger and the 3D shape of hand.

For some application program, as drawing and sculpture, user may need to use finger or pen as instrument.Under these circumstances, finger or pen may need to be conceptualized as cylindrical, and may need the direction and the length that calculate it.Refer again to Figure 11,2512, calculate direction and the length of finger.

In some embodiments, will point abstract in cylindrical, and its length is defined as columniform length, this also can be called finger body length.Finger body length can be restricted to the top point of maniphalanx stringing or the distance between fingertip location P0 (x, y, z) Yu halt P1 (x, y, z).In some embodiments, halt P1 is the point of the position (difference of such as skeleton line and straight line is greater than the position of threshold value) that the terminal of skeleton line or skeleton line depart from from straight line.Equally, the direction of finger can be defined as the direction of the line of connection 2 P1 and P0.

2514, calculate 3D position and the orientation of palm.The 3D position of palm also can be called as the 3D center of palm, and this 3D position by such as equalization frontier point (as shown in figure 14) obtains.Figure 17 diagrammatically illustrates the 3D center of the palm calculated.

The size of palm and orientation can by compare the center of palm 3D, the 3D position of the frontier point of palm, the 3D position of finger tip and finger direction obtain.According to embodiment of the present disclosure, the framework based on model goes for the view of any amount, is no matter the view of a view, two views or more.Figure 16 shows the situation of two views.As follows the details of the framework based on model according to some embodiments is described.For each view, perform 2D hand structure analysis (describing in former framework).The structure analysis of 2D hand produces 2D hand structure (the 2D hand structure also referred to as new), comprises 2D hand skeleton.Be similar to finger skeleton, hand skeleton refers to the abstract of the structure of opponent.Then by implementing tracking in conjunction with the 2D hand structure (obtaining upper once to upgrade) of last time and new 2D hand structure (obtaining current renewal as above).Tracing process comprises: 1) to result apply filters before, the 2D hand structure predicted with " prediction "; 2) correlating method is used, with the 2D hand structure in conjunction with new 2D hand structure and prediction; 3) the new result combining and obtain is used, updated filter.This tracing process can generate level and smooth frame position, by the impact losing suddenly finger in view, and can not can provide consistent finger ID.As used in this disclosure, point ID and refer to the ID distributing to the finger sensed.Finger is once be assigned to finger ID, even if invisible in the updated, that hand refers to will still carry identical finger ID.Such as in once upgrading, middle finger and forefinger are sensed.Middle finger is assigned to finger ID " finger #1 " and forefinger is assigned with finger ID " finger #2 ".In whole process, both with the finger ID be assigned to, though when wherein one or two after renewal in invisible.

In some embodiments, perform filtration to 3D hand model, produce the effect of level and smooth 3D, comprise 3D hand skeleton, this understands by the hand skeleton of the 2D heavily projected to create projection on each view.

Then, for each view, the 2D skeleton combination of new 2D hand skeleton and projection obtains the association between each finger ID.

Then, the 2D result of two views combines the new 3D position and the new 3D finger skeleton that calculate hand.Net result is used as new 3D hand model, and this can be used for next update.

Metope and ground automatic testing method:

Can the automatic or automanual three-dimensional position direction detecting metope, the three-dimensional position direction on ground.After the three-dimensional position detecting ground and metope and directional information, there is important effect to next step Projection Display, drop shadow effect can be made more accurately good.Next step man-machine interaction is also played an important role: can object be drawn, and the head of user and palmistry are for the position in virtual three-dimensional scenic; Virtual touch operation can be carried out.

If the imaging hardware system that system uses is 3-D imaging system (directly can draw structured-light system or the photon flight time system of depth map), the method so detecting metope and ground is:

Depth map (depth map) is converted into three-dimensional point cloud (3D point cloud).

By plane fitting algorithm, and RANSAC algorithm (RANSAC), obtain two planes that consistance is the strongest, these two planes are exactly ground and metope.

If system is various visual angles imaging systems, following several method is had to detect metope and ground use mark figure.

A method uses entity indicia thing, is such as printed on the gridiron pattern on paper, or Quick Response Code, as Figure 18 is shown according to disclosure embodiment for automatically being detected the process flow diagram of the process on touch interaction surface by certification mark.The above-described method of the disclosure is used to create mark.As shown in figure 18,3902, user in the environment as placed on a desk with mark a piece of paper.3904, interactive system 100 uses imaging sensor 304 to take the image of paper and identification marking.At some embodiments, interactive system 100 records the 3D position marked in image.3906, interactive system 100 is based on the 3D position of 3D position calculation paper of mark, orientation, size.Result of calculation saves as touch interaction surface calibration data.Another kind of mode uses display device, and such as projector equipment projects X-Y scheme on ground and metope, such as gridiron pattern, or Quick Response Code.

Figure 19 show according to disclosure embodiment for the display of automatic monitoring and testing display screen-such as 114-screen and use display screen surface as the process flow diagram of the process of contact interactive surface.As shown in figure 19,4002, interactive system 100 shows 2D code on the display screen, as verifying plate, as shown in figure 20.4004, interactive system 100 uses different imaging sensors 304 to take image.4006, interactive system 100 identifies the mark in 2D code, and is recorded in the 2D position of the mark in each image.4008, the 3D position that interactive system 100 calculating marks, orientation and size, and the size of derivation and record display screen, 3D position and 3D orientation.4010, the position of interactive system 100 display surface, direction and size.Afterwards, interactive system 100 can detect user's touch interaction on a display screen.Allow the information on user aid system looks metope and ground:

Figure 20 shows the process flow diagram flow chart of the defining virtual surface in contact according to disclosure embodiment.Dummy contact surface can be defined as above keyboard and between user and display screen, and user can aloft with virtual surface in contact reciprocation, with computer for controlling 104.As shown in figure 20,4202, four angle points on the dummy contact surface that interactive system 100 indicating user " contact " limits, as shown in signal Figure 20.4204, interactive system 100 detects the 3D position of user's hand.4206, interactive system 100 records the position of four angle points.In certain embodiments, in order to record the position of angle point, interactive system 100 can use interactive device by indicating user, and such as keyboard or mouse, at graphic user interface input command, keep the 3D position of his/her hand simultaneously.This order such as can by the button knocked on keyboard or the button of clicking the mouse input.

Interactive system 100 records the position of four angle points, and 4208, interactive system 100 calculates and record the size on dummy contact surface, 3D position and 3D orientation.Then interactive system 100 can show the position on virtual touch surface, direction and size.

Cognoscible as those of ordinary skill in the art, three points are enough to definition plane.Therefore, if virtual touch surface is plane, only need three angle points to define this virtual touch surface.But these three angle points can make for definition quadrilateral as interaction area together with the 4th angle point.Virtual touch surface and after interactive area is defined, interactive system 100 by only detection and response in object this interaction area inside or above action.

When manual definition the 4th angle point, user sometimes may not easily " touch " to the point in the plane defined by other three angle points.In certain embodiments, the touch point of user vertical projection in the plane can be used as the 4th angle point.In some embodiments, the hand of user is as foreground object.Interactive system 100 uses the 3D tracked information of hand (such as, the 3D position of finger tip and the 3D cylinder direction of finger and length information) and environment calibration data perform the conversion of 3D to 2.5D, to obtain 2D information, such as according to above-described method limit from finger tip to the distance on touch interaction surface and finger vertical in the direction on touch interaction surface.

Figure 21 shows the process flow diagram flow chart for the 3D information of foreground object (as hand or finger) being converted to 2.5D information according to disclosure embodiment, 4402, based on the position on touch interaction surface and the 3D information on direction calculating touch interaction surface.The 3D information on touch interaction surface can comprise, the center on such as touch interaction surface and the direction perpendicular to touch interaction surface.4404, the 3D position (x, y, z) of foreground object projects touch interaction on the surface, comprising such as from foreground object to the calculating of the 2D position in the incident point the distance d on touch interaction surface and touch interaction surface.The 2D position in the incident point on touch interaction surface can use and be defined in x ' in the 2D coordinate system on touch interaction surface and y ' coordinate represents.4406, the 2D position of the subpoint on touch interaction surface (x ', y ') and the size on touch interaction surface, for by the 2D position of the subpoint on touch interaction surface (x ', y ') be converted to the 2D position (x ", y ") of the 2D coordinate system be defined on display 114 screen.Due to said process, the 3D position (x, y, z) of foreground object is converted to 2D position on display 114 screen (x ", y ") and the distance d between foreground object and touch interaction surface.Figure 22 shows the process flow diagram of the process for determining the distance d between foreground object and touch interaction surface according to disclosure embodiment.As mentioned above, in the environmental correction stage, environment calibration data go on record, and comprise the position of the calibration point for defining touch interaction surface, that is, P1 (x1, y1, z1), P2 (x2, y2, z2), etc.4502, the 3D position (x, y, z) of this environment calibration data and foreground object is used to find at touch interaction on the surface closest to the point that the position of foreground object is (x ', y ', z ').Then position (x ', y ', z ') is relatively determined distance d (4504) with position (x, y, z).

Figure 23 is the process flow diagram of the process for finding z ' according to disclosure embodiment.In the embodiment shown in Figure 23, touch interaction surface can use many-sided curve fit models equation to estimate:

a*x+b*y+c*z+d+e*x^2+f*y^2+…＝0

4602, the position of all calibration points substitutes into following error function to obtain error amount:

err＝sum[sqr(a*x+b*y+c*z+d+e*x^2+f*y^2+…)]

At some embodiments, find with homing method and make the minimized optimum parameter value a of error amount " err ", b, c, d, e, f ...4604, the x of foreground object, y coordinate (having the 3D position of (x, y, z)) is updated to many-sided curve fit models equation, when given x and y, calculates z '.

Figure 24 shows the process flow diagram of the process for obtaining z ' according to disclosure embodiment.In embodiment shown in Figure 24, use the machine learning method that Gaussian process returns.As shown in figure 24,4702, use the 3D position calculation covariance matrix of all calibration points.4704, return and be used for projecting inquiry point (i.e. foreground object) on the surface at touch interaction and obtaining z '.Method shown in Figure 24 is applicable to the situation of touch interaction surface imperfection, and namely touch interaction surface is not plane or keeps off plane, or its environment measurement data are not unify very much.

Figure 25 shows the process flow diagram of the process for finding z ' according to disclosure embodiment.In embodiment shown in Figure 25, use the method for surface point cloud (surface point cloud).4802, environmentally calibration data rebuilds 3D touch interaction surface from a cloud.4804, based on reconstructed surface, calculate the surperficial z ' value at position (x, y) place.According to the 2.5D information that disclosure embodiment obtains, as previously mentioned, can be used for various application program.

Such as Figure 26 shows and uses touch interaction surface to carry out hand-written process.4902, the 3D position of finger tip is tracked.4904, the 3D position (x, y, z) of the finger tip of acquisition is converted into 2.5D information x ', y ' and d.4906, determine whether d is less than threshold distance.If so, contact/drag events is recorded (4908).If d is not less than threshold distance, event is released (4910).

Figure 27 shows the process of hovering above the key of the finger of foreground object as user on keyboard.5002, the key on keyboard is identified, and the 3D position of each key is sensed and record.5004, the distance hovered over above which key and between finger and that key, compared with the position of key, is determined to point in the 3D position of user's finger.Be presented on the screen of display 114 at 5006, UI, to represent that finger hovers over above that key and finger has distance how far with that key.

As mentioned above, interactive system 100 can the position of track user hand or finger.In some embodiments, interactive system 100 goes back the position of track user eyes, and in conjunction with the positional information of eyes and the positional information of hand or finger, inputs for 3D/2D.

Three-dimension interaction algorithm:

1) evolution of three-dimensional hand is become the position in the three-dimensional scenic in display.2) (optional function) system also has the ability identifying user's head three-dimensional position and direction, by the combination of header information and hand information, can calculate the position of selling in virtual three-dimensional scene more accurately

3) to realize and virtual three-dimensional object mutual.Such as choose, mobile, rotate, assemble multiple objects etc.

Figure 28 diagrammatically illustrates the mutual situation of 3D content (as virtual 3D object 5404) that user and 3D display screen 5402 present.In some embodiments, interactive system 100 uses face recognition and tracing algorithm to identify the 3D position E (x, y, z) of eyes in the coordinate system of sensing apparatus 102.Hand method for tracing, one of method as described above, for identifying the 3D position T (x, y, z) of hand 5104 in the coordinate system of sensing apparatus 102 and the action of hand 5104.

At calibration phase, interactive system 100 senses and records the 3D information of screen 5402 in the coordinate system of sensing apparatus 102.These information can comprise the size (as width and height) of the 3D position of such as screen 5402 and 3D orientation, screen 5402.The 3D position of eyes 5204 is transformed into the coordinate system that screen 5402 uses by interactive system 100 from the coordinate system of sensing apparatus 102, to present virtual 3D object 5404, Es (x, y, and these information are sent to operating system and 3D interactive application z).

3D interactive application presents this virtual 3D object 5404 according to 3D position Es (x, y, z) of the eyes 5204 of user.In addition, the coordinate system that the 3D position of hand 5104 uses from the ordinate transform of sensing apparatus 102 to screen 5402 by interactive system 100, to present virtual 3D object 5404, Ts (x, y, z), and these information are sent to operating system and 3D interactive application.3D interactive application uses Ts (x, y, z) information, carries out alternately to allow user and virtual 3D object 5404.

The three-dimensional state of the user's hand detected and template are compared:

Such as teaching children to hold the application software of chopsticks, by comparing the hand-type of user and correct hand-type, correcting the use-pattern of user).

Such as teach the algorithm how to play the piano of user: first allow teacher play the piano, the three-dimensional motion process of play the musical instrument with three-dimension gesture recognition system record hand and finger in process.Then when student plays the piano time, by the hand of three-dimension gesture recognition system identification student and the three-dimensional motion process of finger.The relatively difference of the hands movement mode of student and teacher.Then correct the mode of playing the musical instrument of student, immediately can feed back, also can show the difference in whole process later again.

Concrete methods of realizing, as Figure 29 and Figure 30:

The template of standard operation manually can be set by user such as teacher: setting hand and finger run duration sequence in three dimensions.(at time t1, the position x_t1 of hand, y_t1, z_t1, the position x1_t1 of finger 1, y1_t1, z1_t1, the position x2_t1 of finger 2, y2_t1, z2_t1, At time t2, the position x_t2 of hand, y_t2, z_t2, the position x1_t2 of finger 1, y1_t2, z1_t2, the position x2_t2 of finger 2, y2_t2, z2_t2, When time t3 ... .)

First the template of standard operation also can allow certain user, such as teacher, does standard operation, carrys out the action of detection specifications simultaneously by three-dimension gesture recognition system, and record the sequence of hand and finger three-dimensional motion in a period of time; The motion sequence detected under storage is as template.

Only a template can be gathered for same action.Also multiple template can be gathered.

If have recorded a large amount of template to same standard operation, next step trains motion model with machine learning method, for judging the gesture motion whether specification of user.

When user does in the process of action, by the three-dimensional motion process of the identification of three-dimension gesture recognition system and recording user hand and finger.

Then action and the standard operation template of user is compared, or standard operation model; To judge the action whether specification of user.If the action of user is less than from the different of swooping template action the threshold value preset, then judge that the action of user is specification, otherwise be then nonstandard.Then need by video, picture, the mode of word or sound points out user to correct.

As shown in figure 31, the model of three-dimensional hand and three-dimensional virtual object are all put into a physical simulation engine, realize hand and dummy object direct interaction.Advantage is directly perceived, and direct three-dimension interaction is experienced.

Set up 3 d rendering engine, and three-dimensional physical simulation engine, set up the virtual three-dimensional world.3 d rendering engine is converted to image the virtual three-dimensional world, and display on the display device, such as projects on wall.Three-dimensional physical simulation engine is used for the motion of the object driven in the virtual three-dimensional world, the object receiving force in simulating reality and motion, the motion that such as object is affected by gravity, the collision etc. of multiple object.

The hand detected according to three-dimension gesture recognition system and the three-dimensional configuration of finger, set up three-dimensional hand model in the virtual three-dimensional world.

3 d rendering engine plays up all objects comprised in the virtual reality world of hand.

Three-dimensional physical simulation engine calculates the collision of hand and dummy object, and by the position of the point of impingement, the direction of impact force and size, change the motion of virtual three-dimensional object.

As shown in figure 32, augmented reality is mutual

1) type (automobile toy such as on the ground) detecting object is identified, size, three-dimensional position and direction by three-dimensional.

Or by being attached to Quick Response Code on object (can be the common visible Quick Response Code of human eye, also can be that human eye is invisible, but the visible Quick Response Code of infrared band), carry out the kind of recognition object, code name, size, and direction.

2) by object type, wall and ground project out relevant scene.Such as, if toy racing car car, racing track is projected out on the ground.If fire engine, the road surface of the home cell that projects on the ground, wall projects out the house caught fire.Increase the authenticity of toy, interesting, and study property.

3) by the three-dimensional configuration of inspected object and the action of user, change written-out program state and export picture.Such as, if the fire-fighting rack of user's pull-up fire engine, just project out ejection water column, and reduce the intensity of a fire.Increase the authenticity of toy, interesting, and study property.

4) being projected in user puts on the object of scene, changes the appearance of object.The fireman of such as projected virtual is on the aerial ladder of fire truck, and fireman climbs up aerial ladder to allow user see.Such as when training user plays the piano, the hand-type of projection standard operation to user on hand.Increase the authenticity of toy, interesting, and study property.

The type of detecting object, three-dimensional size, position, and direction:

System has the database of a set of recognizable object.The two dimension of object or three-dimensional image and structural information is comprised in database.When user uses, system acquisition two dimension or three-dimensional picture.Compare with the information in database, find out the object of coupling, and restore three-dimensional position and the three-dimensional of object.Multiple implementation can be had:

Scheme 1: as shown in Figure 33 ~ Figure 34, the image acquisition of two dimension.Gather the two dimensional image of object multiple directions shooting in advance.And often opening in image the information all extracted in image: such as profile, boundary line, the characteristic information of the computer visions such as SIFT feature.Be stored in database.Often kind of object gathers multidirectional image all equally, preserves and database.Contain multiple object in database, often kind of object is at the image of different angles and characteristic information.In use, collection is the real-time video of user's object.Video is picture stream, and wherein each pictures method that same feature obtains is converted into characteristic information.What such as processing now is that kth in video is shaken.Search for information bar that is similar and user's object in a database.The object x_t. obtaining mating most in t before, process before n frame time drawn mate most object (x_t-n, x_t-n-1 ..., x_t-1).By the sequence comparing x_t and mate most before, obtain more stable matching result, object y. is such as with the object type that occurrence number is maximum.In the characteristic information of front picture and database, the picture of each different directions of object y compares, and finds out the direction of mating most, and and the object direction that draws of front m frame compare, draw more stable direction.Such as use Kalman filter.1.5) actual size of object is drawn by the size of object in the size of object in comparison diagram and database.

Scheme 2: gather 3-D view (such as use structure light image acquisition system, or photon flight time depth map acquisition system) for Reference, what obtain is depth map.Be converted into three dimensions dot matrix (3D Point Cloud), store with database.

For user's object, same sampling depth figure, mates with the object in database, finds immediate object and the immediate anglec of rotation, then calculates the displacement of user's object.

Scheme 3: two-dimension code pattern is attached on object.System adopts 1 two-dimensional camera to gather image.By identifying the sequence number of Quick Response Code, draw the kind of object.By analysis of two-dimensional code position in the picture and size, rotate, distortion draws the three-dimensional position of Quick Response Code in real physical world, and three-dimensional.Three-dimensional position and the direction of object is drawn with this.

As shown in figure 35, according to the type of object, metope and ground project out corresponding scene:

1) for often kind of object, make corresponding content of multimedia in advance, such as image, video, three-dimensional scene models, or an executable program, such as play, be put in content library.

2) in step 1) the inner type having identified object, three-dimensional position and direction.According to object type, from content library, recall corresponding content, and show corresponding content, such as project on ground and metope.Content can be dynamic, is such as a video, an executable program, and game etc., become initial state content, start the logic performed wherein.

3) the three-dimensional position of object, three-dimensional, and the information such as the three-dimensional motion of user's hand passes to the logic/program in content in real time.Logic/program in content is made a response, refresh routine state, and upgrades the background content be presented on ground and metope.

4) logic/program in content is according to user's position, and the three-dimensional position of object, can calculate their two-dimensional shapes on the visual angle of projector equipment.Accordingly on projection information to the health of user and on object.

The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, without departing from the inventive concept of the premise; can also make some improvements and modifications, these improvements and modifications also should be considered as in scope.

Claims

1. virtual reality intelligence projection gesture interaction all-in-one, is characterized in that, comprise microcomputer, the 3D object be connected with microcomputer and human action sniffer, projector equipment; Described 3D object and human action sniffer are used for the sensing of object, and carry out the identification of three-dimensional body and gesture, and identification signal is inputed to microcomputer, and projector equipment is for showing the content of microcomputer needs display; Described all-in-one is arranged on wall.

2. virtual reality according to claim 1 intelligence projection gesture interaction all-in-one, is characterized in that, described microcomputer, in interaction content storehouse, to be selected interaction content according to identification signal and performs, being shown by projection.

3. virtual reality intelligence projection gesture interaction implementation method, is characterized in that, comprise the steps:

4. virtual reality intelligence projection gesture interaction implementation method according to claim 3, it is characterized in that, when same standard operation have recorded a large amount of template, train motion model with machine learning method, described motion model is for judging the gesture motion whether specification of user.

5. the virtual reality intelligence projection gesture interaction implementation method according to claim 3 or 4, it is characterized in that, also comprise and set up 3 d rendering engine and three-dimensional physical simulation engine, thus set up the virtual three-dimensional world, 3 d rendering engine is converted to image the virtual three-dimensional world, utilize Microcomputer control, shown by projector equipment, described 3 d rendering engine plays up all objects comprised in the virtual reality world of hand; Three-dimensional physical simulation engine is used for calculating the motion of the object driven in the virtual three-dimensional world, the object receiving force in simulating reality and motion; The hand detected according to three-dimension gesture recognition system and the three-dimensional configuration of finger, three-dimensional hand model is set up in the virtual three-dimensional world, calculate contacting of hand and dummy object by three-dimensional physical simulation engine, change the motion of virtual three-dimensional object, realize the interaction between finger with dummy object.

6. virtual reality according to claim 5 intelligence projection gesture interaction implementation method, is characterized in that, also comprise the step of recognition object, selects a kind of following steps or multiple;

A) gather the two dimensional image of object multiple directions shooting in advance, and often opening in image the information all extracted in image, this information is stored in database, often kind of object gathers multidirectional image all equally, and is stored in database; Gather the real-time video of user's object, video is picture stream, and the method that wherein each pictures feature obtains is converted into characteristic information; Search for information bar that is similar and user's object in a database, obtain the object x_t mated most in t, before, process before n frame time drawn mate most object (x_t-n, x_t-n-1 ... x_t-1), by the sequence comparing x_t and mate most before, obtain more stable matching result, object y; In the characteristic information of photo current and database, the picture of each different directions of object y compares, find out the direction of mating most, and compare with the object direction that front m frame draws, draw more stable direction, draw the actual size of object by the size of object in the size of object in comparison diagram and database.

7. virtual reality intelligence projection gesture interaction implementation method according to claim 5, it is characterized in that, three-dimension gesture identification comprises metope and ground automatic testing method, specifically comprise cog region entity indicia thing being placed on 3D object and human action sniffer, the image of the imaging sensor shooting entity indicia thing of 3D object and human action sniffer identification marking, thus obtain touch interaction surface calibration data.

8. virtual reality intelligence projection gesture interaction implementation method according to claim 5, it is characterized in that, the three-dimensional interaction method of 3D object and human action sniffer comprises and uses face recognition and tracing algorithm to identify the 3D position E (x of eyes in the coordinate system of sensing apparatus, y, z), identify the 3D position T (x of hand in the coordinate system of sensing apparatus, y, and the action of hand z), at calibration phase, 3D object and human action sniffer sense and record the 3D information of screen in the coordinate system of sensing apparatus after projecting, the 3D position of eyes is transformed into from the coordinate system of sensing apparatus the coordinate system that screen uses, to present virtual 3D object, Es (x, y, z), and these information are sent to microcomputer and 3D interactive application, 3D interactive application is according to the 3D position Es (x of the eyes of user, y, z) this virtual 3D object is presented, in addition, the coordinate system that the 3D position of hand uses from the ordinate transform of sensing apparatus to screen by 3D object and human action sniffer, to present virtual 3D object Ts (x, y, z), and these information are sent to microcomputer and 3D interactive application, 3D interactive application uses Ts (x, y, z) information, carry out alternately to allow user and virtual 3D object.