CN102902355A

CN102902355A - Space interaction method of mobile equipment

Info

Publication number: CN102902355A
Application number: CN2012103201662A
Authority: CN
Inventors: 黄向生; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2012-08-31
Filing date: 2012-08-31
Publication date: 2013-01-30
Anticipated expiration: 2032-08-31
Also published as: CN102902355B

Abstract

The invention discloses a space interaction method of mobile equipment with the human-computer interaction capacity. The method includes the steps of rebuilding a depth map of an image according to the two-dimensional image shot by the mobile equipment in real time; performing region segmentation on the depth map to obtain a human body position region in the depth map; mapping the human body position region in the depth map to a virtual scene; and detecting whether the human body position in the virtual scene collides with other objects in the virtual scene, if the collision happens, determining a body language according to the temporal space change of the human body position in the virtual scene, and the virtual scene making response according to the body. According to the method, accuracy and timeliness of space interaction can be improved.

Description

The spatial interaction method of mobile device

Technical field

The present invention relates to the fields such as image processing, three-dimensional image reconstruction, man-machine interaction and computer utility, relate in particular to a kind of spatial interaction method of mobile device.

Background technology

The spatial interaction technology of mobile device is to utilize the mobile device (such as mobile phone, IPad etc.) with interactive capability to carry out spatial interaction, so that the dummy object in the mobile device can be made corresponding response according to people's indication.Fig. 1 is the synoptic diagram of the spatial interaction technology of existing mobile device.As shown in Figure 1, show the image of a width of cloth cat in the mobile device, spatial interaction by mobile device, we can make in the place ahead (place that its camera can be taken) of mobile device the gesture of the grasping movement of a vacation of hand, mobile device makes the virtual cat of its demonstration by virtual hand crawl by this gesture of identification.

The spatial interaction method generally includes that depth image is rebuild, hand region is cut apart, Unitary coordinate, collision detection, motion process description and the steps such as gesture identification, virtual scene response, and wherein to rebuild be one of them committed step to depth image.

In computer vision field, depth map is being played the part of very important role, and its quick obtaining seems extremely important.Depth map not only can generate stereo-picture, can also realize that the reconstruction of three-dimensional model and image-based play up.Wherein in real-time reconstructing three-dimensional model, it is particularly important that the accuracy that depth map obtains and real-time seem.

For the diversified method of having obtained of depth map, such as laser imaging radar, laser measuring machine method, structured light method, computer stereo vision method etc.Wherein the computer stereo vision method is a kind of traditional depth map acquisition methods, comprises monocular, binocular and multi-view stereo vision method.In reconstructing three-dimensional model, mostly adopt the method for computer stereo vision at present.Yet this method but exists some shortcomings and treats improvements: for the feature point extraction accuracy, because the texture variations among the different images is different, too smooth zone is difficult to extract accurately unique point; For the real-time of feature point extraction, different extracting method exists very large difference in speed, often when guaranteeing real-time, has lost accuracy, and when guaranteeing accuracy, has lost real-time.Therefore, how to guarantee simultaneously that accuracy and real-time are important topics.

Summary of the invention

The technical matters that (one) will solve

Technical matters to be solved by this invention is the spatial interaction method that proposes a kind of mobile device, to solve existing spatial interaction technology accuracy and the not good problem of real-time.

(2) technical scheme

The present invention proposes a kind of spatial interaction method that is applied to mobile device, and described mobile device refers to have the portable information processing device of interactive capability and image camera function, and this exchange method comprises the steps:

Step 1, utilize mobile device to take two dimensional image, rebuild the depth map of this two dimensional image, comprise the image of human body in the described two dimensional image;

Step 2, described depth map is carried out Region Segmentation, obtain the human body image in this depth map;

Step 3, with in human body image mapped to a virtual scene in the described depth map;

Step 4, detect in the described virtual scene human body whether with this scene in other objects bump, if bump, then forward step 6 to, if do not bump, then forward step 7 to;

Step 5, determine people's body language according to the spatial variations in time of the human body in the described virtual scene;

Step 6, described virtual scene are made response according to described body language;

Step 7, judge whether the image of the human body that described mobile device is taken upgrades, if upgrade, then turns back to step 1, if do not upgrade, then the method finishes.

(3) beneficial effect

The present invention aspect the reconstruction of depth map, efficiently solves accuracy and the real-time of obtaining depth image by the spatial interaction method of the mobile device algorithm when depth map reconstruction, Region Segmentation, the collision detection is carried out brand-new design; Aspect Region Segmentation, can accurately extract hand region in conjunction with connected domain and skin color segmentation method; Aspect collision detection, use the method for ball bounding box, guaranteed mutual real-time.

Description of drawings

Fig. 1 is existing spatial interaction technology synoptic diagram;

Fig. 2 is the process flow diagram of the spatial interaction method of mobile device proposed by the invention;

Fig. 3 is the main flow chart of depth image reconstruction procedures of the present invention;

Fig. 4 is the process flow diagram that generates describer in the depth image reconstruction procedures of the present invention;

Fig. 5 is generating depth map flow chart of steps in the depth image reconstruction procedures of the present invention;

Fig. 6 is hand region segmentation procedure main flow chart of the present invention;

Fig. 7 is the process flow diagram of neighborhood determining step in the hand region segmentation procedure of the present invention;

Fig. 8 is the process flow diagram of colour of skin determining step in the hand region segmentation procedure of the present invention;

Fig. 9 is collision detection flow chart of steps of the present invention;

Figure 10 is the process flow diagram of Unitary coordinate step of the present invention;

Figure 11 is the process flow diagram of motion process description of the present invention and gesture identification step;

Figure 12 is the integer representation synoptic diagram of two-dimensional space four direction in motion process description of the present invention and the gesture identification step.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

Fig. 2 is the process flow diagram of the spatial interaction method of mobile device proposed by the invention, and the spatial interaction method of described mobile device specifically may further comprise the steps:

Step 1, utilize mobile device to take two dimensional image, rebuild the depth map of this two dimensional image, comprise the image of human body in the described two dimensional image.

The method that has the depth map of multiple acquisition realtime graphic in the prior art for example adopts the mobile device with multi-angled shooting function that hand images is taken, and afterwards the X-Y scheme that obtains is carried out depth reconstruction.In one embodiment of the invention, described mobile device with multi-angled shooting function refers to have the movably man-machine interaction instrument of dual camera.This step is carried out real-time deep figure reconstruction with the image of binocular mobile device collection, and so-called binocular refers to be positioned at two image collecting devices with suitable distance on same plane and the same level line.Adopt binocular mobile device real-time image acquisition, by the camera calibration pre-service, the binocular image that collects is carried out polar curve proofread and correct, realize that the row of binocular image is aimed at.Binocular image after proofreading and correct is carried out the steps such as continuity check that quick parallax figure rebuild and carried out the later stage, filtering, at last disparity map is converted to depth map.

Step 2, described depth map is carried out Region Segmentation, obtain the human body image in this depth map.

According to one embodiment of the present invention, this step can adopt the growth of connected domain and the colour of skin to judge that the mode that combines carries out.

Because in spatial interaction, hand often is positioned at the front scenic spot of image, so the degree of depth that it produces is maximum, so search the corresponding pixel of depth capacity, it is defined as the hand position pixel, then to all directions growth district, if the pixel in this neighborhood of pixel points meets certain rule, then these pixels are considered as the same area.It is all out divided that described rule should guarantee to be positioned in the image pixel of hand region, if for example the absolute value of the gray scale difference value between adjacent two pixels is less than defined threshold, can determine that then these adjacent two pixels are arranged in same connected domain.

Then, can continue to adopt the method for colour of skin judgement to be partitioned into accurate hand region.The method is namely obtained the hand colour element dot information that has split, and just is the colour of skin if certain pixel satisfies given condition, otherwise is the non-colour of skin, and this pixel is rejected.

Step 3, with in human body image mapped to a virtual scene in the described depth map.

The reconstruction depth image of the hand region that this step will split from realistic space is mapped in the virtual scene.Determine in virtual scene and the depth image after the ratio between the hand region, the image coordinate of hand region in the depth image is mapped in the virtual scene, i.e. the generating virtual hand.

Step 4, detect in the described virtual scene human body whether with this virtual scene in other objects bump, if bump, then forward step 6 to, if do not bump, then forward step 7 to.

According to one embodiment of the present invention, the some set that obtains virtual hand in the step 3 can be become the triangle gridding set, and adopt the technology of ball bounding box to carry out described detection.

Step 5, determine people's body language according to the spatial variations in time of the human body in the described virtual scene.

According to a kind of embodiment of the present invention, determine the behavior of hand gesture according to rules such as the direction of motion of this hand region in the successive frame of virtual scene, speed, acceleration.In a specific embodiment, can define six gestures: move to left, move to right, on move, move down, push away, draw, wherein push away and draw to refer to respectively do perpendicular to binocular mobile device plane drawing near and from the close-by examples to those far off gesture.

Step 6, described virtual scene are made response according to described body language.

This step for example is that hand is made the gesture that moves to left, and the virtual hand in the virtual scene is also made the gesture that moves to left, and dummy object is made response according to the gesture that moves to left of virtual hand.

Step 7 judges whether the image of the human body that described a plurality of video camera is taken upgrades, if upgrade, then turns back to step 1, if do not have, then this method finishes.

Because therefore the real-time of spatial interaction technology needs judge whether data have renewal, if having, then gets back to step 1, the binocular image of new collection is carried out depth image reconstruction (module 101); If no, then processing finishes.

The below is described in detail the key step that relates in the said method.

Described human body typically refers to hand region.Before the hand region in the realistic space is transformed into virtual scene, need to carries out first the reconstruction of depth image, thereby obtain hand region in the position of realistic space.Fig. 3 is the process flow diagram according to the reconstruction depth map of one embodiment of the present invention, in this embodiment, and the two dimensional image of two width of cloth hand region about described two dimensional image is taken by a binocular mobile device.As shown in Figure 3, the specific implementation process of depth image reconstruction further may further comprise the steps:

Step 101, described two width of cloth two dimensional images are carried out polar curve proofread and correct.

Need to calibrate pre-service work to the binocular mobile device at this.Obtain focal length, imaging initial point and the distortion factor of each image collecting device by calibration, thereby determine the correspondent transform relation of image coordinate and world coordinates.Carry out afterwards the solid calibration, obtain rotation matrix and the translation vector of binocular mobile device relative position.The correct image that uses the polar curve correction principle that the binocular mobile device is obtained, make two width of cloth images to polar curve just in the same horizontal line, any point just must have identical line number with its corresponding point on another width of cloth image on the piece image like this, only needs to carry out linear search at this row and can match corresponding point.

Step 102, to described two width of cloth two dimensional image interlacing to each pixel extraction pixel describer, this pixel describer be used for to be described the situation of change of gray-scale value around the pixel.

In this embodiment, the generation method of pixel describer is: the grey scale pixel value to the present frame of binocular image carries out respectively horizontal and vertical Sobel operator planar convolution, and laterally the nuclear of convolution is

[\begin{matrix} - 1 & 0 & + 1 \\ - 2 & 0 & + 2 \\ - 1 & 0 & + 1 \end{matrix}],

Vertically the nuclear of convolution is

[\begin{matrix} + 1 & + 2 & + 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}];

The window of employing 5 * 5 is as the extraction window of unique point describer, take laterally and the image after vertical Sobel convolution the Sobel convolution value of pixel is described the extraction of device as the object interlacing, extract 12 Sobel convolution values that pixel is corresponding in the window after horizontal Sobel convolution, extract 4 Sobel convolution values that pixel is corresponding in the window after vertical Sobel convolution.What Fig. 4 represented is the particular location that this step is extracted the convolution value, and a Sobel convolution value is extracted in numeral 1 representative in the grid, and twice Sobel convolution value extracted in numeral 2 representatives, extracts altogether 16 convolution values as the feature describer.

Step 103, in described two width of cloth two dimensional images, determine remarkable Stereo matching point, and calculate its parallax that significantly the Stereo matching point refers to be positioned at assigned address and texture value greater than the pixel of defined threshold.

In order to improve computing velocity, in this embodiment, when calculating parallax, take the method for interlacing calculating, and carry out the pixel disparity computation one time every certain step-length.

In this embodiment, carry out the disparity computation of a pixel every five step-lengths, judge that at first whether 16 convolution values of describer sum of current pixel point is greater than the texture threshold value, thereby the texture that guarantees this pixel position is too not smooth, if less than the texture threshold value of setting, it is invalid then the parallax value of this pixel to be set to; Centered by each remarkable pixel, create the window of 5 * 5 sizes, extract the corresponding describer of pixel at four angles of this window as the matching parameter of central pixel point, with left figure as the reference image, suppose that corresponding pixel parallax is d among the pixel of parallax to be calculated among the left figure and the right figure, then can obtain pixel corresponding to right figure, to four jiaos of describers of the corresponding window of the left and right sides view of this pixel, it is matching parameter, ask poor and carry out the absolute value summation, because parallax d has a span, the absolute value sum of being on duty hour, the parallax value that this moment, corresponding parallax d was this pixel.

Step 104, spread from remarkable Stereo matching point, obtain the parallax of all pixels.

In this embodiment, the remarkable Stereo matching point of tentatively choosing is spread, adopt the De Laonei Triangulation Method that these sparse points are carried out triangulation, remarkable Stereo matching point is split into triangle gridding, can obtain three corresponding pixel coordinates in summit of triangle gridding, thereby can calculate the straight-line equation on three limits, then can determine the coordinate of each pixel of being surrounded by this triangle gridding, at last all pixels that are included in the triangle gridding be carried out disparity computation.

In this embodiment, judge that whether 16 convolution values of describer sum of current pixel point is greater than the texture threshold value, thereby the texture that guarantees this pixel position is too not smooth, if less than the texture threshold value of setting, it is invalid then the parallax value of this pixel to be set to; To calculate left disparity map as example, with left view as the reference image, the parallax of supposing certain pixel is d_curr, then can obtain pixel corresponding to right view, left and right sides view description device corresponding to this pixel asked poor and carried out the absolute value summation, be designated as SSE, because parallax d_curr has a span, when SSE+p hour, the parallax that adopts this moment is the parallax value of this pixel.Wherein

p = \{\begin{matrix} (- \log (γ + \exp (- Δd / 2 σ^{2})) + \log γ) / β + | d_curr - d_plane | & if | d_curr - d_plane | < σ \\ 0 & otherwise \end{matrix}

, d_plane=au+bv+c, u and v are the transverse and longitudinal coordinate of current pixel point, a, and b, c are the weighted value of current grid, they are to find the solution out by three apex coordinates and the corresponding parallax of current triangle gridding, i.e. solving equations

\{\begin{matrix} a u_{1} + b v_{1} + c = d_{1} \\ a u_{2} + b v_{2} + c = d_{2} \\ a u_{3} + b v_{3} + c = d_{3} \end{matrix},

(u ₁, v ₁), (u ₂, v ₂) and (u ₃, v ₃) be the summit of triangle gridding, d ₁, d ₂And d ₃Disparity map for correspondence.Use the same method right disparity map is carried out strong point diffusion.

Step 105, rebuild depth map according to the parallax of described each pixel.

In this embodiment, this step 105 comprises left and right sides consistency check; Remove the zonule; Interpolation; Filtering.

As shown in Figure 5, the process of step 105 generating depth map further comprises again following step:

Step 1051, the left and right sides consistance of the pixel of described two width of cloth two dimensional images is checked.When the difference of the corresponding pixel points parallax of left and right sides view within the specific limits the time, be considered as having disparity consistency, otherwise the parallax of this pixel is set to invalid.

Step 1052, remove the zonule with disparity consistency.The zonule is defined as, in a zone with disparity continuity, if the pixel that should the zone comprises then is considered as the zonule with this zone less than the threshold value of a certain regulation, and the parallax of all pixels in should the zone be set to invalid.

Step 1053, the interpolation of parallax value that the pixel that has invalid parallax value carried out.Be divided into by row interpolation with by two parts of row interpolation.Take by row interpolation as example, when there is the situation of invalid value in the pixel parallax in every row, tackles this pixel and carry out the parallax interpolation.Get the pixel that two ends, the invalid parallax left and right sides have effective parallax, when the difference of two pixel parallaxes of end points during less than certain threshold value, then the invalid parallax pixel interpolation size of all between these two pixels is the mean value of two-end-point pixel parallax value; When the difference of two pixel parallaxes of end points during greater than this threshold value, then the invalid parallax pixel interpolation size of all between these two pixels is the minimum value of two parallaxes of end points.

Step 1054, the pixel of described two width of cloth two dimensional images is carried out filtering.Take the method for medium filtering respectively each pixel of horizontal parallax figure to be carried out horizontal and vertical filtering.Mask length is 7 pixels, and these 7 pixels are arranged according to order from small to large, gets intermediate value as the center pixel parallax value of mask.

Step 2, described depth map is carried out Region Segmentation, obtain the human body in this depth map.

Owing to when carrying out spatial interaction, only need to obtain the motion process of human body (for example hand), so the depth map that obtains need to be carried out cutting apart of hand region.Adopt the method that connected domain increases and colour of skin judgement combines when cutting apart.As shown in Figure 6, the hand region segmentation procedure further comprises following step:

Step 201, search pixel corresponding to depth capacity in the depth map, it tentatively is defined as the hand position pixel.

In spatial interaction, hand often is positioned at the place ahead of spatial scene, so as long as find the corresponding pixel of depth capacity in the depth map, can determine the approximate location of hand region.

Step 202, judge whether the described neighborhood territory pixel point that tentatively is defined as each pixel of hand region is positioned at the same area with this pixel.

In this embodiment, the pixel neighborhood of a point refers to this pixel zone of four pixels on every side.

As shown in Figure 7, the process of the neighborhood of step 202 judgement further comprises again following step:

Step 2021, determine initial pixel: when beginning, this initial pixel is pixel corresponding to depth capacity, judges each time that later on the initial pixel that adopts all is the pixel in the last same connected domain of determining when judging.

Step 2022, read the depth value of each pixel in this initial pixel neighborhood of a point.

Step 2023, when the difference of the depth value of each pixel in the described initial pixel neighborhood of a point during less than a threshold value, then should initial pixel neighborhood of a point as a connected domain, judge that namely the pixel in this initial pixel neighborhood of a point is positioned at the same area with this initial pixel, returns step 2021.

Step 203, judge whether also to exist in the pixel map and else belong to the pixel of hand region.

This step namely judges whether zero growth point, in this embodiment when the depth value difference of all pixels that are positioned at the connected domain edge and neighborhood territory pixel point during all the time greater than defined threshold, then explanation this moment zero growth point.

Step 204, the described pixel that is defined as hand region is carried out the colour of skin judge, to remove the pixel that does not obviously belong to hand region.

After the method for using connected domain to increase roughly splits hand region, need to carry out the colour of skin and judge, thereby accurately extract hand region.As shown in Figure 8, the process of the colour of skin of step 204 judgement further comprises again following step:

The pixel chromatic information that step 2041, extraction hand split.Because being the colour of skin, this step judges, so the information of obtaining should be chromatic information.

Step 2042, judge whether this pixel chromatic information satisfies given colour of skin condition.Definite method of colour of skin point is: the zone that will split transforms to the HSV space according to transformation law with rgb space, if the H of pixel, S, V component satisfy

\{\begin{matrix} H &GreaterEqual; 0; S &GreaterEqual; 15; S &GreaterEqual; 0.75 H + 0.3 V - 30 \\ S \leq - H - 0.1 V + 110; H \leq - 0.4 v + 75 \\ S \leq 0.08 (100 - V) H + 0.6 V \end{matrix},

Then pixel is defined as colour of skin point, otherwise just is non-colour of skin point.If certain pixel is confirmed as colour of skin point, then forwards step 2044 to, otherwise forward step 2043 to.

Step 2043, the gray-scale value of this pixel is set to (0,0,0).

Step 2044, keep the gray value information of this pixel.

Step 2045, judge whether to exist without the hand region pixel of judging.If exist the pixel that is positioned at hand region without judgement, then get back to step 2041, if being positioned at the pixel of hand region, all finish judgement, then end process.

Owing to the depth image of rebuilding and the coordinate disunity of virtual scene, two different coordinate systems need to be carried out normalizing, and the size of the virtual hand behind the normalizing is adjusted to corresponding ratio with the virtual scene size.As shown in Figure 9, the Unitary coordinate step further comprises following step:

Step 301 is set virtual scene.Virtual scene can be taken from reality scene, also can be three-dimensional material, also can be both combinations.

Step 302 is determined the ratio between the hand region in virtual scene and the depth image.In virtual scene, choose object of reference, can obtain between the two ratio according to coordinate and the image coordinate of hand region of object of reference in the virtual scene in display screen.

Step 303 is mapped to the image coordinate of hand region in the depth image in the virtual scene, i.e. the generating virtual hand.According to the ratio between the ratio of object of reference and hand region and object of reference and the virtual hand that will be projected, can determine the projection pantograph ratio of hand region, the image coordinate of hand region can be mapped in the virtual scene by this pantograph ratio.

Hand region in the realistic space is mapped in the Virtual Space forms after the virtual hand, need to detect the collision relation between virtual hand and the virtual scene, in order to make virtual scene make corresponding reaction.As shown in figure 10, the collision detection step may further comprise the steps:

Step 401 is enclosed virtual hand and virtual scene with the encirclement bag respectively.Encirclement ball in the virtual scene is defined as comprising the smallest sphere of certain object, must there be two points to belong to the virtual objects that it comprises on the sphere of this smallest sphere, only have when two smallest sphere of surrounding virtual hand and encirclement virtual objects bump, virtual hand and virtual objects just may bump.Describe the encirclement ball and only need two parameters, i.e. sphere centre coordinate and the radius of a ball.

Step 402, judge whether the encirclement ball of described virtual hand region and described virtual scene bumps.

The collision detection standard is: the centre of sphere distance of two balls and the relation of two radius of a ball sums, if the centre of sphere illustrates then that apart from less than or equal to the radius sum collision has occured two spheroids, if the centre of sphere illustrates then that apart from greater than the radius sum two spheroids do not bump.

Step 5, determine body language according to the variation in time of the human body in the described virtual scene.

Obtain the rule such as direction of motion, speed, acceleration of hand according to continuous several frames and determine the hand exercise state.The present invention has stipulated six kinds of gestures, specifically describes as shown in the table:

As shown in figure 11, motion process description and gesture identification step further comprise following step:

Step 501, the track of the virtual hand of record on each frame.Determine a point of fixity of hand region, the present invention gets middle fingertip as point of fixity, and middle fingertip is arranged in the top of world coordinates hand region all the time in the time of therefore should guaranteeing hand exercise, records the movement locus of this each frame of point of fixity.

Step 502, adopt the method for fitting a straight line to carry out match to described track.Because the present invention has only defined up and down four direction of motion in two-dimensional space, so adopt the method for fitting a straight line that the movement locus in the step 501 is carried out match.

Step 503, judge that whether described fitting a straight line two ends length is greater than a certain threshold value, this threshold value is to meet the starting point of four kinds of mode of motion of two-dimensional space and the bee-line of terminating point, if greater than this length, determine that then described virtual hand is the motion in the two-dimensional space, forward step 504 to, otherwise forward step 505 to.

Step 504, with fitting a straight line integer representation.Defined four kinds of direction of motion in two-dimensional space, therefore 360 ° of two-dimensional spaces have been equally divided into four parts, as shown in figure 12, four direction uses respectively 0,1,2,3 these four integers to represent.

Step 505, obtain virtual hand region area situation of change, determine the direction that seesaws of virtual hand.The hand shape did not change when the present invention should guarantee hand exercise, if the starting point of motion and the distance of terminating point are less than a certain threshold value, illustrate that hand exercise is perpendicular to two-dimensional space, in three dimensions, move, according to initial hand area and the comparison that stops the hand area, thereby determine that hand travels forward or backward motion.

Step 506, according to the specific description of carrying out for the gesture of virtual hand of step 504 and step 505, mate with predefined gesture model, be which kind of gesture thereby determine.

In this embodiment of the present invention, described gesture model is aforesaid six kinds of gesture models.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. spatial interaction method that is applied to mobile device, described mobile device refers to have the portable information processing device of interactive capability and image camera function, it is characterized in that, comprises the steps:

2. spatial interaction method as claimed in claim 1 is characterized in that, in step 1, described human body is hand, and described body language is gesture.

3. spatial interaction method as claimed in claim 2 is characterized in that, and is described in step 1, and described mobile device is the mobile device with two camera heads, two width of cloth two dimensional images about the hand that described two dimensional image is taken for this mobile device.

4. spatial interaction method as claimed in claim 3 is characterized in that, described step 1 comprises the steps:

Step 101, described two width of cloth two dimensional images are carried out polar curve proofread and correct;

Step 102, to described two width of cloth two dimensional image interlacing to each pixel extraction pixel describer, this pixel describer be used for to be described the situation of change of gray-scale value around the pixel;

Step 103, in described two width of cloth two dimensional images, determine remarkable Stereo matching point, and calculate its parallax that significantly the Stereo matching point refers to be positioned at assigned address and texture value greater than the pixel of defined threshold;

Step 104, spread from described remarkable Stereo matching point, obtain the parallax of all pixels of described two width of cloth two dimensional images;

Step 105, rebuild depth map according to the parallax of described each pixel.

5. spatial interaction method as claimed in claim 4 is characterized in that, described step 102 comprises the steps: the grey scale pixel value of the present frame of described two width of cloth two dimensional images is carried out respectively horizontal and vertical Sobel operator planar convolution, and laterally the nuclear of convolution is Vertically the nuclear of convolution is The window of employing 5 * 5 is as the extraction window of unique point describer, take laterally and the image after vertical Sobel convolution the Sobel convolution value of pixel is described the extraction of device as the object interlacing, extract 12 Sobel convolution values that pixel is corresponding in the window after horizontal Sobel convolution, extract 4 Sobel convolution values that pixel is corresponding in the window after vertical Sobel convolution.

6. spatial interaction method as claimed in claim 4 is characterized in that, described step 103 is taked the method for interlacing calculating, and carries out the pixel disparity computation one time every certain step-length.

7. spatial interaction method as claimed in claim 4, it is characterized in that, described step 104 comprises: adopt the De Laonei Triangulation Method that these sparse points are carried out triangulation, described remarkable Stereo matching point is split into triangle gridding, obtain three corresponding pixel coordinates in summit of triangle gridding, calculate the straight-line equation on three limits, determine by the coordinate of each pixel that this triangle gridding surrounded, at last all pixels that are included in the triangle gridding are carried out disparity computation.

8. spatial interaction method as claimed in claim 4 is characterized in that, described step 105 comprises the steps:

Step 1051, the left and right sides consistance of the pixel of described two width of cloth two dimensional images is checked;

Step 1052, remove the zonule with disparity consistency;

Step 1053, the interpolation of parallax value that the pixel that has invalid parallax value carried out;

Step 1054, the pixel of described two width of cloth two dimensional images is carried out filtering.

9. spatial interaction method as claimed in claim 2 is characterized in that, described step 2 comprises the steps:

Step 201, search pixel corresponding to depth capacity in the described depth map, it tentatively is defined as the hand position pixel;

Step 202, judge whether the described neighborhood territory pixel point that tentatively is defined as each pixel of hand region is positioned at the same area with this pixel;

Step 203, judge whether also to exist in the pixel map and else belong to the pixel of hand region;

10. spatial interaction method as claimed in claim 9 is characterized in that, described step 202 comprises the steps:

Step 2021, determine initial pixel: when beginning, this initial pixel is pixel corresponding to depth capacity, judges each time that later on the initial pixel that adopts all is the pixel in the last same connected domain of determining when judging;

Step 2022, read the depth value of each pixel in this initial pixel neighborhood of a point;

Step 2023, when the difference of the depth value of each pixel in the described initial pixel neighborhood of a point during less than a threshold value, then should initial pixel neighborhood of a point as a connected domain, judge that namely the pixel in this initial pixel neighborhood of a point is positioned at the same area with this initial pixel, and return step 2021.

11. spatial interaction method as claimed in claim 9 is characterized in that described step 204 comprises the steps:

The pixel chromatic information that step 2041, extraction hand split.Because being the colour of skin, this step judges, so the information of obtaining should be chromatic information;

Step 2042, judge that according to the chromatic information of pixel whether this pixel satisfies given colour of skin condition, if certain pixel is confirmed as colour of skin point, then forwards step 2044 to, otherwise forwards step 2043 to;

Step 2043, the gray-scale value of this pixel is set to (0,0,0);

Step 2044, keep the gray value information of this pixel;

Step 2045, judge whether to exist without the hand region pixel of judging, if exist the pixel that is positioned at hand region without judgement, then get back to step 2041, finish judgement if all are positioned at the pixel of hand region, then finish.

12. spatial interaction method as claimed in claim 2 is characterized in that described step 3 comprises the steps:

Step 301, virtual scene is calibrated;

The coordinate of step 302, acquisition depth map;

Step 303, try to achieve the transition matrix of virtual scene coordinate and depth map coordinate;

The size of step 304, unified virtual hand and virtual scene.

13. spatial interaction method as claimed in claim 2 is characterized in that described step 4 comprises the steps:

Step 401, enclose virtual hand and virtual scene with the encirclement bag respectively;

Step 402, judge whether the described encirclement ball of described virtual hand region and described virtual scene bumps.

14. spatial interaction method as claimed in claim 2 is characterized in that described step 5 comprises the steps:

Step 501, the track of the virtual hand of record on each frame;

Step 502, adopt the method for fitting a straight line to carry out match to described track;

Step 503, whether judge described fitting a straight line two ends length greater than a certain threshold value, if this length, determines then that described virtual hand is the motion in the two-dimensional space greater than this threshold value, forward step 504 to, otherwise forward step 505 to;

Step 504, with described fitting a straight line integer representation;

Step 505, obtain virtual hand region area situation of change, determine the direction that seesaws of virtual hand;

Step 506, according to the specific description of carrying out for the gesture of virtual hand of step 504 and step 505, mate with the Pre-defined gesture model, thereby determine this virtual gesture.