CN107256083A

CN107256083A - Many finger method for real time tracking based on KINECT

Info

Publication number: CN107256083A
Application number: CN201710355932.1A
Authority: CN
Inventors: 卢光宏; 童晶; 尹薇娜; 顾晨婷
Original assignee: Changzhou Campus of Hohai University
Current assignee: Changzhou Campus of Hohai University
Priority date: 2017-05-18
Filing date: 2017-05-18
Publication date: 2017-10-17

Abstract

The invention discloses a kind of many finger method for real time tracking based on KINECT, step one：Hand region is split；Step 2：Finger tip detection based on pixel classifications；Step 3：Obtain the three-dimensional position of finger tip；Step 4：Three-dimensional finger tip track following.Many finger method for real time tracking based on KINECT that the present invention is provided, can be real-time, quickly and accurately carries out finger tracking, and the stability of a system is high, and user experience is high, and reflection speed is fast.

Description

Many finger method for real time tracking based on KINECT

Technical field

The present invention relates to many finger method for real time tracking based on KINECT, belong to biological characteristic Computer Recognition Technology neck Domain.

Background technology

In virtual reality and man-machine interactive system, real-time, accurate and stable multifinger tracking can be brought to user Friendly interactive experience.Based on its abundant application prospect, three-dimensional multifinger tracking has obtained the extensive concern of researcher, and it is calculated Method is broadly divided into four steps：The segmentation of hand region, the detection of finger tip, the acquisition of finger tip three-dimensional position, finger tip three-dimensional position And its tracking of track.Wherein, the segmentation of hand region is a critically important step, because it directly determines finger tip detection The accuracy of algorithm.But the shortcoming of this method is easily to be influenceed by external environment conditions such as light, in the bad situation of illumination condition Lower easily failure.Therefore, some researchers obtain reliable hand cut zone using infrared camera, but infrared camera price It is expensive.Also some researchers are preferable to obtain by providing some restrictive conditions, such as illumination of fixed background, fixation Cut zone, but the method limits the application scenario of finger tracking.The detection algorithm of finger tip mainly divides three classes：Based on profile, Based on shape, based on template.Shortcoming based on contour method is that it requires that the profile of hand region is more accurate, otherwise will be gone out Existing mistake；Method based on template matches also has this shortcoming, and speed is slow.As for the method based on shape, to determine One suitable open action window size is relatively difficult.

, being capable of capturing scenes in real time using this cheap hardware device with the release of Microsoft's KINECT equipment Colour and depth information.Some researchers attempt to carry out the research of multifinger tracking using KINECT, but are all in KINECT The two-dimentional finger tip detection algorithm of more robust is designed on the basis of the depth map provided, not obtain two-dimentional finger tip point it Afterwards, depth value that KINECT provides is continued with to track three-dimensional coordinate and the track of finger tip point.

The algorithm that this software is realized splits hand region using KINECT depth map, and according to KINECT data Feature, it is proposed that an improved finger tip detection algorithm based on pixel classifications.On this basis, KINECT depth is recycled Figure, obtains the three-dimensional coordinate of finger tip, then realizes finger tip three-dimensional position by successional Kalman filter between application frame With the tenacious tracking of track.By contrast, algorithm of the invention can stably trace into the three-dimensional position of finger tip, can not only realize 2D man-machine interactions, moreover it is possible to effectively carry out 3D man-machine interactions.The algorithm of the present invention is simpler, and speed is faster, flat in general PC Platform, while in the case of all fingers of two hands of tracking, can also reach 20+FPS, one hand of tracking can reach 40+FPS.

The content of the invention

Purpose：In order to overcome the deficiencies in the prior art, the present invention provide many fingers based on KINECT in real time with Track method.

Technical scheme：The present invention is realized by following technical solution：

A kind of many finger method for real time tracking based on KINECT, including step are as follows：

Step one：Hand region is split；

Step 2：Finger tip detection based on pixel classifications；

Step 3：Obtain the three-dimensional position of finger tip；

Step 4：Three-dimensional finger tip track following.

The hand region segmentation comprises the following steps：

1a：The center position of hand is tracked using NITE storehouses, further according to depth to center coordinate, depth hand area is extracted Domain：

Wherein, Z_handThe depth value of the central point of the hand traced into is represented, Z (x, y) represents the depth at pixel (x, y) place Value, thresh represents the depth capacity scope of hand；Handmask (x, y) represents the discriminant function of hand region；

1b：By the pixel in step 1a, range shorter is included to one in the bounding box of hand region；Make in bounding box Hand region handmask (x, y)=255, region handmask (x, y)=0 outside bounding box；

Wherein, W (Z) represents the bounding box centered on the central point of hand, and Z is the depth value bounding box size of corresponding coordinate It is as follows apart from KINECT relation with hand：

Width (W (Z))=Heigth (W (Z))=2*min { max { 80-0.2* (Z-640), 60 }, 80 }

Wherein, Width (W (Z)) and Heigth (W (Z)) represent the horizontal width of bounding box and the number of vertical height respectively Value.

The finger tip detection based on pixel classifications, comprises the following steps：

2a：Six sections are turned to by KINECT depth values are discrete：[501mm,600mm]、[601mm,700mm]、[710mm, 800mm]、[801mm,900mm]、[901mm,1000mm]、[1001mm,1100mm]；If each section of gap1 and gap2 value Respectively gap1i and gap2i；Fingertip location is obtained using the KINECT finger contours provided, radius is drawn by the center of circle of finger tip is 20px circle, intersection point distance be finger width Fli, correspondence six sections of region Fli take respectively 10px, 10px, 12px, 13px, 14px, 15px, makes delta=3px, then can obtain two radius of circles：

gap1_i=FL_i/2-delta

gap2_i=FL_i/2+delta

2b：Classification based on pixel is carried out to bianry image according to 2a result：If a. pixel p is the center of circle, gap1 is The size of the circle region Zhong Shou regions proportion of radius is not more than the half of the area of a circle, judges pixel p in hand region Outside；If b. pixel p is the center of circle, gap1 is more than the one of the area of a circle for the size of the circle region Zhong Shou regions proportion of radius Half, judge pixel p in hand region；If c. pixel p be the center of circle, on the premise of meeting b, using gap2 as the circle of radius with There are four intersection points in hand region, judges pixel p in finger areas；If d. pixel p is the center of circle, on the premise of meeting b, gap2 is The circle of radius has two distances between intersection point, and intersection point to be less than the width Fli of corresponding finger with hand region, judges that pixel p exists Fingertip area.

The three-dimensional position for obtaining finger tip, the circle chosen in fingertip area around pixel p using 3px as radius is flat in xy All points in face are sampled, if sampled point depth value is more than 10px with centre point depth value difference, are cast out, and calculate remaining The average of sampled point depth value as centre point depth value.

The three-dimensional finger tip track following, comprises the following steps：

4a：Build and determine Kalman filter prediction finger tip point position：

First, in each frame, measurement finger tip point position in three dimensions and speed, define the state of Kalman filter Vector is

x_t=(x (t), y (t), z (t), v_x(t),v_y(t),v_z(t))

Wherein, x (t), y (t), z (t) represent the coordinate position of finger tip point in three dimensions, v_x(t),v_y(t),v_z(t) represent The speed of finger tip point in each frame；Similarly, the observation vector of definition Kalman filter is

y_t=(x (t), y (t), z (t))

Observe the three-dimensional coordinate of vector representation finger tip point in each frame；

List Kalman filter equation：

x_t+1=Fx_t+Gw_t

y_t=Hx_t+v_t

Wherein, F is state-transition matrix, and G is driving matrix, and H is observation matrix, w_tIt is state vector x_tSystematic error, v_tIt is the error of observation, represents the error of finger tip point detection algorithm and KINECT depth detections：

Secondly, it is assumed that Δ T<Linear uniform motion is approximately regarded in the motion of finger tip point between 0.05s, two continuous frames as； F, G and H are defined as follows：

Finally, using the Kalman filter of each finger, by finger when the position prediction finger of t frames is in t+1 frames The position of time；

4b：Predict the matched jamming between finger tip point and detection finger tip point：

In present frame, all finger tip points of present frame are detected using the Fingertip Detection based on pixel classifications；Meanwhile, The position where present frame finger tip point is predicted by previous frame using Kalman filter；Will prediction finger tip point position and detection The minimum combination of matching error is found as most terminating by combinations matches are done after hand central point arranged clockwise in finger tip point position Really, the value of matching error is the difference of the Euclidean distance of finger tip point three-dimensional position；Afterwards, using in the minimum combination of matching error The predicted position of Kalman filter replace actual position, so tracking is gone down always；

4c：Handle the change tracking of finger tip quantity

In the first scenario, operator changes the quantity of finger tip really；When finger tip quantity increase, new card is opened Thalmann filter tracks it；When finger tip quantity is reduced, corresponding Kalman filter is terminated；

In the latter case, due to error and motion blur, can't detect in some finger tip points of some frames or Person's flase drop measures unnecessary finger tip point；First, timer is set to be used to calculate the time when finger tip quantity starts change；Secondly, If the time of calculating is more than some threshold value, it is the first situation to judge finger tip number change；Otherwise judging finger tip number change is Second of situation, Kalman filter does not change.

Preferably, the threshold value of timer sets 0.2s.

Beneficial effect：Many finger method for real time tracking based on KINECT that the present invention is provided, can be real-time, accurate rapidly Finger tracking really is carried out, the stability of a system is high, and user experience is high, and reflection speed is fast.

Brief description of the drawings

Fig. 1 is the system flow chart of tracking of the present invention；

Fig. 2 is that hand region splits schematic diagram；

Fig. 3 is pixel classifications schematic diagram；

Fig. 4 is pixel classifications decision tree schematic diagram.

Embodiment

The present invention is further described below in conjunction with the accompanying drawings.

As shown in figure 1, a kind of many finger method for real time tracking based on KINECT, comprise the following steps：

Step one：The depth image of people is obtained with KINECT；

Step 2：Hand region is split；

Step 3：Detect the fingertip location of two dimension；

Step 4：Obtain three-dimensional fingertip location；

Step 5：The three-dimensional fingertip location of tracking obtains its movement locus.

As shown in Fig. 2 KINECT obtain depth image after, by hand region by the way of box-packed it is separated. The hand region segmentation comprises the following steps：

2a:The center position of hand is tracked using NITE storehouses, further according to depth to center coordinate, depth hand area is extracted Domain：

2b：By the pixel in step 2a, range shorter is included to one in the bounding box of hand region；Make in bounding box Hand region handmask (x, y)=255, region handmask (x, y)=0 outside bounding box；

Width (W (Z))=Heigth (W (Z))=2*min { max { 80-0.2* (Z-640), 60 }, 80 }

Wherein, Width (W (Z)) and Heigth (W (Z)) represent the horizontal width of bounding box and the number of vertical height respectively Value, later calculating is all based on this scope to complete.

The fingertip location for detecting two dimension comprises the following steps：

3a, turn to six sections by KINECT depth values are discrete：[501mm,600mm]、[601mm,700mm]、[710mm, 800mm]、[801mm,900mm]、[901mm,1000mm]、[1001mm,1100mm]；If each section of gap1 and gap2 value Respectively gap1i and gap2i；Fingertip location is obtained using the KINECT finger contours provided, radius is drawn by the center of circle of finger tip is 20px circle, intersection point distance be finger width Fli, correspondence six sections of region Fli take respectively 10px, 10px, 12px, 13px, 14px, 15px, makes delta=3px, then can obtain two radius of circles：

gap1_i=FL_i/2-delta

gap2_i=FL_i/2+delta

3b, as shown in Figure 3,4, the classification based on pixel is carried out according to 3a result to bianry image：As described in Fig. 3 (a), If pixel p is the center of circle, gap1 is not more than the half of the area of a circle for the size of the circle region Zhong Shou regions proportion of radius, Judge pixel p outside hand region；As described in Fig. 3 (b), if pixel p is the center of circle, gap1 is the circle region Zhong Shou areas of radius The size of domain proportion is more than the half of the area of a circle, judges pixel p in hand region；As described in Fig. 3 (c), if picture Vegetarian refreshments p is the center of circle, on the premise of meeting b, has four intersection points using gap2 as the circle of radius and hand region, judges pixel p in hand Refer to region；As described in Fig. 3 (d), if pixel p is the center of circle, on the premise of meeting b, gap2 has two for the circle of radius with hand region Individual intersection point, and distance is less than the width Fli of corresponding finger between intersection point, judges pixel p in fingertip area.

The three-dimensional fingertip location that obtains comprises the following steps：

All points of the circle in x/y plane in fingertip area around pixel p using 3px as radius are chosen to be sampled, if Sampled point depth value is more than 10px with centre point depth value difference, then casts out, and calculates the average conduct of remaining sampled point depth value The depth value of centre point.

5a：Build and determine Kalman filter prediction finger tip point position：

x_t=(x (t), y (t), z (t), v_x(t),v_y(t),v_z(t))

y_t=(x (t), y (t), z (t))

List Kalman filter equation：

x_t+1=Fx_t+Gw_t

y_t=Hx_t+v_t

5b：Predict the matched jamming between finger tip point and detection finger tip point：

In present frame, all finger tip points of present frame are detected using the Fingertip Detection based on pixel classifications；Meanwhile, The position where present frame finger tip point is predicted by previous frame using Kalman filter；But in fact, based on pixel point There is certain random error in the finger tip detection of class, there is also certain experiences error for Empirical rules.In order to reduce amount of calculation, simultaneously Finger can seldom be intersected in view of operator, the present invention presses finger tip point hand central point arranged clockwise, so exists When matching, the order of finger is fixed, therefore whole search space is just reduced to combination stage from arrangement level, can be with larger Reduction amount of calculation.Will prediction finger tip point position and detection finger tip point position by done after hand central point arranged clockwise combination Match somebody with somebody, find the minimum combination of matching error as final result, the value of matching error is the Euclidean distance of finger tip point three-dimensional position Difference；Afterwards, actual position is replaced using the predicted position of the Kalman filter in the minimum combination of matching error, such as This is tracked down always.

5c：Handle the tracking of finger tip quantity

The change of finger tip quantity has two kinds of situations, and the first is, operator changes the quantity of finger tip really.Second of feelings Condition is, due to error and motion blur, can't detect in some finger tip points of some frames or flase drop measures unnecessary finger Cusp.

In the first scenario, when finger tip quantity increase, open new Kalman filter to track it；When finger tip number When amount is reduced, corresponding Kalman filter is terminated；

In the latter case, cannot so it handle.In order to solve this problem, the present invention has to differentiate between out two kinds of fingers Sharp number change.Because the first finger tip number change is a kind of change of stabilization, then between multiframe, it will remain this Change.On the contrary, second of finger tip number change is a kind of temporary transient, unstable change, therefore, it can not be tieed up between multiframe Hold.According to this difference, the present invention uses following steps：First, timer is set based on when finger tip quantity starts change Evaluation time；Secondly, if the time of calculating is more than some threshold value, the threshold value of timer sets 0.2s, then judges that finger tip number change is The first situation；Otherwise it is second of situation to judge finger tip number change, and Kalman filter does not change, so that largely Improve the stability of tracking.

Described above is only the preferred embodiment of the present invention, it should be pointed out that：For the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims

1. a kind of many finger method for real time tracking based on KINECT, it is characterised in that：It is as follows including step：

Step one：Hand region is split；

Step 2：Finger tip detection based on pixel classifications；

Step 3：Obtain the three-dimensional position of finger tip；

Step 4：Three-dimensional finger tip track following.

2. many finger method for real time tracking according to claim 1 based on KINECT, it is characterised in that：The hand area Regional partition comprises the following steps：

1a：The center position of hand is tracked using NITE storehouses, further according to depth to center coordinate, depth hand region is extracted：

Wherein, Z_handThe depth value of the central point of the hand traced into is represented, Z (x, y) represents the depth value at pixel (x, y) place, Thresh represents the depth capacity scope of hand；Handmask (x, y) represents the discriminant function of hand region；

1b：By the pixel in step 1a, range shorter is included to one in the bounding box of hand region；Make the hand in bounding box Region handmask (x, y)=0 outside portion region handmask (x, y)=255, bounding box；

Wherein, W (Z) represents the bounding box centered on the central point of hand, and Z is the depth value bounding box size and hand of corresponding coordinate Relation apart from KINECT is as follows：

Width (W (Z))=Heigth (W (Z))=2*min { max { 80-0.2* (Z-640), 60 }, 80 }

Wherein, Width (W (Z)) and Heigth (W (Z)) represent the horizontal width of bounding box and the numerical value of vertical height respectively.

3. many finger method for real time tracking according to claim 1 based on KINECT, it is characterised in that：It is described to be based on picture The finger tip detection of element classification, comprises the following steps：

2a：Six sections are turned to by KINECT depth values are discrete：[501mm,600mm]、[601mm,700mm]、[710mm,800mm]、 [801mm,900mm]、[901mm,1000mm]、[1001mm,1100mm]；If each section of gap1 and gap2 value is respectively Gap1i and gap2i；Fingertip location is obtained using the KINECT finger contours provided, is that radius is drawn as 20px's in the center of circle using finger tip Circle, intersection point distance is finger width Fli, and six sections of region Fli of correspondence take 10px, 10px, 12px, 13px, 14px, 15px respectively, Delta=3px is made, then can obtain two radius of circles：

gap1_i=FL_i/2-delta

gap2_i=FL_i/2+delta

2b：Classification based on pixel is carried out to bianry image according to 2a result：If a. pixel p is the center of circle, gap1 is radius The size of circle region Zhong Shou regions proportion be not more than the half of the area of a circle, judge pixel p outside hand region；b. If pixel p is the center of circle, gap1 is more than the half of the area of a circle for the size of the circle region Zhong Shou regions proportion of radius, sentences Fixation vegetarian refreshments p is in hand region；If c. pixel p is the center of circle, on the premise of meeting b, the Yuan Yushou areas by radius of gap2 There are four intersection points in domain, judges pixel p in finger areas；If d. pixel p is the center of circle, on the premise of meeting b, gap2 is radius Circle and hand region there are two distances between intersection point, and intersection point to be less than the width Fli of corresponding finger, judge pixel p in finger tip Region.

4. many finger method for real time tracking according to claim 1 based on KINECT, it is characterised in that：The acquisition refers to The three-dimensional position of point, chooses all points of the circle in x/y plane in fingertip area around pixel p using 3px as radius and is taken Sample, if sampled point depth value is more than 10px with centre point depth value difference, casts out, and calculates the average of remaining sampled point depth value It is used as the depth value of centre point.

5. many finger method for real time tracking according to claim 1 based on KINECT, it is characterised in that：The three-dimensional refers to Sharp track following, comprises the following steps：

4a：Build and determine Kalman filter prediction finger tip point position：

First, in each frame, measurement finger tip point position in three dimensions and speed, define the state vector of Kalman filter For

x_t=(x (t), y (t), z (t), v_x(t),v_y(t),v_z(t))

Wherein, x (t), y (t), z (t) represent the coordinate position of finger tip point in three dimensions, v_x(t),v_y(t),v_z(t) represent every The speed of finger tip point in one frame；Similarly, the observation vector of definition Kalman filter is

y_t=(x (t), y (t), z (t))

List Kalman filter equation：

x_t+1=Fx_t+Gw_t

y_t=Hx_t+v_t

Secondly, it is assumed that Δ T<Linear uniform motion is approximately regarded in the motion of finger tip point between 0.05s, two continuous frames as；F, G and H is defined as follows：

<mrow> <mi>F</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>&Delta;</mi> <mi>T</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>&Delta;</mi> <mi>T</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>&Delta;</mi> <mi>T</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> </mrow>

Finally, using the Kalman filter of each finger, by finger when the position prediction finger of t frames is in t+1 frames Position；

In present frame, all finger tip points of present frame are detected using the Fingertip Detection based on pixel classifications；Meanwhile, utilize Kalman filter predicts the position where present frame finger tip point by previous frame；Will prediction finger tip point position and detection finger tip The minimum combination of matching error is found as final result in point position by combinations matches are done after hand central point arranged clockwise, Value with error is the difference of the Euclidean distance of finger tip point three-dimensional position；Afterwards, the card in the minimum combination of matching error is utilized The predicted position of Thalmann filter replaces actual position, and so tracking is gone down always；

4c：Handle the change tracking of finger tip quantity

In the first scenario, operator changes the quantity of finger tip really；When finger tip quantity increase, new Kalman is opened Wave filter tracks it；When finger tip quantity is reduced, corresponding Kalman filter is terminated；

In the latter case, due to error and motion blur, it can't detect or miss in some finger tip points of some frames Detect unnecessary finger tip point；First, timer is set to be used to calculate the time when finger tip quantity starts change；Secondly, if meter Evaluation time is more than some threshold value, then it is the first situation to judge finger tip number change；Otherwise it is second to judge finger tip number change The situation of kind, Kalman filter does not change.

6. many finger method for real time tracking according to claim 5 based on KINECT, it is characterised in that：The threshold of timer Value sets 0.2s.