CN110598510A

CN110598510A - Vehicle-mounted gesture interaction technology

Info

Publication number: CN110598510A
Application number: CN201810606708.XA
Authority: CN
Inventors: 周秦娜
Original assignee: Individual
Current assignee: Shenzhen Point Cloud Intelligent Technology Co ltd
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2019-12-20
Anticipated expiration: 2038-06-13
Also published as: CN110598510B

Abstract

A vehicle-mounted gesture interaction technology comprises the following steps: (1) identifying moving objects using a modified moving object detection algorithm; (2) and (3) judging whether the moving object identified in the step (1) is a human palm by using a gesture identification control method. The improved moving object detection algorithm comprises the following steps: 2.1 initializing; 2.2 detecting whether the pixel points are motion points; 2.3 performing kmeans clustering on the motion points; 2.4 area growth; extracting an area; and 2.5, updating the pixel points. The gesture recognition control method comprises the following steps: 3.1 feature selection and model training; 3.2 judging whether the target image is the human palm. The feature selection and model training comprises the following steps: 3.1.1 collecting training data; 3.1.2 selecting sample points from the data to be trained; 3.1.3, calculating the optimal division value of all the sample points; 3.1.4 establishing a random forest corresponding to the sample point based on the optimal division value calculation result.

Description

Vehicle-mounted gesture interaction technology

Technical Field

The invention relates to the field of image recognition and processing, in particular to a vehicle-mounted gesture interaction technology.

Background

With the progress of science and technology, the functions of automobiles are increased day by day, the internal information systems are complicated day by day, and the operation is also complicated for users. The operation of the traditional automobile button and the touch screen requires the use of both eyes and hands, which has an influence on driving safety. Although the voice interaction mode is rapid, the voice recognition is not accurate enough because the noise of the running vehicle is large and the interference is large.

Inside the car, use the gesture to come to interact with the car, be equivalent to traditional car button or pronunciation interactive mode, have fast, accurate, safety, and the advantage that the interference killing feature is strong.

The camera that uses among the traditional on-vehicle gesture interaction technology is the rgb camera, obtains through the colour of the skin of people's hand, but this kind of mode has the limitation, for example the hand of black race, or at dark or night, or the colour of seat in the car all has very big interference to the gesture recognition of rgb camera. The invention adopts the depth camera, detects the moving object based on the basic principle of motion detection, refers to the motion detection algorithm of the traditional rgb camera, and improves on the basis, thereby better detecting the moving object.

The current general scheme of extracting the palm by the depth camera is based on a depth threshold value, namely, if an object is larger than a distance value, the object is discarded. However, in an actual driving process, the hands of the user are usually in the middle or below the steering wheel, but the user cannot be required to raise the handle to a position beyond the steering wheel when using the product, and the motion of the person himself interferes with other moving objects, so that it is difficult to determine whether one moving object is the palm of the person. In the invention, the camera shoots from the top to the bottom of the vehicle, objects which are in motion during driving, such as a steering wheel, a human body, a head and shoulders, can be shot by the camera, and hands of the human can appear at any position of the camera.

Drawings

The attached drawings in the specification show the main steps of the technical scheme.

Fig. 1 shows two parts of the present technical solution in general, providing a vehicle-mounted gesture interaction technology: identifying moving objects using a modified moving object detection algorithm; and then judging whether the recognized moving object is a human palm by using a gesture recognition control method.

Fig. 2 shows the main steps of the improved moving object detection algorithm used in the present solution to identify moving objects.

Fig. 3 shows a main step of the gesture recognition control method used in the present technical solution, which determines whether the recognized moving object is a human palm.

Fig. 4 shows the main steps of feature selection and model training. And

fig. 5 shows the main steps for determining whether the target image is a human palm.

Disclosure of Invention

The invention provides a vehicle-mounted gesture interaction technology, which comprises the following steps: (1) identifying moving objects using a modified moving object detection algorithm; (2) and (3) judging whether the moving object identified in the step (1) is a human palm by using a gesture identification control method.

Wherein the moving object detection algorithm comprises the steps of: initializing; detecting whether the pixel points are motion points or not; performing kmeans clustering on the motion points; growing a region; extracting an area; and updating the pixel points.

The gesture recognition control method comprises the following steps: selecting characteristics and training a model; and judging whether the target image is a human palm.

Further, the gesture recognition control method is a part of the step of updating the pixel points in the moving object detection algorithm, whether the target image is a hand of a person is judged, if the target image is the hand, historical record information of the pixel points is updated, so that the depth information change of the moving points is increased, and if the target image is not the hand, the information is kept unchanged, so that the change of the depth information of the moving points is increased, and the moving pixel points are extracted more effectively next time.

The invention combines the depth camera with the gesture technology, and the depth camera can solve the interference of illumination, skin color and ornaments in the car. The moving object can be better detected by referring to a traditional rgb camera motion detection algorithm and carrying out innovation on the basis of the traditional rgb camera motion detection algorithm. In the invention, the depth camera shoots from top to bottom of the car roof, the gesture recognition control method adopts machine learning random forest training and innovations are carried out in the step of selecting the characteristics of the decision tree, thereby realizing the judgment of whether the target image is a hand or not.

Detailed Description

To further explain the technical scheme, the depth camera is combined with the gesture technology, an improved moving object detection algorithm is used for detecting a moving object, and then a gesture recognition control method is used for judging whether the recognized moving object is a human palm, and the specific implementation mode is explained below by combining the accompanying drawings of the specification.

Further, the improved moving object detection algorithm is characterized in that in the step 2.1, continuous depth maps of dozens of frames are obtained through a depth camera, and a historical record library is created for each pixel point.

Further, the improved moving object detection algorithm is characterized in that, in step 2.2, for each pixel point, the image obtained by each frame of camera is detected to determine whether the pixel point is a moving pixel point, and specifically includes the following steps: 2.2.1. setting a counter a to 0; 2.2.2. calculating the difference between the current depth value of the pixel point and the depth value in the history record base, and if the difference value is greater than a certain set threshold value, adding 1 to the counter a; 2.2.3. and (3) after step 2.2.2 is carried out on each historical record of the pixel point in the historical record library, if the value of the counter a is greater than a threshold value, setting the pixel point as a motion point.

Preferably, the certain set threshold is not uniquely fixed, and can be adjusted according to actual needs; the threshold value compared with the value of the counter a is not only fixed, but can be adjusted according to actual needs.

Further, the improved moving object detection algorithm is characterized in that after all the moving points are obtained, step 2.3 is executed, and kmeans clustering is performed on all the moving points, and the method specifically comprises the following steps: 2.3.1 randomly selecting a part of pixel points from all the pixel points as an initial clustering center; 2.3.2 for the rest other pixel points, respectively allocating the rest other pixel points to the most similar clusters according to the similarity (distance) between the rest other pixel points and the cluster centers of 2.3.1; 2.3.3 recalculating the cluster center of each obtained new cluster, namely calculating the mean value of all objects in the new cluster; 2.3.4 calculating a standard degree function, if a certain condition is met, if the function is converged, terminating the algorithm, otherwise, recursively executing the steps 2.3.2, 2.3.3 and 2.3.4 to obtain some categories; 2.3.5 step 2.3.4 said categories, each category has a pixel center point and its corresponding motion pixel point, a threshold for the number of category elements is set, categories which do not reach the threshold are removed, and then a category serial number is assigned to each motion point.

Preferably, the threshold of the number of the category elements is not uniquely fixed, and can be adjusted according to actual needs.

Further, the improved moving object detection algorithm is characterized in that after the category is obtained, step 2.4 is executed to perform region growing, and specifically includes the following steps: 2.4.1 comparing the depth value which is detected as the motion point with a new nearby pixel point to be detected, if the depth difference value of the two is less than a set threshold value, the new pixel point to be detected is similar to the pixel point which is detected as the motion point, and therefore the new pixel point is set as the motion point; 2.4.2 according to step 2.4.1, if the new pixel point is judged as a new motion point in both categories, the attributes of the two categories are similar, so that the two categories are merged, and the category serial number is set as the same category serial number until all the motion point detection is completed.

Preferably, the set threshold value compared with the depth difference value is not only fixed, and can be adjusted according to actual needs.

Further, the improved moving object detection algorithm is characterized in that after all the moving points are detected, whether the picture extracted in step 2.5 is a hand is judged, if the picture is judged to be the hand, the historical record information of the pixel points is updated, so that the depth information change of the moving points is increased, and if the picture is judged not to be the hand, the information is kept unchanged.

Fig. 3 shows a main step of the gesture recognition control method used in the present technical solution, which determines whether the recognized moving object is a human palm. The method is characterized by comprising the following steps: 3.1 feature selection and model training; 3.2 judging whether the target image is the human palm.

Further, the step 3.1 comprises the following steps: 3.1.2 selecting sample points from the data to be trained; 3.1.3, calculating the optimal division value of all the sample points; 3.1.4 establishing a random forest corresponding to the sample point based on the optimal division value calculation result.

Further, before step 3.1.2, step 3.1.1: the image to be trained is obtained by means of at least two cameras, wherein at least one camera is a depth camera and at least one camera is an rgb camera. This step is intended to collect training data.

Preferably, the camera is shot from top to bottom of the roof, and the recorded person wears blue gloves with both hands, and can freely make various gestures and actions including actions during driving, steering wheel hitting, hand brake hitting and the like in the middle of the car. Through extracting the position that utilizes rgb camera blue pixel to obtain the region that corresponds degree of depth camera hand, thereby realize the data annotation of palm.

Further, the gesture recognition control method is characterized in that, in the step 3.1.2, the palm portion in the depth map is used as a positive sample, the non-palm portion is used as a negative sample, and the same number of pixel points are randomly selected from the positive sample portion and the negative sample portion to be used as sample points to be trained.

Further, the gesture recognition control method is characterized in that the calculation of the optimal division value in the step 3.1.3 includes the following steps: 3.1.3.1, calculating the depth mean value of each neighborhood of the sample point; 3.1.3.2 calculating the difference between the depth mean value of each neighborhood of the sample point and the depth value of the sample point; 3.1.3.3 calculating information entropy; 3.3.4 obtaining the optimal division value.

Further, the gesture recognition control method is characterized in that the depth mean calculation in the step 3.1.3.1 includes the following steps: 3.1.3.1.1 randomly selecting a certain sample point P; 3.1.3.1.2 calculating the average of the depth values of a square neighborhood centered at P; 3.1.3.1.3 the average of the depth values of the neighborhood of all sample points is calculated based on the calculation method of step 3.1.3.1.2.

Further, the gesture recognition control method is characterized in that, in the step 3.1.3.1.2, the average value calculation method of the depth values of the square neighborhood with P as the center is as follows: the size of the neighborhood of the point P is 3, 5, 7, 9. cndot. 2n +1 in sequence, n is the number of pixels on one side of the square neighborhood, and P is the central point of the square neighborhood.

Further, the gesture recognition control method is characterized in that the information entropy calculation step in step 3.1.3.3 is: 3.1.3.3.1 dividing all the positive and negative sample points into several equal parts, each part contains positive and negative sample points with the same proportion; 3.1.3.3.2 for all sample points in an equal portion, when the neighborhood is 3, there is a difference d from the depth mean of the 3 neighborhoods; 3.1.3.3.3 each difference d may divide the differences into two parts, one larger than d and one smaller than d; 3.1.3.3.4, obtaining the final information entropy s according to the calculation formula of the information entropy; 3.1.3.3.5 based on the calculation method of steps 3.1.3.1.2-3.1.3.1.4, the corresponding information entropy can be obtained when the neighborhood is 5, 7, 9. cndot. 2n + 1.

Preferably, the information entropy is defined as: in the source, not the uncertainty of a single symbol occurrence is considered, but the average uncertainty of all possible occurrences of the source. If the information source symbol has n values: u shape₁…U_i…U_nThe corresponding probability is: p₁…P_i…P_nAnd the various symbols appear independently of each other. At this time, the average uncertainty of the source should be the statistical average (E) of the single symbol uncertainty-log Pi, which can be called information entropy, i.e., the entropyWhere the logarithm is typically taken to be base 2 and the units are bits. However, other logarithmic bases can be selected, and other corresponding units can be usedThe space can be converted by a bottom-changing formula.

Further, for example, for 100 sample points in the 3 neighborhoods, 100 difference values may be calculated and recorded as (d1, d2, d3 …, d100), k values (0 < k < 100) are randomly selected from the 100 difference values, and for each selected difference value as a division value, according to the definition of entropy, a score of entropy that divides the difference values by the division value may be obtained and recorded as S. (assuming that after division, 30 points on the left and 70 points on the right, the entropy of the information is calculated as S-1 (0.3 log (0.3) +0.7 log (0.7)).

Further, the step of obtaining the optimal score value in step 3.1.3.4 is: 3.1.3.4.1 for the 3 neighborhoods, selecting the largest one of the information entropy values corresponding to all the sample points as S3, and recording the score of the 3 neighborhoods, and recording the D value of the division at this time as D3; 3.1.3.4.2 based on step 3.1.3.4.1, it can be found that when the neighborhood is 5, the score is S5 and the value of D is D5; when the neighborhood is 7, the score is S7, and the value of D is D7; when the neighborhood is 2n +1, the score is S (2n +1), and the value of D is D (2n + 1); 3.1.3.4.3, selecting the S value with the maximum score as Sm, the corresponding neighborhood as the optimal neighborhood as m, and the corresponding d as the optimal division value as Dm.

Further, in step 3.1.4, a random forest corresponding to the sample point is established by: 3.1.4.1 constructing a decision tree based on the optimal partition neighborhood and the optimal partition value; 3.1.4.2 building the random forest based on the decision tree.

Preferably, each decision tree is a binary pair, one decision forest is composed of a plurality of decision trees, and each decision tree can be trained by using the extracted pixel points or by using different pixel points.

Further, in the step 3.1.4.1, the step of constructing a decision tree includes: 3.1.4.1.1 storing the optimal neighborhood m and optimal division value Dm obtained in step 3.1.3.4.3 in the root node of a decision tree, (m, Dm); 3.1.4.1.2 dividing an aliquot of sample points into two parts based on the optimal partition values stored on the root nodes, the left side being points where d is greater than Dm and the right side being points where d is less than Dm; 3.1.4.1.3 recursively execute the contents of steps 3.1.3.3 and 3.1.3.4 and 3.1.4.1 for the left and right part points until the left and right subtrees have categories with only positive or only negative examples, or the maximum depth of the tree is reached; 3.1.4.1.4, when the maximum depth is reached, the leaf nodes store the number of positive and negative sample points, thus forming a decision tree.

Further, in step 3.1.4.2, a decision tree can be constructed for each sample point of the equal parts, and a random forest can be formed based on the decision tree.

Preferably, the step feature selection and the model training are completed offline, and those skilled in the art understand that when the model training is completed and the target image needs to be predicted each time, the model training is not required to be performed again, but the judgment is performed based on the result of the model training.

Further, in the step 3.2, it is determined whether the target image is a human palm, and the method may further include the following steps: 3.2.1 judging whether each point of the target image is one point of the human palm based on the random forest obtained by the model training; 3.2.2 judging whether the target image is a human palm or not based on the judgment result in the step 3.2.1.

Further, in the step 3.2.1, the following steps can be further divided: 3.2.1.1 calculate depth difference: for each pixel point, finding a decision tree, calculating the depth mean value of the optimal neighborhood m stored on the root node of the decision tree, and calculating the difference value of the depth mean value and the depth value of the pixel point; 3.2.1.2 recurse the decision tree: comparing the difference calculated based on the step 3.2.1.1 with the optimal division value Dm stored by the node, if the difference is smaller than the Dm, performing left branch recursion and larger than the Dm, performing right branch recursion, and performing recursion sequentially until the recursion reaches a leaf node, wherein the leaf node stores the number of positive and negative samples; 3.2.1.3 counting the positive and negative sample numbers of all trees of the point in the random forest and judging: for the pixel point, the positive and negative sample numbers of all trees of the point in the random forest can be obtained through statistics, and if the total positive sample number is larger than the negative sample number, the pixel point is a hand; if the total number of positive samples is less than the number of negative samples, the pixel is not a hand.

Further, the method for determining whether the target image is a human hand in step 3.2.2 includes, after step 3.2.1 is performed on each point in the target image, counting the number of pixel points predicted to be a hand and the number of pixel points not predicted to be a hand, and if the number of pixel points predicted to be a hand is greater than that predicted to be not a hand, determining that the target image is a hand.

Preferably, for the pixel point judged to be a hand, the history information in the history database is updated, and for the pixel point judged not to be a hand, the record in the history database is kept unchanged, so that the depth information change of the motion point is increased, and the motion pixel point is extracted more effectively next time.

The above is the concrete implementation mode of the technical scheme, and the technical scheme can solve the problem of interaction between a driver and an automobile in the current automobile driving process.

Claims

1. A vehicle-mounted gesture interaction control method is characterized by comprising the following steps:

(1) identifying moving objects using a modified moving object detection algorithm;

(2) and (3) judging whether the moving object identified in the step (1) is a human palm by using a gesture identification control method.

2. Control method according to claim 1, characterized in that said step for identifying moving objects comprises the steps of:

1.1 initializing;

1.2 detecting whether the pixel points are motion points or not;

1.3 performing kmeans clustering on the motion points;

1.4 area growth;

1.5 extracting the region;

1.6 updating the pixel points.

3. The control method according to claim 2, wherein the step of detecting whether each pixel point is a moving pixel point or not is performed on the image acquired by each camera in the 2.2 nd step, specifically includes the steps of:

2.2.1. setting a counter a to 0;

2.2.2. calculating the difference between the current depth value of the pixel point and the depth value in the history record base, and if the difference value is greater than a certain set threshold value, adding 1 to the counter a;

2.2.3. and (3) after step 2.2.2 is carried out on each historical record of the pixel point in the historical record library, if the value of the counter a is greater than a threshold value, setting the pixel point as a motion point.

4. The control method according to any one of claims 2 to 5, wherein after all the motion points are obtained, step 2.3 is performed to perform kmeans clustering on all the motion points, and the method specifically comprises the following steps:

2.3.1 randomly selecting a part of pixel points from all the pixel points as an initial clustering center;

2.3.2 for the rest other pixel points, respectively allocating the rest other pixel points to the most similar clusters according to the similarity (distance) between the rest other pixel points and the cluster centers of 2.3.1;

2.3.3 recalculating the cluster center of each obtained new cluster, namely calculating the mean value of all objects in the new cluster;

2.3.4 calculating a standard degree function, if a certain condition is met, if the function is converged, terminating the algorithm, otherwise, recursively executing the steps 2.3.2, 2.3.3 and 2.3.4 to obtain some categories;

2.3.5 step 2.3.4 said categories, each category has a pixel center point and its corresponding motion pixel point, a threshold for the number of category elements is set, categories which do not reach the threshold are removed, and then a category serial number is assigned to each motion point.

5. The control method according to any one of claims 2 to 6, wherein after obtaining the category, step 2.4 is performed to perform region growing, and specifically includes the following steps:

2.4.1 comparing the depth value which is detected as the motion point with a new nearby pixel point to be detected, if the depth difference value of the two is less than a set threshold value, the new pixel point to be detected is similar to the pixel point which is detected as the motion point, and therefore the new pixel point is set as the motion point;

2.4.2 according to step 2.4.1, if the new pixel point is judged as a new motion point in both categories, the attributes of the two categories are similar, so that the two categories are merged, and the category serial number is set as the same category serial number until all the motion point detection is completed.

6. The method according to any one of claims 1 or 2, wherein it is determined whether the picture extracted in step 2.5 is a human hand, and if it is determined to be a hand, the recorded information of the pixel point history is updated so as to increase the depth information change of the motion point, and if it is determined not to be a hand, the information is kept unchanged.

7. The control method according to any one of claims 1 to 6, characterized in that the step of identifying whether the moving object is a human palm comprises the steps of:

3.1 feature selection and model training;

3.2 judging whether the target image is the human palm.

8. Control method according to any of claims 1 to 7, characterized in that said step 3.1 comprises the steps of:

3.1.2 selecting sample points from the data to be trained;

3.1.3, calculating the optimal division value of all the sample points;

3.1.4 establishing a random forest corresponding to the sample point based on the optimal division value calculation result.

9. Control method according to any of claims 1 to 8, characterized in that it comprises, before said step 3.1.2, the following steps:

3.1.1 collecting training data, obtaining images to be trained by at least two cameras, wherein at least one camera is a depth camera and at least one camera is an rgb camera.

10. The control method according to any one of claims 1 to 8, characterized in that in step 3.1.2, the palm portion in the depth map is taken as a positive sample, the non-palm portion is taken as a negative sample, and the same number of pixel points are randomly selected from the positive sample portion and the negative sample portion as sample points to be trained.

11. Control method according to any of claims 1 to 8, characterized in that the optimal partition value calculation in step 3.1.3 comprises the steps of:

3.1.3.1, calculating the depth mean value of each neighborhood of the sample point;

3.1.3.2 calculating the difference between the depth mean value of each neighborhood of the sample point and the depth value of the sample point;

3.1.3.3 calculating information entropy;

3.1.3.4 obtain the optimal division value.

12. Control method according to claim 11, characterized in that the depth mean calculation in step 3.1.3.1 comprises the following steps:

3.1.3.1.1 randomly selecting a certain sample point P;

3.1.3.1.2 calculating the average of the depth values of a square neighborhood centered at P;

3.1.3.1.3 calculating the average of the depth values of the neighborhood of all sample points based on the calculation method of step 3.3.1.2.

13. The method according to claim 12, wherein the average value of the depth values in the neighborhood of the square centered at P in step 3.1.3.1.2 is calculated by:

the size of the neighborhood of the point P is 3, 5, 7, 9. cndot. 2n +1 in sequence, n is the number of pixels on one side of the square neighborhood, and P is the central point of the square neighborhood.

14. The control method according to any one of claims 12 to 13, wherein the information entropy calculation step in step 3.1.3.3 is:

3.1.3.3.1 dividing all the positive and negative sample points into several equal parts, each part contains positive and negative sample points with the same proportion;

3.1.3.3.2 for all sample points in an equal portion, when the neighborhood is 3, there is a difference d from the depth mean of the 3 neighborhoods;

3.1.3.3.3 each difference d may divide the differences into two parts, one larger than d and one smaller than d;

3.1.3.3.4, obtaining the final information entropy s according to the calculation formula of the information entropy;

3.1.3.3.5 based on the calculation method of steps 3.1.3.3.2-3.1.3.3.4, the corresponding information entropy can be obtained when the neighborhood is 5, 7, 9. cndot. 2n + 1.

15. The control method according to any one of claims 12 to 14, wherein the step of obtaining the optimal division value in step 3.1.3.4 is:

3.1.3.4.1 for the 3 neighborhoods, selecting the largest one of the information entropy values corresponding to all the sample points as S3, and recording the score of the 3 neighborhoods, and recording the D value of the division at this time as D3;

3.1.3.4.2 based on step 3.1.3.4.1, it can be found that when the neighborhood is 5, the score is S5 and the value of D is D5; when the neighborhood is 7, the score is S7, and the value of D is D7; when the neighborhood is 2n +1, the score is S (2n +1), and the value of D is D (2n + 1);

3.1.3.4.3, selecting the S value with the maximum score as Sm, the corresponding neighborhood as the optimal neighborhood as m, and the corresponding d as the optimal division value as Dm.

16. A control method according to any one of claims 7 to 16, characterized in that in step 3.1.4 a random forest is established corresponding to the sample points by:

3.1.4.1 constructing a decision tree based on the optimal partition neighborhood and the optimal partition value;

3.1.4.2 building the random forest based on the decision tree.

17. A control method according to claim 16, wherein in said step 3.1.4.1, the step of constructing a decision tree comprises:

3.1.4.1.1 storing the optimal neighborhood m and optimal division value Dm obtained in step 3.1.3.4.3 in the root node of a decision tree, (m, Dm);

3.1.4.1.2 dividing an aliquot of sample points into two parts based on the optimal partition values stored on the root nodes, the left side being points where d is greater than Dm and the right side being points where d is less than Dm;

3.1.4.1.3 recursively execute the contents of steps 3.1.3.3 and 3.1.3.4 and 3.1.4.1 for the left and right part points until the left and right subtrees have categories with only positive or only negative examples, or the maximum depth of the tree is reached;

3.1.4.1.4, when the maximum depth is reached, the leaf nodes store the number of positive and negative sample points, thus forming a decision tree.

18. The control method according to claim 16, wherein in step 3.1.4.2, a decision tree is constructed for each sample point of the equal parts, based on which a random forest can be formed, and model training is completed.

19. The algorithm according to claim 7, characterized in that said step 3.2 comprises in particular the steps of:

3.2.1 judging whether each point of the target image is one point of the human palm based on the random forest obtained by the model training;

3.2.2 judging whether the target image is a human palm or not based on the judgment result in the step 3.2.1.

20. The control method according to claim 19, wherein in the step 3.2.1, the step of determining whether each point of the target image is a point of the human palm comprises:

3.2.1.1 calculate depth difference:

for each pixel point, finding a decision tree in a training model, calculating the depth mean value of the optimal neighborhood m stored on the root node of the decision tree, and calculating the difference value between the depth mean value and the depth value of the pixel point;

3.2.1.2 recurse the decision tree:

comparing the difference calculated based on the step 3.2.1.1 with the optimal division value Dm stored by the node, if the difference is smaller than the Dm, performing left branch recursion and larger than the Dm, performing right branch recursion, and performing recursion sequentially until the recursion reaches a leaf node to obtain the number of positive and negative samples stored on the leaf node;

3.2.1.3 counting the positive and negative sample numbers of all trees of the point in the random forest and judging:

for the pixel point, the positive and negative sample numbers of all trees of the point in the random forest can be obtained through statistics, and if the total positive sample number is larger than the negative sample number, the pixel point is a hand; if the total number of positive samples is less than the number of negative samples, the pixel is not a hand.

21. The control method according to any one of claims 7 to 20, wherein the step 3.2.2 is a step of determining whether the target image is a human hand by counting the number of pixel points predicted to be a hand and the number of pixel points not predicted to be a hand after the step 3.2.1 is performed for each point in the target image, and if the number of pixel points predicted to be a hand is larger than that predicted to be not a hand, the target image is a hand.

22. The camera of claim 3 or 9 is a depth camera.

23. The algorithm according to any one of claims 1 to 3, characterized in that a depth camera is used to obtain continuous depth maps of tens of frames, and a historical record library is created for each pixel point.