CN114035687B

CN114035687B - Gesture recognition method and system based on virtual reality

Info

Publication number: CN114035687B
Application number: CN202111336108.4A
Authority: CN
Inventors: 王瑞娟; 王灏; 陈慧民
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2023-07-25
Anticipated expiration: 2041-11-12
Also published as: CN114035687A

Abstract

The invention relates to the technical field of artificial intelligence and virtual reality equipment, in particular to a gesture recognition method and system based on virtual reality. The method comprises the following steps: evaluating the initial segmentation frame number and the initial forgetting coefficient according to the difference degree of the predicted gesture action category and the real gesture action category to obtain a comprehensive evaluation index; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient, and increasing and decreasing the probability of the correction direction; selecting an initial segmentation frame number or an initial forgetting coefficient to carry out increasing or decreasing correction according to the correction probability and the probability of the correction direction; and stopping correction when the comprehensive evaluation index tends to be stable, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition. The gesture recognition method and the gesture recognition system improve the accuracy of gesture recognition and the generalization capability of the gesture recognition network, and reduce misjudgment of gesture actions caused by self factors of users.

Description

Gesture recognition method and system based on virtual reality

Technical Field

The invention relates to the technical field of artificial intelligence and virtual reality equipment, in particular to a gesture recognition method and system based on virtual reality.

Background

Along with the development of technology, related technologies such as virtual reality, man-machine interaction, image recognition and the like are continuously improved, and the demands of various industries for accurate gesture recognition are increasing. The method is mainly used for various aspects of intelligent home control, vehicle-mounted operation control, PC and mobile terminal control, industrial design and the like, and the commercial value of the method is also increased day by day.

The gesture recognition methods in the prior art are also various, and can be mainly divided into three technologies: the three methods are different from each other in terms of the image recognition technology based on the optical technology, the motion capture technology based on the inertial sensor and the hand form simulation technology based on the mechanical structure. In order to improve the accuracy of gesture recognition, the accuracy of subsequent gesture recognition is generally improved by improving the accuracy of the acquired hand information on the basis of the method, for example, the gesture tracking method of the VR headset and the VR headset of CN106648103A, and the accuracy of recognition is improved by fusing three-dimensional characteristic information of hands.

The problem in the prior art is that gesture behavior habits of different device users are different, but the gesture behavior habits of the device users are not considered in the prior art, and the gesture recognition network is difficult to generalize to a specific device user only by means of a sample of a training set, so that the gesture recognition accuracy is difficult to improve. In the gesture recognition process, the VR device cannot render the scene to which the instruction information obtained by the gesture recognition result is applied in advance, so that the use experience of the user is degraded.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a gesture recognition method and system based on virtual reality, and the adopted technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a gesture recognition method based on virtual reality:

predicting the historical gesture track information to obtain predicted gesture track information at a predicted moment, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;

obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted;

and continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.

Preferably, the confidence evaluation index is specifically:

wherein beta is _n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.

Preferably, the correction probability of the initial segmentation frame number is specifically: labeling the confidence evaluation indexes, wherein the label represents the time sequence of the confidence evaluation indexes; performing linear fitting on the marked confidence evaluation indexes, wherein the ordinate of a fitted straight line is the confidence evaluation index, and the abscissa is the mark of the confidence evaluation index, so as to obtain the slope of the fitted straight line; giving initial segmentation frame number correction probability, and obtaining the initial segmentation frame number correction probability by using the slope of the straight line:

wherein G is _M ' correction probability for initial segmentation frame number, k is slope of fitting straight line, G _M The probability is modified for a given initial segmentation frame number.

Preferably, the correction probability of the initial forgetting coefficient is specifically:

G _α ＝1-G _M ′

wherein G is _α Representing an initial forgetting coefficient correction probability; g _M ' denotes an initial segmentation frame number correction probability.

Preferably, the validity of the correction is judged according to a validity evaluation index, and the validity evaluation index is:

wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is a comprehensive evaluation index when the segmentation frame number and the forgetting coefficient are not corrected, and beta' is a corrected comprehensive evaluation index; k is the slope of the fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k' is the slope of the corrected fitted straight line.

Preferably, updating the probability of the correction direction according to the validity of the correction includes: multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating.

Preferably, the optimal segmentation frame number and forgetting coefficient acquisition includes: and constructing a coefficient prediction network, inputting a hand motion depth image of a continuous fixed frame, and outputting an optimal segmentation frame number and forgetting coefficient.

In a second aspect, another embodiment of the present invention provides a gesture recognition system based on virtual reality. The system comprises: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;

the coefficient correction module is used for obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the rising or falling trend of the confidence coefficient evaluation index time sequence value at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted; continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable;

the gesture recognition module is used for obtaining the optimal segmentation frame number and forgetting coefficient when correction is stopped, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.

Preferably, the coefficient correction module is further configured to obtain a confidence coefficient evaluation index, specifically:

wherein beta is _n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents a gesture action classification knotThe number of fruit classification categories;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.

Preferably, the coefficient correction module is further configured to determine the validity of the correction according to a validity evaluation index, where the validity evaluation index is:

The embodiment of the invention has at least the following beneficial effects: predicting by combining the TCN network with the track information of the historical gesture to obtain predicted gesture track information; optimizing the initial segmentation frame number and the initial forgetting coefficient according to the difference degree of the real gesture classification category and the predicted gesture classification category, so that the initial segmentation frame number and the initial forgetting coefficient are more in line with the gesture habit of the current user, and predicted gesture track information which is more similar to the real gesture track information is obtained; and further improves the accuracy of gesture recognition and the generalization capability of a gesture recognition network, and reduces misjudgment of gesture actions caused by self factors of users.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific implementation, structure, features and effects of a gesture recognition method and system based on virtual reality according to the present invention, which are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of a gesture recognition method and system based on virtual reality provided by the invention with reference to the accompanying drawings.

Example 1

The main application scene of the invention is as follows: the virtual reality scene is provided with an RGB-D camera which can collect hand information of a user, wherein the hand information comprises RGB images and depth information, and the default camera pose is fixed without considering the influence of factors such as parallax, shielding and the like; identifying and classifying hand actions according to the collected hand action information, so as to obtain instruction information represented by gesture action categories; and realizing corresponding functions according to the instruction information.

Referring to fig. 1, a flowchart of a method according to an embodiment of the present invention is shown, the method includes the following steps:

firstly, predicting and obtaining predicted gesture track information at a predicted time by utilizing historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted time; the method comprises the steps of acquiring historical gesture track information and real gesture track information: and superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information.

And acquiring an RGB-D image of the hand of the user through an RGB-D camera, processing the RGB-D image of the hand of the user through a hand key point detection network, and outputting a thermodynamic diagram of the key point of the hand of the user. The hand key point detection network is constructed by adopting an encoder-decoder framework, network input is a user hand RGB-D image, network output is a multi-channel user hand key point thermodynamic diagram, the number of thermodynamic diagram channels is consistent with the number of hand key point categories, and the input and output image sizes are consistent; the training set adopts a plurality of RGB-D images which are acquired aiming at different hand postures, the labels are hot spots which are generated by Gaussian blur at the centers of key points of the hands, the label categories are 21 bone node categories which are commonly used, and the loss function adopts a mean square error loss function. And acquiring hand key point information through the RGB-D image of the hand of the user acquired by the RGB-D camera, so that the gesture recognition is conveniently carried out by a subsequent gesture recognition network.

Building a gesture recognition network: the gesture recognition network architecture is formed by coupling a TCN network and a gesture classification network. The TCN network specifically includes: in the TCN network of this embodiment, a causal convolution manner is adopted, and the sliding window length suggestion is set to be odd, and preferably, the sliding window length is set to be 3 in this embodiment; the causal convolution predicts subsequent information by processing historical information. The input of the TCN network, namely the input of the gesture recognition network, is gesture track information on a plurality of continuous time sequences, the single gesture track information is a single element of the input sequence, and the single element is obtained by carrying out thermal superposition on the continuous M-frame hand key point thermodynamic diagram split channels. The thermodynamic superposition mode specifically comprises the following steps: for single-channel continuous multi-frame hand key point thermodynamic diagrams, point-by-point accumulation is carried out by adopting forgetting coefficients, a specific formula is X' = (1-alpha) x+alpha X, wherein alpha is an initial forgetting coefficient, the ratio of thermal value retention of historical accumulation is controlled, correspondingly, (1-alpha) is the ratio of current thermal value retention, X is the current thermal value, namely the pixel value of each pixel position of the current thermodynamic diagram, and X is the thermal value of the historical accumulation, namely the pixel value of each pixel position in the result diagram of thermal superposition of all frames before the current frame.

Acquiring continuous multi-frame RGB-D images, acquiring continuous multi-frame hand key point thermodynamic diagrams through a hand key point detection network, segmenting by using an initial segmentation frame number M, overlapping hand key point thermodynamic diagrams acquired after segmentation, acquiring a plurality of gesture track information on a continuous time sequence, selecting a fixed number of track information as a single training sample through a sliding window, and taking k as a single training sample ₁ ，k ₂ ，k ₃ For example, the corresponding tag data is k ₄ Wherein k is ₁ ，k ₂ ，k ₃ Representing historical gesture track information, k ₄ For predicting gesture track information, namely, first predicted gesture track information after input historical gesture track information; and acquiring a plurality of groups of training samples and corresponding tag data by carrying out the processing process on different gestures of different users, wherein the loss function also adopts a mean square error loss function. After the TCN network training is completed, inputting continuous time sequence historical gesture track information, outputting predicted gesture track information, and simultaneously carrying out segmentation and superposition on continuous multi-frame hand key point thermodynamic diagrams according to the initial segmentation frame number M and the initial forgetting coefficient alpha to obtain real gesture track information corresponding to the predicted gesture track information at the predicted moment.

Then, obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; and obtaining the comprehensive evaluation index after the adjustment of the coefficient to be corrected.

The gesture classification network specifically comprises: with the single gesture track information as a training sample, the tag data is set to be a real gesture action category corresponding to the gesture track information, and the category can be set by an implementer according to actual situations, for example, the gesture action category includes: click, move, rotate, zoom, return, etc.; the loss function adopts a cross entropy loss function; after the gesture classification network training is completed, the predicted gesture track information and the real gesture track information are respectively input, and the corresponding predicted gesture action category and real gesture action category are output.

The TCN network and the gesture classification network are pre-trained, so that the TCN network and the gesture classification network can be normally used; the gesture recognition network adopts the fixed frame number M to split and adopts the fixed forgetting coefficient alpha to overlap, so that the gesture recognition network can not adapt to the specific gesture action speed of a user, and therefore the gesture recognition network is optimized, the initial splitting frame number M and the initial forgetting coefficient alpha are mainly optimized, the fit degree of a network input sample and the user is improved, and the accuracy of the gesture recognition network is further improved.

For an RGB-D hand image which is actually acquired, a hand key point thermodynamic diagram is obtained through a hand key point detection network, segmentation is carried out according to an initial segmentation frame number M, superposition is carried out according to a forgetting coefficient alpha, historical gesture track information is obtained, prediction is started according to the historical gesture track information, and predicted gesture track information is obtained. When the prediction time period reaches the set length, the number of the prediction time points in the prediction time period is N, a confidence evaluation index is obtained by using the comparison result of the action types of the predicted gesture track information and the real gesture track information, the initial segmentation frame number and the initial forgetting coefficient are evaluated, and the initial segmentation frame number and the initial forgetting coefficient are corrected according to the evaluation value; the confidence evaluation index of each prediction time is as follows:

wherein beta is _n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />And the confidence that the predicted gesture track information action category of the nth predicted time belongs to the c-th classification category is represented. It should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.

The comprehensive evaluation indexes of the segmentation frame number M and the forgetting coefficient alpha are as follows:

where N is the number of predicted times in the prediction period.

The larger the confidence evaluation index value is, the larger the confidence difference between the predicted gesture classification category and the real gesture classification category is, because the predicted network and the classification network are pre-trained, the default is that the predicted network and the classification network have no misjudgment problem when in use, and the reason for the larger confidence difference between the predicted gesture classification category and the real gesture classification category is mainly that the initial segmentation frame number M is unreasonable or the initial forgetting coefficient alpha is unreasonable, so that the predicted gesture track cannot be consistent with the real gesture track information, namely the confidence difference between the predicted gesture classification category and the real gesture classification category is larger; at this time, the initial segmentation frame number M or the initial forgetting coefficient α is optimized, specifically: the initial segmentation frame number M is traversed, the traversing range M is (+ -) Deltaepsilon, the traversing step length is 1, deltaepsilon is the segmentation frame number adjusting value, and is preferably set as in the embodimentThe optimizing range of the initial forgetting coefficient alpha is [0,1]The traversal step is 0.02.

The correction probability G of the initial segmentation frame number M needs to be given in the optimizing process _M Preferably G _M The initial value of (2) is 1/2. For confidence evaluation index beta _n The confidence evaluation index is marked with a sign indicating the order in time of the confidence evaluation index, and the confidence evaluation index beta is placed _n Performing linear fitting, wherein the abscissa of the fitted straight line is the label of the confidence evaluation index, the ordinate is the confidence evaluation index, and the slope k of the fitted straight line is obtained to obtain an adjustment function of the correction probability of the initial segmentation frame number:

the adjustment function tau is used for correcting the correction probability of the initial segmentation frame number, namely, the larger the slope is, the larger the difference between the predicted gesture track information and the real gesture track information is, and at the moment, the more likely the initial segmentation frame number M is unreasonably caused, so that tau is closer to 2; correction probability G for a given initial segmentation frame number _M Taking the product of the initial segmentation frame number and the adjustment function tau as the initial segmentation frame number correction probability, so that the probability of correcting the initial segmentation frame number is larger; when the slope is close to 0, whether the initial segmentation frame number or the initial forgetting coefficient is unreasonable is difficult to judge, so that the correction probability of the initial segmentation frame number and the initial forgetting coefficient is approximate; the smaller the slope, the smaller the difference between the predicted gesture track information and the real gesture track information, the more likely the initial forgetting coefficient is unreasonable to cause the prediction to appear difference, but with the change of the input history track information, the difference is gradually reduced, the closer τ is to 0, the given initial segmentation frame number G _M The product of the initial segmentation frame number M and the adjustment function tau is used as the correction probability of the initial segmentation frame number M, so that the probability of the initial segmentation frame number being adjusted is smaller. The correction probability of the initial segmentation frame number is:

accordingly, because the correction probability of the initial segmentation frame number and the initial forgetting coefficient is a complete probability distribution, the correction probability of the initial forgetting coefficient is:

G _α ＝1-G′ _M

wherein G' _M G represents the correction probability of the initial segmentation frame number _α G represents the correction probability of the initial forgetting coefficient _M To set the correction probability of the initial segmentation frame number, τ is an adjustment function. Correction probability G 'based on initial segmentation frame number' _M And randomly selecting the initial segmentation frame number or the initial forgetting coefficient as the coefficient to be corrected for correction according to the correction probability of the initial forgetting coefficient.

When the coefficient to be corrected is corrected, the random correction direction is adopted for traversing, namely the probability G that the increasing direction and the decreasing direction also exist _u And G _d Initial values are 1/2, and in order to improve the traversing speed, an effectiveness index for judging the effectiveness of correction when the initial segmentation frame number or the initial forgetting coefficient is selected for correction is setFor a one-time traversal result, acquiring a comprehensive evaluation index beta 'and a fitting straight line slope k' after the coefficient to be corrected is corrected, wherein the effectiveness evaluation index of the coefficient correction is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,in order to influence the comprehensive evaluation index in the effectiveness evaluation index, when the corrected comprehensive evaluation index beta ' is smaller than the initial comprehensive evaluation index beta, beta ' -beta is negative, which indicates that the corrected comprehensive evaluation index beta ' is evaluated better and the correction is effective, and the correction is performed more slowly>Greater than 1, and the smaller the β' - β, +.>The closer to 2; />For the slope influence of the fitting straight line in the effectiveness evaluation index, (|k| -k' |) represents the degree of the straight line approaching the level, the larger the value is, the closer the fitting straight line approaches the level after coefficient correction is explained, and the correction is effective, the +|>The closer to 2.

Multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating:

g' is the probability of the correction direction when the correction is performed again, and G is the probability of the correction direction when the correction is performed currently; g' and G may be probabilities of increasing the direction or decreasing the direction.

If the correction direction is the increasing direction, the probability value of the increasing direction after adjustment is G' _u The probability value of the other unadjusted traversing direction is still G _d For G' _u And G _d And normalizing to obtain the probability value of increasing or decreasing direction during the re-correction.

And finally, continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.

By adopting the optimization method, the number of the segmentation frames and the forgetting coefficient corresponding to the lowest comprehensive evaluation index are used as the optimal number of the segmentation frames and the forgetting coefficient until the comprehensive evaluation index converges or the comprehensive evaluation index cannot reach lower after a period of time.

In actual use, a coefficient prediction network is constructed, a continuous fixed frame number of hand action RGB-D image is used as coefficient prediction network input, a TCN-FC structure is adopted in a network architecture, and an optimal segmentation frame number regression value and an optimal forgetting coefficient regression value are output by the network; the network training specifically comprises the following steps: the method comprises the steps of processing acquired optimal segmentation frame numbers and optimal forgetting coefficients to serve as tag data by adopting continuous fixed frame number hand action RGB-D images acquired by a plurality of different users when VR equipment operation is carried out, and adopting mean square error as a loss function.

Collecting RGB-D images of a user with continuous fixed frames, sending the images into a parameter prediction network, and outputting an optimal segmentation frame number regression value and an optimal forgetting coefficient regression value; and performing segmentation and superposition processing on the RGB-D image based on the optimal segmentation frame number regression value and the optimal forgetting coefficient regression value to obtain a plurality of track information graphs, generating a prediction track by the track information graphs through a TCN network, and performing prediction and recognition of hand actions by combining a classification network.

Example 2

The present embodiment provides a system embodiment. A virtual reality-based gesture recognition system, the system comprising: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;

Claims

1. The gesture recognition method based on virtual reality is characterized by comprising the following steps of:

obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all the prediction moments in the prediction period as a comprehensive evaluation index, and carrying out coefficient correction: obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted;

continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition;

the confidence evaluation index specifically comprises:

wherein beta is _n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, and the probability that the gesture track information belongs to the corresponding classification category is represented;

the correction probability of the initial segmentation frame number is specifically as follows: labeling the confidence evaluation indexes, wherein the label represents the time sequence of the confidence evaluation indexes; performing linear fitting on the marked confidence evaluation indexes, wherein the ordinate of a fitted straight line is the confidence evaluation index, and the abscissa is the mark of the confidence evaluation index, so as to obtain the slope of the fitted straight line; giving initial segmentation frame number correction probability, and obtaining the initial segmentation frame number correction probability by using the slope of the straight line:

wherein G is _M ' correct probability, k for initial segmentation frame numberTo fit the slope of a straight line, G _M Correcting the probability for a given initial segmentation frame number;

the correction probability of the initial forgetting coefficient is specifically as follows:

G _α ＝1-G _M ′

2. The gesture recognition method based on virtual reality according to claim 1, wherein the validity of the correction is judged according to a validity evaluation index, the validity evaluation index is:

wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is the comprehensive evaluation index of the number of segmentation frames and the uncorrected forgetting coefficient, beta ^′ The modified comprehensive evaluation index is obtained; k is the slope of a fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k ^′ And the slope of the corrected fitting straight line is obtained.

3. The virtual reality-based gesture recognition method of claim 1, wherein the updating the probability of the correction direction according to the validity of the correction comprises: multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating.

4. The gesture recognition method based on virtual reality according to claim 1, wherein the optimal segmentation frame number and forgetting coefficient acquisition includes: and constructing a coefficient prediction network, inputting a hand motion depth image of a continuous fixed frame, and outputting an optimal segmentation frame number and forgetting coefficient.

5. A virtual reality-based gesture recognition system, the system comprising: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;

the coefficient correction module is used for obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted; continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable;

the gesture recognition module is used for obtaining the optimal segmentation frame number and forgetting coefficient when correction is stopped, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition;

the coefficient correction module is also used for obtaining a confidence coefficient evaluation index, and specifically comprises the following steps:

wherein beta is _n For confidence levelAn evaluation index, n represents an nth predicted time within a predicted period; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; p is p ^c _n A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, and the probability that the gesture track information belongs to the corresponding classification category is represented;

wherein G is _M ' correction probability for initial segmentation frame number, k is slope of fitting straight line, G _M Correcting the probability for a given initial segmentation frame number;

G _α ＝1-G _M ′

6. The gesture recognition system of claim 5, wherein the coefficient correction module is further configured to determine the validity of the correction according to a validity evaluation index, the validity evaluation index being: