CN114035687B - Gesture recognition method and system based on virtual reality - Google Patents

Gesture recognition method and system based on virtual reality Download PDF

Info

Publication number
CN114035687B
CN114035687B CN202111336108.4A CN202111336108A CN114035687B CN 114035687 B CN114035687 B CN 114035687B CN 202111336108 A CN202111336108 A CN 202111336108A CN 114035687 B CN114035687 B CN 114035687B
Authority
CN
China
Prior art keywords
correction
coefficient
probability
frame number
evaluation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111336108.4A
Other languages
Chinese (zh)
Other versions
CN114035687A (en
Inventor
王瑞娟
王灏
陈慧民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202111336108.4A priority Critical patent/CN114035687B/en
Publication of CN114035687A publication Critical patent/CN114035687A/en
Application granted granted Critical
Publication of CN114035687B publication Critical patent/CN114035687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of artificial intelligence and virtual reality equipment, in particular to a gesture recognition method and system based on virtual reality. The method comprises the following steps: evaluating the initial segmentation frame number and the initial forgetting coefficient according to the difference degree of the predicted gesture action category and the real gesture action category to obtain a comprehensive evaluation index; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient, and increasing and decreasing the probability of the correction direction; selecting an initial segmentation frame number or an initial forgetting coefficient to carry out increasing or decreasing correction according to the correction probability and the probability of the correction direction; and stopping correction when the comprehensive evaluation index tends to be stable, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition. The gesture recognition method and the gesture recognition system improve the accuracy of gesture recognition and the generalization capability of the gesture recognition network, and reduce misjudgment of gesture actions caused by self factors of users.

Description

Gesture recognition method and system based on virtual reality
Technical Field
The invention relates to the technical field of artificial intelligence and virtual reality equipment, in particular to a gesture recognition method and system based on virtual reality.
Background
Along with the development of technology, related technologies such as virtual reality, man-machine interaction, image recognition and the like are continuously improved, and the demands of various industries for accurate gesture recognition are increasing. The method is mainly used for various aspects of intelligent home control, vehicle-mounted operation control, PC and mobile terminal control, industrial design and the like, and the commercial value of the method is also increased day by day.
The gesture recognition methods in the prior art are also various, and can be mainly divided into three technologies: the three methods are different from each other in terms of the image recognition technology based on the optical technology, the motion capture technology based on the inertial sensor and the hand form simulation technology based on the mechanical structure. In order to improve the accuracy of gesture recognition, the accuracy of subsequent gesture recognition is generally improved by improving the accuracy of the acquired hand information on the basis of the method, for example, the gesture tracking method of the VR headset and the VR headset of CN106648103A, and the accuracy of recognition is improved by fusing three-dimensional characteristic information of hands.
The problem in the prior art is that gesture behavior habits of different device users are different, but the gesture behavior habits of the device users are not considered in the prior art, and the gesture recognition network is difficult to generalize to a specific device user only by means of a sample of a training set, so that the gesture recognition accuracy is difficult to improve. In the gesture recognition process, the VR device cannot render the scene to which the instruction information obtained by the gesture recognition result is applied in advance, so that the use experience of the user is degraded.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a gesture recognition method and system based on virtual reality, and the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a gesture recognition method based on virtual reality:
predicting the historical gesture track information to obtain predicted gesture track information at a predicted moment, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;
obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted;
and continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.
Preferably, the confidence evaluation index is specifically:
wherein beta is n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.
Preferably, the correction probability of the initial segmentation frame number is specifically: labeling the confidence evaluation indexes, wherein the label represents the time sequence of the confidence evaluation indexes; performing linear fitting on the marked confidence evaluation indexes, wherein the ordinate of a fitted straight line is the confidence evaluation index, and the abscissa is the mark of the confidence evaluation index, so as to obtain the slope of the fitted straight line; giving initial segmentation frame number correction probability, and obtaining the initial segmentation frame number correction probability by using the slope of the straight line:
wherein G is M ' correction probability for initial segmentation frame number, k is slope of fitting straight line, G M The probability is modified for a given initial segmentation frame number.
Preferably, the correction probability of the initial forgetting coefficient is specifically:
G α =1-G M
wherein G is α Representing an initial forgetting coefficient correction probability; g M ' denotes an initial segmentation frame number correction probability.
Preferably, the validity of the correction is judged according to a validity evaluation index, and the validity evaluation index is:
wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is a comprehensive evaluation index when the segmentation frame number and the forgetting coefficient are not corrected, and beta' is a corrected comprehensive evaluation index; k is the slope of the fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k' is the slope of the corrected fitted straight line.
Preferably, updating the probability of the correction direction according to the validity of the correction includes: multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating.
Preferably, the optimal segmentation frame number and forgetting coefficient acquisition includes: and constructing a coefficient prediction network, inputting a hand motion depth image of a continuous fixed frame, and outputting an optimal segmentation frame number and forgetting coefficient.
In a second aspect, another embodiment of the present invention provides a gesture recognition system based on virtual reality. The system comprises: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;
the coefficient correction module is used for obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the rising or falling trend of the confidence coefficient evaluation index time sequence value at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted; continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable;
the gesture recognition module is used for obtaining the optimal segmentation frame number and forgetting coefficient when correction is stopped, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.
Preferably, the coefficient correction module is further configured to obtain a confidence coefficient evaluation index, specifically:
wherein beta is n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents a gesture action classification knotThe number of fruit classification categories;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.
Preferably, the coefficient correction module is further configured to determine the validity of the correction according to a validity evaluation index, where the validity evaluation index is:
wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is a comprehensive evaluation index when the segmentation frame number and the forgetting coefficient are not corrected, and beta' is a corrected comprehensive evaluation index; k is the slope of the fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k' is the slope of the corrected fitted straight line.
The embodiment of the invention has at least the following beneficial effects: predicting by combining the TCN network with the track information of the historical gesture to obtain predicted gesture track information; optimizing the initial segmentation frame number and the initial forgetting coefficient according to the difference degree of the real gesture classification category and the predicted gesture classification category, so that the initial segmentation frame number and the initial forgetting coefficient are more in line with the gesture habit of the current user, and predicted gesture track information which is more similar to the real gesture track information is obtained; and further improves the accuracy of gesture recognition and the generalization capability of a gesture recognition network, and reduces misjudgment of gesture actions caused by self factors of users.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific implementation, structure, features and effects of a gesture recognition method and system based on virtual reality according to the present invention, which are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of a gesture recognition method and system based on virtual reality provided by the invention with reference to the accompanying drawings.
Example 1
The main application scene of the invention is as follows: the virtual reality scene is provided with an RGB-D camera which can collect hand information of a user, wherein the hand information comprises RGB images and depth information, and the default camera pose is fixed without considering the influence of factors such as parallax, shielding and the like; identifying and classifying hand actions according to the collected hand action information, so as to obtain instruction information represented by gesture action categories; and realizing corresponding functions according to the instruction information.
Referring to fig. 1, a flowchart of a method according to an embodiment of the present invention is shown, the method includes the following steps:
firstly, predicting and obtaining predicted gesture track information at a predicted time by utilizing historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted time; the method comprises the steps of acquiring historical gesture track information and real gesture track information: and superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information.
And acquiring an RGB-D image of the hand of the user through an RGB-D camera, processing the RGB-D image of the hand of the user through a hand key point detection network, and outputting a thermodynamic diagram of the key point of the hand of the user. The hand key point detection network is constructed by adopting an encoder-decoder framework, network input is a user hand RGB-D image, network output is a multi-channel user hand key point thermodynamic diagram, the number of thermodynamic diagram channels is consistent with the number of hand key point categories, and the input and output image sizes are consistent; the training set adopts a plurality of RGB-D images which are acquired aiming at different hand postures, the labels are hot spots which are generated by Gaussian blur at the centers of key points of the hands, the label categories are 21 bone node categories which are commonly used, and the loss function adopts a mean square error loss function. And acquiring hand key point information through the RGB-D image of the hand of the user acquired by the RGB-D camera, so that the gesture recognition is conveniently carried out by a subsequent gesture recognition network.
Building a gesture recognition network: the gesture recognition network architecture is formed by coupling a TCN network and a gesture classification network. The TCN network specifically includes: in the TCN network of this embodiment, a causal convolution manner is adopted, and the sliding window length suggestion is set to be odd, and preferably, the sliding window length is set to be 3 in this embodiment; the causal convolution predicts subsequent information by processing historical information. The input of the TCN network, namely the input of the gesture recognition network, is gesture track information on a plurality of continuous time sequences, the single gesture track information is a single element of the input sequence, and the single element is obtained by carrying out thermal superposition on the continuous M-frame hand key point thermodynamic diagram split channels. The thermodynamic superposition mode specifically comprises the following steps: for single-channel continuous multi-frame hand key point thermodynamic diagrams, point-by-point accumulation is carried out by adopting forgetting coefficients, a specific formula is X' = (1-alpha) x+alpha X, wherein alpha is an initial forgetting coefficient, the ratio of thermal value retention of historical accumulation is controlled, correspondingly, (1-alpha) is the ratio of current thermal value retention, X is the current thermal value, namely the pixel value of each pixel position of the current thermodynamic diagram, and X is the thermal value of the historical accumulation, namely the pixel value of each pixel position in the result diagram of thermal superposition of all frames before the current frame.
Acquiring continuous multi-frame RGB-D images, acquiring continuous multi-frame hand key point thermodynamic diagrams through a hand key point detection network, segmenting by using an initial segmentation frame number M, overlapping hand key point thermodynamic diagrams acquired after segmentation, acquiring a plurality of gesture track information on a continuous time sequence, selecting a fixed number of track information as a single training sample through a sliding window, and taking k as a single training sample 1 ,k 2 ,k 3 For example, the corresponding tag data is k 4 Wherein k is 1 ,k 2 ,k 3 Representing historical gesture track information, k 4 For predicting gesture track information, namely, first predicted gesture track information after input historical gesture track information; and acquiring a plurality of groups of training samples and corresponding tag data by carrying out the processing process on different gestures of different users, wherein the loss function also adopts a mean square error loss function. After the TCN network training is completed, inputting continuous time sequence historical gesture track information, outputting predicted gesture track information, and simultaneously carrying out segmentation and superposition on continuous multi-frame hand key point thermodynamic diagrams according to the initial segmentation frame number M and the initial forgetting coefficient alpha to obtain real gesture track information corresponding to the predicted gesture track information at the predicted moment.
Then, obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; and obtaining the comprehensive evaluation index after the adjustment of the coefficient to be corrected.
The gesture classification network specifically comprises: with the single gesture track information as a training sample, the tag data is set to be a real gesture action category corresponding to the gesture track information, and the category can be set by an implementer according to actual situations, for example, the gesture action category includes: click, move, rotate, zoom, return, etc.; the loss function adopts a cross entropy loss function; after the gesture classification network training is completed, the predicted gesture track information and the real gesture track information are respectively input, and the corresponding predicted gesture action category and real gesture action category are output.
The TCN network and the gesture classification network are pre-trained, so that the TCN network and the gesture classification network can be normally used; the gesture recognition network adopts the fixed frame number M to split and adopts the fixed forgetting coefficient alpha to overlap, so that the gesture recognition network can not adapt to the specific gesture action speed of a user, and therefore the gesture recognition network is optimized, the initial splitting frame number M and the initial forgetting coefficient alpha are mainly optimized, the fit degree of a network input sample and the user is improved, and the accuracy of the gesture recognition network is further improved.
For an RGB-D hand image which is actually acquired, a hand key point thermodynamic diagram is obtained through a hand key point detection network, segmentation is carried out according to an initial segmentation frame number M, superposition is carried out according to a forgetting coefficient alpha, historical gesture track information is obtained, prediction is started according to the historical gesture track information, and predicted gesture track information is obtained. When the prediction time period reaches the set length, the number of the prediction time points in the prediction time period is N, a confidence evaluation index is obtained by using the comparison result of the action types of the predicted gesture track information and the real gesture track information, the initial segmentation frame number and the initial forgetting coefficient are evaluated, and the initial segmentation frame number and the initial forgetting coefficient are corrected according to the evaluation value; the confidence evaluation index of each prediction time is as follows:
wherein beta is n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />And the confidence that the predicted gesture track information action category of the nth predicted time belongs to the c-th classification category is represented. It should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, which characterizes the probability that the gesture track information belongs to the corresponding classification category.
The comprehensive evaluation indexes of the segmentation frame number M and the forgetting coefficient alpha are as follows:
where N is the number of predicted times in the prediction period.
The larger the confidence evaluation index value is, the larger the confidence difference between the predicted gesture classification category and the real gesture classification category is, because the predicted network and the classification network are pre-trained, the default is that the predicted network and the classification network have no misjudgment problem when in use, and the reason for the larger confidence difference between the predicted gesture classification category and the real gesture classification category is mainly that the initial segmentation frame number M is unreasonable or the initial forgetting coefficient alpha is unreasonable, so that the predicted gesture track cannot be consistent with the real gesture track information, namely the confidence difference between the predicted gesture classification category and the real gesture classification category is larger; at this time, the initial segmentation frame number M or the initial forgetting coefficient α is optimized, specifically: the initial segmentation frame number M is traversed, the traversing range M is (+ -) Deltaepsilon, the traversing step length is 1, deltaepsilon is the segmentation frame number adjusting value, and is preferably set as in the embodimentThe optimizing range of the initial forgetting coefficient alpha is [0,1]The traversal step is 0.02.
The correction probability G of the initial segmentation frame number M needs to be given in the optimizing process M Preferably G M The initial value of (2) is 1/2. For confidence evaluation index beta n The confidence evaluation index is marked with a sign indicating the order in time of the confidence evaluation index, and the confidence evaluation index beta is placed n Performing linear fitting, wherein the abscissa of the fitted straight line is the label of the confidence evaluation index, the ordinate is the confidence evaluation index, and the slope k of the fitted straight line is obtained to obtain an adjustment function of the correction probability of the initial segmentation frame number:
the adjustment function tau is used for correcting the correction probability of the initial segmentation frame number, namely, the larger the slope is, the larger the difference between the predicted gesture track information and the real gesture track information is, and at the moment, the more likely the initial segmentation frame number M is unreasonably caused, so that tau is closer to 2; correction probability G for a given initial segmentation frame number M Taking the product of the initial segmentation frame number and the adjustment function tau as the initial segmentation frame number correction probability, so that the probability of correcting the initial segmentation frame number is larger; when the slope is close to 0, whether the initial segmentation frame number or the initial forgetting coefficient is unreasonable is difficult to judge, so that the correction probability of the initial segmentation frame number and the initial forgetting coefficient is approximate; the smaller the slope, the smaller the difference between the predicted gesture track information and the real gesture track information, the more likely the initial forgetting coefficient is unreasonable to cause the prediction to appear difference, but with the change of the input history track information, the difference is gradually reduced, the closer τ is to 0, the given initial segmentation frame number G M The product of the initial segmentation frame number M and the adjustment function tau is used as the correction probability of the initial segmentation frame number M, so that the probability of the initial segmentation frame number being adjusted is smaller. The correction probability of the initial segmentation frame number is:
accordingly, because the correction probability of the initial segmentation frame number and the initial forgetting coefficient is a complete probability distribution, the correction probability of the initial forgetting coefficient is:
G α =1-G′ M
wherein G' M G represents the correction probability of the initial segmentation frame number α G represents the correction probability of the initial forgetting coefficient M To set the correction probability of the initial segmentation frame number, τ is an adjustment function. Correction probability G 'based on initial segmentation frame number' M And randomly selecting the initial segmentation frame number or the initial forgetting coefficient as the coefficient to be corrected for correction according to the correction probability of the initial forgetting coefficient.
When the coefficient to be corrected is corrected, the random correction direction is adopted for traversing, namely the probability G that the increasing direction and the decreasing direction also exist u And G d Initial values are 1/2, and in order to improve the traversing speed, an effectiveness index for judging the effectiveness of correction when the initial segmentation frame number or the initial forgetting coefficient is selected for correction is setFor a one-time traversal result, acquiring a comprehensive evaluation index beta 'and a fitting straight line slope k' after the coefficient to be corrected is corrected, wherein the effectiveness evaluation index of the coefficient correction is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,in order to influence the comprehensive evaluation index in the effectiveness evaluation index, when the corrected comprehensive evaluation index beta ' is smaller than the initial comprehensive evaluation index beta, beta ' -beta is negative, which indicates that the corrected comprehensive evaluation index beta ' is evaluated better and the correction is effective, and the correction is performed more slowly>Greater than 1, and the smaller the β' - β, +.>The closer to 2; />For the slope influence of the fitting straight line in the effectiveness evaluation index, (|k| -k' |) represents the degree of the straight line approaching the level, the larger the value is, the closer the fitting straight line approaches the level after coefficient correction is explained, and the correction is effective, the +|>The closer to 2.
Multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating:
g' is the probability of the correction direction when the correction is performed again, and G is the probability of the correction direction when the correction is performed currently; g' and G may be probabilities of increasing the direction or decreasing the direction.
If the correction direction is the increasing direction, the probability value of the increasing direction after adjustment is G' u The probability value of the other unadjusted traversing direction is still G d For G' u And G d And normalizing to obtain the probability value of increasing or decreasing direction during the re-correction.
And finally, continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.
By adopting the optimization method, the number of the segmentation frames and the forgetting coefficient corresponding to the lowest comprehensive evaluation index are used as the optimal number of the segmentation frames and the forgetting coefficient until the comprehensive evaluation index converges or the comprehensive evaluation index cannot reach lower after a period of time.
In actual use, a coefficient prediction network is constructed, a continuous fixed frame number of hand action RGB-D image is used as coefficient prediction network input, a TCN-FC structure is adopted in a network architecture, and an optimal segmentation frame number regression value and an optimal forgetting coefficient regression value are output by the network; the network training specifically comprises the following steps: the method comprises the steps of processing acquired optimal segmentation frame numbers and optimal forgetting coefficients to serve as tag data by adopting continuous fixed frame number hand action RGB-D images acquired by a plurality of different users when VR equipment operation is carried out, and adopting mean square error as a loss function.
Collecting RGB-D images of a user with continuous fixed frames, sending the images into a parameter prediction network, and outputting an optimal segmentation frame number regression value and an optimal forgetting coefficient regression value; and performing segmentation and superposition processing on the RGB-D image based on the optimal segmentation frame number regression value and the optimal forgetting coefficient regression value to obtain a plurality of track information graphs, generating a prediction track by the track information graphs through a TCN network, and performing prediction and recognition of hand actions by combining a classification network.
Example 2
The present embodiment provides a system embodiment. A virtual reality-based gesture recognition system, the system comprising: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;
the coefficient correction module is used for obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the rising or falling trend of the confidence coefficient evaluation index time sequence value at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted; continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable;
the gesture recognition module is used for obtaining the optimal segmentation frame number and forgetting coefficient when correction is stopped, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition.

Claims (6)

1. The gesture recognition method based on virtual reality is characterized by comprising the following steps of:
predicting the historical gesture track information to obtain predicted gesture track information at a predicted moment, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;
obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all the prediction moments in the prediction period as a comprehensive evaluation index, and carrying out coefficient correction: obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted;
continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable, stopping the correction, obtaining the optimal segmentation frame number and forgetting coefficient, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition;
the confidence evaluation index specifically comprises:
wherein beta is n N represents the nth predicted time in the predicted period for the confidence evaluation index; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; />A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, and the probability that the gesture track information belongs to the corresponding classification category is represented;
the correction probability of the initial segmentation frame number is specifically as follows: labeling the confidence evaluation indexes, wherein the label represents the time sequence of the confidence evaluation indexes; performing linear fitting on the marked confidence evaluation indexes, wherein the ordinate of a fitted straight line is the confidence evaluation index, and the abscissa is the mark of the confidence evaluation index, so as to obtain the slope of the fitted straight line; giving initial segmentation frame number correction probability, and obtaining the initial segmentation frame number correction probability by using the slope of the straight line:
wherein G is M ' correct probability, k for initial segmentation frame numberTo fit the slope of a straight line, G M Correcting the probability for a given initial segmentation frame number;
the correction probability of the initial forgetting coefficient is specifically as follows:
G α =1-G M
wherein G is α Representing an initial forgetting coefficient correction probability; g M ' denotes an initial segmentation frame number correction probability.
2. The gesture recognition method based on virtual reality according to claim 1, wherein the validity of the correction is judged according to a validity evaluation index, the validity evaluation index is:
wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is the comprehensive evaluation index of the number of segmentation frames and the uncorrected forgetting coefficient, beta The modified comprehensive evaluation index is obtained; k is the slope of a fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k And the slope of the corrected fitting straight line is obtained.
3. The virtual reality-based gesture recognition method of claim 1, wherein the updating the probability of the correction direction according to the validity of the correction comprises: multiplying the validity evaluation index by the probability of the correction direction which is actually adjusted, normalizing the multiplication result with the probability of the correction direction which is not adjusted, and obtaining the probability of the correction direction after updating.
4. The gesture recognition method based on virtual reality according to claim 1, wherein the optimal segmentation frame number and forgetting coefficient acquisition includes: and constructing a coefficient prediction network, inputting a hand motion depth image of a continuous fixed frame, and outputting an optimal segmentation frame number and forgetting coefficient.
5. A virtual reality-based gesture recognition system, the system comprising: the gesture track information acquisition module is used for predicting and obtaining predicted gesture track information at a predicted moment by utilizing the historical gesture track information, and simultaneously obtaining real gesture track information corresponding to the predicted moment; the method comprises the steps of acquiring historical gesture track information and real gesture track information: superposing the hand key point thermodynamic diagrams of the initial segmentation frame number according to the initial forgetting coefficient to obtain corresponding gesture track information;
the coefficient correction module is used for obtaining a confidence evaluation index according to a comparison result of the action category of the predicted gesture track information and the real gesture track information; taking the average value of confidence coefficient evaluation indexes at all prediction moments in a prediction period as a comprehensive evaluation index, and carrying out coefficient correction; obtaining the correction probability of the initial segmentation frame number and the initial forgetting coefficient according to the change trend of the value on the confidence coefficient evaluation index time sequence at the prediction moment in the prediction period; selecting an initial segmentation frame number or an initial forgetting coefficient as a coefficient to be corrected according to the correction probability, adjusting the coefficient to be corrected in a corresponding correction direction according to the probability of the correction direction, and updating the probability of the correction direction according to the correction effectiveness, wherein the correction direction comprises an increasing direction and a decreasing direction; acquiring a comprehensive evaluation index of which the coefficient to be corrected is adjusted; continuously carrying out the coefficient correction until the comprehensive evaluation index tends to be stable;
the gesture recognition module is used for obtaining the optimal segmentation frame number and forgetting coefficient when correction is stopped, and using the optimal segmentation frame number and forgetting coefficient for subsequent gesture recognition;
the coefficient correction module is also used for obtaining a confidence coefficient evaluation index, and specifically comprises the following steps:
wherein beta is n For confidence levelAn evaluation index, n represents an nth predicted time within a predicted period; c represents the number of classification categories of the gesture action classification result;a confidence level indicating that the true gesture track information action category at the nth predicted time belongs to the c-th classification category; p is p c n A confidence level indicating that the motion category of the predicted gesture track information at the nth predicted time belongs to the c-th classification category; it should be noted that, the action category classification result of the gesture track information is a confidence coefficient sequence, and the probability that the gesture track information belongs to the corresponding classification category is represented;
the correction probability of the initial segmentation frame number is specifically as follows: labeling the confidence evaluation indexes, wherein the label represents the time sequence of the confidence evaluation indexes; performing linear fitting on the marked confidence evaluation indexes, wherein the ordinate of a fitted straight line is the confidence evaluation index, and the abscissa is the mark of the confidence evaluation index, so as to obtain the slope of the fitted straight line; giving initial segmentation frame number correction probability, and obtaining the initial segmentation frame number correction probability by using the slope of the straight line:
wherein G is M ' correction probability for initial segmentation frame number, k is slope of fitting straight line, G M Correcting the probability for a given initial segmentation frame number;
the correction probability of the initial forgetting coefficient is specifically as follows:
G α =1-G M
wherein G is α Representing an initial forgetting coefficient correction probability; g M ' denotes an initial segmentation frame number correction probability.
6. The gesture recognition system of claim 5, wherein the coefficient correction module is further configured to determine the validity of the correction according to a validity evaluation index, the validity evaluation index being:
wherein the method comprises the steps ofIs an effectiveness evaluation index; beta is the comprehensive evaluation index of the number of segmentation frames and the uncorrected forgetting coefficient, beta The modified comprehensive evaluation index is obtained; k is the slope of a fitted straight line when the segmentation frame number and the forgetting coefficient are not corrected, and k And the slope of the corrected fitting straight line is obtained.
CN202111336108.4A 2021-11-12 2021-11-12 Gesture recognition method and system based on virtual reality Active CN114035687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111336108.4A CN114035687B (en) 2021-11-12 2021-11-12 Gesture recognition method and system based on virtual reality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111336108.4A CN114035687B (en) 2021-11-12 2021-11-12 Gesture recognition method and system based on virtual reality

Publications (2)

Publication Number Publication Date
CN114035687A CN114035687A (en) 2022-02-11
CN114035687B true CN114035687B (en) 2023-07-25

Family

ID=80144149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111336108.4A Active CN114035687B (en) 2021-11-12 2021-11-12 Gesture recognition method and system based on virtual reality

Country Status (1)

Country Link
CN (1) CN114035687B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152931B (en) * 2023-04-23 2023-07-07 深圳未来立体教育科技有限公司 Gesture recognition method and VR system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808143A (en) * 2017-11-10 2018-03-16 西安电子科技大学 Dynamic gesture identification method based on computer vision
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device
CN111832468A (en) * 2020-07-09 2020-10-27 平安科技(深圳)有限公司 Gesture recognition method and device based on biological recognition, computer equipment and medium
CN113420848A (en) * 2021-08-24 2021-09-21 深圳市信润富联数字科技有限公司 Neural network model training method and device and gesture recognition method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102298541B1 (en) * 2019-07-23 2021-09-07 엘지전자 주식회사 Artificial intelligence apparatus for recognizing user from image data and method for the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808143A (en) * 2017-11-10 2018-03-16 西安电子科技大学 Dynamic gesture identification method based on computer vision
CN110991319A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Hand key point detection method, gesture recognition method and related device
CN111832468A (en) * 2020-07-09 2020-10-27 平安科技(深圳)有限公司 Gesture recognition method and device based on biological recognition, computer equipment and medium
CN113420848A (en) * 2021-08-24 2021-09-21 深圳市信润富联数字科技有限公司 Neural network model training method and device and gesture recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合全局时序和局部空间特征的伪造人脸视频检测方法;陈鹏;梁涛;刘锦;戴娇;韩冀中;;信息安全学报(第02期);全文 *

Also Published As

Publication number Publication date
CN114035687A (en) 2022-02-11

Similar Documents

Publication Publication Date Title
US11205099B2 (en) Training neural networks using data augmentation policies
CN109800483A (en) A kind of prediction technique, device, electronic equipment and computer readable storage medium
CN113128558B (en) Target detection method based on shallow space feature fusion and adaptive channel screening
CN112884742B (en) Multi-target real-time detection, identification and tracking method based on multi-algorithm fusion
CN110765854A (en) Video motion recognition method
CN111597961B (en) Intelligent driving-oriented moving target track prediction method, system and device
CN110781262A (en) Semantic map construction method based on visual SLAM
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN114035687B (en) Gesture recognition method and system based on virtual reality
CN112818873A (en) Lane line detection method and system and electronic equipment
CN110705646B (en) Mobile equipment streaming data identification method based on model dynamic update
Fernandes et al. Long short-term memory networks for traffic flow forecasting: exploring input variables, time frames and multi-step approaches
CN113052039A (en) Method, system and server for detecting pedestrian density of traffic network
CN112700476A (en) Infrared ship video tracking method based on convolutional neural network
CN116824542A (en) Light-weight foggy-day vehicle detection method based on deep learning
CN114529890A (en) State detection method and device, electronic equipment and storage medium
CN113836241A (en) Time series data classification prediction method and device, terminal equipment and storage medium
CN110633597A (en) Driving region detection method and device
CN116797799A (en) Single-target tracking method and tracking system based on channel attention and space-time perception
CN116597430A (en) Article identification method, apparatus, electronic device, and computer-readable medium
CN113191984B (en) Deep learning-based motion blurred image joint restoration and classification method and system
CN114429602A (en) Semantic segmentation method and device, electronic equipment and storage medium
CN114372999A (en) Object detection method and device, electronic equipment and storage medium
CN116977969B (en) Driver two-point pre-aiming identification method based on convolutional neural network
CN115357725B (en) Knowledge graph generation method and device based on user behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant