CN117496303A

CN117496303A - Gesture recognition model generation method and device

Info

Publication number: CN117496303A
Application number: CN202311579134.9A
Authority: CN
Inventors: 谢良; 张皓洋; 高鲲; 王心怡; 张星昱; 张亚坤; 陈伟; 闫野; 印二威
Original assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-02-02

Abstract

The invention discloses a gesture recognition model generation method and device, wherein the method comprises the following steps: acquiring a first training set and a second training set; the first training set comprises a plurality of first gesture image samples; processing the initial gesture recognition model based on the first training set, and determining a standby gesture recognition model; the initial gesture recognition model comprises an encoding unit and a decoding unit; determining a gesture recognition model based on the second training set and the standby gesture recognition model for the initial gesture recognition model; the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system. Therefore, the method and the device are beneficial to training the model by utilizing the multi-mode gesture information, reduce the data requirement required by model training, improve the efficiency and the accuracy of model generation, and further improve the optimization capacity of the gesture recognition model and the capacity of adapting to different use scenes of the gesture recognition model.

Description

Gesture recognition model generation method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a gesture recognition model.

Background

In modern society, human-computer interaction has gradually become an integral part of our work and life. With the continuous development of artificial intelligence, especially deep learning, man-machine interaction becomes more intelligent and efficient. In recent years, with the cooperation and research among students in different fields, various man-machine interaction modes are developed to facilitate our lives and works. As an important limb language, gestures are an indispensable part of our life, for example, traffic police on the road command traffic with gestures, deaf-mutes achieve communication with others with sign language, and the like. Gesture interaction is also receiving more and more attention from academia and industry, but has become one of the important interaction modes in the field of man-machine interaction.

The existing gesture recognition model is usually constructed by adopting a neural network, and as the color image and the information represented by the I MU data are different, the characteristic of different mode data is not fully considered in a way of directly using the neural network to extract the characteristics for fusion so as to achieve a better effect. In addition, it should be noted that the current methods are developed based on deep neural networks, and the performance of the methods depends on the scale of the labeled data, and labeling of hand gesture data, especially acquiring three-dimensional gesture coordinates, often requires a great deal of time and effort. Therefore, the gesture recognition model generation method and device are provided to realize training of the model by utilizing multi-mode gesture information, reduce data requirements required by model training, improve model generation efficiency and accuracy, and further improve gesture recognition model optimization capacity and gesture recognition model adaptation capacity to different use situations.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a gesture recognition model generation method and device which are beneficial to training a model by utilizing multi-mode gesture information, reduce the data requirement required by model training, improve the efficiency and accuracy of model generation, and further improve the optimization capacity of the gesture recognition model and the capacity of the gesture recognition model to adapt to different use scenes.

In order to solve the above technical problems, a first aspect of an embodiment of the present invention discloses a gesture recognition model generating method, which includes:

acquiring a first training set and a second training set; the first training set comprises a plurality of first gesture image samples;

processing the initial gesture recognition model based on the first training set, and determining a standby gesture recognition model; the initial gesture recognition model comprises an encoding unit and a decoding unit;

processing the initial gesture recognition model based on the second training set and the standby gesture recognition model to determine a gesture recognition model; the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system.

In a first aspect of the embodiment of the present invention, the processing the initial gesture recognition model based on the first training set, to determine a standby gesture recognition model includes:

randomly determining a target first gesture image sample from all the first gesture image samples in the first training set;

randomly generating a mask matrix block; the pixel value size of the mask matrix block is 255;

Adding the mask matrix block to the target first gesture image sample to obtain a mask first gesture image sample; the mask matrix block covers at least 75% of the image area of the target first gesture image sample;

inputting the mask first gesture image sample into an initial gesture recognition model for training to obtain a first training gesture recognition model and output image result information;

calculating the output image result information and the target first gesture image sample by using a first loss function to obtain a first loss function value;

wherein the first loss function is

Wherein L is ₁ -a value of the first loss function; y is ₁₂ Image data information corresponding to the target first gesture image sample;image data information corresponding to the output image result information;

judging whether the first loss function value is smaller than or equal to a first loss threshold value or not, and obtaining a first loss judgment result;

when the first loss judgment result is negative, replacing and updating the initial gesture recognition model by using the first training gesture recognition model, and triggering and executing the random determination of a target first gesture image sample from all the first gesture image samples in the first training set;

And when the first loss judgment result is yes, determining the first training gesture recognition model as a standby gesture recognition model.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the determining, based on the second training set and the standby gesture recognition model, the gesture recognition model for the initial gesture recognition model includes:

performing parameter migration and updating on the coding unit of the initial gesture recognition model by using the coding unit in the standby gesture recognition model to obtain a reconstructed gesture recognition model;

and performing semi-supervised training on the reconstructed gesture recognition model by using the second training set to obtain a gesture recognition model.

In a first aspect of the embodiment of the present invention, training the reconstructed gesture recognition model by using the second training set to obtain a gesture recognition model includes:

randomly determining a target second gesture image sample from all second gesture image samples in the second training set;

inputting the target second gesture image sample into a reconstructed gesture recognition model for training to obtain a second training gesture recognition model and image joint point information;

Performing loss function calculation on the image node information to obtain a second loss function value;

judging whether the second loss function value is smaller than or equal to a second loss threshold value or not, and obtaining a second loss judgment result;

when the second loss judgment result is negative, replacing and updating the reconstructed gesture recognition model by using the second training gesture recognition model, and triggering and executing the random determination of a target second gesture image sample from all second gesture image samples in the second training set;

and when the second loss judgment result is yes, determining that the second training gesture recognition model is a gesture recognition model.

In an optional implementation manner, in a first aspect of the embodiment of the present invention, the calculating a loss function of the image node information to obtain a second loss function value includes:

calculating an included angle of the image articulation point information by using a bending angle calculation model to obtain joint bending angle information;

wherein the bending angle calculation model is as follows

Wherein θ is the joint bending angle in the joint bending angle information; l (L) _0→1 Is the vector from joint 0 to joint 1; l (L) _1→2 Is a vector from the articulation point 1 to an articulation point 2; (x) ₁ ，y ₁ ，z ₁ ) For said l _0→1 Coordinates of (c); (x) ₂ ，y ₂ ，z ₂ ) For said l _1→2 Coordinates of (c);

judging whether the target second gesture image sample contains marked node coordinate information or not to obtain marked information judgment results;

when the labeling information judging result is negative, calculating the joint bending angle information and the sample joint bending angle information in the target second gesture image sample by using a second loss function to obtain a second loss function value;

wherein the second loss function is

Wherein L is ₂₀ -a value of the second loss function; x is x ₂₁ Sample joint bending angle information in the target second gesture image sample;information about the angle of bending of the joint;

when the judgment result of the labeling information is yes, calculating the joint bending angle information, the sample joint bending angle information in the target second gesture image sample, the image articulation point information and the labeling articulation point coordinate information by using a third loss function to obtain a second loss function value;

wherein the third loss function is

Wherein L is ₂₁ -a value of the second loss function; z ₂₁ Marking node coordinate information for the label;and node information for the image.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the sample joint bending angle information is obtained based on the following steps:

collecting acceleration information and angular velocity information;

calculating the acceleration signal information and the angular velocity information to obtain acceleration attitude angle information and angular velocity attitude angle information;

fusing the acceleration attitude angle information and the angular velocity attitude angle information to obtain target attitude angle information;

and calculating the bending angle of the target attitude angle information to obtain the sample joint bending angle information.

In an optional implementation manner, in a first aspect of the embodiment of the present invention, the calculating the bending angle of the target attitude angle information to obtain the sample joint bending angle information includes:

converting the target attitude angle information by using an angle conversion model to obtain angle quaternary vector information;

wherein the angle conversion model is as follows

In the formula, isFor the angle in the angle quadrilaterals informationA degree quaternary vector; the r, p and yaw are rotation angles around an X axis, a Y axis and a Z axis in the target attitude angle information respectively;

Normalizing the angle quaternary vector information to obtain angle quaternary vector information;

calculating the four-component vector information of the return angle by using a bending angle model to obtain the bending angle information of the sample joint;

wherein the bending angle model is

θ _{0_1_2} ＝180°+2arccos(Q ₀ ·Q ₁ ×180°/π)-2arccos(Q ₁ ·Q ₂ ×180°/π)；

In θ _{0_1_2} The joint bending angle of the joint point 1 relative to the joint point 0 to the joint point 2 in the sample joint bending angle information; q (Q) ₀ 、Q ₁ And Q ₂ The four-component vector is respectively a return angle four-component vector corresponding to the joint point 0, a return angle four-component vector corresponding to the joint point 1 and a return angle four-component vector corresponding to the joint point 2.

The second aspect of the embodiment of the invention discloses a gesture recognition model generating device, which comprises:

the acquisition module is used for acquiring the first training set and the second training set; the first training set comprises a plurality of first gesture image samples;

the first determining module is used for processing the initial gesture recognition model based on the first training set and determining a standby gesture recognition model; the initial gesture recognition model comprises an encoding unit and a decoding unit;

the second determining module is used for processing the initial gesture recognition model based on the second training set and the standby gesture recognition model to determine a gesture recognition model; the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system.

In a third aspect, the present invention discloses another gesture recognition model generating device, which includes:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform some or all of the steps in the gesture recognition model generation method disclosed in the first aspect of the embodiment of the present invention.

A fourth aspect of the present invention discloses a computer readable storage medium storing computer instructions for executing some or all of the steps in the gesture recognition model generation method disclosed in the first aspect of the embodiments of the present invention when the computer instructions are called.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a first training set and a second training set are acquired; the first training set comprises a plurality of first gesture image samples; processing the initial gesture recognition model based on the first training set, and determining a standby gesture recognition model; the initial gesture recognition model comprises an encoding unit and a decoding unit; processing the initial gesture recognition model based on the second training set and the standby gesture recognition model to determine a gesture recognition model; the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system. Therefore, the method and the device are beneficial to training the model by utilizing the multi-mode gesture information, reduce the data requirement required by model training, improve the efficiency and the accuracy of model generation, and further improve the optimization capacity of the gesture recognition model and the capacity of adapting to different use scenes of the gesture recognition model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a gesture recognition model generation method disclosed in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a gesture recognition model generating device according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of another gesture recognition model generating apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a hand gesture key point according to an embodiment of the present invention.

Detailed Description

In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or elements but may, in the alternative, include other steps or elements not expressly listed or inherent to such process, method, article, or device.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The invention discloses a gesture recognition model generation method and device, which are beneficial to realizing training of a model by utilizing multi-mode gesture information, reduce data requirements required by model training, improve the efficiency and accuracy of model generation, and further improve the optimization capacity of a gesture recognition model and the capacity of the gesture recognition model to adapt to different use scenes. The following will describe in detail.

Example 1

Referring to fig. 1, fig. 1 is a flowchart of a gesture recognition model generating method according to an embodiment of the present invention. The gesture recognition model generation method described in fig. 1 is applied to a gesture interaction system, such as a local server or a cloud server for gesture recognition model generation management, and the embodiment of the invention is not limited. As shown in fig. 1, the gesture recognition model generation method may include the following operations:

101. a first training set and a second training set are obtained.

In the embodiment of the invention, the first training set comprises a plurality of first gesture image samples.

102. And processing the initial gesture recognition model based on the first training set to determine a standby gesture recognition model.

In an embodiment of the present invention, the initial gesture recognition model includes an encoding unit and a decoding unit.

103. And processing the initial gesture recognition model based on the second training set and the standby gesture recognition model to determine the gesture recognition model.

In the embodiment of the invention, the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system.

It should be noted that, most of the current gesture recognition methods are constructed based on single-mode image data, and stability and robustness of the gesture recognition methods in an occlusion scene cannot be guaranteed; although the existing multi-mode data can alleviate the problem, the method has a single characteristic extraction mode for the data, only uses a neural network to extract and fuse the characteristics, and does not consider the characteristics of the data to design, so that the data utilization rate and the performance of the method are limited. In the application of the invention, the full supervision training is performed on a small amount of labeled data; in order to utilize a large amount of unlabeled data, the hand joint bending angle obtained by the acceleration and angular velocity signals acquired by the inertial sensor is used as weak supervision information for further training, so that better performance is realized.

In this alternative embodiment, as an alternative implementation manner, the first gesture image sample is obtained based on the following steps:

acquiring a hand image;

extracting features of the hand image by using a feature extractor to obtain feature information;

carrying out target detection of different sizes on the characteristic information by using a prediction frame generator to obtain hand prediction frame information;

performing overlapping degree calculation and sequencing treatment on the hand prediction frame information to obtain a target hand prediction frame;

and cutting the image corresponding to the target hand prediction frame from the hand image to obtain a first gesture image sample.

It should be noted that the feature extractor is constructed based on CBL components, namely, a convolution layer (Conv) +a batch normalization layer (BN) +a nonlinear active layer (leak ReLU). Furthermore, the feature extractor based on the CBL component combines the shallow features and the deep features, and then the accurate detection of targets with different sizes can be realized through the prediction frame generator of the multi-head mechanism.

It should be noted that the prediction frame generator is constructed based on a multi-head attention mechanism.

The above-mentioned overlapping degree calculation and sorting process are performed on the hand prediction frame information to obtain the target hand prediction frame, that is, non-maximum suppression (Non-Maximum Suppression, NMS) is used to evaluate the outputs of different prediction heads and select the final prediction result. The NMS mechanism is to sort all the boundary frames predicted by the network according to the score, then select the first boundary frame as the target frame, calculate the overlapping degree of the target frame and all the rest predicted frames, if the overlapping degree is greater than a preset threshold, then consider the predicted frame and the target frame as the same object, delete, otherwise keep. The above operation is then repeated in the remaining prediction frames until all prediction frames have been screened.

Therefore, the method for generating the gesture recognition model described by the embodiment of the invention is beneficial to realizing training of the model by utilizing multi-mode gesture information, reduces the data requirement for model training, improves the efficiency and accuracy of model generation, and further improves the optimization capacity of the gesture recognition model and the capacity of adapting to different use scenes of the gesture recognition model.

In an alternative embodiment, the processing the initial gesture recognition model based on the first training set to determine the standby gesture recognition model includes:

randomly determining a target first gesture image sample from all first gesture image samples in the first training set;

Wherein the first loss function is

Wherein L is ₁ Is a first loss function value; y is ₁₂ Image data information corresponding to a first gesture image sample of the target;for outputting the corresponding graph of the image result informationImage data information;

judging whether the first loss function value is smaller than or equal to a first loss threshold value or not to obtain a first loss judgment result;

when the first loss judgment result is negative, replacing and updating the initial gesture recognition model by using the first training gesture recognition model, and triggering and executing to randomly determine a target first gesture image sample from all first gesture image samples in the first training set;

It should be noted that, the first loss threshold may be set fixedly, or may be set according to actual needs, which is not limited by the embodiment of the present invention.

It should be noted that, when the first training set is used to train the initial gesture recognition model, a random mask is added to the target first gesture image sample, and then the target first gesture image sample is sent to the self-encoder for reconstruction, that is, the encoder is used to extract the features first, and then the decoder is used to reconstruct the masked part, so as to help to improve the robustness of the model in the occlusion scene.

Further, the self-encoder may be a variable self-encoder VAE, or may be a U-Net network model, which is not limited by the embodiment of the present invention. The decoder may be a model based on a deep neural network, or may be a model based on a self-attention mechanism, which is not limited by the embodiment of the present invention.

In another alternative embodiment, processing the initial gesture recognition model based on the second training set and the standby gesture recognition model, determining the gesture recognition model includes:

and performing semi-supervised training on the reconstructed gesture recognition model by using the second training set to obtain the gesture recognition model.

It should be noted that, the above method for transferring the model parameters of the coding unit in the standby gesture recognition model to the coding unit of the initial gesture recognition model can transfer the prior knowledge of the hand structure learned in the pre-training stage to the semi-supervised learning model in a parameter loading manner, so as to accelerate the model training process and improve the performance of the model.

It should be noted that, the second training set for performing semi-supervised training on the reconstructed gesture recognition model by using the second training set includes two kinds of supervision information: for a small amount of labeled data, the supervision information is respectively the hand joint point coordinates obtained from the image and the joint bending angle related information obtained from the inertial sensor data; whereas for large amounts of unlabeled data, the supervisory information is only information about the angle of joint bending from inertial sensor data. Based on the semi-supervised learning mode, a good training effect can be achieved by using a small amount of marked data.

In yet another alternative embodiment, training the reconstructed gesture recognition model using the second training set to obtain the gesture recognition model includes:

Inputting a target second gesture image sample into the reconstructed gesture recognition model for training to obtain a second training gesture recognition model and image joint point information;

when the second loss judgment result is negative, replacing and updating the reconstructed gesture recognition model by using the second training gesture recognition model, and triggering and executing to randomly determine a target second gesture image sample from all second gesture image samples in the second training set;

The image node information is a two-dimensional vector having a size (21, 3). 21 is the number of key points, namely one key point at the wrist joint, and four key points are arranged on each finger; 3 represents the three-dimensional coordinates of each key point.

Training the reconstructed gesture recognition model by using the second training set, and performing semi-supervised learning by using key point coordinates obtained by the images and joint bending angles obtained by the inertial sensor data.

It should be noted that, the second loss threshold may be fixed, or may be set according to actual needs, which is not limited in the embodiment of the present invention.

In yet another alternative embodiment, as shown in fig. 4, performing a loss function calculation on the image node information to obtain a second loss function value, including:

wherein the bending angle calculation model is as follows

Wherein θ is the joint bending angle in the joint bending angle information; l (L) _0→1 Is the vector from joint 0 to joint 1; l (L) _1→2 Is the vector from node 1 to node 2; (x) ₁ ，y ₁ ，z ₁ ) Is l _0→1 Coordinates of (c); (x) ₂ ，y ₂ ，z ₂ ) Is l _1→2 Coordinates of (c);

When the judgment result of the labeling information is negative, calculating the joint bending angle information and the sample joint bending angle information in the target second gesture image sample by using a second loss function to obtain a second loss function value;

wherein the second loss function is

Wherein L is ₂₀ Is a second loss function value; x is x ₂₁ Sample joint bending angle information in a target second gesture image sample;is joint bending angle information;

when the judgment result of the labeling information is yes, calculating joint bending angle information, sample joint bending angle information in a target second gesture image sample, image joint point information and labeling joint point coordinate information by using a third loss function to obtain a second loss function value;

wherein the third loss function is

Wherein L is ₂₁ Is a second loss function value; z ₂₁ Marking the coordinate information of the joint point;is image joint point information.

It should be noted that, the above-mentioned joint point 0, the above-mentioned joint point 1 and the above-mentioned joint point 2 may be key points, such as the key point 10, the key point 11 and the key point 12, in the order from bottom to top in fig. 4, which is not limited in the embodiment of the present invention.

In an alternative embodiment, the sample joint bending angle information is obtained based on the following steps:

collecting acceleration information and angular velocity information;

calculating acceleration signal information and angular velocity information to obtain acceleration attitude angle information and angular velocity attitude angle information;

and calculating the bending angle of the target attitude angle information to obtain sample joint bending angle information.

The acceleration information and the angular velocity information are acquired by an inertial sensor and a gyroscope.

In this optional embodiment, as an optional implementation manner, the calculating processing of the acceleration signal information and the angular velocity information to obtain acceleration attitude angle information and angular velocity attitude angle information includes:

calculating the acceleration signal information by using an angle conversion model to obtain acceleration attitude angle information;

wherein the angle conversion model is as follows

Wherein r is _acc And p _acc Acceleration attitude angles around an X axis and a Y axis are respectively; a, a _x 、a _y And a _z Acceleration about the X, Y and Z axes, respectively;

and integrating the angular velocity information to obtain angular velocity attitude angle information.

The angular velocity attitude angle around the Z axis is obtained by directly measuring the angular velocity attitude angle with an inertial sensor.

In this optional embodiment, as an optional implementation manner, the fusing acceleration attitude angle information and angular velocity attitude angle information to obtain target attitude angle information includes:

constructing state quantity information and angular velocity prediction angle information based on the angular velocity attitude angle information;

wherein the State quantity information State is

Angular velocity prediction angle informationIs that

Wherein Q (k) is an input noise matrix at time k; angle (k) and Angle (k+1) are angular velocity attitude angles at times k and k+1Angular velocity attitude angle in the information; gyro _bias (k) And Gyro _bias (k+1) is the offset at time k and time k+1, respectively; gyro (k) is the control input at time k;

constructing acceleration prediction angle information based on the acceleration attitude angle information;

wherein the acceleration predicting Angle information Angle _acc (k) Is that

In the formula, angle _acc (k) The acceleration attitude angle in the acceleration attitude angle information at the time k; r (k) is an observation noise matrix at k moment;

and carrying out parameter adjustment and state prediction updating iteration on the state quantity information, the angular velocity prediction angle information and the acceleration prediction angle information to obtain target attitude angle information.

It should be noted that, the above-mentioned parameter adjustment and state prediction update iteration for the state quantity information, the angular velocity prediction angle information and the acceleration prediction angle information are implemented based on a kalman filter algorithm.

It should be noted that, the input noise matrix, the observation noise matrix and the bias may be set according to actual needs, or may be preset default values, which is not limited by the embodiment of the present invention. Further, the control inputs are directly obtained from the actual model, which is known in the art.

It should be noted that, the method of the attitude angle of the acceleration calculation is only suitable for the static process, and the accelerometer is easily affected by vibration, so that the acceleration data has high-frequency noise, and the attitude angle of the three-axis acceleration calculation is also quite noisy; the angular velocity data measured by the gyroscope has small noise, but the gyroscope has the zero drift problem, and errors are accumulated after the attitude angle is obtained through integration. The above-mentioned attitude angle information of acceleration and attitude angle information of angular velocity are fused together the attitude angle that is calculated based on above-mentioned two methods, can get an angle data that has not accumulated error, noise and is small.

In another optional embodiment, the calculating the bending angle of the target attitude angle information to obtain sample joint bending angle information includes:

wherein the angle conversion model is as follows

/>

In the formula, isThe angle quaternary vector is an angle quaternary vector in the angle quaternary vector information; r, p, and yaw are rotation angles around the X axis, the Y axis, and the Z axis in the target attitude angle information respectively;

carrying out normalization processing on the angle quaternary vector information to obtain angle quaternary vector information;

calculating the four-component vector information of the return angle by using a bending angle model to obtain sample joint bending angle information;

wherein the bending angle model is

In θ _{0_1_2} The joint bending angle of the joint point 1 relative to the joint point 0 to the joint point 2 in the sample joint bending angle information; q (Q) ₀ 、Q ₁ And Q ₂ Respectively a return angle quadripod corresponding to the joint point 0 and a return angle quadripod corresponding to the joint point 1The quantity corresponds to the return angle quaternary vector of the joint point 2.

It should be noted that, the above-mentioned angle quadripole may also be obtained by calculating angles of cosine and sine around the X-axis and around the Z-axis, which is not limited in the embodiment of the present invention.

It should be noted that the number of the substrates,is an element of an angular quaternion vector, which is a real number.

Example two

Referring to fig. 2, fig. 2 is a schematic structural diagram of a gesture recognition model generating device according to an embodiment of the present invention. The device described in fig. 2 can be applied to a gesture interaction system, such as a local server or a cloud server for gesture recognition model generation management, and the embodiment of the invention is not limited. As shown in fig. 2, the apparatus may include:

An acquisition module 201, configured to acquire a first training set and a second training set; the first training set comprises a plurality of first gesture image samples;

a first determining module 202, configured to process the initial gesture recognition model based on the first training set, and determine a standby gesture recognition model; the initial gesture recognition model comprises an encoding unit and a decoding unit;

a second determining module 203, configured to process the initial gesture recognition model based on the second training set and the standby gesture recognition model, and determine a gesture recognition model; the gesture recognition model is used for detecting and recognizing gestures in the gesture interaction system.

Therefore, implementing the gesture recognition model generating device described in fig. 2 is beneficial to realizing training of the model by using multi-mode gesture information, reducing data requirements required by model training, improving efficiency and accuracy of model generation, and further improving optimization capacity of the gesture recognition model and capacity of adapting to different use situations of the gesture recognition model.

In another alternative embodiment, as shown in fig. 2, the first determining module 202 processes the initial gesture recognition model based on the first training set, and determines the standby gesture recognition model, including:

wherein the first loss function is

Wherein L is ₁ Is a first loss function value; y is ₁₂ Image data information corresponding to a first gesture image sample of the target;image data information corresponding to the output image result information;

In yet another alternative embodiment, as shown in fig. 2, the second determining module 203 processes the initial gesture recognition model based on the second training set and the standby gesture recognition model, and determines the gesture recognition model, including:

In yet another alternative embodiment, as shown in fig. 2, the second determining module 203 trains the reconstructed gesture recognition model by using the second training set to obtain the gesture recognition model, including:

In yet another alternative embodiment, as shown in fig. 2, the second determining module 203 performs a loss function calculation on the image node information to obtain a second loss function value, including:

wherein the bending angle calculation model is as follows

wherein the second loss function is

Wherein the third loss function is

In yet another alternative embodiment, as shown in fig. 2, the sample joint bending angle information is obtained based on the acquisition module 201 performing the following steps:

collecting acceleration information and angular velocity information;

In yet another alternative embodiment, as shown in fig. 2, the obtaining module 201 performs bending angle calculation on the target attitude angle information to obtain sample joint bending angle information, including:

wherein the angle conversion model is as follows

In the formula, isThe angle quaternary vector is an angle quaternary vector in the angle quaternary vector information; r, p, and yaw are rotations about the X-axis, Y-axis, and Z-axis, respectively, in the target attitude angle informationAn angle;

wherein the bending angle model is

Example III

Referring to fig. 3, fig. 3 is a schematic structural diagram of another gesture recognition model generating apparatus according to an embodiment of the present invention. The device described in fig. 3 can be applied to a gesture interaction system, such as a local server or a cloud server for gesture recognition model generation management, and the embodiment of the invention is not limited. As shown in fig. 3, the apparatus may include:

a memory 301 storing executable program code;

a processor 302 coupled with the memory 301;

the processor 302 invokes executable program code stored in the memory 301 for performing the steps in the gesture recognition model generation method described in embodiment one.

Example IV

The embodiment of the invention discloses a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the steps in the gesture recognition model generation method described in the embodiment one.

Example five

The embodiment of the invention discloses a computer program product, which comprises a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the steps in the gesture recognition model generating method described in the embodiment.

The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.

Finally, it should be noted that: the embodiment of the invention discloses a gesture recognition model generation method and device, which are disclosed in the preferred embodiment of the invention, and are only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for generating a gesture recognition model, the method comprising:

2. The method of claim 1, wherein the processing the initial gesture recognition model based on the first training set to determine the standby gesture recognition model comprises:

wherein the first loss function is

Wherein L is ₁ -a value of the first loss function; y is ₁₂ Image data information corresponding to the target first gesture image sample; Image data information corresponding to the output image result information;

3. The method of generating a gesture recognition model according to claim 1, wherein the processing the initial gesture recognition model based on the second training set and the standby gesture recognition model to determine a gesture recognition model comprises:

4. The method according to claim 2, wherein training the reconstructed gesture recognition model with the second training set to obtain a gesture recognition model comprises:

5. The method of claim 4, wherein the performing a loss function calculation on the image node information to obtain a second loss function value includes:

wherein the bending angle calculation model is as follows

wherein the second loss function is

Wherein L is ₂₀ -a value of the second loss function; x is x ₂₁ Sample joint bending angle information in the target second gesture image sample; Information about the angle of bending of the joint;

wherein the third loss function is

6. The method of claim 5, wherein the sample joint bending angle information is obtained based on:

collecting acceleration information and angular velocity information;

7. The method of generating a gesture recognition model according to claim 6, wherein the calculating the bending angle of the target attitude angle information to obtain the sample joint bending angle information includes:

wherein the angle conversion model is as follows

In the formula, isThe angle quaternary vector in the angle quaternary vector information is the angle quaternary vector; the r, p and yaw are rotation angles around an X axis, a Y axis and a Z axis in the target attitude angle information respectively;

wherein the bending angle model is theta _{0_1_2} ＝180°+2arccos(Q ₀ ·Q ₁ ×180°/π)-2arccos(Q ₁ ·Q ₂ ×180°/π)；

In θ _{0_1_2} A sample joint bending angle from the joint point 1 to the joint point 2 relative to the joint point 0 in the sample joint bending angle information; q (Q) ₀ 、Q ₁ And Q ₂ The four-component vector is respectively a return angle quaternary vector corresponding to the joint point 0, a return angle quaternary vector corresponding to the joint point 1 and a return angle quaternary vector corresponding to the joint point 2.

8. A gesture recognition model generation apparatus, the apparatus comprising:

9. A gesture recognition model generation apparatus, the apparatus comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the gesture recognition model generation method of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions which, when invoked, are adapted to perform the gesture recognition model generation method of any one of claims 1-7.