CN112464895B

CN112464895B - Gesture recognition model training method and device, gesture recognition method and terminal equipment

Info

Publication number: CN112464895B
Application number: CN202011475627.4A
Authority: CN
Inventors: 汤志超; 程骏; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2023-09-01
Anticipated expiration: 2040-12-14
Also published as: CN112464895A

Abstract

The embodiment of the application discloses a gesture recognition model training method, a gesture recognition model training device, a gesture recognition method and terminal equipment, wherein the method comprises the following steps: identifying a preset number of predicted key points from unlabeled gesture sample graphs by using a gesture recognition model subjected to preliminary training, and labeling the gesture sample graphs by using the preset number of predicted key points; respectively calculating confidence degrees between the gesture sample graphs marked with the preset number of predicted key points and each marked gesture sample in the gesture sample training set; determining a maximum confidence from the respective confidence; and when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, continuing training the gesture recognition model by using the gesture sample graph marked with the preset number of prediction key points until the loss value of the gesture recognition model is smaller than a preset loss threshold value. According to the technical scheme, the time for manually marking the training samples can be effectively reduced, the waste of human resources is reduced, and the training period of the gesture recognition model is shortened.

Description

Gesture recognition model training method and device, gesture recognition method and terminal equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a gesture recognition model training method and device, a gesture recognition method and terminal equipment.

Background

At present, in the field of machine learning, most people train a model by using a supervised learning mode, namely, train the model by using a large number of pre-labeled training samples, so that the model learns target features as much as possible and finally shows the target features. However, pre-labeling training samples requires a significant amount of manpower and resources. Taking the field of computer vision as an example: using a professional labeling tool 8 hours a day may only be able to label thousands of pieces of data containing bounding box information. The key point marking difficulty of the training sample for gesture recognition is higher, and the manual marking accuracy is low, so that the model training effect is affected.

Disclosure of Invention

In view of the above problems, the application provides a gesture recognition model training method, a gesture recognition device, a gesture recognition method and terminal equipment.

One embodiment of the application provides a training method for a gesture recognition model, which comprises the following steps:

performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points;

identifying a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model subjected to preliminary training, and labeling corresponding gesture sample graphs by using the preset number of predicted key points;

respectively calculating the confidence coefficient between each marked gesture sample graph with a preset number of predicted key points and each marked gesture sample in the gesture sample training set;

determining the maximum confidence coefficient from the confidence coefficients corresponding to the gesture sample graph marked with the preset number of the prediction key points;

when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth gesture sample graph marked with a preset number of prediction key points to the gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs;

and continuing training the gesture recognition model by using the updated gesture sample training set until the loss value of the gesture recognition model is smaller than a preset loss threshold value.

The gesture recognition model training method according to another embodiment of the present application further includes:

and after the preset number of predicted key points are identified, outputting RGB images for displaying the preset number of predicted key points.

The gesture recognition model training method according to still another embodiment of the present application determines the maximum confidence level using the following formula:

T _max representing the maximum confidence level, A _m (i) Representing the coordinates of the ith marked key point of the mth gesture sample in the gesture sample training set, B (i) representing the coordinates of the ith predicted key point of the gesture sample graph marked with a preset number of predicted key points, n representing the preset number, E (A) _m (i) B (i)) represents the euclidean distance between the ith marker key point coordinate and the ith predicted key point coordinate.

According to the gesture recognition model training method, the gesture is the gesture of the hand.

The above gesture recognition model training method, wherein the preset number includes one of 14, 16 and 21.

According to the gesture recognition model training method, the loss value is calculated by the following formula:

loss (x) represents the Loss value, x| represents the absolute value of the difference between a predicted vector and a target vector, the target vector is composed of coordinates marked with a preset number of marked key points, the predicted vector is composed of coordinates of a preset number of predicted key points output by the gesture recognition model, the predicted vector corresponds to the coordinates in the target vector one by one, w and epsilon are preset constants, and C=w-wln (1+w/epsilon).

Yet another embodiment of the present application provides a gesture recognition model training apparatus, including:

the model preliminary training module is used for carrying out preliminary training on the gesture recognition model by utilizing a gesture sample training set marked with a preset number of marked key points;

the key point recognition module is used for recognizing a preset number of predicted key points from each unlabeled gesture sample graph by utilizing a gesture recognition model which is subjected to preliminary training, and labeling corresponding gesture sample graphs by utilizing the preset number of predicted key points;

the confidence coefficient calculating module is used for calculating the confidence coefficient between each gesture sample graph marked with a preset number of predicted key points and each marked gesture sample in the gesture sample training set;

the maximum confidence determining module is used for determining the maximum confidence from the confidence corresponding to the kth gesture sample graph marked with the preset number of predicted key points;

the gesture sample training set updating module is used for adding the kth gesture sample graph marked with a preset number of predicted key points to the gesture sample training set when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs;

and the model iteration training module is used for continuously training the gesture recognition model by using the updated gesture sample training set until the loss value of the gesture recognition model is smaller than a preset loss threshold value.

The embodiment of the application relates to a gesture recognition method, which is used for inputting a gesture picture into a gesture recognition model obtained by using the gesture recognition model training method disclosed by the embodiment of the application to carry out gesture recognition.

The embodiment of the application relates to a terminal device, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the computer program executes the gesture recognition method according to the embodiment of the application when running on the processor.

The embodiment of the application relates to a readable storage medium which stores a computer program which executes the gesture recognition method according to the embodiment of the application when running on a processor.

The application discloses a gesture recognition model training method which comprises the following steps: performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points; identifying a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model subjected to preliminary training, and labeling corresponding gesture sample graphs by using the preset number of predicted key points; respectively calculating the confidence coefficient between each marked gesture sample graph with a preset number of predicted key points and each marked gesture sample in the gesture sample training set; determining the maximum confidence coefficient from the confidence coefficients corresponding to the gesture sample graph marked with the preset number of the prediction key points; when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth gesture sample graph marked with a preset number of prediction key points to the gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs; and continuing training the gesture recognition model by using the updated gesture sample training set until the loss value of the gesture recognition model is smaller than a preset loss threshold value. According to the technical scheme, the initially trained gesture recognition model is utilized to label the unlabeled gesture sample graph, fresh blood is continuously supplemented for the gesture sample training set, and the newly labeled gesture sample is used for continuously iterating the training gesture recognition model, so that the time for manually labeling the training sample can be effectively reduced, the waste of manpower resources is reduced, and the training period of the gesture recognition model is shortened.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of the present application. Like elements are numbered alike in the various figures.

FIG. 1 is a schematic flow chart of a training method for a gesture recognition model according to an embodiment of the present application;

FIG. 2 is a flow chart of another method for training a gesture recognition model according to an embodiment of the present application;

FIG. 3 is a flow chart of another hand gesture recognition model training method according to an embodiment of the present application;

FIG. 4 shows a hand-related map according to an embodiment of the present application;

fig. 5 shows a schematic structural diagram of another gesture recognition model training apparatus according to an embodiment of the present application.

Description of main reference numerals:

1-a gesture recognition model training device; 10-a model preliminary training module; 20-a key point identification module; 30-a confidence calculation module; 40-a maximum confidence determination module; 50-an attitude sample training set updating module; 60-model iterative training module.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.

The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.

Aiming at the problems that key point data such as hands and bodies are difficult to annotate and a great deal of manpower and material resources are consumed in the annotating process, the application introduces a semi-supervised learning mode to train the gesture recognition model, and the gesture recognition model is trained by the semi-supervised learning mode, so that the gesture recognition model can be continuously optimized by using a small quantity of marked gesture samples and a large quantity of unmarked gesture sample graphs, and the generalization capability of the gesture recognition model is enhanced.

Further, optimizing the gesture recognition model requires continually feeding the model with a large number of unlabeled gesture sample patterns. The gesture recognition model can be initially trained by using a small amount of marked gesture samples, the gesture recognition model after the initial training can be used for initially recognizing the unlabeled gesture sample graph, and a plurality of key point information of the unlabeled gesture sample graph can be initially recognized and predicted. After the preset number of predicted key points are identified from the unlabeled gesture sample graph, RGB images displaying the preset number of predicted key points can be output, so that observers can subjectively judge whether the gesture corresponding to each predicted key point of the unlabeled gesture sample graph is the gesture to be identified, if so, the unlabeled gesture sample graph can be labeled by utilizing each predicted key point, the gesture sample graph is reserved, and the gesture sample graph is used as a training sample for training a gesture identification model; and determining whether the maximum confidence coefficient is larger than a preset threshold value or not by respectively calculating the confidence coefficient between each predicted key point of the unlabeled gesture sample graph and the preset number of marked key points of each marked gesture sample, if so, labeling the unlabeled gesture sample graph by using each predicted key point, reserving the gesture sample graph, and taking the gesture sample graph as a training sample for training a gesture recognition model.

And labeling and reserving the gesture sample graph which accords with the standard through double judgment of subjective observation and objective confidence, and discarding the gesture sample graph which does not accord with the standard. Not only can save the time consumed by manual labeling, but also can save the material cost.

Example 1

Referring to fig. 1, this embodiment shows a gesture recognition model training method, which includes the following steps:

s10: and performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points.

Before training the gesture recognition model, a gesture sample training set needs to be obtained in advance, wherein the gesture sample training set comprises a plurality of marked key point gesture samples with preset numbers, which can be used for training the gesture recognition model. It will be appreciated that when the gesture recognition model is used to recognize a body gesture, the gesture sample used to train the gesture recognition model should be a body gesture sample labeled with a preset number of body key points, for example, the body key points may be various joints and key parts of the body, and may be eyes, nose, shoulders, wrists, ankles, knees, etc.; when the gesture recognition model is used for recognizing the gesture of a hand, the gesture sample used for training the gesture recognition model should be a body gesture sample marked with a preset number of hand key points, for example, the hand key points may be hand key points and various joints, and may be palms and five fingers (thumb, index finger, middle finger, ring finger and little finger).

It will be appreciated that the pose sample for training may be an image acquired using an image acquisition device (e.g., camera, video camera, scanner, medical device, lidar, etc.); the image may be a locally stored image in advance; but may also be images acquired from a network, and embodiments of the present application are not limited in this regard. The image may be one image or may be an image frame in a video, which is not limited in the embodiment of the present application. In addition, for the images acquired by the image acquisition device, a professional labeling personnel is required to label the key points of each acquired image by using a special labeling tool, if the images stored in the local or acquired from the network in advance are not labeled, the professional labeling personnel is required to label the key points of each acquired image by using the special labeling tool, if the images stored in the local or acquired from the network in advance are labeled with a preset number of labeled key points, the labeled samples can be directly used for carrying out preliminary training on the gesture recognition model.

S20: and identifying a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model subjected to preliminary training, and labeling a corresponding gesture sample graph by using the preset number of predicted key points.

It can be understood that when the gesture recognition model performs gesture recognition on a certain picture, each key point of the picture needs to be recognized first, and the corresponding gesture is determined by using the coordinates of each key point. Therefore, after the preliminary training of the gesture sample training set marked with the preset number of marked key points, the gesture recognition model has certain capacity, the preset number of predicted key points can be recognized from the unlabeled gesture sample graph, and then the gesture sample graph can be marked by the preset number of predicted key points.

S30: and respectively calculating the confidence coefficient between each gesture sample graph marked with a preset number of predicted key points and each marked gesture sample in the gesture sample training set.

It can be understood that the gesture sample graph marked with the preset number of the predicted key points needs to be compared with each gesture sample in the gesture sample training set, so as to determine whether the gesture sample graph marked with the preset number of the predicted key points is similar to a certain gesture sample in the gesture sample training set, if the gesture sample graph marked with the preset number of the predicted key points is similar to a certain gesture sample in the gesture sample training set, the gesture sample graph marked with the preset number of the predicted key points corresponds to a gesture to be recognized, and then the gesture sample graph marked with the preset number of the predicted key points is effective for training a gesture recognition model and can be reserved; if the gesture sample graph marked with the preset number of the predicted key points is not similar to any gesture sample in the gesture sample training set, the gesture sample graph marked with the preset number of the predicted key points is invalid for training the gesture recognition model and can be discarded.

Further, the confidence level between the gesture sample graph and each marked gesture sample can be determined by performing one-to-one comparison between the preset number of predicted key points of the gesture sample graph and the preset number of marked key points of each marked gesture sample in the gesture sample training set. And judging whether the gesture sample graph is effective for training the gesture recognition model by using the confidence.

For example, if the gesture sample training set includes 20 marked gesture samples, each gesture sample includes a preset number of marked key points, the preset number of predicted key points of the gesture sample graph with the preset number of marked gesture samples can be compared with the preset number of marked key points of each marked gesture sample one by one through the k-th preset number of predicted key points, the distance between the preset number of corresponding key points is determined, and the corresponding confidence is determined through the average value of the preset number of distances. Obviously, 20 confidence levels can be obtained for each gesture sample graph and 20 marked gesture samples.

S40: and determining the maximum confidence from the confidence corresponding to the gesture sample graph marked with the preset number of the prediction key points.

S50: and when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth gesture sample graph marked with a preset number of prediction key points to the gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs.

Further, continuing to train the gesture recognition model by using the updated gesture sample training set until the loss value of the gesture recognition model is smaller than a preset loss threshold value, including the following steps:

s61: and continuing to train the gesture recognition model by using the updated gesture sample training set.

S62: and judging whether the loss value of the gesture recognition model is smaller than a preset loss threshold value.

S63: and if the loss threshold value is smaller than the preset loss threshold value, training the gesture recognition model to reach the standard, and finishing the training.

And if the loss value is greater than or equal to the preset loss threshold value, repeating the steps S61 and S62 until the loss value of the gesture recognition model is smaller than the preset loss threshold value.

The gesture recognition model training method disclosed by the embodiment comprises the following steps: performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points; identifying a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model subjected to preliminary training, and labeling corresponding gesture sample graphs by using the preset number of predicted key points; respectively calculating the confidence coefficient between each marked gesture sample graph with a preset number of predicted key points and each marked gesture sample in the gesture sample training set; determining the maximum confidence coefficient from the confidence coefficients corresponding to the gesture sample graph marked with the preset number of the prediction key points; when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth gesture sample graph marked with a preset number of prediction key points to the gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs; and continuing training the gesture recognition model by using the updated gesture sample training set until the loss value of the gesture recognition model is smaller than a preset loss threshold value. According to the technical scheme, the initially trained gesture recognition model is utilized to label the unlabeled gesture sample graph, the new labeled gesture sample is continuously added to the gesture sample training set, and then the gesture recognition model is continuously and iteratively trained by utilizing the new labeled gesture sample, so that the time for manually labeling the training sample can be effectively reduced, the waste of human resources is reduced, and the training period of the gesture recognition model is shortened.

Example 2

Referring to fig. 2, another method for training a gesture recognition model is shown in this embodiment, including the following steps:

s100: and performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points.

S200: and identifying a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model subjected to preliminary training, and labeling a corresponding gesture sample graph by using the preset number of predicted key points.

S300: and after the preset number of predicted key points are identified, outputting RGB images for displaying the preset number of predicted key points.

S400, respectively calculating the confidence coefficient between each gesture sample graph marked with a preset number of predicted key points and each marked gesture sample in the gesture sample training set.

S500: and determining the maximum confidence from the confidence corresponding to the gesture sample graph marked with the preset number of the prediction key points.

S600: and when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth gesture sample graph marked with a preset number of prediction key points to the gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled gesture sample graphs.

s710: and continuing to train the gesture recognition model by using the updated gesture sample training set.

S720: and judging whether the loss value of the gesture recognition model is smaller than a preset loss threshold value.

S730: and if the loss threshold value is smaller than the preset loss threshold value, training the gesture recognition model to reach the standard, and finishing the training.

If the loss value is greater than or equal to the preset loss threshold, repeating steps S710 and S720 until the loss value of the gesture recognition model is less than the preset loss threshold.

The present embodiment is implemented by adding step S300: and after the preset number of predicted key points are identified, outputting RGB images for displaying the preset number of predicted key points. After the preset number of predicted key points are identified from the unlabeled gesture sample graph of the current gesture recognition model, RGB images for displaying the preset number of predicted key points are output, so that a training person of the gesture recognition model can observe and analyze conveniently, the validity of the unlabeled gesture sample graph on the current gesture recognition model can be analyzed subjectively by the observer conveniently, and the training situation of the current gesture recognition model can be judged subjectively by the observer conveniently.

Example 3

Referring to fig. 3, another hand gesture recognition model training method is shown in this embodiment, which includes the following steps:

s101: and performing preliminary training on the hand gesture recognition model by using a hand gesture sample training set marked with a preset number of marked key points.

As shown in fig. 4, the preset number of hand gesture samples includes one of 14, 16, and 21. In this embodiment, the preset number is preferably set to 21 to adapt to the change of the hand gesture, so as to fully embody the details of the hand gesture, so that the hand gesture recognition model obtains better gesture recognition capability. Illustratively, each hand gesture sample in the hand gesture sample training set has been pre-labeled with 21 labeled keypoints.

S201: and identifying a preset number of predicted key points from each unlabeled hand gesture sample graph by using the preliminarily trained hand gesture identification model, and labeling the corresponding hand gesture sample graph by using the preset number of predicted key points.

It can be understood that when the hand gesture recognition model performs hand gesture recognition on a certain picture, each hand key point of the picture needs to be recognized first, and the corresponding hand gesture is determined by using the coordinates of each hand key point. Therefore, after the preliminary training of the hand gesture sample training set marked with 21 marked key points, the hand gesture recognition model has certain capability, 21 predicted key points can be recognized from the unlabeled hand gesture sample graph, and the hand gesture sample graph is marked by using the 21 predicted key points.

S301: and after the preset number of predicted key points are identified, outputting the RGB images of the hand displaying the preset number of predicted key points.

After the current hand gesture recognition model recognizes 21 prediction key points from the unlabeled hand gesture sample graph, the hand RGB images of the 21 prediction key points are output and displayed, so that a training person of the hand gesture recognition model can observe and analyze conveniently, the validity of the unlabeled hand gesture sample graph on the current hand gesture recognition model can be analyzed subjectively by the observer conveniently, and the training situation of the current hand gesture recognition model can be judged subjectively by the observer conveniently.

S401: and respectively calculating the confidence coefficient between each hand gesture sample graph marked with a preset number of prediction key points and each marked hand gesture sample in the hand gesture sample training set.

S501: and determining the maximum confidence from the confidence corresponding to the hand gesture sample graph marked with the preset number of the prediction key points.

Further, the maximum confidence level may be determined using the following formula:

T _max representing the maximum confidence level, A _m (i) Representing coordinates of an ith marked key point of an mth hand gesture sample in the hand gesture sample training set, B (i) representing coordinates of an ith predicted key point of the kth hand gesture sample graph marked with a preset number of predicted key points, n representing the preset number, E (a) _m (i) B (i)) represents the euclidean distance between the ith marker key point coordinate and the ith predicted key point coordinate.

S601: and when the maximum confidence coefficient is larger than a preset confidence coefficient threshold value, adding the kth hand gesture sample graph marked with a preset number of prediction key points to the hand gesture sample training set, wherein K is smaller than or equal to K, and K represents the total number of unlabeled hand gesture sample graphs.

In this embodiment, preferably, the confidence threshold is set to 0.1, when the maximum confidence is greater than 0.1, the average value of the distances between the coordinates of the 21 predicted key points representing the hand gesture sample map and the coordinates of the 21 marked key points of the marked hand gesture sample most similar to the hand gesture sample training set is smaller than 10 pixels, and it is understood that the smaller the average value of the distances between the 21 predicted key points and the 21 marked key points is, the more similar the hand gesture sample map representing the marked preset number of predicted key points is to the marked hand gesture sample of one of the hand gesture sample training set, the greater the confidence between the hand gesture sample map marked with the preset number of predicted key points and the marked hand gesture sample of one of the hand gesture sample training set is.

Further, training the hand gesture recognition model by using the updated hand gesture sample training set until the loss value of the hand gesture recognition model is smaller than a preset loss threshold.

Further, the loss value may be calculated using the following formula:

loss (x) represents a Loss value, x represents an absolute value of a difference between a predictive vector and a target vector, the target vector is composed of coordinates marked with a preset number of marked key points, the predictive vector is composed of coordinates of a preset number of predictive key points output by the hand gesture recognition model, the predictive vector corresponds to the coordinates in the target vector one by one, w and epsilon are preset constants, and C=w-wln (1+w/epsilon).

The above-mentioned loss function should appear as a logarithmic function with offset for small errors, i.e., |x| is smaller than w, and as |x| -C for large errors, |x| is greater than or equal to w. Further, in this embodiment, preferably, w=10, and ε=2.

Further, training the hand gesture recognition model by using the updated hand gesture sample training set until the loss value of the hand gesture recognition model is smaller than a preset loss threshold, including the following steps:

s711: and continuing to train the hand gesture recognition model by using the updated hand gesture sample training set.

S721: and judging whether the loss value of the hand gesture recognition model is smaller than a preset loss threshold value.

S731: and if the hand gesture recognition model is smaller than the preset loss threshold value, training the hand gesture recognition model to reach the standard, and finishing the training.

And if the loss value is greater than or equal to the preset loss threshold value, repeating the step S711 and the step S721 until the loss value of the hand gesture recognition model is smaller than the preset loss threshold value.

According to the hand gesture recognition model training method disclosed by the embodiment, the hand gesture recognition model which is subjected to preliminary training is utilized to label the unlabeled hand gesture sample graph, a new hand gesture training sample is continuously supplemented for the hand gesture sample training set, and then the hand gesture recognition model is continuously and iteratively trained by utilizing the new labeled hand gesture sample, so that the time for manually labeling the hand gesture training sample can be effectively reduced, the waste of manpower resources is reduced, and the training period of the hand gesture recognition model is shortened.

Example 4

Referring to fig. 5, an attitude identification model training apparatus 1 according to the present embodiment includes: the model preliminary training module 10, the key point recognition module 20, the confidence calculation module 30, the maximum confidence determination module 40, the pose sample training set updating module 50 and the model iterative training module 60.

The model preliminary training module 10 is used for performing preliminary training on the gesture recognition model by using a gesture sample training set marked with a preset number of marked key points; the key point recognition module 20 is configured to recognize a preset number of predicted key points from each unlabeled gesture sample graph by using a gesture recognition model after preliminary training, and label a corresponding gesture sample graph by using the preset number of predicted key points; the confidence coefficient calculating module 30 is configured to calculate confidence coefficients between the gesture sample graph labeled with a preset number of predicted key points and the labeled gesture samples in the gesture sample training set; the maximum confidence determining module 40 is configured to determine a maximum confidence from the respective confidence corresponding to the kth gesture sample graph labeled with the preset number of predicted key points; the gesture sample training set updating module 50 is configured to add the kth gesture sample graph labeled with a preset number of predicted key points to the gesture sample training set when the maximum confidence is greater than a preset confidence threshold, where K is less than or equal to K, and K represents the total number of unlabeled gesture sample graphs; the model iterative training module 60 is configured to continuously train the gesture recognition model using the updated gesture sample training set until a loss value of the gesture recognition model is less than a preset loss threshold.

Further, the gesture recognition model training device further includes:

and the RGB image output module is used for outputting and displaying RGB images of the preset number of predicted key points after the preset number of predicted key points are identified.

Further, the gesture recognition model training device determines the maximum confidence coefficient by using the following formula:

T _max represents the maximum confidence, A _m (i) Representing the coordinates of the ith marked key point of the mth gesture sample in the gesture sample training set, B (i) representing the coordinates of the ith predicted key point of the gesture sample graph marked with a preset number of predicted key points, n representing the preset number, E (A) _m (i) B (i)) represents the euclidean distance between the ith marker key point coordinate and the ith predicted key point coordinate.

Further, the gesture recognition model training device is characterized in that the gesture is a hand gesture.

Further, the preset number includes one of 14, 16 and 21.

Further, the gesture recognition model training apparatus calculates the loss value using the following formula:

Further, continuing to train the gesture recognition model by using the gesture sample graph marked by the gesture recognition model; judging whether the loss value of the gesture recognition model is smaller than a preset loss threshold value or not; if the model is smaller than the preset loss threshold value, training the gesture recognition model to reach the standard, and finishing the training; if the loss value is greater than or equal to the preset loss threshold value, the model iterative training module 60 is used for iteratively training the gesture recognition model until the loss value of the gesture recognition model is smaller than the preset loss threshold value.

The gesture recognition model training apparatus 1 disclosed in this embodiment is configured to execute the gesture recognition model training method described in the foregoing embodiment through the cooperation of the model preliminary training module 10, the key point recognition module 20, the confidence coefficient calculation module 30, the maximum confidence coefficient determination module 40, the gesture sample training set updating module 50, and the model iterative training module 60, and the implementation and the beneficial effects related to the foregoing embodiment are also applicable in this embodiment, and are not described herein again.

It can be appreciated that the embodiment of the application relates to a gesture recognition model, which is obtained by training by using the gesture recognition model training method described in the embodiment of the application.

It can be understood that the embodiment of the application relates to a gesture recognition method, and gesture images are input into a gesture recognition model obtained by using the gesture recognition model training method disclosed by the embodiment of the application to perform gesture recognition.

It will be appreciated that the terminal device according to the embodiment of the present application includes a memory and a processor, where the memory is configured to store a computer program, and the computer program executes the gesture recognition method according to the embodiment of the present application when running on the processor.

It will be appreciated that a readable storage medium storing a computer program which when run on a processor performs the gesture recognition method according to the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules or units in various embodiments of the application may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims

1. A training method of a gesture recognition model is characterized by comprising the following steps:

the maximum confidence level is determined using the following formula:

，T _max the maximum confidence level is indicated as such,A _m (i)an ith sample representing an mth sample in the training set of gesture samplesThe coordinates of the key points are marked,B(i)representing the coordinates of the ith predicted key point of the gesture sample graph marked with a preset number of predicted key points, n represents the preset number,E(A _m (i)，B (i))representing the Euclidean distance between the ith mark key point coordinate and the ith predicted key point coordinate;

2. The gesture recognition model training method of claim 1, further comprising:

3. The method of claim 1, wherein the gesture is a hand gesture.

4. A gesture recognition model training method according to claim 3, characterized in that the preset number comprises one of 14, 16 and 21.

5. A method of training a gesture recognition model according to claim 3, wherein the loss value is calculated using the formula:

，Loss(x)representing the loss value,/->Representing the absolute value of the difference between a predictive vector and a target vector, wherein the target vector consists of coordinates marked with a preset number of marked key points, the predictive vector consists of coordinates of a preset number of predictive key points output by the gesture recognition model, the predictive vector corresponds to the coordinates in the target vector one by one,wandεin order to set the constant value of the preset value,C=w-wln(1+w/ε)。

6. a gesture recognition model training apparatus, the apparatus comprising:

the maximum confidence determining module is used for determining the maximum confidence by using the following formula:

，T _max the maximum confidence level is indicated as such,A _m (i)representing coordinates of an ith marker key point of an mth gesture sample in the gesture sample training set,B(i)representing the coordinates of the ith predicted key point of the gesture sample graph marked with a preset number of predicted key points, n represents the preset number,E(A _m (i)，B (i))representing the Euclidean distance between the ith mark key point coordinate and the ith predicted key point coordinate;

7. A gesture recognition method characterized in that a gesture picture is input into a gesture recognition model obtained by the gesture recognition model training method according to any one of claims 1 to 5 for gesture recognition.

8. A terminal device comprising a memory and a processor, the memory for storing a computer program that, when run on the processor, performs the gesture recognition model training method of any one of claims 1 to 5 or performs the gesture recognition method of claim 7.

9. A readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the gesture recognition model training method of any one of claims 1 to 5 or performs the gesture recognition method of claim 7.