CN111126339A

CN111126339A - Gesture recognition method and device, computer equipment and storage medium

Info

Publication number: CN111126339A
Application number: CN201911413447.0A
Authority: CN
Inventors: 赵突
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-08

Abstract

The application relates to a gesture recognition method, a gesture recognition device, computer equipment and a storage medium. The method comprises the following steps: acquiring an image containing at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models. The trained key point detection model is adopted to detect the key points, the trained gesture recognition model is adopted to recognize the gestures, the key point detection and the gesture recognition are separately executed, and the development period of the model is shortened.

Description

Gesture recognition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a gesture recognition method and apparatus, a computer device, and a storage medium.

Background

Gesture recognition is applied to a plurality of fields in daily life, such as mobile terminal short videos. The existing gesture recognition mainly adopts an SSD model, the SSD model is adopted for gesture recognition, the gesture recognition refers to that a rectangular box area of a gesture is regressed while classification is carried out, the recognition of the mode mainly depends on labeling of a data set, independent labeling needs to be carried out on each category, when an independent gesture is added, data needs to be collected again and labeled, the whole model is retrained, and therefore the algorithm development period is too long, and product iteration is slow.

Disclosure of Invention

In order to solve the technical problem, the application provides a gesture recognition method, a gesture recognition device, a computer device and a storage medium.

In a first aspect, the present application provides a gesture recognition method, including:

acquiring an image containing at least one gesture feature;

inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic;

calculating the position relation among the key points according to the key point position information corresponding to the gesture features;

and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

In a second aspect, the present application provides a gesture recognition apparatus, including:

the data acquisition module is used for acquiring an image containing at least one gesture feature;

the key point detection module is used for inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic;

the position relation calculation module is used for calculating the position relation among the key points according to the key point position information corresponding to the gesture features;

and the gesture recognition module is used for inputting the position information and the corresponding position relation of the key points of each gesture feature into the trained gesture recognition model and outputting the recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively obtained by training.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring an image containing at least one gesture feature;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring an image containing at least one gesture feature;

The gesture recognition method, the gesture recognition device, the computer equipment and the storage medium comprise the following steps: acquiring an image containing at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models. The trained key point detection model is adopted to detect the key points, the trained gesture recognition model is adopted to recognize the gestures, the key point detection and the gesture recognition are separately executed, and the development period of the model is shortened.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a diagram of an exemplary implementation of a gesture recognition method;

FIG. 2 is a flow diagram illustrating a method for gesture recognition in one embodiment;

FIG. 3 is a diagram illustrating a network architecture of a gesture recognition model in one embodiment;

FIG. 4 is a block diagram of a gesture recognition apparatus according to an embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a diagram of an embodiment of a gesture recognition system. Referring to fig. 1, the gesture recognition method is applied to a gesture recognition system. The gesture recognition system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The server 120 acquires an image which is uploaded by the terminal 110 and contains at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

The image acquisition, keypoint detection, position calculation, and gesture recognition described above may all be performed on terminal 110.

The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

As shown in FIG. 2, in one embodiment, a gesture recognition method is provided. The embodiment is mainly illustrated by applying the method to the terminal 110 (or the server 120) in fig. 1. Referring to fig. 2, the gesture recognition method specifically includes the following steps:

step S201, an image including at least one gesture feature is acquired.

In particular, gesture features refer to features that describe gestures, with different gestures containing different gesture features. The image refers to an image captured by the image capturing device, wherein the image may contain one or more gesture features corresponding to gestures, wherein the gestures include common gestures, such as five-finger opening, scissor hand, fist and heart, and the like. The gesture can be a gesture which can be completed by one hand, a gesture which can be completed by two hands, a gesture which can be completed by multiple people, and the like.

Step S202, inputting images to the trained key point detection model to obtain the position information of the key points of each gesture feature.

Specifically, the trained keypoint detection model is a model obtained by training a large number of images labeled with keypoints. The input data of the model is an image, and the output result is the position information of the key point. Wherein the position information of the key points refers to the position information of the key points in the image. Inputting the images carrying the gesture features into the trained key point detection model, extracting and screening the features of the images according to the parameters in the trained key point detection model, and outputting each key point and corresponding position information. The key points refer to key points corresponding to the hand, and the key points of the hand may be defined according to joint points, for example, each joint point corresponds to one key point, or on the basis of defining the joint points, fingertips are defined as key points.

In one embodiment, when multiple gestures exist in the image, the correspondence of each gesture and keypoint is saved.

Step S203, calculating the position relation among the key points according to the key point position information corresponding to the gesture features.

Specifically, the positional relationship between the key points may be described using one or more geometric relationships among distance, angle, direction, and the like. If the distance and the direction form a vector, the distance vector of two key points can be obtained, wherein the angle can be the vector included angle formed by any two keys and the origin under a specific coordinate system. And calculating to obtain the position relation between the key points corresponding to each gesture feature, namely the position relation between any key point of each gesture feature and other key points corresponding to the gesture feature. If each gesture feature contains 21 key points, acquiring any key point of the gesture, and calculating the position relation between the key point and other 20 key points of the gesture feature.

Step S204, inputting the position information of the key points of each gesture feature and the corresponding position relation to the trained gesture recognition model, and outputting the recognition result corresponding to each gesture feature.

In this embodiment, the trained gesture recognition model and the trained keypoint detection model are respectively trained models.

Specifically, the trained gesture recognition model is obtained by training the position information and the position relation of the key corresponding to the gesture category and the carried gesture category information. The gesture recognition model extracts features through position information of key points of each input gesture feature and corresponding position relations to obtain current gesture features, calculates similarity between the current gesture features and gesture features stored in a trained gesture recognition model, determines gesture features matched with the current gesture features according to the similarity, outputs gesture categories corresponding to the current gesture features according to gesture categories corresponding to the matched gesture features, and obtains recognition results corresponding to the gesture features.

The gesture recognition method comprises the following steps: acquiring an image containing at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models. The key point detection and the gesture recognition are respectively executed by adopting the key point detection model and the gesture recognition model which are respectively trained, the gesture recognition is divided into two parts for adjustment and training, the model is convenient to adjust and train, the development period of the model is reduced under the condition of ensuring the recognition accuracy, and therefore the whole development efficiency is improved. And the training of the key point detection model and the training of the gesture recognition model are not influenced mutually, after one model is trained, if the other model needs to be adjusted, the model needing to be adjusted is adjusted, and the trained model is not required to be adjusted. If all steps are realized by a single model, the whole model needs to be adjusted when the model needs to be adjusted, so that the algorithm development period is increased, and the product iteration period is reduced. And the adjustment is carried out on part of the model, so that the algorithm development period can be reduced, and the product iteration period is accelerated.

In one embodiment, the gesture recognition method further includes: and generating a trained key point detection model. Wherein generating a trained keypoint detection model comprises:

in step S301, a plurality of first training images are acquired.

In this embodiment, the first training image carries annotation information, and the annotation information includes standard position information of the key point.

Step S302, inputting a first training image to the key point detection model, and outputting the predicted position information of each key point.

Step S303, calculating a loss value of the key point detection model according to the difference between the predicted position information of each key point and the corresponding standard position information to obtain a first loss value.

Step S304, when the first loss value is within the first preset loss value interval, obtaining a trained key point detection model.

Specifically, the first training image is a training image used for training the keypoint detection model, and the first training image carries annotation information, where the annotation information includes position information of each keypoint in the image, that is, standard position information of the keypoint. The labeling information may be manually labeled or automatically labeled by machine. The keypoint detection model refers to a deep learning model that has not been trained yet. And performing key point detection on the first training image by adopting the model to obtain each key point and the predicted position information corresponding to each key point. Calculating the difference between the standard position information of each key point and the corresponding predicted position information, wherein the difference can be directly represented by a difference value, a ratio, a flat difference and the like, calculating the loss value of the model according to the difference to obtain a first loss value, and the loss value can be the difference value, an exponential function or a logarithmic function of the difference value and the like. The first preset loss value interval is a critical interval for measuring whether the key point detection model converges, and the preset loss value interval may be an empirical interval or an accurate interval calculated according to requirements. And when the first loss value is positioned in the first preset loss value interval, the key point detection model is represented to be converged, and the trained key point detection model is obtained.

In one embodiment, when the first loss value is not within the first preset loss value interval, updating parameters of the key point detection model according to the first loss value to obtain an intermediate key point detection model, and inputting the first training image to the intermediate key point detection model until the first loss value of the intermediate key point detection model is within the first preset loss value interval to obtain the trained key point detection model.

Specifically, the first loss value is not located in a first preset loss value interval, which indicates that the key point detection model is not converged, and the parameters of the model need to be continuously updated, the parameters of the key point detection model are updated according to the first loss value to obtain an intermediate key point detection model, a first training image is input to the intermediate key point detection model, key point detection is continuously executed by using the intermediate key point detection model to obtain predicted position information of each key point corresponding to the intermediate key point detection model, the difference between the predicted position information of each key point and corresponding standard position information is calculated, the first loss value of the intermediate key point detection model is calculated according to the difference, and the trained key point detection model is obtained until the first loss value of the intermediate key point detection model is located in the first preset loss value interval.

In one embodiment, the gesture recognition method further includes: generating the trained gesture recognition model, wherein generating the trained gesture recognition model comprises:

step S401, a plurality of first standard gestures and position information of corresponding key points are obtained.

Step S402, calculating the position relation of each first standard gesture according to the position information of the key point corresponding to each first standard gesture.

Step S403, inputting each first standard gesture and the position information and the position relationship of the corresponding key point to the first gesture recognition model, and outputting a predicted gesture corresponding to each first standard gesture.

Step S404, counting the recognition errors according to the first standard gestures and the corresponding predicted gestures, and calculating the loss value of the first gesture recognition model according to the recognition errors to obtain a second loss value.

And S405, when the second loss value is within a second preset loss value interval, obtaining the trained first gesture recognition model.

Step S406, determining a trained gesture recognition model according to the trained first gesture recognition model.

Specifically, the first standard gesture and the position information of the key point corresponding to the first standard gesture are training data used for training the first gesture recognition model, the first standard gesture includes a plurality of gesture types, and different types of gestures are identified by using different identification information. And calculating the position relation of key points corresponding to each gesture identifier, inputting each gesture identifier, the position information and the position relation of each corresponding key point into a first gesture recognition model, and performing gesture recognition through the first gesture recognition model to obtain a predicted gesture corresponding to each gesture identifier. Calculating the difference between the identification gesture corresponding to the gesture identification and the corresponding predicted gesture, determining the recognition probability according to the difference, calculating the loss value of the first gesture recognition model according to the recognition probability to obtain a second loss value, and when the second loss value is located in a second preset loss value interval, representing that the first gesture recognition model converges to obtain the trained first gesture recognition model. And determining a trained gesture recognition model according to the trained first gesture recognition model, namely taking the trained first gesture recognition model as the trained gesture recognition model.

In one embodiment, when the second loss value is not within the second preset loss value interval, updating parameters of the first gesture recognition model according to the second loss value to obtain an intermediate gesture recognition model, and executing inputting the position information and the position relationship of each first standard gesture and the corresponding key point to the intermediate gesture recognition model until the second loss value of the intermediate gesture recognition model is within the second preset loss value interval to obtain the trained first gesture recognition model.

Specifically, when the second loss value is not located in the second preset loss value interval, it indicates that the first gesture recognition model is not converged, and the model parameters of the first gesture recognition model are updated to obtain the intermediate gesture recognition model. And updating the model parameters by adopting a parameter updating method of a common machine learning model. And (3) correlating the position information and the positions of each first standard gesture and the corresponding key points to an intermediate gesture recognition model, performing gesture recognition again by using the intermediate gesture recognition model to obtain a corresponding predicted gesture, calculating the difference between the predicted gesture and the standard gesture again, determining the loss value of the model again according to the difference, namely a second loss value, and obtaining the trained first gesture recognition model when the second loss value of the intermediate gesture recognition model is located in a second preset loss value interval.

In one embodiment, the first trained gesture recognition model includes an input layer, a hidden layer, and an output layer, the input layer is connected to the hidden layer, the hidden layer is connected to the output layer, and the output of the output layer has a first number of output categories, and the generating the trained gesture recognition model further includes:

step S501, acquiring the number of the categories to be increased, and calculating the sum of the first number and the number of the categories to be increased to obtain a second number.

Step S502, the network structure of the trained first gesture recognition model is adjusted to obtain a second gesture recognition model.

In this embodiment, the number of categories output by the output layer of the second gesture recognition model is the second number.

Step S503, obtaining the classification training data corresponding to the gesture category to be added.

In this particular embodiment, the classification training data includes a plurality of second standard gestures and location information of corresponding keypoints.

Step S504, calculating the position relation of each second standard gesture according to the position information of the key point corresponding to each second standard gesture.

And step S505, taking a gesture set formed by the first standard gesture and the second standard gesture as a standard gesture set.

Step S506, performing input of the position information and the position relationship between each gesture in each standard gesture set and the corresponding key point to the second gesture recognition model, until the loss value of the second gesture recognition model is located in the third preset loss interval, to obtain the trained second gesture recognition model.

Step S406 in this embodiment includes: and taking the trained first gesture recognition model or the trained second gesture recognition model as the trained gesture recognition model.

Specifically, the first number refers to the number of categories that can be output by the output layer of the trained first gesture recognition model. The number of the categories to be increased is the number of the categories to be increased in preparation, the sum of the first number and the number of the categories to be increased is calculated to obtain a second number, and the second number is the number of the categories which can be output by the output layer of the first gesture recognition model after the adjustment training. And adjusting the network structure of the output layer of the trained first gesture recognition model according to the first number and the second number to obtain a second gesture recognition model, wherein the number of the categories output by the output layer of the second gesture recognition model is the second number. The classification training data corresponding to the gesture category to be added refer to the second standard gestures and the position information of the corresponding key points, and the position relation of each second standard gesture is calculated according to the position information of the key points corresponding to each second standard gesture. And combining the gestures formed by the second standard gesture and the first standard gesture to form a standard gesture set, inputting each gesture in the standard gesture set, corresponding position information and position relation into a second gesture recognition model, outputting a predicted gesture of the second gesture recognition model, calculating a loss value of the second gesture recognition model according to the difference between the predicted gesture and the standard gesture of the second gesture recognition model to obtain a third loss value, and obtaining the trained second gesture recognition model when the third loss value is located in a third preset loss interval. And adding new categories, labeling a small number of pictures, re-recognizing the model according to the key point characteristics and the category labels, and not modifying the key point detection model, so that the development period is shortened, and the product iteration period is shortened.

In one embodiment, step S203 includes: and constructing a key point vector of each finger key point and wrist key point corresponding to each gesture feature to obtain a key point vector of each gesture feature, calculating a vector included angle of the key point vector of each gesture feature, and expressing the position relation of the key points of each gesture feature by adopting the vector included angle of the key point vector of each gesture feature.

Specifically, each gesture feature includes 21 key points, and the 21 key points include a wrist key point and 4 finger key points corresponding to each finger head. And constructing a vector between each finger key point and each wrist key point to obtain a vector corresponding to each gesture key point, namely a key point vector, and calculating a vector included angle of any two finger key point vectors in the same gesture, for example, calculating an included angle of the two vectors by adopting a cosine law to obtain a cosine value of the two vectors. The wrist key points are used as reference points of all the finger key points, and the position relationship between the wrist key points and the finger key points can better describe gesture information. If the five fingers are opened, the vectors formed by the key points of the fingers and the key points of the wrist of the same finger are basically superposed.

In a particular embodiment, a method of generating trained gesture recognition comprises:

a gesture recognition model training stage:

and collecting a plurality of pictures of various gestures, and manually marking coordinates and corresponding categories of the gesture key points. The number of the gesture key points is 21, each finger has 4 points, and the wrist has 1 point.

After the labeling is completed, features are extracted according to the coordinates of the gesture key points.

The features are composed of two parts, one is the coordinates of the key points, and the other is the positional relationship between the key points.

The key point coordinates are x and y coordinate values of 21 key points, and a 42-dimensional feature vector is formed in total.

The positional relationship between the gesture keypoints is measured using cosine angles. With the key points at the wrist as the origin, then, of the 20 key points in the finger area, two key points were arbitrarily selected. The two key points and the wrist origin form two vectors, and the cosine included angle of the two vectors is calculated.

Assuming that the coordinates at the wrist are (x0, y0), and the coordinates of the two key points of the fingers are (x1, y1) and (x2, y2), the calculation formula (1) of the included angle is as follows:

the method comprises the steps of obtaining 190 cosine angles by calculating the position relation among 20 key points except the original point of the wrist, obtaining 190-dimensional features, obtaining 42-dimensional features formed by coordinates of 21 key points, namely 42 coordinate values, and inputting a feature set formed by the 190-dimensional features and the 42-dimensional features, namely 232-dimensional features into a three-layer neural network (gesture recognition model), wherein the network structure diagram of the gesture recognition model is shown in figure 3. The input to the network is a 232-dimensional feature and the output is the number of categories. The hidden layer in the middle uses 200 hidden nodes. And during training, training by using a cross entropy loss function to obtain a trained gesture recognition model.

And (3) updating the model:

and when a new category needs to be added to the model, marking the picture of the new category, and retraining the gesture recognition model according to the key point characteristics and the category label.

The use stage is as follows:

and acquiring the coordinates of key points of the gesture, calculating the characteristics according to the coordinates, and inputting the characteristics into the three-layer network model acquired in the training stage to obtain classification information. Wherein the keypoint coordinates of the gesture can be derived from trained keypoint detection model inputs.

And classifying the gesture categories from the gesture key points by comprehensively adopting a deep learning method. The method comprises the steps of firstly detecting key points of a gesture area to obtain gesture key points, and then classifying results through a classified deep learning network. The new gesture labeling process is avoided, the whole model is trained again, the complexity of gesture classification is greatly simplified, and the product development of the gesture classification task is accelerated.

FIG. 2 is a flowchart illustrating a gesture recognition method according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a gesture recognition apparatus 200, comprising:

a data obtaining module 201, configured to obtain an image including at least one gesture feature.

And the key point detection module 202 is configured to input an image to the trained key point detection model to obtain position information of the key points of each gesture feature.

And the position relation calculating module 203 is configured to calculate a position relation between the key points according to the key point position information corresponding to each gesture feature.

The gesture recognition module 204 is configured to input the position information and the corresponding position relationship of the key point of each gesture feature to the trained gesture recognition model, and output a recognition result corresponding to each gesture feature, where the trained gesture recognition model and the trained key point detection model are models obtained through respective training.

In one embodiment, the gesture recognition model includes:

a detection model generation module for generating a trained keypoint detection model, wherein the model generation module comprises:

the first data acquisition unit is used for acquiring a plurality of first training images, wherein the first training images carry annotation information, and the annotation information comprises standard position information of key points.

And the detection unit is used for inputting the first training image to the key point detection model and outputting the predicted position information of each key point.

And the first loss value calculation unit is used for calculating the loss value of the key point detection model according to the difference between the predicted position information of each key point and the corresponding standard position information to obtain a first loss value.

And the detection model generation unit is used for obtaining the trained key point detection model when the first loss value is positioned in the first preset loss value interval.

In an embodiment, the detection model generating unit is further configured to, when the first loss value is not within the first preset loss value interval, update parameters of the key point detection model according to the first loss value to obtain an intermediate key point detection model, perform input of the first training image to the intermediate key point detection model until the first loss value of the intermediate key point detection model is within the first preset loss value interval, and obtain the trained key point detection model.

In one embodiment, the gesture recognition module 200 further includes:

a recognition model generation module for generating a trained gesture recognition model, wherein the recognition model generation module comprises:

and the second data acquisition unit is used for acquiring a plurality of first standard gestures and the position information of the corresponding key points.

And the position calculating unit is used for calculating the position relation of each first standard gesture according to the position information of the key point corresponding to each first standard gesture.

And the gesture prediction unit is used for inputting each first standard gesture and the position information and the position relation of the corresponding key point to the first gesture recognition model and outputting the predicted gesture corresponding to each first standard gesture.

And the second loss value calculation unit is used for counting the recognition errors according to the first standard gestures and the corresponding predicted gestures, and calculating the loss value of the first gesture recognition model according to the recognition errors to obtain a second loss value.

And the gesture model determining unit is used for obtaining the trained first gesture recognition model when the second loss value is located in the second preset loss value interval.

And the trained gesture model determining unit is used for determining a trained gesture recognition model according to the trained first gesture recognition model.

In an embodiment, the gesture model determining unit is further configured to, when the second loss value is not located in the second preset loss value interval, update parameters of the first gesture recognition model according to the second loss value to obtain an intermediate gesture recognition model, perform input of the position information and the position relationship of each first standard gesture and the corresponding key point to the intermediate gesture recognition model, and obtain the trained first gesture recognition model until the second loss value of the intermediate gesture recognition model is located in the second preset loss value interval.

In an embodiment, the gesture recognition model generating module further includes:

the first gesture recognition model comprises an input layer, a hidden layer and an output layer, wherein the input layer is connected with the hidden layer, the hidden layer is connected with the output layer, and the output type of the output layer is the first number.

And the structure adjusting unit is used for adjusting the network structure of the trained first gesture recognition model to obtain a second gesture recognition model, and the number of the categories output by the output layer of the second gesture recognition model is a second number.

The second data acquisition module is further configured to acquire classification training data corresponding to the gesture category to be added, where the classification training data includes a plurality of second standard gestures and position information of corresponding key points.

And the position relation calculation unit is used for calculating the position relation of each second standard gesture according to the position information of the key point corresponding to each second standard gesture.

And the data merging unit is used for taking a gesture set formed by the first standard gesture and the second standard gesture as a standard gesture set.

And the model training unit is used for executing inputting of the position information and the position relation of each gesture and the corresponding key point in each standard gesture set to the second gesture recognition model until the loss value of the second gesture recognition model is located in a third preset loss interval, and obtaining the trained second gesture recognition model.

The trained gesture model determining unit is further used for taking the trained first gesture recognition model or the trained second gesture recognition model as the trained gesture recognition model.

In an embodiment, the position relationship calculation module 203 is specifically configured to construct a key point vector of each finger key point and wrist key point corresponding to each gesture feature, obtain a key point vector of each gesture feature, calculate a vector included angle of the key point vector of each gesture feature, and represent a position relationship of the key points of each gesture feature by using the vector included angle of the key point vector of each gesture feature, where each gesture feature includes 21 key points, and each 21 key point includes a wrist key point and 4 finger key points corresponding to each finger head.

FIG. 5 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 (or the server 120) in fig. 1. As shown in fig. 5, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected via a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the gesture recognition method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a gesture recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the gesture recognition apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 5. The memory of the computer device may store various program modules constituting the gesture recognition apparatus, such as the data acquisition module 201, the key point detection module 202, the position relation calculation module 203, and the gesture recognition module 204 shown in fig. 4. The program modules constitute computer programs that cause the processor to execute the steps in the gesture recognition methods of the embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 5 may perform the acquiring of the image containing at least one gesture feature by the data acquisition module 201 in the gesture recognition apparatus shown in fig. 4. The keypoint detection module 202 performs image input to the trained keypoint detection model to obtain the position information of the keypoints of each gesture feature. The computer device may calculate the position relationship between the key points according to the position information of the key points corresponding to the gesture features through the position relationship calculation module 203. The computer device may input the position information and the corresponding position relationship of the key point of each gesture feature to the trained gesture recognition model through the gesture recognition module 204, and output a recognition result corresponding to each gesture feature, where the trained gesture recognition model and the trained key point detection model are models obtained through respective training.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: acquiring an image containing at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

In one embodiment, the processor, when executing the computer program, further performs the steps of: generating a trained keypoint detection model, comprising: acquiring a plurality of first training images, wherein the first training images carry marking information, and the marking information comprises standard position information of key points; inputting a first training image to the key point detection model, and outputting the predicted position information of each key point; calculating a loss value of the key point detection model according to the difference between the predicted position information of each key point and the corresponding standard position information to obtain a first loss value; and when the first loss value is positioned in the first preset loss value interval, obtaining the trained key point detection model.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and when the first loss value is not in the first preset loss value interval, updating parameters of the key point detection model according to the first loss value to obtain an intermediate key point detection model, and inputting a first training image to the intermediate key point detection model until the first loss value of the intermediate key point detection model is in the first preset loss value interval to obtain the trained key point detection model.

In one embodiment, generating a trained gesture recognition model includes: acquiring a plurality of first standard gestures and position information of corresponding key points; calculating the position relation of each first standard gesture according to the position information of the key point corresponding to each first standard gesture; inputting the position information and the position relation of each first standard gesture and the corresponding key point to a first gesture recognition model, and outputting a predicted gesture corresponding to each first standard gesture; counting recognition errors according to the first standard gestures and the corresponding predicted gestures, and calculating a loss value of the first gesture recognition model according to the recognition errors to obtain a second loss value; when the second loss value is located in a second preset loss value interval, obtaining a trained first gesture recognition model; and determining the trained gesture recognition model according to the trained first gesture recognition model.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and when the second loss value is not in the second preset loss value interval, updating parameters of the first gesture recognition model according to the second loss value to obtain an intermediate gesture recognition model, and executing input of position information and position relations of each first standard gesture and the corresponding key point to the intermediate gesture recognition model until the second loss value of the intermediate gesture recognition model is in the second preset loss value interval to obtain the trained first gesture recognition model.

In one embodiment, the first gesture recognition model after training comprises an input layer, a hidden layer and an output layer, the input layer is connected with the hidden layer, the hidden layer is connected with the output layer, the output of the output layer has a first number of output categories, and the processor further implements the following steps when executing the computer program: acquiring the number of categories to be increased, and calculating the sum of the first number and the number of categories to be increased to obtain a second number; adjusting the network structure of the trained first gesture recognition model to obtain a second gesture recognition model, wherein the number of categories output by an output layer of the second gesture recognition model is a second number; acquiring classification training data corresponding to the gesture categories to be added, wherein the classification training data comprises a plurality of second standard gestures and position information of corresponding key points; calculating the position relation of each second standard gesture according to the position information of the key point corresponding to each second standard gesture; taking a gesture set formed by the first standard gesture and the second standard gesture as a standard gesture set; inputting each gesture in each standard gesture set and the position information and the position relation of the corresponding key point to the second gesture recognition model, and obtaining a trained second gesture recognition model when the loss value of the second gesture recognition model is located in a third preset loss interval; determining a trained gesture recognition model according to the trained first gesture recognition model, comprising: and taking the trained first gesture recognition model or the trained second gesture recognition model as the trained gesture recognition model.

In one embodiment, each gesture feature includes 21 key points, each of the 21 key points includes a wrist key point and 4 finger key points corresponding to each finger, and the calculating of the position relationship between the key points according to the key point position information corresponding to each gesture feature includes: and constructing a key point vector of each finger key point and wrist key point corresponding to each gesture feature to obtain a key point vector of each gesture feature, calculating a vector included angle of the key point vector of each gesture feature, and expressing the position relation of the key points of each gesture feature by adopting the vector included angle of the key point vector of each gesture feature.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring an image containing at least one gesture feature; inputting images to the trained key point detection model to obtain the position information of the key points of each gesture characteristic; calculating the position relation among the key points according to the key point position information corresponding to the gesture features; and inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

In one embodiment, the computer program when executed by the processor further performs the steps of: generating a trained keypoint detection model, comprising: acquiring a plurality of first training images, wherein the first training images carry marking information, and the marking information comprises standard position information of key points; inputting a first training image to the key point detection model, and outputting the predicted position information of each key point; calculating a loss value of the key point detection model according to the difference between the predicted position information of each key point and the corresponding standard position information to obtain a first loss value; and when the first loss value is positioned in the first preset loss value interval, obtaining the trained key point detection model.

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the first loss value is not in the first preset loss value interval, updating parameters of the key point detection model according to the first loss value to obtain an intermediate key point detection model, and inputting a first training image to the intermediate key point detection model until the first loss value of the intermediate key point detection model is in the first preset loss value interval to obtain the trained key point detection model.

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the second loss value is not in the second preset loss value interval, updating parameters of the first gesture recognition model according to the second loss value to obtain an intermediate gesture recognition model, and executing input of position information and position relations of each first standard gesture and the corresponding key point to the intermediate gesture recognition model until the second loss value of the intermediate gesture recognition model is in the second preset loss value interval to obtain the trained first gesture recognition model.

In one embodiment, the trained first gesture recognition model comprises an input layer, a hidden layer and an output layer, the input layer being connected to the hidden layer, the hidden layer being connected to the output layer, the output of the output layer having a first number of categories, the computer program when executed by the processor further implementing the steps of: acquiring the number of categories to be increased, and calculating the sum of the first number and the number of categories to be increased to obtain a second number; adjusting the network structure of the trained first gesture recognition model to obtain a second gesture recognition model, wherein the number of categories output by an output layer of the second gesture recognition model is a second number; acquiring classification training data corresponding to the gesture categories to be added, wherein the classification training data comprises a plurality of second standard gestures and position information of corresponding key points; calculating the position relation of each second standard gesture according to the position information of the key point corresponding to each second standard gesture; taking a gesture set formed by the first standard gesture and the second standard gesture as a standard gesture set; inputting each gesture in each standard gesture set and the position information and the position relation of the corresponding key point to the second gesture recognition model, and obtaining a trained second gesture recognition model when the loss value of the second gesture recognition model is located in a third preset loss interval; determining a trained gesture recognition model according to the trained first gesture recognition model, comprising: and taking the trained first gesture recognition model or the trained second gesture recognition model as the trained gesture recognition model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of gesture recognition, the method comprising:

acquiring an image containing at least one gesture feature;

inputting the images to a trained key point detection model to obtain the position information of the key points of the gesture features;

inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model, and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

2. The method of claim 1, wherein generating the trained keypoint detection model comprises:

acquiring a plurality of first training images, wherein the first training images carry annotation information, and the annotation information comprises standard position information of key points;

inputting the first training image to a key point detection model, and outputting the predicted position information of each key point;

calculating a loss value of the key point detection model according to the difference between the predicted position information of each key point and the corresponding standard position information to obtain a first loss value;

and when the first loss value is located in a first preset loss value interval, obtaining the trained key point detection model.

3. The method of claim 2, further comprising:

and when the first loss value is not in the first preset loss value interval, updating parameters of the key point detection model according to the first loss value to obtain an intermediate key point detection model, and executing input of the first training image to the intermediate key point detection model until the first loss value of the intermediate key point detection model is in the first preset loss value interval to obtain the trained key point detection model.

4. The method of claim 1, wherein generating the trained gesture recognition model comprises:

acquiring a plurality of first standard gestures and position information of corresponding key points;

calculating the position relation of each first standard gesture according to the position information of the key point corresponding to each first standard gesture;

inputting the position information and the position relation of each first standard gesture and the corresponding key point to a first gesture recognition model, and outputting a predicted gesture corresponding to each first standard gesture;

counting recognition errors according to the first standard gestures and the corresponding predicted gestures, and calculating loss values of the first gesture recognition models according to the recognition errors to obtain second loss values;

when the second loss value is within a second preset loss value interval, obtaining the trained first gesture recognition model;

and determining the trained gesture recognition model according to the trained first gesture recognition model.

5. The method of claim 4, further comprising:

and when the second loss value is not in the second preset loss value interval, updating parameters of the first gesture recognition model according to the second loss value to obtain an intermediate gesture recognition model, and executing input of position information and position relation of each first standard gesture and corresponding key point to the intermediate gesture recognition model until the second loss value of the intermediate gesture recognition model is in the second preset loss value interval to obtain the trained first gesture recognition model.

6. The method of claim 4 or 5, wherein the trained first gesture recognition model comprises an input layer, a hidden layer, and an output layer, wherein the input layer is connected to the hidden layer, wherein the hidden layer is connected to the output layer, wherein the output of the output layer has a first number of output categories, and wherein the method further comprises:

acquiring the number of categories to be increased, and calculating the sum of the first number and the number of categories to be increased to obtain a second number;

adjusting the network structure of the trained first gesture recognition model to obtain a second gesture recognition model, wherein the category number output by an output layer of the second gesture recognition model is the second number;

obtaining classification training data corresponding to the gesture category to be added, wherein the classification training data comprises a plurality of second standard gestures and position information of corresponding key points;

calculating the position relation of each second standard gesture according to the position information of the key point corresponding to each second standard gesture;

taking a gesture set formed by the first standard gesture and the second standard gesture as a standard gesture set;

inputting each gesture in each standard gesture set and position information and position relation of the corresponding key point to the second gesture recognition model, and obtaining a trained second gesture recognition model when a loss value of the second gesture recognition model is located in a third preset loss interval;

the determining the trained gesture recognition model according to the trained first gesture recognition model comprises: and taking the trained first gesture recognition model or the trained second gesture recognition model as the trained gesture recognition model.

7. The method of claim 1, wherein each of the gesture features comprises 21 keypoints, the 21 keypoints comprising one wrist keypoint and 4 finger keypoints corresponding to each finger,

the calculating the position relationship among the key points according to the key point position information corresponding to the gesture features comprises the following steps: constructing a key point vector of each finger key point and the wrist key point corresponding to each gesture feature to obtain a key point vector of each gesture feature, calculating a vector included angle of the key point vector of each gesture feature, and expressing the position relation of each key point of each gesture feature by adopting the vector included angle of the key point vector of each gesture feature.

8. A gesture recognition apparatus, the apparatus comprising:

the key point detection module is used for inputting the images to the trained key point detection model to obtain the position information of the key points of the gesture features;

and the gesture recognition module is used for inputting the position information and the corresponding position relation of the key points of each gesture feature to a trained gesture recognition model and outputting a recognition result corresponding to each gesture feature, wherein the trained gesture recognition model and the trained key point detection model are respectively trained models.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.