CN111832468A

CN111832468A - Gesture recognition method and device based on biological recognition, computer equipment and medium

Info

Publication number: CN111832468A
Application number: CN202010659074.1A
Authority: CN
Inventors: 付佐毅; 何敏聪; 冯颖龙; 周宸; 陈远旭
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-10-27
Anticipated expiration: 2040-07-09
Also published as: WO2021120834A1; CN111832468B

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a gesture recognition method based on biological recognition, which comprises the following steps: acquiring an image to be identified; inputting the image to be recognized into a target detection network to obtain a hand characteristic image; determining two-dimensional joint points in the hand characteristic image to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points; correcting the two-dimensional joint points in the thermodynamic diagrams through a three-dimensional correction model to obtain a two-dimensional joint point topological graph; and carrying out graph convolution on the two-dimensional joint point topological graph to obtain the gesture category in the image to be recognized. The application also provides a gesture recognition device based on biological recognition, computer equipment and a storage medium. In addition, the present application also relates to blockchain techniques, and the gesture classes may be stored in blockchains. The method and the device improve the accuracy of gesture recognition.

Description

Gesture recognition method and device based on biological recognition, computer equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a gesture recognition method and apparatus based on biometric identification, a computer device, and a storage medium.

Background

With the development of artificial intelligence, gesture recognition has more and more extensive application in fields such as family's amusement, intelligent driving and intelligent wearing. Gesture recognition involves live body detection in biometric recognition, and recognizes a gesture by acquiring an image of a hand feature so as to perform a next operation according to the meaning of the gesture, for example, triggering a corresponding instruction according to the gesture.

The accuracy of the recognition has great significance on gesture recognition. The traditional gesture recognition technology usually obtains a hand picture and performs recognition through a visual algorithm or a neural network. In practical application, gesture gestures are complex and changeable, for example, the fingers can be shielded and wound by themselves, however, no matter the gesture gestures are visual algorithms or neural networks, the gesture gestures cannot be effectively changed, and therefore the accuracy of gesture recognition is low.

Disclosure of Invention

An embodiment of the application aims to provide a gesture recognition method and device based on biometric recognition, a computer device and a storage medium, so as to solve the problem of low gesture recognition accuracy.

In order to solve the above technical problem, an embodiment of the present application provides a gesture recognition method based on biometric identification, which adopts the following technical solutions:

acquiring an image to be identified;

inputting the image to be recognized into a target detection network to obtain a hand characteristic image;

determining two-dimensional joint points in the hand characteristic image to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points;

correcting the two-dimensional joint points in the thermodynamic diagrams through a three-dimensional correction model to obtain a two-dimensional joint point topological graph;

and carrying out graph convolution on the two-dimensional joint point topological graph to obtain the gesture category in the image to be recognized.

Further, the step of inputting the image to be recognized into the target detection network to obtain the hand feature image further includes:

acquiring a real hand data set and a virtual hand data set;

performing first training on an initial target detection network according to the virtual hand data set;

and performing second training on the initial target detection network after the first training according to the real hand data set to obtain a target monitoring network.

Further, the step of determining two-dimensional joint points in the hand feature image to obtain a plurality of thermodynamic diagrams labeled with the two-dimensional joint points specifically includes:

inputting the hand feature images into a joint point extraction network to obtain a plurality of thermodynamic diagrams;

respectively determining a pixel point with the maximum heat force value in the plurality of thermodynamic diagrams;

and marking the determined pixel points as two-dimensional joint points to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points.

Further, the step of inputting the hand feature image into a joint point extraction network to obtain a plurality of thermodynamic diagrams further includes:

acquiring a joint point extraction data set;

inputting the hand image in the joint point extraction data set into an initial joint point extraction network to obtain a predictive thermodynamic diagram;

determining a prediction error from the predictive thermodynamic diagram and an annotated thermodynamic diagram in the joint extraction dataset;

and adjusting the initial joint point extraction network according to the prediction error until the prediction error meets the training stopping condition to obtain the joint point extraction network.

Further, the step of correcting the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model to obtain the two-dimensional joint point topological graph specifically includes:

inputting the thermodynamic diagrams into a three-dimensional correction model, correcting two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model, and obtaining space geometric parameters of the hand recorded by each two-dimensional joint point;

calculating three-dimensional joint points corresponding to the two-dimensional joint points according to the space geometric parameters;

projecting the obtained three-dimensional joint points to obtain corrected two-dimensional joint points;

and generating a two-dimensional joint point topological graph corresponding to the corrected two-dimensional joint point.

Further, before the step of correcting the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model to obtain the two-dimensional joint point topological graph, the method further includes:

acquiring a joint point correction data set;

extracting two-dimensional joint points, and space geometric parameters and label data corresponding to the extracted two-dimensional joint points from the joint point correction data set;

and training an initial three-dimensional correction model according to the extracted two-dimensional joint points, the space geometric parameters and the label data to obtain a three-dimensional correction model.

Further, the step of training the initial three-dimensional correction model according to the extracted two-dimensional joint points, the space geometric parameters and the label data to obtain the three-dimensional correction model specifically includes:

inputting the extracted two-dimensional joint points into the initial three-dimensional correction model to obtain space geometric prediction parameters;

determining a prediction error according to the space geometric prediction parameters and the space geometric parameters;

determining whether the hand recorded by the extracted two-dimensional joint points has abnormal pose or not according to the label data;

when the pose is abnormal, acquiring a correction factor;

and adjusting the initial three-dimensional correction model according to the correction factor and the prediction error until the prediction error meets the training stopping condition to obtain the three-dimensional correction model.

In order to solve the above technical problem, an embodiment of the present application further provides a gesture recognition apparatus based on biometric identification, which adopts the following technical solutions:

the image acquisition module is used for acquiring an image to be identified;

the hand detection module is used for inputting the image to be recognized into a target detection network to obtain a hand characteristic image;

the joint marking module is used for determining two-dimensional joint points in the hand characteristic image to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points;

the joint correction module is used for correcting the two-dimensional joint points in the thermodynamic diagrams through a three-dimensional correction model to obtain a two-dimensional joint point topological graph;

and the joint convolution module is used for carrying out graph convolution on the two-dimensional joint point topological graph to obtain the gesture category in the image to be recognized.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the gesture recognition method based on biometric recognition when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the gesture recognition method based on biometric recognition described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: after the image to be recognized is obtained, firstly, a hand characteristic image is obtained according to the image to be recognized; preliminarily obtaining two-dimensional joint points in the hand characteristic image through a thermodynamic diagram, and marking the two-dimensional joint points in the thermodynamic diagram; inputting the two-dimensional joint points into a three-dimensional correction model, wherein the three-dimensional correction model can carry out three-dimensional constraint and correction on the two-dimensional joint points, so that the gesture recognition accuracy is improved, and a two-dimensional joint point topological graph is obtained according to the corrected two-dimensional joint points; when the two-dimensional joint point topological graph is subjected to graph convolution, the topological relation among the nodes is utilized, and the accuracy of gesture recognition is further guaranteed.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a biometric-based gesture recognition method according to the present application;

FIG. 3 is a schematic diagram of a two-dimensional joint topology in one embodiment;

FIG. 4 is a flowchart of one embodiment of step S203 in FIG. 2;

FIG. 5 is a flowchart of one embodiment of step S204 of FIG. 2;

FIG. 6 is a schematic diagram of an embodiment of a biometric-based gesture-recognition apparatus according to the present application;

FIG. 7 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the gesture recognition method based on biometric recognition provided in the embodiments of the present application is generally executed by a server, and accordingly, the gesture recognition apparatus based on biometric recognition is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of a biometric-based gesture recognition method in accordance with the present application is shown. The gesture recognition method based on the biological recognition comprises the following steps:

step S201, an image to be recognized is acquired.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the gesture recognition method based on biometric recognition is operated may communicate with the terminal device through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The image to be recognized may be an image for gesture recognition.

Specifically, the terminal collects an image to be identified and sends the image to be identified to the server. An application or a page in the terminal supports a gesture recognition function, and a user inputs an image to be recognized according to an indication operation terminal.

The server can also read the stored image from the database as the image to be identified.

Step S202, inputting the image to be recognized into a target detection network to obtain a hand characteristic image.

The target detection network may be a network for detecting a hand feature in an image to be recognized.

Specifically, after the server acquires the image to be recognized, whether hand features exist in the image to be recognized is detected. The server can input the image to be recognized into the trained target detection network. The target detection network is used for identifying the hand features in the image to be identified and intercepting the hand features in the image to be identified to obtain a hand feature image.

In one embodiment, the hand features may be human hand features, and may also be hand-like features having the same structure as human hand features, such as toy hands, doll hands, and the like.

In one embodiment, the target detection network is designed as a lightweight network, and is improved based on an SSD network (SSD, which is a kind of target detection algorithm). Specifically, the VGG network in the SSD network may be replaced with a MobileNet v2 network. The VGG network is a deep convolutional neural network, a large number of convolutional layers are used, and the number of convolutional cores of each convolutional layer is large, so that the VGG network is large in calculated amount and high in requirement on calculation resources. The MobileNet v2 network is a lightweight convolutional neural network, and a large number of deep separable convolutions are used, so that the calculated amount is smaller, and the operation speed is higher. The MobileNet V2 network is used in the target detection network, so that the calculation speed of the server can be improved.

Step S203, determining two-dimensional joint points in the hand feature image to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points.

The two-dimensional joint points can be joint points in the hand feature image and have two-dimensional coordinate information.

Specifically, the server inputs the hand feature image into a joint point extraction network, and the joint point extraction network can identify joint points in the hand feature image. The joint point extraction network generates a plurality of thermodynamic diagrams, wherein colors of different pixel points in the thermodynamic diagrams can be different, but the designated colors represent the pixel points forming the two-dimensional joint points. The thermodynamic diagram is that probability calculation is carried out on each pixel point, the probability that each pixel point belongs to a certain two-dimensional joint point is calculated, and the pixel point is selected as the two-dimensional joint point according to the probability. The joint point extraction network can generate a plurality of thermodynamic diagrams by extracting a plurality of two-dimensional joint points, and each thermodynamic diagram corresponds to one two-dimensional joint point.

In an embodiment, the server divides the hand feature image into image regions of the same size, each image region is composed of a plurality of pixel points (for example, 2 × 2 or 3 × 3 pixel points, which are not described herein), the probability that the image region belongs to a two-dimensional joint point is calculated by taking the image region as a unit, and a plurality of thermodynamic diagrams marked with the two-dimensional joint point are generated.

In one embodiment, the joint point extraction network may be a Stacked HourglassNetworks (Stacked Hourglass networks) that is commonly used to identify keypoints in two-dimensional gesture recognition.

In one embodiment, the joint extraction network may extract 21 two-dimensional joint points from the hand feature image.

And S204, correcting the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model to obtain a two-dimensional joint point topological graph.

The three-dimensional correction model may be a model for correcting a two-dimensional joint point.

Specifically, after obtaining a plurality of thermodynamic diagrams marked with two-dimensional joint points, the server inputs the thermodynamic diagrams into the three-dimensional correction model. The three-dimensional correction model can carry out three-dimensional constraint and correction on the two-dimensional joint points, so that the three-dimensional joint points corresponding to the two-dimensional joint points are distributed more reasonably in space, and then the two-dimensional joint points are obtained again from the three-dimensional joint points, and the correction of the two-dimensional joint points is completed.

And the server adds the corrected two-dimensional joint points to the initial topological graph according to the positions of the corrected two-dimensional joint points and connects the corrected two-dimensional joint points to obtain a two-dimensional joint point topological graph.

Fig. 3 is a schematic diagram of a two-dimensional joint point topology in one embodiment, specifically, referring to fig. 3, a joint point extraction network extracts 21 joint points from a hand feature image and labels the extracted joint points, and in order to display the correspondence between the joint points and the hand, a hand feature image is added to fig. 3.

And S205, performing graph convolution on the two-dimensional joint point topological graph to obtain the gesture category in the image to be recognized.

Specifically, each joint point of the hand has a direct geometric relationship, and in order to improve the accuracy of gesture recognition, a two-dimensional joint point topological graph can be subjected to graph convolution operation, so that the topological relationship among the joint points is utilized.

In graph convolution, each two-dimensional joint is treated as a vertex, and the calculation range of each vertex is the set of the vertex itself and the adjacent vertex. For example, referring to fig. 3, in calculating vertex 17, the vertices involved in the calculation include vertex 17, vertex 0, vertex 13, and vertex 18; in computing vertex 14, the vertices involved in the computation include vertex 14, vertex 13, and vertex 15. Vertices and the lines between vertices are edges in the topology graph.

In one embodiment, the computational formula for graph convolution is:

wherein, V_(i)The method is an edge set of vertexes, and each vertex is different in edge related to each vertex in calculation due to the fact that adjacent vertexes are different. n is_iIs the number of vertices involved in the computation of the vertices.

Is the vertex value of each vertex, the graph convolution network outputs a set of values based on the input

I(p(x_i) Is a collection of vertices involved in the computation of a vertex.

The weights of the edges are training parameters in the graph convolution network, occur when the graph convolution network is initialized, and are continuously updated in the calculation.

The graph convolution network is connected with the softmax layer, the softmax layer calculates a plurality of probabilities, each probability corresponds to one gesture category, and the gesture category corresponding to the maximum probability is selected as the gesture category in the image to be recognized.

It is emphasized that, in order to further ensure the privacy and security of the recognized gesture categories, the gesture categories may also be stored in nodes of a blockchain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In the embodiment, after the image to be identified is obtained, the hand feature image is obtained according to the image to be identified; preliminarily obtaining two-dimensional joint points in the hand characteristic image through a thermodynamic diagram, and marking the two-dimensional joint points in the thermodynamic diagram; inputting the two-dimensional joint points into a three-dimensional correction model, wherein the three-dimensional correction model can carry out three-dimensional constraint and correction on the two-dimensional joint points, so that the gesture recognition accuracy is improved, and a two-dimensional joint point topological graph is obtained according to the corrected two-dimensional joint points; when the two-dimensional joint point topological graph is subjected to graph convolution, the topological relation among the nodes is utilized, and the accuracy of gesture recognition is further guaranteed.

In one embodiment, before step S202, the method may further include: acquiring a real hand data set and a virtual hand data set; performing first training on the initial target detection network according to the virtual hand data set; and performing second training on the initial target detection network after the first training according to the real hand data set to obtain a target monitoring network.

Wherein the real hand dataset may be a dataset based on human hand acquisition; the virtual hand dataset may be a dataset based on virtual hand synthesis, e.g. a virtual hand dataset obtained by building hand features and acquiring relevant data by three-dimensional modeling.

The real hand data set and the virtual hand data set each include RGB images of the hand (images based on RGB color patterns, which are obtained by changing three color channels of Red (Red), Green (Green), and Blue (Blue) and superimposing them with each other to obtain various colors), two-dimensional joint points, hand feature labeling data, and the like.

In one embodiment, the real hand dataset may be an HMKA dataset (handles with Manual keys advertisements) and the virtual hand dataset may be an HSD dataset (handles from synthetic data).

The initial target detection network may be a model for which target detection training has not been completed.

Specifically, the server needs to obtain the target detection network through training. The server obtains a real hand dataset and a virtual hand dataset for target detection training. The server firstly performs first training on the initial target detection network according to the virtual hand data set, and after the first training is completed, the initial target detection network has hand detection capability; and secondly, performing second training on the initial target detection network after the first training is completed according to the real hand data set so as to improve the hand detection capability of the model in the real environment, and obtaining the target detection network after the second training is completed. The first training may be a pre-training of the second training.

In the embodiment, the initial target detection network is trained according to the virtual hand data set to have the hand characteristic detection capability, and then the detection capability of the initial target detection network on the real environment is strengthened according to the real hand data set, so that the detection accuracy of the trained target detection network is ensured.

Further, as shown in fig. 4, the step S203 may include:

step S2031, inputting the hand feature images into a joint point extraction network to obtain a plurality of thermodynamic diagrams.

Wherein the joint extraction network may be a network for identifying two-dimensional joints.

Specifically, the server inputs the hand feature images into a joint point extraction network, and the joint point extraction network does not directly predict the positions of two-dimensional joint points, but generates a thermodynamic diagram of the two-dimensional joint points so as to avoid difficulty in convergence of training or low accuracy after the training is completed. The server may generate a plurality of thermodynamic diagrams, each thermodynamic diagram corresponding to a two-dimensional joint.

Step S2032, determining pixel points with the maximum heat value in the thermodynamic diagrams respectively.

Specifically, a pixel point in the thermodynamic diagram has a thermal force value, and the magnitude of the thermal force value is in direct proportion to the probability that the two-dimensional joint point is located at the pixel point. And traversing each pixel point in each thermodynamic diagram by the server, comparing the thermodynamic value of each pixel point, and determining the pixel point with the maximum thermodynamic value.

And step S2033, marking the determined pixel points as two-dimensional joint points to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint points.

Specifically, for each thermodynamic diagram, the server marks the pixel point with the maximum heat value as the two-dimensional joint point in the thermodynamic diagram, so as to obtain a plurality of thermodynamic diagrams marked with the two-dimensional joint point.

In the embodiment, a thermodynamic diagram of the hand feature image is generated, the thermodynamic value of the pixel point in the thermodynamic diagram represents the probability that the pixel point belongs to the two-dimensional joint point, and the pixel point with the maximum thermodynamic value is selected from the thermodynamic diagram to serve as the two-dimensional joint point, so that the accuracy of joint point identification is guaranteed.

Further, before step S2031, the method further comprises: acquiring a joint point extraction data set; inputting hand images in the joint point extraction data set into an initial joint point extraction network to obtain a prediction thermodynamic diagram; determining a prediction error according to the prediction thermodynamic diagram and a labeled thermodynamic diagram in the joint extraction data set; and adjusting the initial joint point extraction network according to the prediction error until the prediction error meets the training stopping condition to obtain the joint point extraction network.

Wherein the joint extraction dataset may be a dataset used to train an initial joint extraction network. The joint extraction dataset may include a hand image, and an annotated thermodynamic diagram corresponding to the hand image. In one embodiment, the joint extraction dataset may be an RHD (rendered Portable dataset) dataset.

The initial joint extraction network may be a joint extraction network that has not been trained. The predictive thermodynamic diagram may be a thermodynamic diagram predicted by an initial joint extraction network on a two-dimensional joint. The annotated thermodynamic diagram may be a pre-annotated thermodynamic diagram.

Specifically, the node extraction network is obtained through training. The server acquires a joint point extraction data set, inputs hand images in the joint point extraction data set into an initial joint point extraction network, and the initial joint point extraction network identifies two-dimensional joint points in the hand images to obtain a prediction thermodynamic diagram.

The server extracts a labeling thermodynamic diagram corresponding to the hand image from the joint extraction dataset, uses the labeling thermodynamic diagram as labeling data in training, and calculates a prediction error based on the labeling thermodynamic diagram and the prediction thermodynamic diagram.

And the server adjusts the parameters in the initial joint extraction network by taking the prediction error reduction as a target, continues training after the parameters are adjusted every time, and stops training when the prediction error meets the training stopping condition to obtain the joint extraction network. Wherein the training stop condition may be that the prediction error is smaller than a preset error threshold.

In one embodiment, the prediction error is calculated as:

wherein, Y_HMFor predictive thermodynamic diagrams, G (Y)_2D) Label thermodynamic diagram. H. W is the thermodynamic diagram size, namely the output layer characteristic diagram size, and is a set of hyper-parameters determined by the initial joint extraction network during design. In calculating the prediction error, the prediction thermodynamic diagram and the annotation thermodynamic diagram need to be compared point by point, i.e. summed in the H/W dimension.

In this embodiment, when the initial joint point extraction network is trained according to the joint point extraction data set, a prediction error is calculated according to the output prediction thermodynamic diagram and the output labeling thermodynamic diagram, and the initial joint point extraction network is adjusted according to the prediction error until the prediction error meets the training stop condition, so that the network can accurately identify the joint point when the training is finished.

Further, as shown in fig. 5, the step S204 may include:

step S2041, inputting the thermodynamic diagrams into a three-dimensional correction model, correcting two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model, and obtaining the space geometric parameters of the hand recorded by the two-dimensional joint points.

The spatial geometrical parameters may be parameters describing hand shape characteristics and hand surfaces, among others.

The hand recorded by the hand characteristic image is in a complex real environment, and pose abnormality may exist, such as: the phenomena of self-shielding of hands, finger winding, eversion of fingers, non-coplanarity of joint points in palms and the like can influence gesture recognition due to complex gesture shapes. The three-dimensional correction model takes the space geometric characteristics of the two-dimensional joint points into consideration during calculation, and can correct the influence of complex gesture shapes on gesture recognition.

Specifically, the server inputs the thermodynamic diagram labeled with the two-dimensional joint points into a three-dimensional correction model, the three-dimensional correction model calculates the space geometric characteristics of the hand recorded by the hand characteristic image according to the two-dimensional joint points, three-dimensional constraint is carried out on the two-dimensional joint points in the calculation process, correction of the two-dimensional joint points is achieved, and space geometric parameters are obtained after calculation is finished.

The space geometric parameters can be composed of a shape parameter and a position parameter, wherein the shape parameter describes hand shape characteristics, such as information of finger length, finger size, palm thickness and the like; the pos parameter describes hand surface information, such as deformation of the hand surface.

In one embodiment, the three-dimensional modification model may be a MANO model that may output joint points for the hand based on shape and position parameters. Before the MANO model is used, training is needed to enable the MANO model to output shape and position parameters according to the thermodynamic diagram.

In one embodiment, the thermodynamic diagram output by the joint extraction network is input into a two-dimensional to three-dimensional projection network, and then the shape and the position parameters are output by the MANO layer after the modification and the calculation of the MANO model.

In one embodiment, after the server identifies the two-dimensional joint points according to the thermodynamic diagram, the two-dimensional joint points can be labeled in the hand feature image, and the hand feature image labeled with the two-dimensional joint points is input into the three-dimensional correction model. The three-dimensional correction model is converted into a plurality of thermodynamic diagrams according to the hand characteristic images, and then the thermodynamic diagrams are input into the three-dimensional correction model.

Step S2042, calculating three-dimensional joint points corresponding to the two-dimensional joint points through the space geometric parameters.

Specifically, the three-dimensional correction model can map out three-dimensional joint points of the hand according to the space geometric parameters and build a three-dimensional hand mesh. Therefore, the three-dimensional correction model can respectively calculate each three-dimensional joint point according to the space geometric parameters, and the correspondence from the two-dimensional joint point to the three-dimensional joint point in the thermodynamic diagram is realized.

And step S2043, projecting the obtained three-dimensional joint points to obtain the corrected two-dimensional joint points.

Specifically, after the three-dimensional correction model obtains the three-dimensional joint points, the three-dimensional joint points are projected according to a three-dimensional to two-dimensional projection formula to obtain a new set of two-dimensional joint points, and the two-dimensional joint points obtained by projection are the corrected two-dimensional joint points.

Step S2044, a two-dimensional joint point topological graph corresponding to the corrected two-dimensional joint point is generated.

Specifically, the modified two-dimensional joint points are ordered and have a fixed connection relationship. And the server adds the corrected two-dimensional joint points to the initial topological graph and connects the two-dimensional joint points according to a preset fixed connection relation to obtain a two-dimensional joint point topological graph.

In the embodiment, the space geometric parameters of the thermodynamic diagram are obtained through the three-dimensional correction model, and three-dimensional constraint and correction are performed on the two-dimensional joint points in the process, so that the influence of complex gestures in a real environment on recognition is reduced; and mapping the three-dimensional joint points corresponding to the two-dimensional joint points according to the spatial geometric parameters, and then projecting the three-dimensional joint points to obtain the corrected two-dimensional joint points, wherein the corrected two-dimensional joint points have more accurate position information, so that a more accurate two-dimensional joint point topological graph is generated, and the accuracy of gesture recognition through the two-dimensional joint point topological graph is ensured.

Further, before the step 204, the method further includes: acquiring a joint point correction data set; extracting two-dimensional joint points, and space geometric parameters and label data corresponding to the extracted two-dimensional joint points from the joint point correction data set; and training an initial three-dimensional correction model according to the extracted two-dimensional joint points, the space geometric parameters and the label data to obtain a three-dimensional correction model.

Wherein the joint point revision data set may be a data set used to train an initial three-dimensional revision model; the joint point correction dataset can be a Freihand dataset, and the Freihand dataset can record information such as two-dimensional joint points, three-dimensional joint points, space geometric parameters of hands, label data and the like. The tag data may identify whether a pose anomaly exists for the hand, or may be a three-dimensional model of the hand.

Specifically, the server acquires a joint point correction data set, extracts a two-dimensional joint point, a space geometric parameter corresponding to the two-dimensional joint point and label data from the joint point correction data set, takes the two-dimensional joint point as the input of an initial three-dimensional correction model, takes the space geometric parameter as expected output, and trains the initial three-dimensional correction model according to the label data. The three-dimensional correction model obtained after training can calculate the space geometric parameters according to the two-dimensional joint points.

In this embodiment, the two-dimensional joint points in the joint point correction data set are used as input, the spatial geometric parameters are used as expected output, and the initial three-dimensional correction model is trained according to the label data, so that the three-dimensional correction model after training can accurately calculate the spatial geometric parameters of the hand according to the two-dimensional joint points.

Further, the step of training the initial three-dimensional correction model according to the extracted two-dimensional joint points, the space geometric parameters and the label data to obtain the three-dimensional correction model specifically comprises: inputting the extracted two-dimensional joint points into an initial three-dimensional correction model to obtain space geometric prediction parameters; determining a prediction error according to the space geometric prediction parameters and the space geometric parameters; determining whether the pose of the hand recorded by the extracted two-dimensional joint points is abnormal or not according to the label data; when the pose is abnormal, acquiring a correction factor; and adjusting the initial three-dimensional correction model according to the correction factor and the prediction error until the prediction error meets the training stopping condition to obtain the three-dimensional correction model.

The space geometric prediction parameters can be parameters which are obtained by predicting the initial three-dimensional correction model according to the two-dimensional joint points and mark the space geometric characteristics of the hand.

The pose abnormity can comprise the phenomena of self-shielding of the hand, finger winding, finger eversion, non-coplanarity of joint points in the palm and the like, and can also comprise the phenomena of abnormal joint point positions, inconsistency of hand contours and label data and the like.

Specifically, the server calculates the two-dimensional joint point through the initial three-dimensional correction model to obtain a space geometric prediction parameter, and determines a prediction error according to the space geometric prediction parameter and the space geometric parameter.

Meanwhile, whether the hand recorded by the two-dimensional joint points is abnormal in pose is determined through the label data. When the pose is abnormal, the pose abnormality can generate negative influence on gesture recognition, and the capability of predicting the space geometric parameters of the initial three-dimensional correction model under the condition needs to be strengthened, so that a preset correction factor needs to be applied to the initial three-dimensional correction model, the correction factor is equivalent to excitation, the initial three-dimensional correction model is forced to correct the two-dimensional joint points more reasonably, and the more reasonable space geometric prediction parameters are calculated.

And adjusting the model parameters in the initial three-dimensional correction model under the action of the correction factor and the prediction error until the prediction error meets the training stopping condition to obtain the three-dimensional correction model.

The trained three-dimensional correction model can accurately calculate the space geometric parameters according to the two-dimensional joint points, the process of calculating the space geometric parameters is also a process of correcting the two-dimensional joint points, the initial three-dimensional correction model obtains the capability of predicting the space geometric parameters through training, the three-dimensional correction model obtained after the training can overcome the pose abnormity, and the accuracy of gesture recognition is improved.

In one embodiment, there may be only one correction factor, or there may be a plurality of different correction factors. When only one correction factor exists and the existing posture is abnormal, the only correction factor is directly obtained to correct the initial three-dimensional correction model; when a plurality of correction factors exist and the pose abnormality is determined to exist, an abnormality evaluation value can be calculated according to the space geometric parameters, and the abnormality evaluation value represents the chaos degree of the pose. Different abnormality evaluation values correspond to correction factors of different magnitudes. And the server selects a correction factor corresponding to the abnormal evaluation value to correct the training of the initial three-dimensional correction model.

In one embodiment, after training is finished, when gesture recognition is performed by applying the three-dimensional correction model, after the three-dimensional correction model calculates the space geometric parameters, the abnormality evaluation value is calculated according to the space geometric parameters, and the abnormality evaluation value and the final gesture recognition result are stored together. The three-dimensional correction model can also compare the abnormal evaluation value with an abnormal threshold value, if the abnormal evaluation value is larger than a preset abnormal threshold value, the processing of gesture recognition is stopped, the image is displayed abnormally through the terminal, and the terminal is reminded to acquire the image to be recognized again so as to perform gesture recognition again.

In the embodiment, when the gesture abnormality of the hand is determined in the training process, the additional correction factor is obtained to correct the initial three-dimensional correction model, and the correction factor and the prediction error act on the initial three-dimensional correction model simultaneously, so that the initial three-dimensional correction model can predict more reasonable space geometric prediction parameters, and meanwhile, two-dimensional joint points can be corrected more reasonably, and the gesture recognition accuracy is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 6, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a gesture recognition apparatus based on biometric recognition, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the gesture recognition apparatus 300 based on biometric recognition according to the present embodiment includes: an image acquisition module 301, a hand detection module 302, a joint labeling module 303, a joint modification module 304 and a joint convolution module 305. Wherein:

an image obtaining module 301, configured to obtain an image to be identified.

And the hand detection module 302 is configured to input the image to be recognized into the target detection network, so as to obtain a hand feature image.

And the joint labeling module 303 is configured to determine two-dimensional joint points in the hand feature image, and obtain a plurality of thermodynamic diagrams labeled with the two-dimensional joint points.

And the joint correction module 304 is used for correcting the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model to obtain a two-dimensional joint point topological graph.

And the joint convolution module 305 is configured to perform graph convolution on the two-dimensional joint point topological graph to obtain a gesture category in the image to be recognized.

In some optional implementations of the present embodiment, the gesture recognition apparatus 300 based on biometric recognition further includes: the device comprises a data set acquisition module, a first training module and a second training module. Wherein:

and the data set acquisition module is used for acquiring the real hand data set and the virtual hand data set.

And the first training module is used for carrying out first training on the initial target detection network according to the virtual hand data set.

And the second training module is used for carrying out second training on the initial target detection network after the first training is finished according to the real hand data set to obtain the target monitoring network.

In some optional implementations of the present embodiment, the hand detection module 302 includes: the image input sub-module, the pixel determination sub-module and the pixel labeling sub-module. Wherein:

and the image input submodule inputs the hand characteristic images into the joint point extraction network to obtain a plurality of thermodynamic diagrams.

And the pixel determination submodule is used for respectively determining the pixel point with the maximum heat force value in the plurality of thermodynamic diagrams.

And the pixel labeling submodule is used for labeling the determined pixel points as two-dimensional joint points to obtain a plurality of thermodynamic diagrams labeled with the two-dimensional joint points.

In some optional implementations of this embodiment, the hand detection module 302 further includes: the device comprises an acquisition submodule, an input submodule, a determination submodule and an adjustment submodule. Wherein:

and the acquisition submodule is used for acquiring the joint point extraction data set.

And the input submodule is used for inputting the hand images in the joint point extraction data set into the initial joint point extraction network to obtain the prediction thermodynamic diagram.

And the determining submodule is used for determining a prediction error according to the prediction thermodynamic diagram and the labeled thermodynamic diagram in the joint point extraction data set.

And the adjusting submodule is used for adjusting the initial joint point extraction network according to the prediction error until the prediction error meets the training stopping condition, so that the joint point extraction network is obtained.

In some optional implementations of the present embodiment, the joint modification module 304 further includes: the device comprises a parameter acquisition submodule, a three-dimensional determination submodule, a three-dimensional projection submodule and a topology generation submodule. Wherein:

and the parameter acquisition submodule is used for inputting the thermodynamic diagrams into the three-dimensional correction model so as to correct the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional correction model and obtain the space geometric parameters of the hand recorded by each two-dimensional joint point.

And the three-dimensional determining submodule is used for calculating the three-dimensional joint points corresponding to the two-dimensional joint points through the space geometric parameters.

And the three-dimensional projection submodule is used for projecting the obtained three-dimensional joint points to obtain the corrected two-dimensional joint points.

And the topology generation submodule is used for generating a two-dimensional joint point topological graph corresponding to the corrected two-dimensional joint point.

In some optional implementations of the present embodiment, the above gesture recognition apparatus 300 based on biometric recognition further includes: the device comprises a correction acquisition module, a data set extraction module and a model training module. Wherein:

and the correction acquisition module is used for acquiring the joint point correction data set.

And the data set extraction module is used for extracting the two-dimensional joint points, the space geometric parameters corresponding to the extracted two-dimensional joint points and the label data from the joint point correction data set.

And the model training module is used for training the initial three-dimensional correction model according to the extracted two-dimensional joint points, the space geometric parameters and the label data to obtain the three-dimensional correction model.

In some optional implementations of this embodiment, the model training module includes: the system comprises a two-dimensional input submodule, an error determination submodule, a pose determination submodule, a factor acquisition submodule and a model adjustment submodule. Wherein:

and the two-dimensional input submodule is used for inputting the extracted two-dimensional joint points into the initial three-dimensional correction model to obtain space geometric prediction parameters.

And the error determining submodule is used for determining the prediction error according to the space geometric prediction parameters and the space geometric parameters.

And the pose determining submodule is used for determining whether the pose of the hand recorded by the extracted two-dimensional joint points is abnormal or not according to the label data.

And the factor acquisition submodule is used for acquiring the correction factor when the existing posture is abnormal.

And the model adjusting submodule is used for adjusting the initial three-dimensional correction model according to the correction factor and the prediction error until the prediction error meets the training stopping condition to obtain the three-dimensional correction model.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 7, fig. 7 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed on the computer device 4 and various types of application software, such as computer readable instructions of a gesture recognition method based on biometric recognition. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the gesture recognition method based on biometric recognition.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment may perform the steps of the gesture recognition method based on biometric recognition. Here, the steps of the gesture recognition method based on biometric recognition may be the steps of the gesture recognition method based on biometric recognition in the above embodiments.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the biometric-based gesture recognition method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A gesture recognition method based on biological recognition is characterized by comprising the following steps:

acquiring an image to be identified;

2. The gesture recognition method based on biometric identification according to claim 1, wherein the step of inputting the image to be recognized into the target detection network to obtain the hand feature image further comprises:

acquiring a real hand data set and a virtual hand data set;

3. The method according to claim 1, wherein the step of determining two-dimensional joint points in the hand feature image to obtain a plurality of thermodynamic diagrams labeled with the two-dimensional joint points comprises:

4. The method of claim 3, wherein the step of inputting the hand feature image to a joint point extraction network to obtain a plurality of thermodynamic diagrams further comprises:

acquiring a joint point extraction data set;

5. The gesture recognition method based on biometric identification according to claim 1, wherein the step of modifying the two-dimensional joint points in the thermodynamic diagrams through the three-dimensional modification model to obtain the two-dimensional joint point topological graph specifically comprises:

6. The method of claim 1, wherein the step of modifying the two-dimensional joint points in the thermodynamic diagrams by the three-dimensional modification model to obtain a two-dimensional joint point topology map further comprises:

acquiring a joint point correction data set;

7. The gesture recognition method based on biometric identification according to claim 6, wherein the step of training the initial three-dimensional modified model according to the extracted two-dimensional joint points, the spatial geometric parameters and the label data to obtain the three-dimensional modified model specifically comprises:

when the pose is abnormal, acquiring a correction factor;

8. A gesture recognition apparatus based on biometric recognition, comprising:

the image acquisition module is used for acquiring an image to be identified;

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the biometric-based gesture recognition method according to any one of claims 1 to 7.

10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the biometric based gesture recognition method according to any one of claims 1 to 7.