WO2021120834A1 - 基于生物识别的手势识别方法、装置、计算机设备及介质 - Google Patents

基于生物识别的手势识别方法、装置、计算机设备及介质 Download PDF

Info

Publication number
WO2021120834A1
WO2021120834A1 PCT/CN2020/122833 CN2020122833W WO2021120834A1 WO 2021120834 A1 WO2021120834 A1 WO 2021120834A1 CN 2020122833 W CN2020122833 W CN 2020122833W WO 2021120834 A1 WO2021120834 A1 WO 2021120834A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
dimensional joint
joint points
joint point
correction model
Prior art date
Application number
PCT/CN2020/122833
Other languages
English (en)
French (fr)
Inventor
付佐毅
何敏聪
冯颖龙
周宸
陈远旭
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021120834A1 publication Critical patent/WO2021120834A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for gesture recognition based on biometrics.
  • Gesture recognition involves living body detection in biometric recognition. Gestures are recognized by acquiring images of hand features, so as to perform the next operation according to the meaning of the gestures, such as triggering corresponding instructions based on the gesture.
  • the accuracy of recognition is of great significance to gesture recognition.
  • the traditional gesture recognition technology usually obtains a picture of the hand and recognizes it through a visual algorithm or a neural network.
  • the inventor realizes that in practical applications, gestures are complex and changeable. For example, fingers will be self-occluded and entangled.
  • neither visual algorithms nor neural networks can effectively cope with changing gestures, making gesture recognition difficult.
  • the accuracy is low.
  • the purpose of the embodiments of the present application is to propose a biometric-based gesture recognition method, device, computer equipment, and storage medium to solve the problem of low accuracy of gesture recognition.
  • the embodiments of the present application provide a gesture recognition method based on biometrics, which adopts the following technical solutions:
  • embodiments of the present application also provide a gesture recognition device based on biometrics, which adopts the following technical solutions:
  • the image acquisition module is used to acquire the image to be recognized
  • the hand detection module is used to input the to-be-recognized image into the target detection network to obtain a feature image of the hand;
  • the joint marking module is used to determine the two-dimensional joint points in the hand feature image, and obtain several heat maps marked with the two-dimensional joint points;
  • the joint correction module is used to correct the two-dimensional joint points in the several heat maps through the three-dimensional correction model to obtain a two-dimensional joint point topology map;
  • the joint convolution module is used to perform graph convolution on the two-dimensional joint point topology map to obtain the gesture category in the image to be recognized.
  • an embodiment of the present application further provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the following steps when executing the computer-readable instructions:
  • embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
  • the embodiments of the present application mainly have the following beneficial effects: after acquiring the image to be recognized, the hand feature image is first obtained according to the image to be recognized; the two-dimensional joint points in the hand feature image are initially obtained through the heat map, And mark the 2D joint points in the heat map; then input the 2D joint points into the 3D correction model, the 3D correction model can constrain and correct the 2D joint points in 3D, thereby improving the accuracy of gesture recognition, and according to the correction
  • the latter two-dimensional joint points obtain a two-dimensional joint point topology map; when the two-dimensional joint point topology map is graph convolved, the topological relationship between nodes is used to further ensure the accuracy of gesture recognition.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is a flowchart of an embodiment of a gesture recognition method based on biometrics according to the present application
  • Fig. 3 is a schematic diagram of a two-dimensional joint point topology diagram in an embodiment
  • FIG. 4 is a flowchart of a specific implementation of step S203 in FIG. 2;
  • FIG. 5 is a flowchart of a specific implementation of step S204 in FIG. 2;
  • Fig. 6 is a schematic structural diagram of an embodiment of a gesture recognition device based on biometrics according to the present application.
  • Fig. 7 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens and support for web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
  • laptop portable computers and desktop computers etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • biometric-based gesture recognition method provided by the embodiments of the present application is generally executed by a server, and accordingly, the biometric-based gesture recognition device is generally set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • the biometric-based gesture recognition method includes the following steps:
  • Step S201 Obtain an image to be recognized.
  • the electronic device (such as the server shown in FIG. 1) on which the gesture recognition method based on biometrics runs can communicate with the terminal device through a wired connection or a wireless connection.
  • the above-mentioned wireless connection methods can include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
  • the image to be recognized may be an image used for gesture recognition.
  • the terminal collects the image to be recognized, and sends the image to be recognized to the server.
  • the application or page in the terminal supports the gesture recognition function, and the user operates the terminal to enter the image to be recognized according to the instructions.
  • the server can also read the stored image from the database as the image to be recognized.
  • Step S202 Input the image to be recognized into the target detection network to obtain a feature image of the hand.
  • the target detection network may be a network for detecting hand features in the image to be recognized.
  • the server after the server obtains the image to be recognized, it first detects whether there are hand features in the image to be recognized.
  • the server can input the image to be recognized into the trained target detection network.
  • the target detection network is used to identify the hand features in the image to be recognized, and to intercept the hand features in the image to be recognized to obtain the hand feature image.
  • the hand feature may be a human hand feature, or a hand-like feature having the same structure as a human hand feature, such as a toy hand, a doll hand, and the like.
  • the target detection network is designed as a lightweight network, which is improved based on the SSD network (SSD, full name Single Shot MultiBox Detector, which is a target detection algorithm).
  • the VGG network in the SSD network can be replaced with the MobileNet v2 network.
  • the VGG network is a deep convolutional neural network that uses a large number of convolutional layers, and the number of convolution kernels in each convolutional layer is large, resulting in a large amount of calculation of the VGG network and high requirements for computing resources.
  • the MobileNet v2 network is a lightweight convolutional neural network that uses a large number of deep separable convolutions, with smaller calculations and faster calculation speeds. Using the MobileNet V2 network in the target detection network can increase the computing speed of the server.
  • Step S203 Determine the two-dimensional joint points in the hand feature image, and obtain several heat maps marked with the two-dimensional joint points.
  • the two-dimensional joint point may be the joint point in the hand feature image, which has two-dimensional coordinate information.
  • the server inputs the hand feature image into the joint point extraction network, and the joint point extraction network can identify the joint points in the hand feature image.
  • the node extraction network generates several heat maps. The colors of different pixels in the heat map can be different, but the pixels that constitute the two-dimensional joint points are represented by the specified color.
  • the heat map is to calculate the probability of each pixel, calculate the probability that each pixel belongs to a certain two-dimensional joint point, and select the pixel as the two-dimensional joint point according to the probability.
  • the node extraction network can generate as many heat maps as many two-dimensional joint points need to be extracted, and each heat map corresponds to a two-dimensional joint point.
  • the server divides the hand feature image into image areas of the same size, and each image area is composed of several pixels (for example, 2 ⁇ 2 or 3 ⁇ 3 pixels, which will not be repeated here), Calculate the probability that the image area belongs to the two-dimensional joint point by taking the image area as the unit, and generate several heat maps marked with the two-dimensional joint point.
  • the joint point extraction network may be a stacked hourglass network (Stacked Hourglass Networks), and the stacked hourglass network is often used to identify key points in two-dimensional gesture recognition.
  • the joint point extraction network can extract 21 two-dimensional joint points from the hand feature image.
  • step S204 the two-dimensional joint points in a number of heat maps are corrected by the three-dimensional correction model to obtain a two-dimensional joint point topology map.
  • the three-dimensional correction model may be a model that corrects two-dimensional joint points.
  • the server inputs the heat maps into the three-dimensional correction model.
  • the three-dimensional correction model can perform three-dimensional constraints and corrections on the two-dimensional joint points, so that the three-dimensional joint points corresponding to the two-dimensional joint points are more reasonably distributed in space, and then the two-dimensional joint points are obtained from the three-dimensional joint points to complete the two-dimensional joint points. Correction.
  • the server adds the corrected two-dimensional joint points to the initial topology map according to the positions of the corrected two-dimensional joint points, and connects the corrected two-dimensional joint points to obtain a two-dimensional joint point topology map.
  • Fig. 3 is a schematic diagram of a two-dimensional joint point topology diagram in an embodiment. Specifically, referring to Fig. 3, the joint point extraction network has extracted 21 joint points from the hand feature image and labeled them, in order to display the joint points and the hand The corresponding relationship between Figure 3 is also attached to the hand feature image.
  • Step S205 Perform graph convolution on the two-dimensional joint point topology map to obtain the gesture category in the image to be recognized.
  • each joint point of the hand has a direct geometric relationship.
  • a graph convolution operation can be performed on the two-dimensional joint point topology map, so as to utilize the topological relationship between the joint points.
  • each two-dimensional joint point is regarded as a vertex, and the calculation range of each vertex is the set of the vertex itself and adjacent vertices.
  • the vertices participating in the calculation include vertex 17, vertex 0, vertex 13, and vertex 18.
  • the vertices participating in the calculation include vertex 14, vertex 13, and vertex 15.
  • the lines between vertices and vertices are edges in the topological graph.
  • calculation formula of graph convolution is:
  • V (i) is the edge set of the vertex, each vertex is different from the neighboring vertices, the edge involved in each vertex is also different during the calculation.
  • n i is the number of vertices involved when calculating vertices.
  • Is the vertex value of each vertex, and the graph convolutional network will output a set according to the input I(p(x i )) is the set of vertices involved in the calculation of a vertex. It is the weight of the edge. It is the training parameter in the graph convolutional network. It appears when the graph convolutional network is initialized and is continuously updated in the calculation.
  • the graph convolutional network is connected to the softmax layer, and the softmax layer calculates multiple probabilities. Each probability corresponds to a gesture category. The gesture category with the largest probability is selected as the gesture category in the image to be recognized.
  • the aforementioned gesture category may also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the hand feature image is first obtained according to the image to be recognized; the two-dimensional joint points in the hand feature image are initially obtained through the heat map, and the two-dimensional joint points are marked in the heat map ; Then input the two-dimensional joint points into the three-dimensional correction model, the three-dimensional correction model can constrain and correct the two-dimensional joint points in three dimensions, thereby improving the accuracy of gesture recognition, and obtain the two-dimensional joint point topology according to the corrected two-dimensional joint points Figure; the topological relationship between nodes is used when the two-dimensional joint point topology map is convolved to further ensure the accuracy of gesture recognition.
  • step S202 it may further include: acquiring a real hand data set and a virtual hand data set; performing a first training on the initial target detection network according to the virtual hand data set; The initial target detection network that has completed the first training performs the second training to obtain the target monitoring network.
  • the real hand data set may be a data set obtained based on human hands;
  • the virtual hand data set may be a data set synthesized based on virtual hands.
  • the hand features are established through three-dimensional modeling and related data is obtained to obtain Virtual hand data set.
  • Both the real hand data set and the virtual hand data set include the RGB image of the hand (the image based on the RGB color mode is achieved by changing the three color channels of red, green, and blue, and They are superimposed on each other to get a variety of colors), two-dimensional joint points and hand feature annotation data and other information.
  • the real hand data set may be the HMKA data set (Hands with Manual Keypoint Annotations), and the virtual hand data set may be the HSD data set (Hands from Synthetic Data).
  • the initial target detection network may be a model that has not yet completed target detection training.
  • the server needs to obtain the target detection network through training.
  • the server obtains the real hand data set and the virtual hand data set used for target detection training.
  • the server first trains the initial target detection network according to the virtual hand data set. After the first training is completed, the initial target detection network has the hand detection ability; then according to the real hand data set to complete the initial target of the first training
  • the detection network performs the second training to improve the model's ability to detect hands in the real environment, and the target detection network is obtained after the second training.
  • the first training may be pre-training of the second training.
  • the initial target detection network is first trained based on the virtual hand data set to enable it to have hand feature detection capabilities, and then based on the real hand data set, the initial target detection network’s ability to detect the real environment is enhanced to ensure the training results The detection accuracy of the target detection network.
  • step S203 may include:
  • step S2031 the hand feature image is input to the joint point extraction network to obtain several heat maps.
  • the joint point extraction network may be a network used to identify two-dimensional joint points.
  • the server inputs the hand feature image into the joint point extraction network.
  • the joint point extraction network does not directly predict the position of the two-dimensional joint points, but generates a heat map of the two-dimensional joint points to avoid difficulty in converging training or after the training is completed. The accuracy is low.
  • the server can generate multiple heat maps, and each heat map corresponds to a two-dimensional joint point.
  • step S2032 the pixel points with the largest heating value are respectively determined in several heat maps.
  • a pixel in the heat map has a thermal value, and the magnitude of the thermal value is proportional to the probability that the two-dimensional joint point is at this pixel.
  • the server traverses each pixel in each heat map, compares the thermal value of each pixel, and determines the pixel with the largest thermal value.
  • step S2033 the determined pixel points are marked as two-dimensional joint points, and several heat maps marked with two-dimensional joint points are obtained.
  • the server marks the pixel with the largest heat value as a two-dimensional joint point in the heat map, thereby obtaining several heat maps marked with two-dimensional joint points.
  • the heat map of the hand feature image is generated.
  • the heat value of the pixel in the heat map represents the probability that the pixel is a two-dimensional joint point.
  • the pixel with the largest heat value is selected from the heat map as the two-dimensional joint Points to ensure the accuracy of joint point recognition.
  • step S2031 also includes: acquiring a joint point extraction data set; inputting the hand images in the joint point extraction data set into the initial joint point extraction network to obtain a predicted heat map; extracting the data in the data set based on the predicted heat map and the joint points Annotate the heat map to determine the prediction error; according to the prediction error, the initial joint point extraction network is adjusted until the prediction error meets the training stop condition, and the joint point extraction network is obtained.
  • the joint point extraction data set may be a data set used to train the initial joint point extraction network.
  • the key node extraction data set may include hand images and annotated heat maps corresponding to the hand images.
  • the joint point extraction data set may be a RHD (Rendered Handpose Dataset) data set.
  • the initial joint point extraction network may be a joint point extraction network that has not yet completed training.
  • the predicted heat map may be a heat map obtained by predicting the two-dimensional joint points by the initial joint point extraction network.
  • the labeling heat map may be based on a pre-labeled heat map.
  • the joint point extraction network is obtained through training.
  • the server obtains the joint point extraction data set, and inputs the hand image in the joint point extraction data set into the initial joint point extraction network.
  • the initial joint point extraction network recognizes the two-dimensional joint points in the hand image and obtains the predicted heat map.
  • the server extracts the annotation heat map corresponding to the hand image from the joint extraction data set, uses the annotation heat map as the annotation data in training, and calculates the prediction error based on the annotation heat map and the predicted heat map.
  • the server adjusts the parameters in the initial joint point extraction network with the goal of reducing the prediction error, and continues training after each adjustment of the parameters.
  • the prediction error meets the training stop condition, the training is stopped, and the joint point extraction network is obtained.
  • the training stop condition may be that the prediction error is less than a preset error threshold.
  • the calculation formula of the prediction error is:
  • Y HM is the predicted heat map
  • G(Y 2D ) is the labeled heat map
  • H and W are the size of the heat map, that is, the size of the feature map of the output layer, which is a set of hyperparameters determined during the design of the initial joint point extraction network.
  • the prediction error is calculated according to the output prediction heat map and the annotation heat map, and the initial joint point extraction network is adjusted according to the prediction error until the prediction error satisfies the training
  • the stop condition enables the network at the end of the training to accurately identify the joint points.
  • step S204 may include:
  • Step S2041 Input a number of heat maps into a three-dimensional correction model to correct the two-dimensional joint points in the several heat maps through the three-dimensional correction model, and obtain the spatial geometric parameters of the hand recorded by each two-dimensional joint point.
  • the spatial geometric parameters can be parameters describing the shape characteristics of the hand and the surface of the hand.
  • the hand recorded in the feature image of the hand is in a complex real environment, and there may be abnormal postures, such as self-occlusion of the hand, entanglement of fingers, valgus, and joint points in the palm that are not coplanar. These complex gestures
  • the shape will have an impact on gesture recognition.
  • the three-dimensional correction model takes into account the spatial geometric characteristics of the two-dimensional joint points in the calculation, and can correct the influence of complex gesture shapes on gesture recognition.
  • the server inputs the heat map marked with two-dimensional joint points into the three-dimensional correction model.
  • the three-dimensional correction model calculates the spatial geometric characteristics of the hand recorded in the hand feature image based on the two-dimensional joint points.
  • the points are three-dimensionally constrained to realize the correction of the two-dimensional joint points, and the spatial geometric parameters are obtained after the calculation is completed.
  • Spatial geometric parameters can be composed of two sets of parameters, shape and pose.
  • the shape parameter describes the shape characteristics of the hand, such as finger length, finger size, and palm thickness;
  • the pose parameter describes the surface information of the hand, such as the deformation of the hand surface.
  • the three-dimensional correction model may be a MANO model, and the MANO model may output the joint points of the hand according to the shape and pose parameters. Before using the MANO model, it needs to be trained first so that the MANO model can output shape and pose parameters based on the heat map.
  • the heat map output by the joint point extraction network is first input into a two-dimensional to three-dimensional projection network, and then after the correction and calculation of the MANO model, the shape and pose parameters are output by the MANO layer layer.
  • the server recognizes the two-dimensional joint points according to the heat map
  • the two-dimensional joint points can be marked in the hand feature image, and the hand feature image marked with the two-dimensional joint points can be input into the three-dimensional correction model.
  • the three-dimensional correction model is first converted into several heat maps according to the feature image of the hand, and then the heat maps are input into the three-dimensional correction model.
  • Step S2042 Calculate the three-dimensional joint points corresponding to each two-dimensional joint point according to the spatial geometric parameters.
  • the three-dimensional correction model can map the three-dimensional joint points of the hand according to the spatial geometric parameters, and build a three-dimensional hand grid. Therefore, the three-dimensional correction model can calculate the three-dimensional joint points separately according to the spatial geometric parameters, and realize the correspondence between the two-dimensional joint points and the three-dimensional joint points in the heat map.
  • Step S2043 Project the obtained three-dimensional joint points to obtain the corrected two-dimensional joint points.
  • the three-dimensional joint points are projected according to the three-dimensional to two-dimensional projection formula to obtain a new set of two-dimensional joint points.
  • the two-dimensional joint points obtained by the projection are the corrected two Dimensional joint points.
  • Step S2044 Generate a two-dimensional joint point topology map corresponding to the corrected two-dimensional joint point.
  • the two-dimensional joint points after correction are ordered and have a fixed connection relationship.
  • the server adds the revised two-dimensional joint points to the initial topology map, and connects the two-dimensional joint points according to a preset fixed connection relationship to obtain a two-dimensional joint point topology map.
  • the spatial geometric parameters of the heat map are obtained through the three-dimensional correction model, and the two-dimensional joint points are three-dimensionally constrained and corrected in this process, reducing the impact of complex gestures in the real environment on recognition; mapping according to the spatial geometric parameters Draw out the 3D joint points corresponding to the 2D joint points, and then project the 3D joint points to obtain the corrected 2D joint points.
  • the corrected 2D joint points have more accurate position information, thereby generating more accurate 2D joints
  • the point topology map ensures the accuracy of gesture recognition through the two-dimensional joint point topology map.
  • step 204 also includes: acquiring a joint point correction data set; extracting two-dimensional joint points, spatial geometric parameters and label data corresponding to the extracted two-dimensional joint points from the joint point correction data set;
  • the initial three-dimensional correction model is trained with the two-dimensional joint points, spatial geometric parameters and label data to obtain the three-dimensional correction model.
  • the joint point correction data set can be a data set used to train the initial three-dimensional correction model;
  • the joint point correction data set can be the FreiHand data set, which can record the two-dimensional joint points, three-dimensional joint points, and hands of the hand.
  • the tag data can identify whether there is an abnormal posture of the hand, and it can also be a three-dimensional model of the hand.
  • the server obtains the joint point correction data set and extracts two-dimensional joint points, spatial geometric parameters corresponding to the two-dimensional joint points, and label data from it, uses the two-dimensional joint points as the input of the initial three-dimensional correction model, and sets the spatial geometric parameters As the expected output, the initial three-dimensional correction model is trained based on the label data.
  • the three-dimensional modified model obtained after training can calculate the spatial geometric parameters based on the two-dimensional joint points.
  • the two-dimensional joint points in the joint point correction data set are used as input, the spatial geometric parameters are used as the expected output, and the initial three-dimensional correction model is trained based on the label data, which ensures that the three-dimensional correction model after the training can be accurately based on The two-dimensional joint points calculate the spatial geometric parameters of the hand.
  • training the initial three-dimensional correction model according to the extracted two-dimensional joint points, spatial geometric parameters and label data specifically includes: inputting the extracted two-dimensional joint points into the initial three-dimensional correction model to obtain the spatial geometry Prediction parameters; determine the prediction error according to the spatial geometric prediction parameters and the spatial geometric parameters; determine whether the hand recorded by the extracted two-dimensional joint points has an abnormal posture according to the label data; when there is an abnormal posture, obtain the correction factor; according to The correction factor and the prediction error are adjusted to the initial three-dimensional correction model until the prediction error meets the training stop condition, and the three-dimensional correction model is obtained.
  • the spatial geometric prediction parameter may be a parameter that is obtained by the initial three-dimensional correction model according to the two-dimensional joint point prediction and identifies the spatial geometric characteristics of the hand.
  • Posture abnormalities can include phenomena such as hand self-occlusion, entanglement of fingers, finger valgus, and joint points in the palm that are not coplanar, as well as abnormal joint positions and inconsistencies between hand contours and label data.
  • the server calculates the two-dimensional joint points through the initial three-dimensional correction model to obtain the spatial geometric prediction parameters, and determines the prediction error according to the spatial geometric prediction parameters and the spatial geometric parameters.
  • the tag data is used to determine whether the hand recorded by the two-dimensional joint points has an abnormal posture.
  • the abnormal pose will have a negative impact on gesture recognition. It is necessary to strengthen the ability of the initial 3D correction model to predict spatial geometric parameters in this case. Therefore, it is necessary to apply a preset correction factor to the initial 3D correction model.
  • the factor is equivalent to a kind of stimulus, forcing the initial 3D correction model to more reasonably correct the 2D joint points and calculate more reasonable spatial geometric prediction parameters.
  • the initial three-dimensional correction model adjusts the internal model parameters of the initial three-dimensional correction model under the action of the correction factor and the prediction error until the prediction error meets the training stop condition, and the three-dimensional correction model is obtained.
  • the trained three-dimensional correction model can accurately calculate the spatial geometric parameters based on the two-dimensional joint points, and the process of calculating the spatial geometric parameters is also the process of correcting the two-dimensional joint points.
  • the initial three-dimensional correction model obtains the predicted spatial geometric parameters
  • the ability of the training also enables the three-dimensional correction model obtained after training to overcome abnormal poses and improve the accuracy of gesture recognition.
  • the only correction factor is directly obtained to correct the initial three-dimensional correction model; when there are multiple correction factors and the pose is determined to be abnormal, the abnormal evaluation value can also be calculated according to the spatial geometric parameters , The abnormal evaluation value represents the degree of confusion of the pose. Different abnormal evaluation values correspond to different size correction factors.
  • the server selects the correction factor corresponding to the abnormal evaluation value to correct the training of the initial three-dimensional correction model.
  • the three-dimensional correction model calculates the spatial geometric parameters
  • the abnormal evaluation value is calculated according to the spatial geometric parameters
  • the abnormal evaluation value is combined with the final gesture recognition result storage.
  • the three-dimensional correction model can also compare the abnormality evaluation value with the abnormality threshold. If the abnormality evaluation value is greater than the preset abnormality threshold, the processing of gesture recognition will be stopped, the abnormal picture will be displayed on the terminal, and the terminal will be reminded to reacquire the image to be recognized so as to restart.
  • Gesture Recognition is compared to determine the processing of gesture recognition.
  • an additional correction factor is obtained to correct the initial three-dimensional correction model.
  • the correction factor and prediction error are simultaneously applied to the initial three-dimensional correction model, so that the initial three-dimensional correction model can be It predicts more reasonable spatial geometric prediction parameters, and at the same time corrects the two-dimensional joint points more reasonably, thereby improving the accuracy of gesture recognition.
  • the biometric-based gesture recognition method in this application involves neural networks, machine learning, and computer vision in the field of artificial intelligence; in addition, it may also involve smart homes and smart lives in the field of smart cities.
  • the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium.
  • the computer-readable instructions When executed, they may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • this application provides an embodiment of a gesture recognition device based on biometrics.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the biometric-based gesture recognition device 300 in this embodiment includes: an image acquisition module 301, a hand detection module 302, a joint labeling module 303, a joint correction module 304 and a joint convolution module 305. among them:
  • the image acquisition module 301 is used to acquire the image to be recognized.
  • the hand detection module 302 is used to input the to-be-recognized image into the target detection network to obtain the hand feature image.
  • the joint labeling module 303 is used to determine the two-dimensional joint points in the hand feature image, and obtain several heat maps marked with the two-dimensional joint points.
  • the joint correction module 304 is used for correcting the two-dimensional joint points in a number of heat maps through the three-dimensional correction model to obtain a two-dimensional joint point topology map.
  • the joint convolution module 305 is used to perform graph convolution on the two-dimensional joint point topology map to obtain the gesture category in the image to be recognized.
  • the hand feature image is first obtained according to the image to be recognized; the two-dimensional joint points in the hand feature image are initially obtained through the heat map, and the two-dimensional joint points are marked in the heat map ; Then input the two-dimensional joint points into the three-dimensional correction model, the three-dimensional correction model can constrain and correct the two-dimensional joint points in three dimensions, thereby improving the accuracy of gesture recognition, and obtain the two-dimensional joint point topology according to the corrected two-dimensional joint points Figure; the topological relationship between nodes is used when the two-dimensional joint point topology map is convolved to further ensure the accuracy of gesture recognition.
  • the biometric-based gesture recognition device 300 further includes: a data set acquisition module, a first training module, and a second training module. among them:
  • the data set acquisition module is used to acquire the real hand data set and the virtual hand data set.
  • the first training module is used to perform first training on the initial target detection network according to the virtual hand data set.
  • the second training module is used to perform second training on the initial target detection network that has completed the first training according to the real hand data set to obtain the target monitoring network.
  • the initial target detection network is first trained based on the virtual hand data set to enable it to have hand feature detection capabilities, and then based on the real hand data set, the initial target detection network’s ability to detect the real environment is enhanced to ensure the training results The detection accuracy of the target detection network.
  • the aforementioned hand detection module 302 includes: an image input submodule, a pixel determination submodule, and a pixel labeling submodule. among them:
  • the image input sub-module inputs the feature image of the hand to the joint point extraction network to obtain several heat maps.
  • the pixel determination sub-module is used to determine the pixel point with the largest heating value in several heat maps.
  • the pixel labeling sub-module is used to label the determined pixels as two-dimensional joint points, and obtain several heat maps with two-dimensional joint points.
  • the heat map of the hand feature image is generated.
  • the heat value of the pixel in the heat map represents the probability that the pixel is a two-dimensional joint point.
  • the pixel with the largest heat value is selected from the heat map as the two-dimensional joint Points to ensure the accuracy of joint point recognition.
  • the aforementioned hand detection module 302 further includes: an acquisition sub-module, an input sub-module, a determination sub-module, and an adjustment sub-module. among them:
  • the acquisition sub-module is used to acquire the joint point extraction data set.
  • the input sub-module is used to input the hand image in the joint point extraction data set into the initial joint point extraction network to obtain a predicted heat map.
  • the determination sub-module is used to determine the prediction error according to the prediction heat map and the annotation heat map in the joint point extraction data set.
  • the adjustment sub-module is used to adjust the initial joint point extraction network according to the prediction error until the prediction error satisfies the training stop condition, and the joint point extraction network is obtained.
  • the prediction error is calculated according to the output prediction heat map and the annotation heat map, and the initial joint point extraction network is adjusted according to the prediction error until the prediction error satisfies the training
  • the stop condition enables the network at the end of the training to accurately identify the joint points.
  • the aforementioned joint correction module 304 further includes: a parameter acquisition sub-module, a three-dimensional determination sub-module, a three-dimensional projection sub-module, and a topology generation sub-module. among them:
  • the parameter acquisition sub-module is used to input a number of heat maps into the three-dimensional correction model to correct the two-dimensional joint points in the several heat maps through the three-dimensional correction model, and obtain the spatial geometric parameters of the hand recorded by each two-dimensional joint point.
  • the three-dimensional determination sub-module is used to calculate the three-dimensional joint points corresponding to each two-dimensional joint point through the spatial geometric parameters.
  • the three-dimensional projection sub-module is used to project the obtained three-dimensional joint points to obtain the corrected two-dimensional joint points.
  • the topology generation sub-module is used to generate a two-dimensional joint point topology map corresponding to the corrected two-dimensional joint point.
  • the spatial geometric parameters of the heat map are obtained through the three-dimensional correction model, and the two-dimensional joint points are three-dimensionally constrained and corrected in this process, reducing the impact of complex gestures in the real environment on recognition; mapping according to the spatial geometric parameters Draw out the 3D joint points corresponding to the 2D joint points, and then project the 3D joint points to obtain the corrected 2D joint points.
  • the corrected 2D joint points have more accurate position information, thereby generating more accurate 2D joints
  • the point topology map ensures the accuracy of gesture recognition through the two-dimensional joint point topology map.
  • the above-mentioned biometric-based gesture recognition device 300 further includes: a correction acquisition module, a data set extraction module, and a model training module. among them:
  • the correction acquisition module is used to acquire the joint point correction data set.
  • the data set extraction module is used to extract two-dimensional joint points, spatial geometric parameters and label data corresponding to the extracted two-dimensional joint points from the joint point correction data set.
  • the model training module is used to train the initial three-dimensional correction model according to the extracted two-dimensional joint points, spatial geometric parameters and label data to obtain a three-dimensional correction model.
  • the two-dimensional joint points in the joint point correction data set are used as input, the spatial geometric parameters are used as the expected output, and the initial three-dimensional correction model is trained based on the label data, which ensures that the three-dimensional correction model after the training can be accurately based on The two-dimensional joint points calculate the spatial geometric parameters of the hand.
  • the above-mentioned model training module includes: a two-dimensional input sub-module, an error determination sub-module, a pose determination sub-module, a factor acquisition sub-module, and a model adjustment sub-module. among them:
  • the two-dimensional input sub-module is used to input the extracted two-dimensional joint points into the initial three-dimensional correction model to obtain spatial geometric prediction parameters.
  • the error determination sub-module is used to determine the prediction error according to the spatial geometric prediction parameters and the spatial geometric parameters.
  • the posture determination sub-module is used to determine whether the hand recorded by the extracted two-dimensional joint points has an abnormal posture according to the tag data.
  • the factor acquisition sub-module is used to acquire the correction factor when the pose is abnormal.
  • the model adjustment sub-module is used to adjust the initial three-dimensional correction model according to the correction factor and the prediction error, until the prediction error meets the training stop condition, and the three-dimensional correction model is obtained.
  • an additional correction factor is obtained to correct the initial three-dimensional correction model.
  • the correction factor and prediction error are simultaneously applied to the initial three-dimensional correction model, so that the initial three-dimensional correction model can be It predicts more reasonable spatial geometric prediction parameters, and at the same time corrects the two-dimensional joint points more reasonably, thereby improving the accuracy of gesture recognition.
  • FIG. 7 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions of a gesture recognition method based on biometrics.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run computer-readable instructions or process data stored in the memory 41, for example, run the computer-readable instructions of the biometric-based gesture recognition method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer device provided in this embodiment can execute the steps of the above-mentioned biometric-based gesture recognition method.
  • the steps of the biometric recognition-based gesture recognition method may be the steps in the biometric recognition-based gesture recognition method of each of the foregoing embodiments.
  • the hand feature image is first obtained according to the image to be recognized; the two-dimensional joint points in the hand feature image are initially obtained through the heat map, and the two-dimensional joint points are marked in the heat map ; Then input the two-dimensional joint points into the three-dimensional correction model, the three-dimensional correction model can constrain and correct the two-dimensional joint points in three dimensions, thereby improving the accuracy of gesture recognition, and obtain the two-dimensional joint point topology according to the corrected two-dimensional joint points Figure; the topological relationship between nodes is used when the two-dimensional joint point topology map is convolved to further ensure the accuracy of gesture recognition.
  • the present application also provides another implementation manner, that is, a computer-readable storage medium is provided with computer-readable instructions stored thereon, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the biometric-based gesture recognition method as described above.
  • the hand feature image is first obtained according to the image to be recognized; the two-dimensional joint points in the hand feature image are initially obtained through the heat map, and the two-dimensional joint points are marked in the heat map ; Then input the two-dimensional joint points into the three-dimensional correction model, the three-dimensional correction model can constrain and correct the two-dimensional joint points in three dimensions, thereby improving the accuracy of gesture recognition, and obtain the two-dimensional joint point topology according to the corrected two-dimensional joint points Figure; the topological relationship between nodes is used when the two-dimensional joint point topology map is convolved to further ensure the accuracy of gesture recognition.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于生物识别的手势识别方法、装置、计算机设备及存储介质,方法包括:获取待识别图像(S201);将待识别图像输入目标检测网络,得到手部特征图像(S202);确定手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图(S203);通过三维修正模型修正若干热力图中的二维关节点,得到二维关节点拓扑图(S204);对二维关节点拓扑图进行图卷积,得到待识别图像中的手势类别(S205);其中,手势类别可存储于区块链中。该方法提高了手势识别的准确性。

Description

基于生物识别的手势识别方法、装置、计算机设备及介质
本申请要求于2020年07月09日提交中国专利局、申请号为202010659074.1,发明名称为“基于生物识别的手势识别方法、装置、计算机设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于生物识别的手势识别方法、装置、计算机设备及存储介质。
背景技术
随着人工智能的发展,手势识别在家庭娱乐、智能驾驶和智能穿戴等领域有着越来越广泛的应用。手势识别涉及生物识别中的活体检测,通过获取手部特征的图像识别手势,以便根据手势的含义进行下一步的操作,例如根据手势触发相应的指令。
识别的准确性对手势识别具有重要意义。传统的手势识别技术,通常是获取手部图片,通过视觉算法或者神经网络进行识别。发明人意识到,实际应用中手势姿态复杂多变,例如手指会出现自我遮挡、缠绕等现象,然而,无论是视觉算法还是神经网络,都无法有效地应对多变的手势姿态,使得手势识别的准确性较低。
发明内容
本申请实施例的目的在于提出一种基于生物识别的手势识别方法、装置、计算机设备及存储介质,以解决手势识别准确性较低的问题。
为了解决上述技术问题,本申请实施例提供一种基于生物识别的手势识别方法,采用了如下所述的技术方案:
获取待识别图像;
将所述待识别图像输入目标检测网络,得到手部特征图像;
确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
为了解决上述技术问题,本申请实施例还提供一种基于生物识别的手势识别装置,采用了如下所述的技术方案:
图像获取模块,用于获取待识别图像;
手部检测模块,用于将所述待识别图像输入目标检测网络,得到手部特征图像;
关节标注模块,用于确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
关节修正模块,用于通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
关节卷积模块,用于对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待识别图像;
将所述待识别图像输入目标检测网络,得到手部特征图像;
确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:
获取待识别图像;
将所述待识别图像输入目标检测网络,得到手部特征图像;
确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
与现有技术相比,本申请实施例主要有以下有益效果:获取待识别图像后,先根据待识别图像获取手部特征图像;通过热力图初步得到手部特征图像中的二维关节点,并将二维关节点在热力图中进行标注;再将二维关节点输入三维修正模型,三维修正模型可以将二维关节点进行三维约束与修正,从而提升手势识别的准确性,并根据修正后的二维关节点得到二维关节点拓扑图;对二维关节点拓扑图进行图卷积时利用了节点之间的拓扑关系,进一步保证了手势识别的准确性。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的基于生物识别的手势识别方法的一个实施例的流程图;
图3是一个实施例中二维关节点拓扑图的示意图;
图4是图2中步骤S203的一种具体实施方式的流程图;
图5是图2中步骤S204的一种具体实施方式的流程图;
图6是根据本申请的基于生物识别的手势识别装置的一个实施例的结构示意图;
图7是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器 应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。
需要说明的是,本申请实施例所提供的基于生物识别的手势识别方法一般由服务器执行,相应地,基于生物识别的手势识别装置一般设置于服务器中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的基于生物识别的手势识别方法的一个实施例的流程图。所述的基于生物识别的手势识别方法,包括以下步骤:
步骤S201,获取待识别图像。
在本实施例中,基于生物识别的手势识别方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式与终端设备进行通信。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
其中,待识别图像可以是用于手势识别的图像。
具体地,终端采集待识别图像,并将待识别图像发送至服务器。终端中的应用或页面支持手势识别功能,用户依据指示操作终端录入待识别图像。
服务器还可以从数据库中读取已经存储的图像,作为待识别图像。
步骤S202,将待识别图像输入目标检测网络,得到手部特征图像。
其中,目标检测网络可以是用于检测待识别图像中的手部特征的网络。
具体地,服务器获取到待识别图像后,先检测待识别图像中是否存在手部特征。服务器可以将待识别图像输入训练完毕的目标检测网络。目标检测网络用于识别待识别图像中的手部特征,并对待识别图像中的手部特征进行截取,得到手部特征图像。
在一个实施例中,手部特征可以是人类手部特征,还可以是与人类手部特征具有相同结构的类手部特征,例如玩具手、人偶手等。
在一个实施例中,目标检测网络设计为轻量级网络,基于SSD网络(SSD,全称Single Shot MultiBox Detector,是一种目标检测算法)改进而来。具体地,可以将SSD网络中的VGG网络替换为MobileNet v2网络。VGG网络是一种深度卷积神经网络,大量使用卷积层,且每个卷积层的卷积核数量大,导致VGG网络计算量大,对计算资源要求高。而MobileNet v2网络是一种轻量化卷积神经网络,使用大量的深度可分离卷积,计算量更小,运算速度更快。在目标检测网络中使用MobileNet V2网络,能够提升服务器的运算速度。
步骤S203,确定手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图。
其中,二维关节点可以是手部特征图像中的关节点,具备二维坐标信息。
具体地,服务器将手部特征图像输入关节点提取网络,关节点提取网络可以识别手部特征图像中的关节点。关节点提取网络生成若干热力图,热力图中不同的像素点颜色可以不同,但以指定颜色表示构成二维关节点的像素点。热力图是对每个像素点进行概率计算,计算每个像素点属于某个二维关节点的概率,依据概率选取像素点作为二维关节点。关节点提取网络需要提取多少个二维关节点,就可以生成多少张热力图,每一张热力图对应一个二维关节点。
在一个实施例中,服务器将手部特征图像划分为相同大小的图像区域,每个图像区域由若干个像素点组成(例如2×2或者3×3个像素点,在此不做赘述),以图像区域为单 位计算图像区域属于二维关节点的概率,并生成标注有二维关节点的若干热力图。
在一个实施例中,关节点提取网络可以是堆叠沙漏网络(Stacked Hourglass Networks),堆叠沙漏网络常用于在二维姿态识别中识别关键点。
在一个实施例中,关节点提取网络可以从手部特征图像中提取21个二维关节点。
步骤S204,通过三维修正模型修正若干热力图中的二维关节点,得到二维关节点拓扑图。
其中,三维修正模型可以是对二维关节点进行修正的模型。
具体地,得到若干标注有二维关节点的热力图后,服务器将热力图输入三维修正模型。三维修正模型可以对二维关节点进行三维约束与修正,使二维关节点对应的三维关节点在空间上分布更加合理,再从三维关节点重新得到二维关节点,完成对二维关节点的修正。
服务器根据修正后的二维关节点的位置,将修正后的二维关节点添加到初始拓扑图中,并连接各修正后的二维关节点,得到二维关节点拓扑图。
图3是一个实施例中二维关节点拓扑图的示意图,具体地,参照图3,关节点提取网络从手部特征图像中提取到了21个关节点并进行标号,为了显示关节点与手部的对应关系,图3还附加了手部特征图像。
步骤S205,对二维关节点拓扑图进行图卷积,得到待识别图像中的手势类别。
具体地,手部各关节点存在直接的几何关系,为了提升手势识别的准确性,可以对二维关节点拓扑图进行图卷积运算,从而利用关节点之间的拓扑关系。
在进行图卷积时,每个二维关节点视作顶点,每个顶点的计算范围是顶点自身和相邻顶点的集合。举例说明,参照图3,在计算顶点17时,参与计算的顶点包括顶点17、顶点0、顶点13和顶点18;在计算顶点14时,参与计算的顶点包括顶点14、顶点13和顶点15。顶点和顶点之间的连线是拓扑图中的边。
在一个实施例中,图卷积的计算公式为:
Figure PCTCN2020122833-appb-000001
其中,V (i)是顶点的边集,每个顶点由于相邻顶点不同,计算时每个顶点所涉及的边也不同。n i是对顶点计算时,所涉及的顶点的数量。
Figure PCTCN2020122833-appb-000002
是每个顶点的顶点值,图卷积网络会依据输入会输出一组
Figure PCTCN2020122833-appb-000003
I(p(x i))是对一个顶点计算时所涉及顶点的集合。
Figure PCTCN2020122833-appb-000004
是边的权重,是图卷积网络中的训练参数,初始化图卷积网络时出现,并在计算中不断更新。
图卷积网络连接softmax层,softmax层计算出多个概率,每个概率对应于一种手势类别,选取具有最大概率所对应的手势类别作为待识别图像中的手势类别。
需要强调的是,为进一步保证上述识别到的手势类别的私密和安全性,上述手势类别还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本实施例中,获取待识别图像后,先根据待识别图像获取手部特征图像;通过热力图初步得到手部特征图像中的二维关节点,并将二维关节点在热力图中进行标注;再将二维关节点输入三维修正模型,三维修正模型可以将二维关节点进行三维约束与修正,从而提升手势识别的准确性,并根据修正后的二维关节点得到二维关节点拓扑图;对二维关节点拓扑图进行图卷积时利用了节点之间的拓扑关系,进一步保证了手势识别的准确性。
在一个实施例中,步骤S202之前,还可以包括:获取真实手部数据集和虚拟手部数据集;根据虚拟手部数据集对初始目标检测网络进行第一训练;根据真实手部数据集对完成第一训练的初始目标检测网络进行第二训练,得到目标监测网络。
其中,真实手部数据集可以是基于人类手部获取的数据集;虚拟手部数据集可以是基于虚拟手部合成的数据集,例如,通过三维建模建立手部特征并获取相关数据,得到虚拟手部数据集。
真实手部数据集和虚拟手部数据集均包括手部的RGB图像(基于RGB色彩模式的图像,是通过对红(Red)、绿(Green)、蓝(Blue)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色)、二维关节点和手部特征标注数据等信息。
在一个实施例中,真实手部数据集可以是HMKA数据集(Hands with Manual Keypoint Annotations),虚拟手部数据集可以是HSD数据集(Hands from Synthetic Data)。
初始目标检测网络可以是尚未完成目标检测训练的模型。
具体地,服务器需要通过训练获得目标检测网络。服务器获取用于目标检测训练的真实手部数据集和虚拟手部数据集。服务器先依据虚拟手部数据集对初始目标检测网络进行第一训练,完成第一训练后,初始目标检测网络具备了手部检测能力;再根据真实手部数据集对完成第一训练的初始目标检测网络进行第二训练,以提升模型对真实环境中的手部检测能力,第二训练结束后得到目标检测网络。第一训练可以是第二训练的预训练。
本实施例中,先根据虚拟手部数据集训练初始目标检测网络使之具备手部特征检测能力,再根据真实手部数据集强化初始目标检测网络对真实环境的检测能力,保证了训练得到的目标检测网络的检测准确性。
进一步的,如图4所示,上述步骤S203可以包括:
步骤S2031,将手部特征图像输入至关节点提取网络,得到若干热力图。
其中,关节点提取网络可以是用于识别二维关节点的网络。
具体地,服务器将手部特征图像输入关节点提取网络,关节点提取网络并不直接预测二维关节点的位置,而是生成二维关节点的热力图,以避免训练难以收敛或训练完成后准确性较低。服务器可以生成多张热力图,每张热力图对应一个二维关节点。
步骤S2032,在若干热力图中分别确定具有最大热力值的像素点。
具体地,热力图中的像素点具有热力值,热力值的大小正比于二维关节点处于这个像素点的概率。服务器遍历各热力图中的各像素点,比较各像素点的热力值,确定具有最大热力值的像素点。
步骤S2033,将确定的像素点标注为二维关节点,得到标注有二维关节点的若干热力图。
具体地,对于每张热力图,服务器将具有最大热力值的像素点标注为该热力图中的二维关节点,从而得到若干张标注有二维关节点的热力图。
本实施例中,生成手部特征图像的热力图,热力图中像素点的热力值表征了像素点属于二维关节点的概率,从热力图中选取具有最大热力值的像素点作为二维关节点,保证了关节点识别的准确性。
进一步的,上述步骤S2031之前还包括:获取关节点提取数据集;将关节点提取数据集中的手部图像输入初始关节点提取网络,得到预测热力图;根据预测热力图和关节点提取数据集中的标注热力图确定预测误差;根据预测误差对初始关节点提取网络进行调整,直至预测误差满足训练停止条件,得到关节点提取网络。
其中,关节点提取数据集可以是用于训练初始关节点提取网络的数据集。关节点提取数据集可以包括手部图像,以及与手部图像对应的标注热力图。在一个实施例中,关节点提取数据集可以是RHD(Rendered Handpose Dataset)数据集。
初始关节点提取网络可以是尚未完成训练的关节点提取网络。预测热力图可以是初始关节点提取网络对二维关节点进行预测得到的热力图。标注热力图可以是基于预先标注得到的热力图。
具体地,关节点提取网络经过训练得到。服务器获取关节点提取数据集,将关节点提取数据集中的手部图像输入初始关节点提取网络,初始关节点提取网络识别手部图像中的 二维关节点,得到预测热力图。
服务器从关节提取数据集中提取与手部图像对应的标注热力图,将标注热力图作为训练中的标注数据,基于标注热力图和预测热力图计算预测误差。
服务器以减小预测误差为目标调整初始关节点提取网络中的参数,每次调整完参数后继续进行训练,当预测误差满足训练停止条件时,停止训练,得到关节点提取网络。其中,训练停止条件可以是预测误差小于预设的误差阈值。
在一个实施例中,预测误差的计算公式为:
Figure PCTCN2020122833-appb-000005
其中,Y HM为预测热力图,G(Y 2D)为标注热力图。H、W为热力图大小,即输出层特征图尺寸,是初始关节点提取网络在设计时确定的一组超参数。在计算预测误差时,需要将预测热力图和标注热力图进行逐点比较,即在H/W维度上求和。
本实施例中,依据关节点提取数据集对初始关节点提取网络进行训练时,根据输出的预测热力图和标注热力图计算预测误差,根据预测误差调整初始关节点提取网络,直至预测误差满足训练停止条件,使得训练结束时的网络能够对关节点进行准确识别。
进一步的,如图5所示,上述步骤S204可以包括:
步骤S2041,将若干热力图输入三维修正模型,以通过三维修正模型对若干热力图中的二维关节点进行修正,并得到各二维关节点所记录手部的空间几何参数。
其中,空间几何参数可以是描述手部形状特性和手部表面的参数。
手部特征图像记录的手部处于复杂的真实环境,可能存在位姿异常,例如:手部的自我遮挡、手指缠绕、手指外翻、手掌中的关节点不共面等现象,这些复杂的手势形状会对手势识别产生影响。三维修正模型在计算时考虑到了二维关节点的空间几何特性,可以修正复杂的手势形状对手势识别的影响。
具体地,服务器将标注有二维关节点的热力图输入三维修正模型,三维修正模型依据二维关节点,计算手部特征图像记录的手部的空间几何特性,在计算过程中对二维关节点进行了三维约束,实现对二维关节点的修正,并在计算结束后得到空间几何参数。
空间几何参数可以由shape和pose两组参数组成,其中shape参数描述手部形状特性,例如手指长度、手指大小和手掌厚度等信息;pose参数描述手部表面信息,例如手部表面的形变。
在一个实施例中,三维修正模型可以是MANO模型,MANO模型可以依据shape和pose参数输出手部的关节点。在使用MANO模型之前,需要先进行训练,使MANO模型能够依据热力图输出shape和pose参数。
在一个实施例中,关节点提取网络输出的热力图先输入二维到三维的投影网络,再经过MANO模型的修正与计算,由MANO layer层输出shape和pose参数。
在一个实施例中,服务器根据热力图识别到二维关节点后,可以将二维关节点标注在手部特征图像中,将标注有二维关节点的手部特征图像输入三维修正模型。三维修正模型先依据手部特征图像转换为若干张热力图,再将热力图输入三维修正模型。
步骤S2042,通过空间几何参数计算各二维关节点所对应的三维关节点。
具体地,三维修正模型能够依据空间几何参数映射出手部的三维关节点,并搭建三维的手部网格。因此,三维修正模型可以依据空间几何参数,分别计算出各三维关节点,实现热力图中二维关节点到三维关节点的对应。
步骤S2043,对得到的三维关节点进行投影,得到修正后的二维关节点。
具体地,三维修正模型得到三维关节点后,再按照三维到二维的投影公式对三维关节点进行投影,得到一组新的二维关节点,投影得到的二维关节点就是修正后的二维关节点。
步骤S2044,生成与修正后的二维关节点所对应的二维关节点拓扑图。
具体地,修正后的各二维关节点是有序的,具有固定的相连关系。服务器将修正后的 二维关节点添加到初始的拓扑图中,并按照预设的固定相连关系连接各二维关节点,得到二维关节点拓扑图。
本实施例中,通过三维修正模型获取热力图的空间几何参数,在此过程对二维关节点进行三维约束与修正,减少了真实环境中复杂的手势对识别产生的影响;依据空间几何参数映射出二维关节点所对应的三维关节点,再对三维关节点进行投影得到修正后的二维关节点,修正后的二维关节点具有更准确的位置信息,从而生成更准确的二维关节点拓扑图,保证了通过二维关节点拓扑图进行手势识别的准确性。
进一步的,上述步骤204之前还包括:获取关节点修正数据集;从关节点修正数据集中提取二维关节点、与提取到的二维关节点所对应的空间几何参数以及标签数据;根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型。
其中,关节点修正数据集可以是用于训练初始三维修正模型的数据集;关节点修正数据集可以是FreiHand数据集,FreiHand数据集可以记录手部的二维关节点、三维关节点、手部的空间几何参数、标签数据等信息。标签数据可以标识手部是否存在位姿异常,还可以是手部的三维模型。
具体地,服务器获取关节点修正数据集并从中提取二维关节点、与二维关节点所对应的空间几何参数以及标签数据,将二维关节点作为初始三维修正模型的输入,将空间几何参数作为期望输出,并依据标签数据对初始三维修正模型进行训练。训练完毕得到的三维修正模型可以依据二维关节点计算出空间几何参数。
本实施例中,将关节点修正数据集中的二维关节点作为输入,将空间几何参数作为期望输出,并依据标签数据训练初始三维修正模型,保证了训练结束后的三维修正模型可以准确地根据二维关节点计算出手部的空间几何参数。
进一步的,根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型的步骤具体包括:将提取到的二维关节点输入初始三维修正模型,得到空间几何预测参数;根据空间几何预测参数以及空间几何参数确定预测误差;根据标签数据确定提取到的二维关节点所记录的手部是否存在位姿异常;当存在位姿异常时,获取修正因子;根据修正因子和预测误差对初始三维修正模型进行调整,直至预测误差满足训练停止条件,得到三维修正模型。
其中,空间几何预测参数可以是初始三维修正模型依据二维关节点预测得到的、标识手部空间几何特性的参数。
位姿异常可以包括手部的自我遮挡、手指缠绕、手指外翻、手掌中的关节点不共面等现象,还可以包括关节点位置异常,以及手部轮廓与标签数据不一致等现象。
具体地,服务器通过初始三维修正模型对二维关节点进行计算,得到空间几何预测参数,并根据空间几何预测参数和空间几何参数确定预测误差。
同时,通过标签数据确定二维关节点所记录的手部是否存在位姿异常。存在位姿异常时,位姿异常会对手势识别产生负面影响,需要强化初始三维修正模型在这种情况下预测空间几何参数的能力,因此需要向初始三维修正模型施加预设的修正因子,修正因子相当于一种激励,迫使初始三维修正模型更合理地对二维关节点进行修正,并计算出更加合理的空间几何预测参数。
初始三维修正模型在修正因子和预测误差的作用下,调整初始三维修正模型内部的模型参数,直至预测误差满足训练停止条件,得到三维修正模型。
训练好的三维修正模型能够准确地依据二维关节点计算空间几何参数,而计算空间几何参数的过程也是对二维关节点进行修正的过程,通过训练使初始三维修正模型获得了预测空间几何参数的能力,也使得训练完毕得到的三维修正模型能够克服位姿异常,提高了手势识别的准确性。
在一个实施例中,可以仅有一个修正因子,也可以存在多个不同的修正因子。仅有一 个修正因子且存在位姿异常时,直接获取唯一的修正因子对初始三维修正模型进行修正;当存在多个修正因子且确定存在位姿异常时,还可以根据空间几何参数计算异常评估值,异常评估值表征了位姿的混乱程度。不同的异常评估值对应于不同大小的修正因子。服务器选取与异常评估值相对应的修正因子,对初始三维修正模型的训练进行修正。
在一个实施例中,训练结束后,应用三维修正模型进行手势识别时,三维修正模型计算出空间几何参数后,依据空间几何参数计算异常评估值,并将异常评估值与最后的手势识别结果一起存储。三维修正模型还可以将异常评估值与异常阈值相比较,若异常评估值大于预设的异常阈值,则停止手势识别的处理,通过终端显示图片异常,并提醒终端重新获取待识别图像以便重新进行手势识别。
本实施例中,当在训练中确定手部存在位姿异常时,获取额外的修正因子对初始三维修正模型进行修正,修正因子和预测误差同时作用于初始三维修正模型,使得初始三维修正模型能够预测出更加合理的空间几何预测参数,同时能更合理地修正二维关节点,从而提高了手势识别的准确性。
本申请中基于生物识别的手势识别方法涉及人工智能领域中的神经网络、机器学习和计算机视觉;此外,还可以涉及智慧城市领域中的智慧家居和智慧生活。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图6,作为对上述图2所示方法的实现,本申请提供了一种基于生物识别的手势识别装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图6所示,本实施例所述的基于生物识别的手势识别装置300包括:图像获取模块301、手部检测模块302、关节标注模块303、关节修正模块304和关节卷积模块305。其中:
图像获取模块301,用于获取待识别图像。
手部检测模块302,用于将待识别图像输入目标检测网络,得到手部特征图像。
关节标注模块303,用于确定手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图。
关节修正模块304,用于通过三维修正模型修正若干热力图中的二维关节点,得到二维关节点拓扑图。
关节卷积模块305,用于对二维关节点拓扑图进行图卷积,得到待识别图像中的手势类别。
本实施例中,获取待识别图像后,先根据待识别图像获取手部特征图像;通过热力图初步得到手部特征图像中的二维关节点,并将二维关节点在热力图中进行标注;再将二维关节点输入三维修正模型,三维修正模型可以将二维关节点进行三维约束与修正,从而提升手势识别的准确性,并根据修正后的二维关节点得到二维关节点拓扑图;对二维关节点拓扑图进行图卷积时利用了节点之间的拓扑关系,进一步保证了手势识别的准确性。
在本实施例的一些可选的实现方式中,基于生物识别的手势识别装置300还包括:数 据集获取模块、第一训练模块和第二训练模块。其中:
数据集获取模块,用于获取真实手部数据集和虚拟手部数据集。
第一训练模块,用于根据虚拟手部数据集对初始目标检测网络进行第一训练。
第二训练模块,用于根据真实手部数据集对完成第一训练的初始目标检测网络进行第二训练,得到目标监测网络。
本实施例中,先根据虚拟手部数据集训练初始目标检测网络使之具备手部特征检测能力,再根据真实手部数据集强化初始目标检测网络对真实环境的检测能力,保证了训练得到的目标检测网络的检测准确性。
在本实施例的一些可选的实现方式中,上述手部检测模块302包括:图像输入子模块、像素确定子模块和像素标注子模块。其中:
图像输入子模块,将手部特征图像输入至关节点提取网络,得到若干热力图。
像素确定子模块,用于在若干热力图中分别确定具有最大热力值的像素点。
像素标注子模块,用于将确定的像素点标注为二维关节点,得到标注有二维关节点的若干热力图。
本实施例中,生成手部特征图像的热力图,热力图中像素点的热力值表征了像素点属于二维关节点的概率,从热力图中选取具有最大热力值的像素点作为二维关节点,保证了关节点识别的准确性。
在本实施例的一些可选的实现方式中,上述手部检测模块302还包括:获取子模块、输入子模块、确定子模块和调整子模块。其中:
获取子模块,用于获取关节点提取数据集。
输入子模块,用于将关节点提取数据集中的手部图像输入初始关节点提取网络,得到预测热力图。
确定子模块,用于根据预测热力图和关节点提取数据集中的标注热力图确定预测误差。
调整子模块,用于根据预测误差对初始关节点提取网络进行调整,直至预测误差满足训练停止条件,得到关节点提取网络。
本实施例中,依据关节点提取数据集对初始关节点提取网络进行训练时,根据输出的预测热力图和标注热力图计算预测误差,根据预测误差调整初始关节点提取网络,直至预测误差满足训练停止条件,使得训练结束时的网络能够对关节点进行准确识别。
在本实施例的一些可选的实现方式中,上述关节修正模块304还包括:参数获取子模块、三维确定子模块、三维投影子模块和拓扑生成子模块。其中:
参数获取子模块,用于将若干热力图输入三维修正模型,以通过三维修正模型对若干热力图中的二维关节点进行修正,并得到各二维关节点所记录手部的空间几何参数。
三维确定子模块,用于通过空间几何参数计算各二维关节点所对应的三维关节点。
三维投影子模块,用于对得到的三维关节点进行投影,得到修正后的二维关节点。
拓扑生成子模块,用于生成与修正后的二维关节点所对应的二维关节点拓扑图。
本实施例中,通过三维修正模型获取热力图的空间几何参数,在此过程对二维关节点进行三维约束与修正,减少了真实环境中复杂的手势对识别产生的影响;依据空间几何参数映射出二维关节点所对应的三维关节点,再对三维关节点进行投影得到修正后的二维关节点,修正后的二维关节点具有更准确的位置信息,从而生成更准确的二维关节点拓扑图,保证了通过二维关节点拓扑图进行手势识别的准确性。
在本实施例的一些可选的实现方式中,上述基于生物识别的手势识别装置300还包括:修正获取模块、数据集提取模块和模型训练模块。其中:
修正获取模块,用于获取关节点修正数据集。
数据集提取模块,用于从关节点修正数据集中提取二维关节点、与提取到的二维关节点所对应的空间几何参数以及标签数据。
模型训练模块,用于根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型。
本实施例中,将关节点修正数据集中的二维关节点作为输入,将空间几何参数作为期望输出,并依据标签数据训练初始三维修正模型,保证了训练结束后的三维修正模型可以准确地根据二维关节点计算出手部的空间几何参数。
在本实施例的一些可选的实现方式中,上述模型训练模块包括:二维输入子模块、误差确定子模块、位姿确定子模块、因子获取子模块和模型调整子模块。其中:
二维输入子模块,用于将提取到的二维关节点输入初始三维修正模型,得到空间几何预测参数。
误差确定子模块,用于根据空间几何预测参数以及空间几何参数确定预测误差。
位姿确定子模块,用于根据标签数据确定提取到的二维关节点所记录的手部是否存在位姿异常。
因子获取子模块,用于当存在位姿异常时,获取修正因子。
模型调整子模块,用于根据修正因子和预测误差对初始三维修正模型进行调整,直至预测误差满足训练停止条件,得到三维修正模型。
本实施例中,当在训练中确定手部存在位姿异常时,获取额外的修正因子对初始三维修正模型进行修正,修正因子和预测误差同时作用于初始三维修正模型,使得初始三维修正模型能够预测出更加合理的空间几何预测参数,同时能更合理地修正二维关节点,从而提高了手势识别的准确性。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图7,图7为本实施例计算机设备基本结构框图。
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器41至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如基于生物识别的手势识别方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机 可读指令或者处理数据,例如运行所述基于生物识别的手势识别方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本实施例中提供的计算机设备可以执行上述基于生物识别的手势识别方法的步骤。此处基于生物识别的手势识别方法的步骤可以是上述各个实施例的基于生物识别的手势识别方法中的步骤。
本实施例中,获取待识别图像后,先根据待识别图像获取手部特征图像;通过热力图初步得到手部特征图像中的二维关节点,并将二维关节点在热力图中进行标注;再将二维关节点输入三维修正模型,三维修正模型可以将二维关节点进行三维约束与修正,从而提升手势识别的准确性,并根据修正后的二维关节点得到二维关节点拓扑图;对二维关节点拓扑图进行图卷积时利用了节点之间的拓扑关系,进一步保证了手势识别的准确性。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于生物识别的手势识别方法的步骤。
本实施例中,获取待识别图像后,先根据待识别图像获取手部特征图像;通过热力图初步得到手部特征图像中的二维关节点,并将二维关节点在热力图中进行标注;再将二维关节点输入三维修正模型,三维修正模型可以将二维关节点进行三维约束与修正,从而提升手势识别的准确性,并根据修正后的二维关节点得到二维关节点拓扑图;对二维关节点拓扑图进行图卷积时利用了节点之间的拓扑关系,进一步保证了手势识别的准确性。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种基于生物识别的手势识别方法,其中,包括下述步骤:
    获取待识别图像;
    将所述待识别图像输入目标检测网络,得到手部特征图像;
    确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
    通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
    对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
  2. 根据权利要求1所述的基于生物识别的手势识别方法,其中,所述将所述待识别图像输入目标检测网络,得到手部特征图像的步骤之前还包括:
    获取真实手部数据集和虚拟手部数据集;
    根据所述虚拟手部数据集对初始目标检测网络进行第一训练;
    根据所述真实手部数据集对完成第一训练的初始目标检测网络进行第二训练,得到目标监测网络。
  3. 根据权利要求1所述的基于生物识别的手势识别方法,其中,所述确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图的步骤具体包括:
    将所述手部特征图像输入至关节点提取网络,得到若干热力图;
    在所述若干热力图中分别确定具有最大热力值的像素点;
    将确定的像素点标注为二维关节点,得到标注有二维关节点的若干热力图。
  4. 根据权利要求3所述的基于生物识别的手势识别方法,其中,所述将所述手部特征图像输入至关节点提取网络,得到若干热力图的步骤之前还包括:
    获取关节点提取数据集;
    将所述关节点提取数据集中的手部图像输入初始关节点提取网络,得到预测热力图;
    根据所述预测热力图和所述关节点提取数据集中的标注热力图确定预测误差;
    根据所述预测误差对所述初始关节点提取网络进行调整,直至所述预测误差满足训练停止条件,得到关节点提取网络。
  5. 根据权利要求1所述的基于生物识别的手势识别方法,其中,所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤具体包括:
    将所述若干热力图输入三维修正模型,以通过所述三维修正模型对所述若干热力图中的二维关节点进行修正,并得到各二维关节点所记录手部的空间几何参数;
    通过所述空间几何参数计算所述各二维关节点所对应的三维关节点;
    对得到的三维关节点进行投影,得到修正后的二维关节点;
    生成与所述修正后的二维关节点所对应的二维关节点拓扑图。
  6. 根据权利要求1所述的基于生物识别的手势识别方法,其中,在所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤之前还包括:
    获取关节点修正数据集;
    从所述关节点修正数据集中提取二维关节点、与提取到的二维关节点所对应的空间几何参数以及标签数据;
    根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型。
  7. 根据权利要求6所述的基于生物识别的手势识别方法,其中,所述根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型的步骤具体包括:
    将所述提取到的二维关节点输入所述初始三维修正模型,得到空间几何预测参数;
    根据所述空间几何预测参数以及所述空间几何参数确定预测误差;
    根据所述标签数据确定所述提取到的二维关节点所记录的手部是否存在位姿异常;
    当存在位姿异常时,获取修正因子;
    根据所述修正因子和所述预测误差对所述初始三维修正模型进行调整,直至所述预测误差满足训练停止条件,得到三维修正模型。
  8. 一种基于生物识别的手势识别装置,其中,包括:
    图像获取模块,用于获取待识别图像;
    手部检测模块,用于将所述待识别图像输入目标检测网络,得到手部特征图像;
    关节标注模块,用于确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
    关节修正模块,用于通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
    关节卷积模块,用于对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待识别图像;
    将所述待识别图像输入目标检测网络,得到手部特征图像;
    确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
    通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
    对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
  10. 根据权利要求9所述的计算机设备,其中,所述确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图的步骤具体包括:
    将所述手部特征图像输入至关节点提取网络,得到若干热力图;
    在所述若干热力图中分别确定具有最大热力值的像素点;
    将确定的像素点标注为二维关节点,得到标注有二维关节点的若干热力图。
  11. 根据权利要求10所述的计算机设备,其中,所述将所述手部特征图像输入至关节点提取网络,得到若干热力图的步骤之前还包括:
    获取关节点提取数据集;
    将所述关节点提取数据集中的手部图像输入初始关节点提取网络,得到预测热力图;
    根据所述预测热力图和所述关节点提取数据集中的标注热力图确定预测误差;
    根据所述预测误差对所述初始关节点提取网络进行调整,直至所述预测误差满足训练停止条件,得到关节点提取网络。
  12. 根据权利要求9所述的计算机设备,其中,所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤具体包括:
    将所述若干热力图输入三维修正模型,以通过所述三维修正模型对所述若干热力图中的二维关节点进行修正,并得到各二维关节点所记录手部的空间几何参数;
    通过所述空间几何参数计算所述各二维关节点所对应的三维关节点;
    对得到的三维关节点进行投影,得到修正后的二维关节点;
    生成与所述修正后的二维关节点所对应的二维关节点拓扑图。
  13. 根据权利要求9所述的计算机设备,其中,在所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤之前还包括:
    获取关节点修正数据集;
    从所述关节点修正数据集中提取二维关节点、与提取到的二维关节点所对应的空间几何参数以及标签数据;
    根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型。
  14. 根据权利要求13所述的计算机设备,其中,所述根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型的步骤具体包括:
    将所述提取到的二维关节点输入所述初始三维修正模型,得到空间几何预测参数;
    根据所述空间几何预测参数以及所述空间几何参数确定预测误差;
    根据所述标签数据确定所述提取到的二维关节点所记录的手部是否存在位姿异常;
    当存在位姿异常时,获取修正因子;
    根据所述修正因子和所述预测误差对所述初始三维修正模型进行调整,直至所述预测误差满足训练停止条件,得到三维修正模型。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令;其中,所述计算机可读指令被处理器执行时实现如下步骤:
    获取待识别图像;
    将所述待识别图像输入目标检测网络,得到手部特征图像;
    确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图;
    通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图;
    对所述二维关节点拓扑图进行图卷积,得到所述待识别图像中的手势类别。
  16. 根据权利要求15所述的一种计算机可读存储介质,其中,所述确定所述手部特征图像中的二维关节点,得到标注有二维关节点的若干热力图的步骤具体包括:
    将所述手部特征图像输入至关节点提取网络,得到若干热力图;
    在所述若干热力图中分别确定具有最大热力值的像素点;
    将确定的像素点标注为二维关节点,得到标注有二维关节点的若干热力图。
  17. 根据权利要求16所述的一种计算机可读存储介质,其中,所述将所述手部特征图像输入至关节点提取网络,得到若干热力图的步骤之前还包括:
    获取关节点提取数据集;
    将所述关节点提取数据集中的手部图像输入初始关节点提取网络,得到预测热力图;
    根据所述预测热力图和所述关节点提取数据集中的标注热力图确定预测误差;
    根据所述预测误差对所述初始关节点提取网络进行调整,直至所述预测误差满足训练停止条件,得到关节点提取网络。
  18. 根据权利要求15所述的一种计算机可读存储介质,其中,所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤具体包括:
    将所述若干热力图输入三维修正模型,以通过所述三维修正模型对所述若干热力图中的二维关节点进行修正,并得到各二维关节点所记录手部的空间几何参数;
    通过所述空间几何参数计算所述各二维关节点所对应的三维关节点;
    对得到的三维关节点进行投影,得到修正后的二维关节点;
    生成与所述修正后的二维关节点所对应的二维关节点拓扑图。
  19. 根据权利要求15所述的一种计算机可读存储介质,其中,在所述通过三维修正模型修正所述若干热力图中的二维关节点,得到二维关节点拓扑图的步骤之前还包括:
    获取关节点修正数据集;
    从所述关节点修正数据集中提取二维关节点、与提取到的二维关节点所对应的空间几何参数以及标签数据;
    根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型。
  20. 根据权利要求19所述的一种计算机可读存储介质,其中,所述根据提取到的二维关节点、空间几何参数以及标签数据训练初始三维修正模型,得到三维修正模型的步骤具体包括:
    将所述提取到的二维关节点输入所述初始三维修正模型,得到空间几何预测参数;
    根据所述空间几何预测参数以及所述空间几何参数确定预测误差;
    根据所述标签数据确定所述提取到的二维关节点所记录的手部是否存在位姿异常;
    当存在位姿异常时,获取修正因子;
    根据所述修正因子和所述预测误差对所述初始三维修正模型进行调整,直至所述预测误差满足训练停止条件,得到三维修正模型。
PCT/CN2020/122833 2020-07-09 2020-10-22 基于生物识别的手势识别方法、装置、计算机设备及介质 WO2021120834A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010659074.1 2020-07-09
CN202010659074.1A CN111832468A (zh) 2020-07-09 2020-07-09 基于生物识别的手势识别方法、装置、计算机设备及介质

Publications (1)

Publication Number Publication Date
WO2021120834A1 true WO2021120834A1 (zh) 2021-06-24

Family

ID=72899768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122833 WO2021120834A1 (zh) 2020-07-09 2020-10-22 基于生物识别的手势识别方法、装置、计算机设备及介质

Country Status (2)

Country Link
CN (1) CN111832468A (zh)
WO (1) WO2021120834A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817757A (zh) * 2022-04-02 2022-07-29 广州大学 基于图卷积网络的跨社交网络虚拟身份关联方法
CN115346345A (zh) * 2022-07-28 2022-11-15 福建省杭氟电子材料有限公司 一种用于六氟丁二烯制备的智能化有毒有害气体报警系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112558856A (zh) * 2020-12-16 2021-03-26 深圳市大中华区块链科技有限公司 一种区块链触屏手势识别方法及系统
CN112668543B (zh) * 2021-01-07 2022-07-15 中国科学技术大学 一种手模型感知的孤立词手语识别方法
CN113191421A (zh) * 2021-04-25 2021-07-30 东北大学 一种基于Faster-RCNN的手势识别系统及方法
CN113255497B (zh) * 2021-05-17 2022-08-16 南京甄视智能科技有限公司 基于数据合成的多场景活体检测方法、系统、服务器与可读介质
CN113326751B (zh) * 2021-05-19 2024-02-13 中国科学院上海微系统与信息技术研究所 一种手部3d关键点的标注方法
CN113239835B (zh) * 2021-05-20 2022-07-15 中国科学技术大学 模型感知的手势迁移方法
CN114035687B (zh) * 2021-11-12 2023-07-25 郑州大学 一种基于虚拟现实的手势识别方法和系统
CN117593437B (zh) * 2024-01-18 2024-05-14 华伦医疗用品(深圳)有限公司 基于gpu的内窥镜实时图像处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117766A (zh) * 2018-07-30 2019-01-01 上海斐讯数据通信技术有限公司 一种动态手势识别方法及系统
CN109325995A (zh) * 2018-09-13 2019-02-12 叠境数字科技(上海)有限公司 基于人手参数模型的低分辨率多视角手部重建方法
CN110427877A (zh) * 2019-08-01 2019-11-08 大连海事大学 一种基于结构信息的人体三维姿态估算的方法
US20200026910A1 (en) * 2017-03-31 2020-01-23 Beijing Sensetime Technology Development Co., Ltd. Gesture identification, control, and neural network training methods and apparatuses, and electronic devices
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837778B (zh) * 2019-10-12 2023-08-18 南京信息工程大学 一种基于骨架关节点序列的交警指挥手势识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026910A1 (en) * 2017-03-31 2020-01-23 Beijing Sensetime Technology Development Co., Ltd. Gesture identification, control, and neural network training methods and apparatuses, and electronic devices
CN109117766A (zh) * 2018-07-30 2019-01-01 上海斐讯数据通信技术有限公司 一种动态手势识别方法及系统
CN109325995A (zh) * 2018-09-13 2019-02-12 叠境数字科技(上海)有限公司 基于人手参数模型的低分辨率多视角手部重建方法
CN110427877A (zh) * 2019-08-01 2019-11-08 大连海事大学 一种基于结构信息的人体三维姿态估算的方法
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817757A (zh) * 2022-04-02 2022-07-29 广州大学 基于图卷积网络的跨社交网络虚拟身份关联方法
CN115346345A (zh) * 2022-07-28 2022-11-15 福建省杭氟电子材料有限公司 一种用于六氟丁二烯制备的智能化有毒有害气体报警系统
CN115346345B (zh) * 2022-07-28 2023-06-06 福建省杭氟电子材料有限公司 一种用于六氟丁二烯制备的智能化有毒有害气体报警系统

Also Published As

Publication number Publication date
CN111832468A (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
WO2021120834A1 (zh) 基于生物识别的手势识别方法、装置、计算机设备及介质
Wang et al. Mask-pose cascaded cnn for 2d hand pose estimation from single color image
US11468636B2 (en) 3D hand shape and pose estimation
WO2021103648A1 (zh) 手部关键点检测方法、手势识别方法及相关装置
CN110021051B (zh) 一种基于生成对抗网络通过文本指导的人物图像生成方法
US10572072B2 (en) Depth-based touch detection
Nai et al. Fast hand posture classification using depth features extracted from random line segments
US11435845B2 (en) Gesture recognition based on skeletal model vectors
WO2021143103A1 (zh) 视频数据处理方法、装置、设备及计算机可读存储介质
WO2022105118A1 (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
WO2021223738A1 (zh) 模型参数的更新方法、装置、设备及存储介质
CN109003224A (zh) 基于人脸的形变图像生成方法和装置
WO2023151237A1 (zh) 人脸位姿估计方法、装置、电子设备及存储介质
CN105096353A (zh) 一种图像处理方法及装置
CN113254491A (zh) 一种信息推荐的方法、装置、计算机设备及存储介质
US20230290174A1 (en) Weakly supervised semantic parsing
CN112699857A (zh) 基于人脸姿态的活体验证方法、装置及电子设备
Jin et al. Emotion information visualization through learning of 3D morphable face model
CN112949576B (zh) 姿态估计方法、装置、设备及存储介质
Zhang et al. A posture detection method for augmented reality–aided assembly based on YOLO-6D
He et al. ContourPose: Monocular 6-D Pose Estimation Method for Reflective Textureless Metal Parts
Yang et al. 3D character recognition using binocular camera for medical assist
Li et al. Static hand gesture recognition based on hierarchical decision and classification of finger features
CN110414402B (zh) 一种手势数据标注方法、装置、电子设备及存储介质
Ogiela et al. Natural user interfaces for exploring and modeling medical images and defining gesture description technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900872

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900872

Country of ref document: EP

Kind code of ref document: A1