CN111191622B - Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector - Google Patents

Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector Download PDF

Info

Publication number
CN111191622B
CN111191622B CN202010006031.3A CN202010006031A CN111191622B CN 111191622 B CN111191622 B CN 111191622B CN 202010006031 A CN202010006031 A CN 202010006031A CN 111191622 B CN111191622 B CN 111191622B
Authority
CN
China
Prior art keywords
key points
thermodynamic diagram
gesture recognition
offset
thermodynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010006031.3A
Other languages
Chinese (zh)
Other versions
CN111191622A (en
Inventor
肖菁
李海超
屈光卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202010006031.3A priority Critical patent/CN111191622B/en
Publication of CN111191622A publication Critical patent/CN111191622A/en
Application granted granted Critical
Publication of CN111191622B publication Critical patent/CN111191622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a gesture recognition method, a gesture recognition system and a storage medium based on thermodynamic diagrams and offset vectors, wherein the gesture recognition method comprises the following steps: acquiring a target image to be identified; extracting the characteristics of the target image to be identified; predicting the positions of key points according to the extracted features; correcting the predicted key points, and determining the final positions of the key points; and determining the gesture information of the target to be identified according to the key points. According to the invention, by extracting the characteristics of the image, then predicting the positions of the key points, correcting the predicted results and finally identifying to obtain the attitude information, the more accurate attitude information can be obtained, and the method can be widely applied to the technical field of deep learning.

Description

Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector
Technical Field
The invention relates to the technical field of deep learning, in particular to a gesture recognition method, a gesture recognition system and a storage medium based on thermodynamic diagrams and offset vectors.
Background
Thermodynamic diagrams: the probability map is that the probability of a pixel point which is closer to the center point is closer to 1, and the probability of a pixel point which is farther from the center point is closer to 0, and the simulation can be performed by a corresponding function, such as Gaussian, and the like.
Offset vector: the point-to-point displacement is inferred from the distance between the point and the reference point.
Posture estimation: the pose of the object in the image (or stereoscopic image, image sequence) is determined, reconstructing the specific tasks of the joints and limbs of the person.
People often record life by taking pictures in daily life, and in order to better understand character information in pictures, people want to locate the positions of people, know activities performed by people, and how to achieve the targets is a main problem of human body posture estimation. Pose estimation is also called human body keypoint detection, and mainly identifies the position of a human body's key parts, such as a nose, a left eye, a right eye, a left ear, a right ear, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left hip, a right hip, a left knee, a right knee, a left ankle, a right ankle, and the like. Despite many years of research, it has been a very challenging problem in computer vision to date, with difficulties arising mainly from complex background, blurring, occlusion, light and shade of illumination, and clothing color in natural scenes. Moreover, limb interactions from person to person can also cause strong disturbances, such as overlapping of limbs, shadowing between limbs.
Because more than one person is often in the actual application scene, the current gesture estimation algorithm is mainly a multi-person gesture algorithm. There are two main trends in the multi-person pose estimation algorithm, one is Top-down (Top-down) and the other is Bottom-up (Bottom-up). The top-down method is to obtain the detection frames of multiple people in the image by using Object detection (Object detection) method, such as fast-RCNN (fast Region-based Convolutional Neural Networks) or SSD (Single Shot MultiBox Detector), and then cut them from the original image and transmit them to the later pose estimation network respectively, where the network predicts the key points of the human body separately for the cut image. The top-down approach converts the problem of multi-person pose estimation into single person pose estimation. The bottom-up multi-person gesture estimation method is to detect key points on all persons first, then cluster the key points, and connect different key points of different persons together, so that different individuals are generated by clustering. The bottom-up multi-person gesture estimation method focuses on the exploration of a key point clustering method, namely how to construct the relation between different key points.
With the rapid development of the deep learning technology in the field of computer vision, a large amount of research work for solving the detection of key points of human bodies by adopting deep learning has been developed in recent years. However, most existing works focus on how to design the data transfer path in the network to obtain the rich spatial information and detail information in the picture. For example, a feature pyramid network (Feature Pyramid Networks for Object Detection), a cascaded convolutional neural network (Cascaded Pyramid Network for Multi-Person Pose Estimation) and a stacked hourglass network (Stacked Hourglass Networks for Human Pose Estimation), and so forth. These methods can naturally improve the accuracy of human body key point detection, but they ignore the small offset that occurs in the mapping process of the predicted point from low resolution to high resolution, which causes a certain degree of precision loss.
Disclosure of Invention
In view of this, embodiments of the present invention provide a gesture recognition method, system and storage medium based on a thermodynamic diagram and an offset vector with high precision.
The first aspect of the invention provides a gesture recognition method based on thermodynamic diagrams and offset vectors, comprising the following steps:
acquiring a target image to be identified;
extracting the characteristics of the target image to be identified;
predicting the positions of key points according to the extracted features;
correcting the predicted key points, and determining the final positions of the key points; and
and determining the gesture information of the target to be identified according to the key points.
Further, the step of extracting the features of the target image to be identified includes:
cutting the obtained target image to be identified;
inputting each image obtained by cutting into a residual error network; and
and performing coding processing through the residual error network to obtain a first characteristic diagram.
Further, the residual network includes five sets of convolutional layers;
in addition, the step of obtaining the feature map through the encoding processing of the residual network comprises the following steps:
performing dimension changing processing on each channel of the feature map through convolution check, wherein the dimension changing processing comprises dimension increasing and dimension decreasing;
normalizing each channel; and
and carrying out nonlinear activation processing on the normalization processing result.
Further, the step of extracting the features of the target image to be identified further includes a decoding step, where the decoding step includes:
inputting the obtained first characteristic diagram into a deconvolution structure;
decoding the first feature map through a deconvolution structure; and
and acquiring characteristic response graphs of all the channels.
Further, the predicting the key point position according to the extracted feature includes:
obtaining thermodynamic diagrams from the output results of the channels;
calculating the maximum value of each thermodynamic diagram to obtain the position information of each key point on the thermodynamic diagram; and
and mapping the position information of the key points to the target image to be identified according to the size relation between the target image to be identified and the thermodynamic diagram.
Further, the step of correcting the predicted key point and determining the final position of the key point includes the steps of:
determining an offset vector of the key point according to the output result of each channel; and
and adding the offset vector to the maximum value of the thermodynamic diagram according to the offset vector, and determining the final position of the key point.
Further, the method also comprises the following steps:
training a thermodynamic diagram by adopting a mean square error loss function; and
in training the offset vector, a smooth penalty function is employed to handle the gap between the true offset and the predicted offset.
A second aspect of the present invention provides a thermodynamic diagram and offset vector based gesture recognition system comprising:
the acquisition module is used for acquiring the target image to be identified;
the feature extraction module is used for extracting features of the target image to be identified;
the key point predicting module is used for predicting the position of the key point according to the extracted characteristics;
the key point correction module is used for correcting the predicted key points and determining the final positions of the key points; and
and the gesture determining module is used for determining gesture information of the target to be recognized according to the key points.
A third aspect of the present invention provides a thermodynamic diagram and offset vector based gesture recognition system comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method.
A fourth aspect of the invention provides a storage medium having stored therein processor-executable instructions which, when executed by a processor, are for performing the method.
One or more of the above technical solutions in the embodiments of the present invention have the following advantages: according to the embodiment of the invention, the predicted result can be corrected by extracting the characteristics of the image and predicting the positions of the key points, and finally the gesture information is obtained by recognition.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating the overall steps of an embodiment of the present invention;
FIG. 2 is a first example flow chart of an embodiment of the present invention;
FIG. 3 is a second exemplary flow chart of an embodiment of the present invention;
FIG. 4 is a schematic diagram of coordinate position prediction by coordinate offset correction thermodynamic diagrams according to an embodiment of the present invention;
FIG. 5 is a graph showing the comparison of various algorithms on MSCOCO data sets according to embodiments of the present invention;
FIG. 6 is a graph showing the comparison of various algorithms on MPII datasets in accordance with an embodiment of the present invention;
FIG. 7 is a comparison of various algorithms of an embodiment of the present invention on a CROWPOSE dataset;
FIG. 8 is a graph showing the detection of HOPE on MSCOCO data sets according to an embodiment of the invention;
FIG. 9 is a graph showing the detection of HOPE on an MPII dataset according to an embodiment of the invention;
fig. 10 shows the detection of hop on a crowdose dataset according to an embodiment of the present invention.
Detailed Description
The invention is further explained and illustrated below with reference to the drawing and the specific embodiments of the present specification. The step numbers in the embodiments of the present invention are set for convenience of illustration, and the order of steps is not limited in any way, and the execution order of the steps in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.
Since the prior art is mostly only a thermodynamic approach to innovations for network architecture, and mainly focuses on loss functions. However, thermodynamic diagram-based methods have a coordinate mapping process that ignores the loss of predicted points from the low resolution thermodynamic diagram resulting in the loss of coordinate mapping back to the original diagram, which limits the accuracy improvement.
Thus, the application provides a human body posture estimation method based on thermodynamic diagrams and coordinate offset, which predicts thermodynamic diagrams and offset vectors of key points by extracting features through a convolutional neural network with strong robustness, predicts the coordinates of the key points by using the thermodynamic diagrams, and corrects the coordinates of the key points by using the offset vectors so as to obtain more accurate position information.
Referring to fig. 1, the specific implementation steps of the embodiment of the present application include:
s1: acquiring a target image to be identified;
s2: extracting the characteristics of the target image to be identified;
as shown in fig. 2 and fig. 3, the feature extraction in the embodiment of the present application is to convert a picture into a feature, and the network structure of the model is mainly divided into two parts, namely, an encoding module and a decoding module. The coding module adopts a 50-layer residual network, removes the last 1x1 convolution layer, extracts the characteristics of an input image in a full convolution mode, and particularly has very excellent performance in a plurality of computer vision tasks due to the design of residual, and has very strong characteristic expression capability.
The residual network in this embodiment is composed of c1, c2, c3, c4, c5, and 5 groups of convolutional layers, each layer containing N residual modules. The residual module is composed of alternating convolution layers, BN layers and ReLUs, the convolution kernel of 1x1 is mainly used for reducing or increasing the dimension of a channel of the feature map, and the calculated amount of the next convolution kernel can be effectively reduced by reducing the dimension by 1x1 before the convolution kernel of 3x3 is input. The BN layer is a batch normalization layer, and each channel has four corresponding parameters, namely a mean value, a variance, a coefficient of telescopic transformation and a bias, which are used for normalizing the characteristics input into the layer, so that the problem that the gradient disappears or the gradient explodes due to the change of the data distribution of the middle layer in the model training process is prevented. The ReLU is used as a nonlinear activation function, so that nonlinear expression capacity of the network is improved on one hand, and on the other hand, the problem that parameters of the Sigmoid function are slowly updated in a saturation region is avoided. As shown in fig. 2, the specific implementation steps of the encoding steps in the embodiments of the present application are: firstly, cutting an acquired image; secondly, inputting the image into a residual error network; and thirdly, acquiring the coded characteristic diagram from a residual error network.
In addition, the embodiment of the application further includes a decoding step, as shown in fig. 2, where the decoding step includes: the first step, inputting the obtained characteristic diagram into a deconvolution structure; secondly, encoding the feature map by the deconvolution structure; third, a characteristic response map of 3*n channels would be obtained from a 1x1 convolution. As shown in fig. 3, the network outputs a thermodynamic diagram corresponding to n channels for predicting the positions of n keypoints, respectively, and an offset vector corresponding to 2*n channels for predicting the offsets of the keypoints at each position in the x and y directions, respectively, the final network end profile has a size of 64×48, which is a quarter of the input image in width and height.
S3: predicting the positions of key points according to the extracted features;
specifically, the present embodiment assumes that the position of the kth key point is l k If position x on thermodynamic diagram i And key point l k The distance of each position in the circle is not more than the radius R, and the probability of each position being a true key point is subject to a Gaussian distribution, thus being more beneficial to the learning of the network, namely h k (x i )=G(x i -l k )if||x i -l k R is not more than k (x i ) =0, where G represents a gaussian function. Clearly, from the key point l k The closer the more likely the location on the thermodynamic diagram is to be a keypoint. Specifically, the implementation step of predicting the key point includes: firstly, obtaining a thermodynamic diagram in an output channel of a network; second, for each key point l k Corresponds to a thermodynamic diagram h k Obtaining the maximum value of each thermodynamic diagram to obtain the position of each key point on the thermodynamic diagram; and thirdly, mapping coordinates from the thermodynamic diagram to the input image according to the multiple relation between the input image size and the thermodynamic diagram size.
S4: correcting the predicted key points, and determining the final positions of the key points;
specifically, in the embodiment of the present application, the key points have a loss of precision when mapping from a low-resolution image to a high-resolution image, as shown in fig. 4 (b), each grid represents a pixel position, and the area framed by the rectangle in fig. 4 (a) is a thermodynamic diagram for predicting the position of the left wrist, but when the predicted coordinates thereof are mapped to the resolution of the input image, a larger loss of precision occurs. As can be seen from fig. 4 (b), one pixel in the thermodynamic diagram actually represents the 16-pixel position of the original image, because the width and height are both one-fourth of the original image, and the coordinate by 4 on each thermodynamic diagram can only be mapped to the first pixel of the corresponding region of the input image, i.e. the position in the upper left corner of the 16 grid in fig. 4 (b), which is the source of the loss of precision in the coordinate mapping process. Many works to reduce the accuracy loss of coordinate mapping, manually shifting the thermodynamic diagram predicted keypoint locations by a quarter of a pixel, i.e. by a distance of 1 pixel on the original input image, in this stage does indeed reduce the expected error between the mapped keypoints and the true keypoints, with a slight accuracy improvement, but does not radically solve the accuracy loss problem.
Based on such a current situation, the network of the present application predicts each location x in addition to the output thermodynamic diagram i Two-dimensional offset vector o relative to input image k (x i ) Let neural network actively learn the offset between mapped key points and real key points, o k (x i ) After mapping a certain position xi on the kth thermodynamic diagram, the shift relative to the kth keypoint on the input image is aimed at correcting the predicted position of the keypoint. Because there are k keypoints, the network of the present application generates k such offset fields, solving a two-dimensional regression problem for each keypoint and its nearby locations, respectively.
Referring to fig. 2, the correction step is implemented specifically by a first step of taking the location of the maximum thermodynamic diagram after the thermodynamic diagram and the offset vector are generated by the network, and a second step of adding the offset vector to the maximum thermodynamic diagram location to obtain the location of the key point that is finally mapped to the input image, i.e. keypoint positions=hetmap positions.
S5: and determining the gesture information of the target to be identified according to the key points.
In addition, the embodiment of the application also provides the steps of model training and testing, in particular:
the classical mean square error loss function is adopted to train the thermodynamic diagram, and it is worth noting that the probability value of the region within the distance R near the key point is only calculated and lost, namely only those points near the key point are trained, so that the convergence of the network is facilitated, and the loss function is shown.
Figure BDA0002355320320000061
In respect of training offset vectors, inspired by the target detection domain regression detection frame coordinates, the present application employs a smooth loss function to penalize the gap between true and predicted offsets, as shown.
Figure BDA0002355320320000062
Such a loss function may make the loss more robust to outliers of anomalies, thereby controlling better back propagation of gradients in the network. Likewise, the present application only calculates the losses at those locations that are no more than R from the keypoint. After fusing the two losses, the final loss function is shown as formula, where λ h And lambda (lambda) o The weights representing the two losses, respectively, are scaled 4:1, and the optimizer used by the training model is adam.
L(θ)=λ h L h (θ)+λ o L o (θ) (3)
In addition, the present application selects three test datasets disclosed in the attitude estimation field to perform experimental measurements to further illustrate the advantages of the present application over the prior art:
the operating environment of the embodiment of the application: 6 cores, intel Xeon E5-2620 processor, 64GB memory, titan X graphics card, ubuntu 16.04 operating system.
The three data sets are: (1) MSCOCO: the MSCOCO data set can be applied to tasks such as target detection, semantic segmentation, key point detection and the like. The patent mainly uses a COCO data set in 2017, wherein a training set comprises 118287 pictures, a test set comprises 5000 pictures, and no picture has labels of a plurality of people.
(2) MPII A MPII human body posture dataset is the most advanced benchmark for assessing articulated human body posture estimates. The dataset comprises about 25K images containing more than 40K persons with annotated human joints. These images are collected according to a classification system of human daily activities. The entire dataset covers 410 human activities, each image having an activity label. Each image is extracted from the YouTube video. Approximately 25000 pictures are included in the data, and more than 4 ten thousand unlabeled human body key point examples are included, wherein 28000 examples are used for training of a network, and the rest 12000 samples are used for testing.
(3) Crowdose we also evaluated our method in a crowdpost dataset containing 2 ten thousand pictures and 8 ten thousand human examples. The crowd posture data set is designed to improve performance in crowded situations, so that the model is suitable for different scenes.
In order to evaluate the effectiveness of the algorithm, the performance evaluation indexes of the AP and the PCK are adopted in the experiment of this embodiment, wherein the AP is used as the evaluation index in the COCO and crowtoe data set, and the PCK is used as the evaluation index in the MPII data set. Object keypoint similarity OKS (object keypoint similarity), which is used to calculate the similarity between predicted keypoints and labeled keypoints, is formulated as follows:
Figure BDA0002355320320000071
wherein D is i Representing the Euclidean distance between the predicted and labeled keypoints, s being the scale of the object, k i Is a key control constant for controlling attenuation, v i Indicating whether the keypoint is visible. After giving the threshold s of OKS, the average accuracy over the test set can then be calculated by the following formulaTo:
Figure BDA0002355320320000081
another important evaluation criterion for keypoints is PCK, which is an indicator of the proportion of all predicted keypoints that fall a normalized distance around the corresponding labeled keypoint. This normalized distance is often related to the longest distance of the human torso in the picture. Usually denoted pck @ sigma, where sigma is a fraction between intervals 0,1, and multiplying sigma by the longest trunk distance yields the normalized distance in the evaluation index, by the following method:
Figure BDA0002355320320000082
where N represents the total number of samples and k represents the kth human keypoint, so the overall PCK is:
Figure BDA0002355320320000083
the evaluation index used on the MPII dataset is PCKh, unlike PCK, which replaces the longest trunk distance used in normalizing distance with the longest head distance.
The present embodiments compare AP, PCK values with other algorithms on the MSCOCO, MPII, CROWPOSE three data sets, respectively. These methods include simple human body pose estimation and tracking baseline (Simple baselines for human pose estimation and tracking, SB) and accurate multi-person pose estimation in natural environment (Towards accurate multi-person pose estimation in the wild, G-RMI), cascading pyramid network for multi-person pose estimation (Cascaded pyramid network for multi-person pose estimation, CPN), stacking Hourglass network for human body pose estimation (Stacked Hourglass networks for human pose estimation, hourglass), learning feature pyramid for human body pose estimation (Learning feature pyramids for human pose estimation, FPN), quantifying tightly connected u-shaped network (Quantized densely connected u-nets for efficient landmark localization, dense u-net), human body pose estimation by convolution partial heat map regression (Human pose estimation via convolutional part heatmap regression, vcph). The algorithm of the present application is simply referred to as HOPE.
FIG. 5 is a graph showing the results of the present application and other algorithms on MSCOCO datasets; FIG. 6 is a graph showing the results of the present application and other algorithms on MPII datasets; FIG. 7 is a graph showing the results of the present application and other algorithms on a CORDPOSE dataset.
As can be seen from fig. 5, 6 and 7, the AP and PCK values of the present application are superior to those of other algorithms, whether it is a sparse scene or a crowded scene. In addition, fig. 8 shows the detection result of the hop on the MSCOOCO dataset, fig. 9 shows the detection result of the hop on the MPII dataset, and fig. 10 shows the detection result of the hop on the crowtoe dataset. As can be seen from fig. 8-10, the correlation of the results returned in the detection of the keypoints is relatively high, which further illustrates that the present patent has a better effect in the detection of the keypoints.
The embodiment of the invention also provides a gesture recognition system based on the thermodynamic diagram and the offset vector, which comprises:
the acquisition module is used for acquiring the target image to be identified;
the feature extraction module is used for extracting features of the target image to be identified;
the key point predicting module is used for predicting the position of the key point according to the extracted characteristics;
the key point correction module is used for correcting the predicted key points and determining the final positions of the key points; and
and the gesture determining module is used for determining gesture information of the target to be recognized according to the key points.
The embodiment of the invention also provides a gesture recognition system based on the thermodynamic diagram and the offset vector, which comprises:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method.
Embodiments of the present invention also provide a storage medium having stored therein processor-executable instructions which, when executed by a processor, are for performing the method.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims (9)

1. The gesture recognition method based on thermodynamic diagrams and offset vectors is characterized by comprising the following steps of:
acquiring a target image to be identified;
extracting the characteristics of the target image to be identified;
predicting the positions of key points according to the extracted features;
correcting the predicted key points, and determining the final positions of the key points; and
determining the gesture information of the target to be identified according to the key points;
the method further comprises the step of evaluating the validity of the gesture recognition method, the step comprising:
performing performance evaluation by adopting AP and PCK performance evaluation indexes, wherein the AP is used as an evaluation index in COCO and CROWPOSE data sets, and the PCK is used as an evaluation index in MPII data sets;
calculating the similarity between the predicted key points and the marked key points through the object key point similarity OKS, wherein the calculation formula of the similarity is as follows:
Figure FDA0004120567880000011
wherein D is i Representing the Euclidean distance between the predicted and labeled keypoints, s being the scale of the object, k i Is a key control constant for controlling attenuation, v i Indicating whether the key point is visible, delta (v i >0) Representing the sum of the visible keypoints;
after giving the threshold value s of OKS, the average precision AP@s on the test set is calculated by the following formula:
Figure FDA0004120567880000012
the PCK index represents the proportion of all predicted key points falling into a certain standardized distance around the corresponding marked key points; this normalized distance is related to the longest distance of the torso in the picture, denoted pck @ sigma, where sigma is a fraction between intervals 0,1, and the normalized distance in the evaluation index is obtained by multiplying sigma by the longest distance of the torso, as follows:
Figure FDA0004120567880000021
where N represents the total number of samples, k represents the kth human keypoint,
Figure FDA0004120567880000022
representing the predictive key point->
Figure FDA0004120567880000023
Representing the actual key point of the key point,
Figure FDA0004120567880000024
representing predicted torso diameter,/->
Figure FDA0004120567880000025
Representing the true torso diameter, the overall PCK is:
Figure FDA0004120567880000026
the evaluation index used on the MPII dataset was PCKh, which differs from PCK,
PCKh replaces the longest trunk distance used when normalizing distance with the longest head distance.
2. The method for recognizing a gesture based on thermodynamic diagrams and offset vectors according to claim 1, wherein the step of extracting features of the target image to be recognized comprises:
cutting the obtained target image to be identified;
inputting each image obtained by cutting into a residual error network; and
and performing coding processing through the residual error network to obtain a first characteristic diagram.
3. The thermodynamic diagram and offset vector based gesture recognition method of claim 2, wherein the residual network comprises five sets of convolution layers;
in addition, the step of obtaining the feature map through the encoding processing of the residual network comprises the following steps:
performing dimension changing processing on each channel of the feature map through convolution check, wherein the dimension changing processing comprises dimension increasing and dimension decreasing;
normalizing each channel; and
and carrying out nonlinear activation processing on the normalization processing result.
4. The gesture recognition method based on thermodynamic diagrams and offset vectors according to claim 2, wherein the step of feature extraction of the target image to be recognized further comprises a decoding step including:
inputting the obtained first characteristic diagram into a deconvolution structure;
decoding the first feature map through a deconvolution structure; and
and acquiring characteristic response graphs of all the channels.
5. The thermodynamic diagram and offset vector based gesture recognition method of claim 4, wherein predicting key point locations based on extracted features comprises:
obtaining thermodynamic diagrams from the output results of the channels;
calculating the maximum value of each thermodynamic diagram to obtain the position information of each key point on the thermodynamic diagram; and
and mapping the position information of the key points to the target image to be identified according to the size relation between the target image to be identified and the thermodynamic diagram.
6. The method for recognizing a gesture based on thermodynamic diagrams and offset vectors according to claim 4, wherein the step of correcting the predicted keypoints to determine final positions of the keypoints comprises the steps of:
determining an offset vector of the key point according to the output result of each channel; and
and adding the offset vector to the maximum value of the thermodynamic diagram according to the offset vector, and determining the final position of the key point.
7. The thermodynamic diagram and offset vector based gesture recognition method of claim 6, further comprising the steps of:
training a thermodynamic diagram by adopting a mean square error loss function; and
in training the offset vector, a smooth penalty function is employed to handle the gap between the true offset and the predicted offset.
8. A thermodynamic diagram and offset vector based gesture recognition system comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any of claims 1-7.
9. A storage medium having stored therein processor executable instructions which, when executed by a processor, are for performing the method of any of claims 1-7.
CN202010006031.3A 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector Active CN111191622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010006031.3A CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010006031.3A CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Publications (2)

Publication Number Publication Date
CN111191622A CN111191622A (en) 2020-05-22
CN111191622B true CN111191622B (en) 2023-05-26

Family

ID=70708632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010006031.3A Active CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Country Status (1)

Country Link
CN (1) CN111191622B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680623B (en) * 2020-06-05 2023-04-21 北京百度网讯科技有限公司 Gesture conversion method and device, electronic equipment and storage medium
CN111695519B (en) 2020-06-12 2023-08-08 北京百度网讯科技有限公司 Method, device, equipment and storage medium for positioning key point
CN111814593A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Traffic scene analysis method and device, and storage medium
CN111914639A (en) * 2020-06-30 2020-11-10 吴�荣 Driving action recognition method of lightweight convolution space-time simple cycle unit model
CN111860276B (en) * 2020-07-14 2023-04-11 咪咕文化科技有限公司 Human body key point detection method, device, network equipment and storage medium
CN111860300A (en) * 2020-07-17 2020-10-30 广州视源电子科技股份有限公司 Key point detection method and device, terminal equipment and storage medium
CN111985556A (en) * 2020-08-19 2020-11-24 南京地平线机器人技术有限公司 Key point identification model generation method and key point identification method
CN111967406A (en) * 2020-08-20 2020-11-20 高新兴科技集团股份有限公司 Method, system, equipment and storage medium for generating human body key point detection model
CN112132131B (en) * 2020-09-22 2024-05-03 深兰科技(上海)有限公司 Measuring cylinder liquid level identification method and device
CN112417972A (en) * 2020-10-23 2021-02-26 奥比中光科技集团股份有限公司 Heat map decoding method, human body joint point estimation method and system
CN112446302B (en) * 2020-11-05 2023-09-19 杭州易现先进科技有限公司 Human body posture detection method, system, electronic equipment and storage medium
CN112101490B (en) * 2020-11-20 2021-03-02 支付宝(杭州)信息技术有限公司 Thermodynamic diagram conversion model training method and device
CN112528858A (en) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 Training method, device, equipment, medium and product of human body posture estimation model
CN112651316B (en) * 2020-12-18 2022-07-15 上海交通大学 Two-dimensional and three-dimensional multi-person attitude estimation system and method
CN112597955B (en) * 2020-12-30 2023-06-02 华侨大学 Single-stage multi-person gesture estimation method based on feature pyramid network
CN112837336B (en) * 2021-02-23 2022-02-22 浙大宁波理工学院 Method and system for estimating and acquiring room layout based on heat map correction of key points
CN112926648B (en) * 2021-02-24 2021-11-16 北京优创新港科技股份有限公司 Method and device for detecting abnormality of tobacco leaf tip in tobacco leaf baking process
CN113076891B (en) * 2021-04-09 2023-08-22 华南理工大学 Human body posture prediction method and system based on improved high-resolution network
CN113128436B (en) * 2021-04-27 2022-04-01 北京百度网讯科技有限公司 Method and device for detecting key points
CN113159198A (en) * 2021-04-27 2021-07-23 上海芯物科技有限公司 Target detection method, device, equipment and storage medium
CN113011402B (en) * 2021-04-30 2023-04-25 中国科学院自动化研究所 Primate gesture estimation system and method based on convolutional neural network
CN113343762B (en) * 2021-05-07 2022-03-29 北京邮电大学 Human body posture estimation grouping model training method, posture estimation method and device
CN113537234A (en) * 2021-06-10 2021-10-22 浙江大华技术股份有限公司 Quantity counting method and device, electronic device and computer equipment
CN114463534A (en) * 2021-12-28 2022-05-10 佳都科技集团股份有限公司 Target key point detection method, device, equipment and storage medium
CN114359974B (en) * 2022-03-08 2022-06-07 广东履安实业有限公司 Human body posture detection method and device and storage medium
CN114863237B (en) * 2022-03-25 2023-07-14 中国人民解放军国防科技大学 Method and system for recognizing swimming gesture
CN115272992B (en) * 2022-09-30 2023-01-03 松立控股集团股份有限公司 Vehicle attitude estimation method
CN115331153B (en) * 2022-10-12 2022-12-23 山东省第二人民医院(山东省耳鼻喉医院、山东省耳鼻喉研究所) Posture monitoring method for assisting vestibule rehabilitation training
CN116645699B (en) * 2023-07-27 2023-09-29 杭州华橙软件技术有限公司 Key point detection method, device, terminal and computer readable storage medium
CN117437433B (en) * 2023-12-07 2024-03-19 苏州铸正机器人有限公司 Sub-pixel level key point detection method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033946A (en) * 2018-06-08 2018-12-18 东南大学 Merge the estimation method of human posture of directional diagram
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033946A (en) * 2018-06-08 2018-12-18 东南大学 Merge the estimation method of human posture of directional diagram
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘唐波 ; 杨锐 ; 王文伟 ; 何楚 ; .基于姿态估计的驾驶员手部动作检测方法研究.信号处理.2019,(12),第136-143页. *

Also Published As

Publication number Publication date
CN111191622A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111191622B (en) Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector
He et al. Epipolar transformers
Moreno-Noguer 3d human pose estimation from a single image via distance matrix regression
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
CN112597941B (en) Face recognition method and device and electronic equipment
JP6639123B2 (en) Image processing apparatus, image processing method, and program
CN105844669B (en) A kind of video object method for real time tracking based on local Hash feature
Zhang et al. Actively learning human gaze shifting paths for semantics-aware photo cropping
Zhu et al. Convolutional relation network for skeleton-based action recognition
US9158963B2 (en) Fitting contours to features
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
Holmquist et al. Diffpose: Multi-hypothesis human pose estimation using diffusion models
CN112883896B (en) Micro-expression detection method based on BERT network
US20140099031A1 (en) Adjusting a Contour by a Shape Model
Yan et al. Monocular depth estimation with guidance of surface normal map
Etezadifar et al. A new sample consensus based on sparse coding for improved matching of SIFT features on remote sensing images
Gouidis et al. Accurate hand keypoint localization on mobile devices
CN113229807A (en) Human body rehabilitation evaluation device, method, electronic device and storage medium
Nguyen et al. Combined YOLOv5 and HRNet for high accuracy 2D keypoint and human pose estimation
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
Liu et al. Mean shift fusion color histogram algorithm for nonrigid complex target tracking in sports video
Zhang et al. Lightweight network for small target fall detection based on feature fusion and dynamic convolution
Kaviani et al. Semi-Supervised 3D hand shape and pose estimation with label propagation
CN113343762B (en) Human body posture estimation grouping model training method, posture estimation method and device
Zhang et al. Animal Pose Estimation Algorithm Based on the Lightweight Stacked Hourglass Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant