CN116363750A - Human body posture prediction method, device, equipment and readable storage medium - Google Patents
Human body posture prediction method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN116363750A CN116363750A CN202310251993.9A CN202310251993A CN116363750A CN 116363750 A CN116363750 A CN 116363750A CN 202310251993 A CN202310251993 A CN 202310251993A CN 116363750 A CN116363750 A CN 116363750A
- Authority
- CN
- China
- Prior art keywords
- human body
- key point
- thermodynamic diagram
- predicted
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000010586 diagram Methods 0.000 claims abstract description 136
- 230000036544 posture Effects 0.000 claims description 74
- 238000012545 processing Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000005764 inhibitory process Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 12
- 238000012937 correction Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a human body posture prediction method, a device, equipment and a readable storage medium. The method comprises the following steps: extracting a feature map of an input picture; obtaining an offset, a thermodynamic diagram and a central thermodynamic diagram of the human body key points based on the feature diagram, obtaining predicted human body key points according to the offset and the central thermodynamic diagram, generating a local key point expansion window by taking the predicted human body key points as the center, and converting the key point expansion window into a key point attraction field; generating a global key point expansion window by using the predicted human key points and the thermodynamic diagram, and carrying out convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram; obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram; and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point. By the method and the device, the human body posture prediction precision is improved.
Description
Technical Field
The present invention relates to the field of computer vision, and in particular, to a human body posture prediction method, apparatus, device, and readable storage medium.
Background
One of the important tasks in the field of computer vision research is human skeleton key point detection, and particularly, the computer can sense the positions of all skeleton key points of a human body, so that a foundation is provided for a plurality of practical scenes such as further action recognition, action abnormality detection, intelligent monitoring, automatic driving and the like.
The object of human skeleton key point detection is to take a picture as input and output the coordinates of each skeleton key point of each human body in the picture and the real world. Currently, the main human body key point detection technology based on deep learning can be divided into two kinds, namely top-down and bottom-up methods. The top-down method firstly detects the target frames of the human body, then carries out single human body posture estimation aiming at each target frame, and the method has higher precision, but the calculation amount of the posture estimation of the single human body is proportional to the number of the target frames, so the calculation efficiency is not high, and meanwhile, the method is limited by the precision of human body target detection. While the bottom-up approach generally involves two steps, namely first detecting human keypoints and then grouping the keypoints. One representative of such methods is to predict the keypoint locations based on the keypoint centers and offsets, which avoids complex groupings of keypoints with faster computation speeds, but the disadvantage of this approach is also apparent, i.e., inaccurate estimates of offset vectors farther from the keypoint centers, and thus overall accuracy of the final model is not high.
Disclosure of Invention
Aiming at the defects of the existing technology for predicting the key point based on the key point center point and the offset, the invention provides a human body posture prediction method, a device, equipment and a readable storage medium.
In a first aspect, the present invention provides a human body posture prediction method, including:
extracting an input picture through a convolutional neural network to obtain a feature map;
obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center;
converting the local key point expansion window into a key point attraction field;
generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;
obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram;
and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point.
Optionally, the step of obtaining the offset, the thermodynamic diagram of the human body key point and the thermodynamic diagram of the human body key point center based on the extracted feature map includes:
the feature map is subjected to dimension reduction through two parallel branches respectively, and then convolution processing is carried out by using a convolution layer with the convolution kernel size of 1x 1;
wherein, one branch outputs a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram,by usingThe representation, wherein H is a thermodynamic diagram, H, w depend on the sampling step size of the backbone network, and the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, using +_>A representation; the other branch outputs the offset of 17 human body key points by +.>And (3) representing.
Optionally, the step of obtaining the predicted human body key point according to the offset of the human body key point and the thermodynamic diagram of the center of the human body key point, and generating the local key point expansion window by taking the predicted human body key point as the center includes:
performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer;
obtaining N candidate human body postures through the candidate key point center points and the offset of the human body key points, and simultaneously removing the human body postures with the human body key point center thermodynamic diagram score smaller than a threshold value;
for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points;
generating local key point expansion windows by taking each predicted human body key point as center
Optionally, the step of converting the local keypoint expansion window into a keypoint attraction field includes:
processing the feature map by using a convolution layer, batch normalization and correction linear unit to obtain a dimension C 3 Is a feature map of (1);
expanding windows using local keypointsFrom dimension C 3 Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2);
the local key point expanded window feature map is processed using three different convolution layers, batch normalization, and modified linear units to obtain the key point attraction field.
Optionally, the step of generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check on the global key point expansion window to obtain a corrected key point thermodynamic diagram includes:
generating a global key point expansion window for each predicted human key point;
expanding a window from dimension C using the generated global keypoint 3 Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map;
weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained;
and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.
Optionally, the step of obtaining the two-dimensional coordinates of the predicted human body node from the corrected key point thermodynamic diagram includes:
selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point;
and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.
Optionally, the step of encoding and decoding by using the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point includes:
performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics;
encoding the output characteristics through a time scale to obtain an output array;
and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.
In a second aspect, the present invention also provides a human body posture predicting apparatus, comprising:
the extraction module is used for extracting the input picture through a convolutional neural network to obtain a feature map;
the first generation module is used for obtaining the offset of the human body key points, the thermodynamic diagram of the human body key points and the central thermodynamic diagram of the human body key points based on the extracted feature diagram, obtaining predicted human body key points according to the offset of the human body key points and the central thermodynamic diagram of the human body key points, and generating a local key point expansion window by taking the predicted human body key points as the center;
the conversion module is used for converting the local key point expansion window into a key point attraction field;
the second generation module is used for generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and carrying out convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;
the third generation module is used for obtaining the two-dimensional coordinates of the predicted human body joint point from the corrected key point thermodynamic diagram;
and the encoding and decoding module is used for encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point respectively to obtain the three-dimensional information of the predicted human body joint point.
In a third aspect, the present invention also provides a human body posture predicting device comprising a processor, a memory, and a human body posture predicting program stored on the memory and executable by the processor, wherein the human body posture predicting program, when executed by the processor, implements the steps of the human body posture predicting method as described above.
In a fourth aspect, the present invention also provides a readable storage medium having stored thereon a human body posture prediction program, wherein the human body posture prediction program, when executed by a processor, implements the steps of the human body posture prediction method as described above.
In the invention, for an input picture, a feature map is extracted through a convolutional neural network; obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center; converting the local key point expansion window into a key point attraction field; generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram; obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram; and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point. The invention effectively utilizes the characteristic information, combines the characteristic information in the front and rear stages and the global and local information, thereby outputting more abundant characteristic information, improving the positioning effect of key points of the human body and further improving the prediction precision of the human body posture.
Drawings
FIG. 1 is a flow chart of an embodiment of a human body posture prediction method according to the present invention;
FIG. 2 is a schematic diagram of functional modules of an embodiment of a human body posture predicting device according to the present invention;
fig. 3 is a schematic hardware structure of a human body posture predicting device according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In a first aspect, an embodiment of the present invention provides a human body posture prediction method.
In an embodiment, referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a human body posture prediction method according to the present invention. As shown in fig. 1, the human body posture prediction method includes:
step S10, extracting an input picture through a convolutional neural network to obtain a feature map;
in this embodiment, for a given input picture, a low-dimensional feature map is extracted by a convolutional neural network. The feature extraction network used may be HRNets, and the spatial dimension of the final feature map output depends on the overall step size of the feature extraction network.
For example, assuming that the original picture has a size of h×w, the length and width of the features of the original input become s times as large as the original features each time the pooling layer is passed. Taking an original image as input, and obtaining a feature image F epsilon R with feature dimension of C through calculation of a series of convolution layers, correction linear units and pooling layers C×h×w The spatial dimension h×w depends on the overall step size of the feature extraction module.
Step S20, obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center;
in this embodiment, a feature map F εR is obtained C×h×w Then, firstly, a convolution layer with the convolution kernel size of 1x1 and a batch normalization and correction linear unit are used for reducing the dimension of the feature map, and then, the convolution layer with the convolution kernel size of 1x1 is used for carrying out convolution processing to output two branches. Wherein branch one outputs offThermodynamic diagrams of key points and key point centers usingTo show that the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, which is used for the following purposesTo represent.
A series of candidate human body gestures are obtained based on the calculated key point central thermodynamic diagram and the offset of the key point positions. Specifically, first, non-maximum suppression processing (window size 3×3) is performed on the obtained central point thermodynamic diagram, and the top N points with high scores are selected as candidates for the central point of the key point. Then, the candidate key point center points and the key point offset are used for obtaining N candidate human body posture estimates, and simultaneously, the human body posture with the key point center thermodynamic diagram score smaller than a given threshold value is removed (0.01 is taken in the embodiment).
A partial window of human body key point expansion is generated, specifically, for N calculated candidate human body postures, 17 key points of each human body posture respectively generate a partial grid (11 x11 in the embodiment) taking the positions of the points as the center for compensating the information of estimating the human body posture deletion by using the offset of the center point. This grid is defined as a window of expansion of key points of the human bodyTo represent.
Further, in an embodiment, the step of obtaining the offset, the thermodynamic diagram of the human body key point and the thermodynamic diagram of the human body key point center based on the extracted feature map includes:
the feature map is reduced in dimension by two parallel branches respectively,then, a convolution layer with the convolution kernel size of 1x1 is used for convolution processing; wherein, one branch outputs human body key point thermodynamic diagram and human body key point central thermodynamic diagram, usingThe representation, wherein H is a thermodynamic diagram, H, w depend on the sampling step size of the backbone network, and the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, using +_>A representation; the other branch outputs the offset of 17 human body key points by +.>And (3) representing.
In this embodiment, the specific flow is as follows:
wherein F is E R C×h×w For the feature map, C is the dimension of the feature, and C1 and C2 are predefined parameters (c1=32 and c2=256 are commonly used in the algorithm). Conv is convolution processing, BN is batch normalization, and ReLU is a regular linear unit.
Further, in an embodiment, the step of obtaining the predicted human body key point according to the offset of the human body key point and the thermodynamic diagram of the center of the human body key point, and generating the local key point expansion window by taking the predicted human body key point as the center includes:
performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer; obtaining N candidate human body poses through the offset of the candidate key point center point and the human body key pointSimultaneously removing human body gestures of which the central thermodynamic diagram score of the key points of the human body is smaller than a threshold value; for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points; generating local key point expansion windows by taking each predicted human body key point as center
In this embodiment, the specific flow is as follows:
step S30, converting the local key point expansion window into a key point attraction field;
in this embodiment, the feature map is first processed by using a convolution layer, a batch normalization, and a modified linear unit to obtain a dimension C 3 (64 in this embodiment) and then obtaining a local key point expanded window feature map from the processed feature map by bilinear interpolation using a local geometric grid.
The method comprises the steps of generating a key point attraction field by using a convolution information transmission module, specifically, processing the local key point expansion window characteristic map obtained in the step one by using three different convolution layers and batch normalization and correction linear units to obtain the key point attraction field. Wherein instead of batch normalization, normalization of the attention mechanism is used in order to emphasize the specificity of the different gesture instances in the convolution information delivery module.
Further, in an embodiment, step S30 includes:
processing the feature map by using a convolution layer, batch normalization and correction linear unit to obtain a dimension C 3 Is a feature map of (1); expanding windows from dimension C using local keypoints 3 Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2); processing the local key point expansion window characteristic map by using three different convolution layers, batch normalization and modified linear units to obtain key point absorptionAnd (5) guiding a field.
In this embodiment, the specific flow is as follows:
wherein Cin, cout, C6 are all dimensions.
Step S40, a global key point expansion window is generated by using the predicted human key points and the human key point thermodynamic diagram, and a corrected key point thermodynamic diagram is obtained by performing convolution operation by using the key point attraction field as a convolution check global key point expansion window;
in this embodiment, for each predicted human critical point, an expanded global grid (axa) is generated for the gridThis grid can be understood as a global key point expansion window. We then use the generated global keypoint expansion window to bilinear interpolate from the predicted global thermodynamic diagram to derive a global keypoint window feature map. Finally, the Gaussian kernel is used for re-weighting calculation to obtain a global key point expansion window characteristic diagramThe specific flow is as follows:
And performing convolution operation by using the obtained key point attraction field as a convolution check global key point expansion window characteristic map to obtain a corrected key point thermodynamic diagram. Specifically, the learned key point attraction field is used as a convolution check global key point expansion window characteristic diagram to carry out convolution operation, so that fusion of local and global context information is realized, and finally, corrected 17 key point thermodynamic diagrams are obtained.
Further, in an embodiment, step S40 includes:
generating a global key point expansion window for each predicted human key point; expanding a window from dimension C using the generated global keypoint 3 Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map; weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained; and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.
Step S50, obtaining the two-dimensional coordinates of the predicted human body joint point from the corrected key point thermodynamic diagram;
further, in an embodiment, step S50 includes:
selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point; and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.
In this embodiment, the specific flow is as follows:
is the predicted human body key point, N' is the number of pose instances that are ultimately predicted in one image I.
And step S60, encoding and decoding are respectively carried out by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point, so as to obtain the three-dimensional information of the predicted human body joint point.
In this embodiment, the two-dimensional coordinates of the predicted human body node are used as key values to perform spatial scale encoding with the implicit vector as an index, so as to obtain the output feature. And for the output characteristics, performing time scale coding on the implicit vector serving as a key value and an index to obtain an output array Y. And (3) returning the position Y of the joint point in the real three-dimensional world through weighted average of the Y and a multi-layer perceptron to obtain the predicted three-dimensional information of the human body joint point.
Further, in one embodiment, step S60 includes:
performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics; encoding the output characteristics through a time scale to obtain an output array; and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.
Compared with the prior art, the embodiment has the following advantages:
1. the method solves the defect of low accuracy of the method for predicting the human body posture based on the center of the key point and the offset, and provides a method for constructing a local expansion window around the predicted key point to realize further fine positioning of the key point, wherein the whole constructed key point detection and correction network is an end-to-end network.
2. The embodiment provides a novel local and global information adaptation module for obtaining a structured human body posture estimation effect and realizing fusion of local and global structural information.
According to the embodiment, the problem of multi-person gesture estimation with serious self-shielding condition can be solved through two-stage training.
Further, the loss function of the full convolution network according to this embodiment includes the following parts:
the keypoint thermodynamic diagram loses function. By usingThe thermodynamic diagram truth values representing each keypoint and the center of the keypoint, generated by modeling the mean and variance of a given dataset with a gaussian model. The elements in the set with dimensions 10×h×w are denoted by p= (i, x). Thermodynamic diagram for predicted key point and center +.>Calculate it and +.>As a loss function of the key point thermodynamic diagram. The following are provided:
where w (x) represents the weights of the foreground and background pixels. For foreground pixels, w (x) =1; for background element, w (x) =0.1.
Key point offset field loss function. By usingTrue value representing offset, c GT Representing the non-empty set of keypoint centers in the truth. For predicted keypoint offset field +.>The loss function is expressed as follows:
wherein the method comprises the steps ofRepresents the area of the human body centered on the pixel p, and β represents the cutoff threshold (e.g., 1/9)
OKS loss function. In predicting each human pose, a local keypoint attractive field needs to be learned as a convolution kernel to correct the global keypoint thermodynamic diagram. Specifically, for a given N in a picture GT Labeling true values of individual body gestures, calculating similarity scores between key points and true values of each candidate in the local key point expansion window, and obtaining similarity score tensorsThe tensor is then truncated with a threshold value of 0.5, i.e. +.>Then, taking an average of the first three dimensions of the cut-off similarity score tensor to obtain a matching score of the similarity score tensor and the labeling true value of each human body gesture, and selecting one human body gesture true value n with the largest matching score * Matching score +.>To represent. Calculating a similarity score for each key point of the predicted human body pose based on the true value of that selected human body pose, using s k (k∈[1,17]) To represent. Thus, the loss function of the final critical point attraction field is defined as:
finally the overall loss function of the whole network is defined asFor balancing values between different penalty terms, e.g.Lambda was taken as 0.01. The defined loss function is used for measuring the difference between the predicted human body posture and the true value, taking the difference as an error signal and solving the partial derivatives of all parameters in the convolution layer through a back propagation algorithm; and updating parameters of the neural network according to the calculation result.
In a second aspect, the embodiment of the invention further provides a human body posture prediction device.
In an embodiment, referring to fig. 2, fig. 2 is a schematic functional block diagram of a human body posture predicting device according to an embodiment of the invention. As shown in fig. 2, the human body posture predicting apparatus includes:
the extraction module 10 is used for extracting an input picture through a convolutional neural network to obtain a feature map;
the first generating module 20 is configured to obtain an offset of a human body key point, a thermodynamic diagram of the human body key point, and a central thermodynamic diagram of the human body key point based on the extracted feature map, obtain a predicted human body key point according to the offset of the human body key point and the central thermodynamic diagram of the human body key point, and generate a local key point expansion window with the predicted human body key point as a center;
a conversion module 30 for converting the local keypoint expansion window into a keypoint attractive field;
a second generating module 40, configured to generate a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and perform a convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;
a third generating module 50, configured to obtain two-dimensional coordinates of the predicted human body node from the corrected key point thermodynamic diagram;
the encoding and decoding module 60 is configured to encode and decode by using the space and time sequence information of the two-dimensional coordinates of the predicted human body node respectively, so as to obtain the three-dimensional information of the predicted human body node.
Further, in an embodiment, the first generating module 20 is configured to:
the feature map is subjected to dimension reduction through two parallel branches respectively, and then convolution processing is carried out by using a convolution layer with the convolution kernel size of 1x 1;
wherein, one branch outputs human body key point thermodynamic diagram and human body key point central thermodynamic diagram, usingThe representation, wherein H is a thermodynamic diagram, H, w depend on the sampling step size of the backbone network, and the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, using +_>A representation; the other branch outputs the offset of 17 human body key points by +.>And (3) representing.
Further, in an embodiment, the first generating module 20 is configured to:
performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer;
obtaining N candidate human body postures through the candidate key point center points and the offset of the human body key points, and simultaneously removing the human body postures with the human body key point center thermodynamic diagram score smaller than a threshold value;
for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points;
generating local key point expansion windows by taking each predicted human body key point as center
Further, in an embodiment, the conversion module 30 is configured to:
processing the feature map by using a convolution layer, batch normalization and correction linear unit to obtain a dimension C 3 Is a feature map of (1);
expanding windows from dimension C using local keypoints 3 Is characterized by (a)Obtaining a local key point expansion window characteristic diagram through medium bilinear interpolation;
the local key point expanded window feature map is processed using three different convolution layers, batch normalization, and modified linear units to obtain the key point attraction field.
Further, in an embodiment, the second generating module 40 is configured to:
generating a global key point expansion window for each predicted human key point;
expanding a window from dimension C using the generated global keypoint 3 Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map;
weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained;
and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.
Further, in an embodiment, the third generating module 50 is configured to:
selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point;
and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.
Further, in an embodiment, the codec module 60 is configured to:
performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics;
encoding the output characteristics through a time scale to obtain an output array;
and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.
The function implementation of each module in the human body posture prediction device corresponds to each step in the human body posture prediction method embodiment, and the function and implementation process of each module are not described in detail herein.
In a third aspect, embodiments of the present invention provide a human body posture prediction apparatus, which may be an apparatus having a data processing function such as a personal computer (personal computer, PC), a notebook computer, a server, or the like.
Referring to fig. 3, fig. 3 is a schematic hardware configuration diagram of a human body posture predicting apparatus according to an embodiment of the present invention. In an embodiment of the present invention, the human posture prediction apparatus may include a processor 1001 (e.g., a central processing unit Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communications between these components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., WIreless-FIdelity, WI-FI interface); the memory 1005 may be a high-speed random access memory (random access memory, RAM) or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 may alternatively be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration shown in fig. 3 is not limiting of the invention and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
With continued reference to fig. 3, an operating system, a network communication module, a user interface module, and a human body posture prediction program may be included in a memory 1005, which is one type of computer storage medium in fig. 3. The processor 1001 may call a human body posture prediction program stored in the memory 1005, and execute the human body posture prediction method provided by the embodiment of the present invention.
In a fourth aspect, embodiments of the present invention also provide a readable storage medium.
The human body posture predicting program is stored on the readable storage medium, and when the human body posture predicting program is executed by the processor, the steps of the human body posture predicting method are realized.
The method implemented when the human body posture prediction program is executed may refer to various embodiments of the human body posture prediction method of the present invention, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising several instructions for causing a terminal device to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (10)
1. A human body posture prediction method, characterized in that the human body posture prediction method comprises:
extracting an input picture through a convolutional neural network to obtain a feature map;
obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center;
converting the local key point expansion window into a key point attraction field;
generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;
obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram;
and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point.
2. The human body posture prediction method of claim 1, wherein the step of obtaining the offset of the human body key point, the human body key point thermodynamic diagram and the human body key point center thermodynamic diagram based on the extracted feature map comprises:
the feature map is subjected to dimension reduction through two parallel branches respectively, and then convolution processing is carried out by using a convolution layer with the convolution kernel size of 1x 1;
wherein, one branch outputs human body key point thermodynamic diagram and human body key point central thermodynamic diagram, usingThe representation, wherein H is a thermodynamic diagram, H, w depend on the sampling step size of the backbone network, and the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, using +_>A representation; the other branch outputs the offset of 17 human body key points by +.>And (3) representing.
3. The human body posture prediction method of claim 2, wherein the step of obtaining predicted human body key points based on the deviation amount of the human body key points and the thermodynamic diagram of the centers of the human body key points, and generating the local key point expansion window centering on the predicted human body key points comprises:
performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer;
obtaining N candidate human body postures through the candidate key point center points and the offset of the human body key points, and simultaneously removing the human body postures with the human body key point center thermodynamic diagram score smaller than a threshold value;
for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points;
4. A method of predicting body poses as recited in claim 3 wherein said step of converting the local keypoint expansion window into a keypoint attractive field comprises:
linear cell pair with convolutional layer, batch normalization, correctionProcessing the feature map to obtain a dimension C 3 Is a feature map of (1);
expanding windows from dimension C using local keypoints 3 Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2);
the local key point expanded window feature map is processed using three different convolution layers, batch normalization, and modified linear units to obtain the key point attraction field.
5. The human body posture prediction method of claim 4, wherein the generating a global key point expansion window using the predicted human body key points and the human body key point thermodynamic diagram, and the convolving the global key point expansion window using the key point attraction field as a convolving check to obtain a modified key point thermodynamic diagram comprises:
generating a global key point expansion window for each predicted human key point;
expanding a window from dimension C using the generated global keypoint 3 Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map;
weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained;
and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.
6. The method of claim 5, wherein the step of deriving the two-dimensional coordinates of the predicted human body node from the modified keypoint thermodynamic diagram comprises:
selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point;
and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.
7. The human body posture prediction method of claim 6, wherein the step of encoding and decoding using spatial and temporal information of two-dimensional coordinates of the predicted human body node respectively, to obtain three-dimensional information of the predicted human body node comprises:
performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics;
encoding the output characteristics through a time scale to obtain an output array;
and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.
8. A human body posture predicting device, characterized in that the human body posture predicting device comprises:
the extraction module is used for extracting the input picture through a convolutional neural network to obtain a feature map;
the first generation module is used for obtaining the offset of the human body key points, the thermodynamic diagram of the human body key points and the central thermodynamic diagram of the human body key points based on the extracted feature diagram, obtaining predicted human body key points according to the offset of the human body key points and the central thermodynamic diagram of the human body key points, and generating a local key point expansion window by taking the predicted human body key points as the center;
the conversion module is used for converting the local key point expansion window into a key point attraction field;
the second generation module is used for generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and carrying out convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;
the third generation module is used for obtaining the two-dimensional coordinates of the predicted human body joint point from the corrected key point thermodynamic diagram;
and the encoding and decoding module is used for encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point respectively to obtain the three-dimensional information of the predicted human body joint point.
9. A human body posture prediction device, characterized in that it comprises a processor, a memory, and a human body posture prediction program stored on the memory and executable by the processor, wherein the human body posture prediction program, when executed by the processor, implements the steps of the human body posture prediction method according to any one of claims 1 to 7.
10. A readable storage medium, characterized in that a human body posture prediction program is stored on the readable storage medium, wherein the human body posture prediction program, when executed by a processor, implements the steps of the human body posture prediction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310251993.9A CN116363750A (en) | 2023-03-13 | 2023-03-13 | Human body posture prediction method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310251993.9A CN116363750A (en) | 2023-03-13 | 2023-03-13 | Human body posture prediction method, device, equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116363750A true CN116363750A (en) | 2023-06-30 |
Family
ID=86915571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310251993.9A Pending CN116363750A (en) | 2023-03-13 | 2023-03-13 | Human body posture prediction method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116363750A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116631010A (en) * | 2023-07-17 | 2023-08-22 | 粤港澳大湾区数字经济研究院(福田) | Interactive key point detection method and related device |
CN116645699A (en) * | 2023-07-27 | 2023-08-25 | 杭州华橙软件技术有限公司 | Key point detection method, device, terminal and computer readable storage medium |
-
2023
- 2023-03-13 CN CN202310251993.9A patent/CN116363750A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116631010A (en) * | 2023-07-17 | 2023-08-22 | 粤港澳大湾区数字经济研究院(福田) | Interactive key point detection method and related device |
CN116631010B (en) * | 2023-07-17 | 2023-10-31 | 粤港澳大湾区数字经济研究院(福田) | Interactive key point detection method and related device |
CN116645699A (en) * | 2023-07-27 | 2023-08-25 | 杭州华橙软件技术有限公司 | Key point detection method, device, terminal and computer readable storage medium |
CN116645699B (en) * | 2023-07-27 | 2023-09-29 | 杭州华橙软件技术有限公司 | Key point detection method, device, terminal and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111563508B (en) | Semantic segmentation method based on spatial information fusion | |
CN112926396B (en) | Action identification method based on double-current convolution attention | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN116363750A (en) | Human body posture prediction method, device, equipment and readable storage medium | |
CN112232134B (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN110942471A (en) | Long-term target tracking method based on space-time constraint | |
CN116612288B (en) | Multi-scale lightweight real-time semantic segmentation method and system | |
Shi et al. | Lightweight context-aware network using partial-channel transformation for real-time semantic segmentation | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
CN117237858B (en) | Loop detection method | |
CN115471718A (en) | Construction and detection method of lightweight significance target detection model based on multi-scale learning | |
CN115205336A (en) | Feature fusion target perception tracking method based on multilayer perceptron | |
CN114529793A (en) | Depth image restoration system and method based on gating cycle feature fusion | |
CN117523645B (en) | Face key point detection method and device, electronic equipment and storage medium | |
CN114550014A (en) | Road segmentation method and computer device | |
CN113343762B (en) | Human body posture estimation grouping model training method, posture estimation method and device | |
CN113033263B (en) | Face image age characteristic recognition method | |
CN112906724B (en) | Image processing device, method, medium and system | |
CN114913541A (en) | Human body key point detection method, device and medium based on orthogonal matching pursuit | |
CN113947792A (en) | Target face image matching method and device, equipment, medium and product thereof | |
CN113610856A (en) | Method and device for training image segmentation model and image segmentation | |
CN110458092B (en) | Face recognition method based on L2 regularization gradient constraint sparse representation | |
CN116030347B (en) | High-resolution remote sensing image building extraction method based on attention network | |
CN116486101B (en) | Image feature matching method based on window attention | |
CN117392180B (en) | Interactive video character tracking method and system based on self-supervision optical flow learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |