CN116363750A

CN116363750A - Human body posture prediction method, device, equipment and readable storage medium

Info

Publication number: CN116363750A
Application number: CN202310251993.9A
Authority: CN
Inventors: 魏格格; 薛楠; 吴田富; 夏桂松; 张良培
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-06-30

Abstract

The invention provides a human body posture prediction method, a device, equipment and a readable storage medium. The method comprises the following steps: extracting a feature map of an input picture; obtaining an offset, a thermodynamic diagram and a central thermodynamic diagram of the human body key points based on the feature diagram, obtaining predicted human body key points according to the offset and the central thermodynamic diagram, generating a local key point expansion window by taking the predicted human body key points as the center, and converting the key point expansion window into a key point attraction field; generating a global key point expansion window by using the predicted human key points and the thermodynamic diagram, and carrying out convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram; obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram; and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point. By the method and the device, the human body posture prediction precision is improved.

Description

Human body posture prediction method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a human body posture prediction method, apparatus, device, and readable storage medium.

Background

One of the important tasks in the field of computer vision research is human skeleton key point detection, and particularly, the computer can sense the positions of all skeleton key points of a human body, so that a foundation is provided for a plurality of practical scenes such as further action recognition, action abnormality detection, intelligent monitoring, automatic driving and the like.

The object of human skeleton key point detection is to take a picture as input and output the coordinates of each skeleton key point of each human body in the picture and the real world. Currently, the main human body key point detection technology based on deep learning can be divided into two kinds, namely top-down and bottom-up methods. The top-down method firstly detects the target frames of the human body, then carries out single human body posture estimation aiming at each target frame, and the method has higher precision, but the calculation amount of the posture estimation of the single human body is proportional to the number of the target frames, so the calculation efficiency is not high, and meanwhile, the method is limited by the precision of human body target detection. While the bottom-up approach generally involves two steps, namely first detecting human keypoints and then grouping the keypoints. One representative of such methods is to predict the keypoint locations based on the keypoint centers and offsets, which avoids complex groupings of keypoints with faster computation speeds, but the disadvantage of this approach is also apparent, i.e., inaccurate estimates of offset vectors farther from the keypoint centers, and thus overall accuracy of the final model is not high.

Disclosure of Invention

Aiming at the defects of the existing technology for predicting the key point based on the key point center point and the offset, the invention provides a human body posture prediction method, a device, equipment and a readable storage medium.

In a first aspect, the present invention provides a human body posture prediction method, including:

extracting an input picture through a convolutional neural network to obtain a feature map;

obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center;

converting the local key point expansion window into a key point attraction field;

generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;

obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram;

and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point.

Optionally, the step of obtaining the offset, the thermodynamic diagram of the human body key point and the thermodynamic diagram of the human body key point center based on the extracted feature map includes:

the feature map is subjected to dimension reduction through two parallel branches respectively, and then convolution processing is carried out by using a convolution layer with the convolution kernel size of 1x 1;

wherein, one branch outputs a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram,by using

The representation, wherein H is a thermodynamic diagram, H, w depend on the sampling step size of the backbone network, and the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, using +_>

A representation; the other branch outputs the offset of 17 human body key points by +.>

And (3) representing.

Optionally, the step of obtaining the predicted human body key point according to the offset of the human body key point and the thermodynamic diagram of the center of the human body key point, and generating the local key point expansion window by taking the predicted human body key point as the center includes:

performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer;

obtaining N candidate human body postures through the candidate key point center points and the offset of the human body key points, and simultaneously removing the human body postures with the human body key point center thermodynamic diagram score smaller than a threshold value;

for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points;

generating local key point expansion windows by taking each predicted human body key point as center

Optionally, the step of converting the local keypoint expansion window into a keypoint attraction field includes:

processing the feature map by using a convolution layer, batch normalization and correction linear unit to obtain a dimension C ₃ Is a feature map of (1);

expanding windows using local keypointsFrom dimension C ₃ Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2);

the local key point expanded window feature map is processed using three different convolution layers, batch normalization, and modified linear units to obtain the key point attraction field.

Optionally, the step of generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check on the global key point expansion window to obtain a corrected key point thermodynamic diagram includes:

generating a global key point expansion window for each predicted human key point;

expanding a window from dimension C using the generated global keypoint ₃ Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map;

weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained;

and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.

Optionally, the step of obtaining the two-dimensional coordinates of the predicted human body node from the corrected key point thermodynamic diagram includes:

selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point;

and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.

Optionally, the step of encoding and decoding by using the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point includes:

performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics;

encoding the output characteristics through a time scale to obtain an output array;

and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.

In a second aspect, the present invention also provides a human body posture predicting apparatus, comprising:

the extraction module is used for extracting the input picture through a convolutional neural network to obtain a feature map;

the first generation module is used for obtaining the offset of the human body key points, the thermodynamic diagram of the human body key points and the central thermodynamic diagram of the human body key points based on the extracted feature diagram, obtaining predicted human body key points according to the offset of the human body key points and the central thermodynamic diagram of the human body key points, and generating a local key point expansion window by taking the predicted human body key points as the center;

the conversion module is used for converting the local key point expansion window into a key point attraction field;

the second generation module is used for generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and carrying out convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;

the third generation module is used for obtaining the two-dimensional coordinates of the predicted human body joint point from the corrected key point thermodynamic diagram;

and the encoding and decoding module is used for encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point respectively to obtain the three-dimensional information of the predicted human body joint point.

In a third aspect, the present invention also provides a human body posture predicting device comprising a processor, a memory, and a human body posture predicting program stored on the memory and executable by the processor, wherein the human body posture predicting program, when executed by the processor, implements the steps of the human body posture predicting method as described above.

In a fourth aspect, the present invention also provides a readable storage medium having stored thereon a human body posture prediction program, wherein the human body posture prediction program, when executed by a processor, implements the steps of the human body posture prediction method as described above.

In the invention, for an input picture, a feature map is extracted through a convolutional neural network; obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center; converting the local key point expansion window into a key point attraction field; generating a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and performing convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram; obtaining predicted two-dimensional coordinates of the human body joint point from the corrected key point thermodynamic diagram; and respectively encoding and decoding by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point to obtain the three-dimensional information of the predicted human body joint point. The invention effectively utilizes the characteristic information, combines the characteristic information in the front and rear stages and the global and local information, thereby outputting more abundant characteristic information, improving the positioning effect of key points of the human body and further improving the prediction precision of the human body posture.

Drawings

FIG. 1 is a flow chart of an embodiment of a human body posture prediction method according to the present invention;

FIG. 2 is a schematic diagram of functional modules of an embodiment of a human body posture predicting device according to the present invention;

fig. 3 is a schematic hardware structure of a human body posture predicting device according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In a first aspect, an embodiment of the present invention provides a human body posture prediction method.

In an embodiment, referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a human body posture prediction method according to the present invention. As shown in fig. 1, the human body posture prediction method includes:

step S10, extracting an input picture through a convolutional neural network to obtain a feature map;

in this embodiment, for a given input picture, a low-dimensional feature map is extracted by a convolutional neural network. The feature extraction network used may be HRNets, and the spatial dimension of the final feature map output depends on the overall step size of the feature extraction network.

For example, assuming that the original picture has a size of h×w, the length and width of the features of the original input become s times as large as the original features each time the pooling layer is passed. Taking an original image as input, and obtaining a feature image F epsilon R with feature dimension of C through calculation of a series of convolution layers, correction linear units and pooling layers ^C×h×w The spatial dimension h×w depends on the overall step size of the feature extraction module.

Step S20, obtaining an offset of a human body key point, a human body key point thermodynamic diagram and a human body key point central thermodynamic diagram based on the extracted feature diagram, obtaining a predicted human body key point according to the offset of the human body key point and the human body key point central thermodynamic diagram, and generating a local key point expansion window by taking the predicted human body key point as a center;

in this embodiment, a feature map F εR is obtained ^C×h×w Then, firstly, a convolution layer with the convolution kernel size of 1x1 and a batch normalization and correction linear unit are used for reducing the dimension of the feature map, and then, the convolution layer with the convolution kernel size of 1x1 is used for carrying out convolution processing to output two branches. Wherein branch one outputs offThermodynamic diagrams of key points and key point centers using

To show that the obtained low-resolution thermodynamic diagram is up-sampled by bilinear interpolation to obtain a high-resolution thermodynamic diagram, which is used for the following purposes

To represent.

The two branches output the offset of 17 human body key point positions by

To represent.

A series of candidate human body gestures are obtained based on the calculated key point central thermodynamic diagram and the offset of the key point positions. Specifically, first, non-maximum suppression processing (window size 3×3) is performed on the obtained central point thermodynamic diagram, and the top N points with high scores are selected as candidates for the central point of the key point. Then, the candidate key point center points and the key point offset are used for obtaining N candidate human body posture estimates, and simultaneously, the human body posture with the key point center thermodynamic diagram score smaller than a given threshold value is removed (0.01 is taken in the embodiment).

A partial window of human body key point expansion is generated, specifically, for N calculated candidate human body postures, 17 key points of each human body posture respectively generate a partial grid (11 x11 in the embodiment) taking the positions of the points as the center for compensating the information of estimating the human body posture deletion by using the offset of the center point. This grid is defined as a window of expansion of key points of the human body

To represent.

Further, in an embodiment, the step of obtaining the offset, the thermodynamic diagram of the human body key point and the thermodynamic diagram of the human body key point center based on the extracted feature map includes:

the feature map is reduced in dimension by two parallel branches respectively,then, a convolution layer with the convolution kernel size of 1x1 is used for convolution processing; wherein, one branch outputs human body key point thermodynamic diagram and human body key point central thermodynamic diagram, using

And (3) representing.

In this embodiment, the specific flow is as follows:

wherein F is E R ^C×h×w For the feature map, C is the dimension of the feature, and C1 and C2 are predefined parameters (c1=32 and c2=256 are commonly used in the algorithm). Conv is convolution processing, BN is batch normalization, and ReLU is a regular linear unit.

Further, in an embodiment, the step of obtaining the predicted human body key point according to the offset of the human body key point and the thermodynamic diagram of the center of the human body key point, and generating the local key point expansion window by taking the predicted human body key point as the center includes:

performing non-maximum value inhibition processing on the human body key point central thermodynamic diagram, and selecting the first N points with high scores as candidate key point central points, wherein N is a positive integer; obtaining N candidate human body poses through the offset of the candidate key point center point and the human body key pointSimultaneously removing human body gestures of which the central thermodynamic diagram score of the key points of the human body is smaller than a threshold value; for N candidate human body postures, 17 key points of each candidate human body posture are taken as predicted human body key points; generating local key point expansion windows by taking each predicted human body key point as center

In this embodiment, the specific flow is as follows:

step S30, converting the local key point expansion window into a key point attraction field;

in this embodiment, the feature map is first processed by using a convolution layer, a batch normalization, and a modified linear unit to obtain a dimension C ₃ (64 in this embodiment) and then obtaining a local key point expanded window feature map from the processed feature map by bilinear interpolation using a local geometric grid.

The method comprises the steps of generating a key point attraction field by using a convolution information transmission module, specifically, processing the local key point expansion window characteristic map obtained in the step one by using three different convolution layers and batch normalization and correction linear units to obtain the key point attraction field. Wherein instead of batch normalization, normalization of the attention mechanism is used in order to emphasize the specificity of the different gesture instances in the convolution information delivery module.

Further, in an embodiment, step S30 includes:

processing the feature map by using a convolution layer, batch normalization and correction linear unit to obtain a dimension C ₃ Is a feature map of (1); expanding windows from dimension C using local keypoints ₃ Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2); processing the local key point expansion window characteristic map by using three different convolution layers, batch normalization and modified linear units to obtain key point absorptionAnd (5) guiding a field.

In this embodiment, the specific flow is as follows:

wherein Cin, cout, C6 are all dimensions.

Step S40, a global key point expansion window is generated by using the predicted human key points and the human key point thermodynamic diagram, and a corrected key point thermodynamic diagram is obtained by performing convolution operation by using the key point attraction field as a convolution check global key point expansion window;

in this embodiment, for each predicted human critical point, an expanded global grid (axa) is generated for the grid

This grid can be understood as a global key point expansion window. We then use the generated global keypoint expansion window to bilinear interpolate from the predicted global thermodynamic diagram to derive a global keypoint window feature map. Finally, the Gaussian kernel is used for re-weighting calculation to obtain a global key point expansion window characteristic diagram

The specific flow is as follows:

wherein,,

is Gaussian kernel->

And performing convolution operation by using the obtained key point attraction field as a convolution check global key point expansion window characteristic map to obtain a corrected key point thermodynamic diagram. Specifically, the learned key point attraction field is used as a convolution check global key point expansion window characteristic diagram to carry out convolution operation, so that fusion of local and global context information is realized, and finally, corrected 17 key point thermodynamic diagrams are obtained.

Further, in an embodiment, step S40 includes:

generating a global key point expansion window for each predicted human key point; expanding a window from dimension C using the generated global keypoint ₃ Bilinear interpolation in the feature map of (2) to obtain a global key point expansion window feature map; weighting calculation is carried out on the global key point expansion window feature map by utilizing Gaussian verification, so that a new global key point expansion window feature map is obtained; and using the key point attraction field as a convolution check to carry out convolution operation on the new global key point expansion window characteristic diagram, so as to realize fusion of local and global context information and obtain a corrected thermodynamic diagram of the key point.

Step S50, obtaining the two-dimensional coordinates of the predicted human body joint point from the corrected key point thermodynamic diagram;

further, in an embodiment, step S50 includes:

selecting the first two positions with the highest scores in the corrected key point thermodynamic diagram for each predicted human body key point, then weighting the scores of the two positions, calculating a first product of the scores of the two weighted positions, multiplying the first product by the score of the corresponding candidate key point center point to obtain a second product, and taking the second product as the confidence score of the predicted human body key point; and taking the position of the key point with the highest confidence score as the two-dimensional coordinate of the predicted human body joint point.

In this embodiment, the specific flow is as follows:

is the predicted human body key point, N' is the number of pose instances that are ultimately predicted in one image I.

And step S60, encoding and decoding are respectively carried out by utilizing the space and time sequence information of the two-dimensional coordinates of the predicted human body joint point, so as to obtain the three-dimensional information of the predicted human body joint point.

In this embodiment, the two-dimensional coordinates of the predicted human body node are used as key values to perform spatial scale encoding with the implicit vector as an index, so as to obtain the output feature. And for the output characteristics, performing time scale coding on the implicit vector serving as a key value and an index to obtain an output array Y. And (3) returning the position Y of the joint point in the real three-dimensional world through weighted average of the Y and a multi-layer perceptron to obtain the predicted three-dimensional information of the human body joint point.

Further, in one embodiment, step S60 includes:

performing spatial scale coding by taking the two-dimensional coordinates of the predicted human body joint points as key values and implicit vectors serving as indexes to obtain output characteristics; encoding the output characteristics through a time scale to obtain an output array; and obtaining the three-dimensional coordinates of the predicted human body joint point according to the output array regression.

Compared with the prior art, the embodiment has the following advantages:

1. the method solves the defect of low accuracy of the method for predicting the human body posture based on the center of the key point and the offset, and provides a method for constructing a local expansion window around the predicted key point to realize further fine positioning of the key point, wherein the whole constructed key point detection and correction network is an end-to-end network.

2. The embodiment provides a novel local and global information adaptation module for obtaining a structured human body posture estimation effect and realizing fusion of local and global structural information.

According to the embodiment, the problem of multi-person gesture estimation with serious self-shielding condition can be solved through two-stage training.

Further, the loss function of the full convolution network according to this embodiment includes the following parts:

the keypoint thermodynamic diagram loses function. By using

The thermodynamic diagram truth values representing each keypoint and the center of the keypoint, generated by modeling the mean and variance of a given dataset with a gaussian model. The elements in the set with dimensions 10×h×w are denoted by p= (i, x). Thermodynamic diagram for predicted key point and center +.>

Calculate it and +.>

As a loss function of the key point thermodynamic diagram. The following are provided:

where w (x) represents the weights of the foreground and background pixels. For foreground pixels, w (x) =1; for background element, w (x) =0.1.

Key point offset field loss function. By using

True value representing offset, c ^GT Representing the non-empty set of keypoint centers in the truth. For predicted keypoint offset field +.>

The loss function is expressed as follows:

wherein the method comprises the steps of

Represents the area of the human body centered on the pixel p, and β represents the cutoff threshold (e.g., 1/9)

OKS loss function. In predicting each human pose, a local keypoint attractive field needs to be learned as a convolution kernel to correct the global keypoint thermodynamic diagram. Specifically, for a given N in a picture ^GT Labeling true values of individual body gestures, calculating similarity scores between key points and true values of each candidate in the local key point expansion window, and obtaining similarity score tensors

The tensor is then truncated with a threshold value of 0.5, i.e. +.>

Then, taking an average of the first three dimensions of the cut-off similarity score tensor to obtain a matching score of the similarity score tensor and the labeling true value of each human body gesture, and selecting one human body gesture true value n with the largest matching score ^* Matching score +.>

To represent. Calculating a similarity score for each key point of the predicted human body pose based on the true value of that selected human body pose, using s _k (k∈[1，17]) To represent. Thus, the loss function of the final critical point attraction field is defined as:

finally the overall loss function of the whole network is defined as

For balancing values between different penalty terms, e.g.Lambda was taken as 0.01. The defined loss function is used for measuring the difference between the predicted human body posture and the true value, taking the difference as an error signal and solving the partial derivatives of all parameters in the convolution layer through a back propagation algorithm; and updating parameters of the neural network according to the calculation result.

In a second aspect, the embodiment of the invention further provides a human body posture prediction device.

In an embodiment, referring to fig. 2, fig. 2 is a schematic functional block diagram of a human body posture predicting device according to an embodiment of the invention. As shown in fig. 2, the human body posture predicting apparatus includes:

the extraction module 10 is used for extracting an input picture through a convolutional neural network to obtain a feature map;

the first generating module 20 is configured to obtain an offset of a human body key point, a thermodynamic diagram of the human body key point, and a central thermodynamic diagram of the human body key point based on the extracted feature map, obtain a predicted human body key point according to the offset of the human body key point and the central thermodynamic diagram of the human body key point, and generate a local key point expansion window with the predicted human body key point as a center;

a conversion module 30 for converting the local keypoint expansion window into a keypoint attractive field;

a second generating module 40, configured to generate a global key point expansion window by using the predicted human key points and the human key point thermodynamic diagram, and perform a convolution operation by using the key point attraction field as a convolution check global key point expansion window to obtain a corrected key point thermodynamic diagram;

a third generating module 50, configured to obtain two-dimensional coordinates of the predicted human body node from the corrected key point thermodynamic diagram;

the encoding and decoding module 60 is configured to encode and decode by using the space and time sequence information of the two-dimensional coordinates of the predicted human body node respectively, so as to obtain the three-dimensional information of the predicted human body node.

Further, in an embodiment, the first generating module 20 is configured to:

wherein, one branch outputs human body key point thermodynamic diagram and human body key point central thermodynamic diagram, using

And (3) representing.

Further, in an embodiment, the first generating module 20 is configured to:

Further, in an embodiment, the conversion module 30 is configured to:

expanding windows from dimension C using local keypoints ₃ Is characterized by (a)Obtaining a local key point expansion window characteristic diagram through medium bilinear interpolation;

Further, in an embodiment, the second generating module 40 is configured to:

Further, in an embodiment, the third generating module 50 is configured to:

Further, in an embodiment, the codec module 60 is configured to:

The function implementation of each module in the human body posture prediction device corresponds to each step in the human body posture prediction method embodiment, and the function and implementation process of each module are not described in detail herein.

In a third aspect, embodiments of the present invention provide a human body posture prediction apparatus, which may be an apparatus having a data processing function such as a personal computer (personal computer, PC), a notebook computer, a server, or the like.

Referring to fig. 3, fig. 3 is a schematic hardware configuration diagram of a human body posture predicting apparatus according to an embodiment of the present invention. In an embodiment of the present invention, the human posture prediction apparatus may include a processor 1001 (e.g., a central processing unit Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communications between these components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., WIreless-FIdelity, WI-FI interface); the memory 1005 may be a high-speed random access memory (random access memory, RAM) or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 may alternatively be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration shown in fig. 3 is not limiting of the invention and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

With continued reference to fig. 3, an operating system, a network communication module, a user interface module, and a human body posture prediction program may be included in a memory 1005, which is one type of computer storage medium in fig. 3. The processor 1001 may call a human body posture prediction program stored in the memory 1005, and execute the human body posture prediction method provided by the embodiment of the present invention.

In a fourth aspect, embodiments of the present invention also provide a readable storage medium.

The human body posture predicting program is stored on the readable storage medium, and when the human body posture predicting program is executed by the processor, the steps of the human body posture predicting method are realized.

The method implemented when the human body posture prediction program is executed may refer to various embodiments of the human body posture prediction method of the present invention, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising several instructions for causing a terminal device to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A human body posture prediction method, characterized in that the human body posture prediction method comprises:

2. The human body posture prediction method of claim 1, wherein the step of obtaining the offset of the human body key point, the human body key point thermodynamic diagram and the human body key point center thermodynamic diagram based on the extracted feature map comprises:

And (3) representing.

3. The human body posture prediction method of claim 2, wherein the step of obtaining predicted human body key points based on the deviation amount of the human body key points and the thermodynamic diagram of the centers of the human body key points, and generating the local key point expansion window centering on the predicted human body key points comprises:

4. A method of predicting body poses as recited in claim 3 wherein said step of converting the local keypoint expansion window into a keypoint attractive field comprises:

linear cell pair with convolutional layer, batch normalization, correctionProcessing the feature map to obtain a dimension C ₃ Is a feature map of (1);

expanding windows from dimension C using local keypoints ₃ Obtaining a local key point expansion window characteristic diagram through bilinear interpolation in the characteristic diagram of the (2);

5. The human body posture prediction method of claim 4, wherein the generating a global key point expansion window using the predicted human body key points and the human body key point thermodynamic diagram, and the convolving the global key point expansion window using the key point attraction field as a convolving check to obtain a modified key point thermodynamic diagram comprises:

6. The method of claim 5, wherein the step of deriving the two-dimensional coordinates of the predicted human body node from the modified keypoint thermodynamic diagram comprises:

7. The human body posture prediction method of claim 6, wherein the step of encoding and decoding using spatial and temporal information of two-dimensional coordinates of the predicted human body node respectively, to obtain three-dimensional information of the predicted human body node comprises:

8. A human body posture predicting device, characterized in that the human body posture predicting device comprises:

9. A human body posture prediction device, characterized in that it comprises a processor, a memory, and a human body posture prediction program stored on the memory and executable by the processor, wherein the human body posture prediction program, when executed by the processor, implements the steps of the human body posture prediction method according to any one of claims 1 to 7.

10. A readable storage medium, characterized in that a human body posture prediction program is stored on the readable storage medium, wherein the human body posture prediction program, when executed by a processor, implements the steps of the human body posture prediction method according to any one of claims 1 to 7.