CN113298922B

CN113298922B - Human body posture estimation method and device and terminal equipment

Info

Publication number: CN113298922B
Application number: CN202110655977.7A
Authority: CN
Inventors: 郭渺辰; 程骏; 汤志超; 邵池; 张惊涛; 胡淑萍; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2023-08-29
Anticipated expiration: 2041-06-11
Also published as: CN113298922A; WO2022257378A1

Abstract

The embodiment of the application provides a human body posture estimation method, a device and terminal equipment, wherein the method comprises the following steps: extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model; inputting first characteristic data and second characteristic data into the three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected at the same time, and the time cost is reduced.

Description

Human body posture estimation method and device and terminal equipment

Technical Field

The present application relates to the field of man-machine interaction technologies, and in particular, to a human body posture estimation method, a device and a terminal device.

Background

With the gradual popularization and application of intelligent control and man-machine interaction technology, human interaction with machines is frequent. The need to analyze people's emotions, behaviors, etc. with machines is becoming more and more urgent. The behavior recognition can include recognition of behaviors such as hand lifting detection, sleepiness detection, standing detection, mind detection and the like in an education scene, and recognition of behaviors such as fighting, drowning, calling for help and the like in the security field.

In the process of identifying the behaviors of the people, the method has an extremely important meaning for analyzing the gesture actions of the people. The simple human body detection cannot analyze the human body posture in detail, so that the acquisition of the motion state of human bones is necessary. The current human body posture estimation can be divided into a multi-human body posture estimation and a single human body posture estimation, and the multi-human body posture estimation or the single human body posture estimation respectively comprises the detection of the 2D human body posture estimation and the 3D human body posture estimation.

In the prior art, the human body posture of a single person is estimated by detecting the position of each human body in an image by using a human body detector, and then positioning key points of a detection rectangular area of each human body.

In the prior art, 2D human body posture estimation can be roughly divided into two ideas, namely a top-down (top-down) scheme, typically an alphaPose, firstly, all human bodies in an image are detected by using a human body detector, then, single human body posture estimation is performed on each human body, the top-down scheme has higher requirements on the precision of human body detection, the precision of human body detection affects the precision of key points to a great extent, and the more the number of people is, the larger the total time overhead is. Another is the bottom-up (openpoint) scheme, which is typically performed by first detecting the positions of all key points of all persons from the whole graph and then assigning the points to each person, and has the advantage that the number of persons in the image does not affect the reasoning speed, but the accuracy is slightly lower than the top-down scheme. The multi-person 3D human body posture estimation is to predict the coordinate position under the camera system from the image or calculate the spatial related positions of other key points relative to the zero point by taking a certain key point of the multi-person 3D human body posture estimation as the zero point. The scheme provided by the prior art is intelligent, and can be used for independently realizing 2D human body posture estimation or 3D human body posture estimation, so that 2D human body posture estimation and 3D human body posture estimation can not be realized at the same time, and the problem of relatively large time consumption of human body posture estimation exists.

Disclosure of Invention

In order to solve the technical problems described above, embodiments of the present application provide a human body posture estimation method, a training method of a human body posture estimation model, a device, a terminal device, and a computer readable storage medium.

In a first aspect, an embodiment of the present application provides a human body posture estimation method, including:

extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;

inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;

and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.

In a second aspect, an embodiment of the present application provides a training method for a human body posture estimation model, where the human body posture estimation model includes a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the method includes:

controlling and freezing the three-dimensional human body posture estimation model to be trained, and inputting two-dimensional human body key point data to the two-dimensional human body posture estimation model to be trained to train to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprises image data marked with two-dimensional position information of human body key points;

and controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained, and training to obtain the trained three-dimensional human body posture estimation model, wherein the three-dimensional human body key point data comprises image data marked with three-dimensional position information of human body key points.

In a third aspect, an embodiment of the present application provides a human body posture estimation apparatus, including:

the first processing module is used for extracting first characteristic data of an image, inputting the first characteristic data into the two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data through the two-dimensional human body posture estimation model;

the second processing module is used for inputting the first characteristic data and the second characteristic data into the three-dimensional human body posture estimation model and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;

the determining module is used for determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.

In a fourth aspect, an embodiment of the present application provides a terminal device, including a memory and a processor, where the memory is configured to store a computer program, where the computer program executes, when executed by the processor, a human body posture estimation method provided in the first aspect or a training method for a human body posture estimation model provided in the second aspect.

The human body posture estimation method provided by the application extracts the first characteristic data of the image, inputs the first characteristic data into the two-dimensional human body posture estimation model, and outputs a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data through the two-dimensional human body posture estimation model; inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of the present application. Like elements are numbered alike in the various figures.

Fig. 1 is a schematic flow chart of a human body posture estimation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a human body posture estimation model according to an embodiment of the present application;

FIG. 3 is another schematic diagram of a human body posture estimation model according to an embodiment of the present application;

fig. 4 is a schematic flowchart of step S103 of the human body posture estimation method according to the embodiment of the present application;

fig. 5 is a schematic flow chart of step S1031 of the human body posture estimation method according to the embodiment of the application;

FIG. 6 is a schematic illustration of a human body articulation provided in accordance with an embodiment of the present application;

fig. 7 is a schematic structural diagram of a human body posture estimating apparatus according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.

The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.

Example 1

The embodiment of the disclosure provides a human body posture estimation method.

Specifically, referring to fig. 1, the human body posture estimating method includes:

step S101, extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;

in this embodiment, an end-to-end human body posture estimation model is constructed. Referring to fig. 2, the human body posture estimation model includes a backbone network model 202, a two-dimensional human body posture estimation model 203, and a three-dimensional human body posture estimation model 205. The backbone network model 202, also referred to as a backbone (backbone) network model, may be a lightweight or heavy-weight deep neural network model, and is not limited herein. The backbone network model 202 is connected to the two-dimensional human body posture estimation model 203 and the three-dimensional human body posture estimation model 205, and the two-dimensional human body posture estimation model 203 and the three-dimensional human body posture estimation model 205 are connected. The first output 204 includes a two-dimensional human body pose key point feature map and a two-dimensional human body articulation feature map. Referring to fig. 2, the first feature data of the image 201 may be extracted, where the image 201 includes a plurality of human bodies, the image 201 is an image captured by a camera, the image includes a plurality of human body images, the human body images in the image 201 are only illustrated, and the actually captured image may be in other forms, which is not limited herein.

In this embodiment, the specific structure of the two-dimensional human body posture estimation model can be seen in the two-dimensional human body posture estimation model 301 in fig. 3. A plurality of nodes are provided in the two-dimensional human body posture estimation model 301, and the nodes are connected to each other, and include a plurality of activation (Relu) functions, a plurality of convolution (Conv) functions, and a plurality of addition (Add) functions. And setting corresponding parameters for each node. It should be noted that, the two-dimensional human body posture estimation model 301 is merely illustrative, and in specific cases, the node connection relationship and the parameter setting may be different, which is not limited herein. The two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map can be a two-dimensional human body posture key point heat map and a two-dimensional human body joint connection heat map respectively. For example, in fig. 3, the hetmap output by the two-dimensional human body posture estimation model 301 is a two-dimensional human body posture key point heat map, and pafs is a two-dimensional human body joint connection heat map.

Step S102, inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model.

In this embodiment, the specific structure of the three-dimensional human body posture estimation model may be referred to as a three-dimensional human body posture estimation model 302 in fig. 3. A plurality of nodes are provided in the three-dimensional body posture estimation model 302, and the nodes are connected with each other, and the nodes include a plurality of activation (Relu) functions, a plurality of convolution (Conv) functions, and a plurality of addition (Add) functions. And setting corresponding parameters for each node. It should be noted that, the three-dimensional human body posture estimation model 302 is merely an exemplary convolution (Conv) function, and may further include other nodes, where in specific cases, the number of nodes, the types of nodes, the connection relationships of the nodes, and the parameters may be set according to actual situations, which is not limited herein. Referring again to fig. 2, the second output 206 of fig. 2 includes a three-dimensional human body posture key point feature map. In this embodiment, the three-dimensional human body posture key point feature map is a heat map of 3×19 channels, where 3×18 channels correspond to 18 key points of the human body and 3×1 channels correspond to the background map.

Step S103, determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.

Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.

Optionally, referring to fig. 4, step S103 includes:

step S1031, determining two-dimensional positions of human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map.

In this embodiment, the images input to the end-to-end human body posture estimation model may include a plurality of human body images, so that in order to accurately divide key points of each human body, key nodes of each human body need to be matched by combining a two-dimensional human body posture key point feature map and a two-dimensional human body joint connection feature map.

Optionally, referring to fig. 5, step S1031 includes:

step S10311, determining a plurality of human body key points according to the two-dimensional human body gesture key point feature map;

step S10312, determining a plurality of joint connection relations according to the two-dimensional human body joint connection characteristic diagram;

step S10313, matching the plurality of human body key points and the plurality of joint connection relations, and determining two-dimensional positions of the human body key points of each human body in the two-dimensional human body gesture key point feature map.

In this embodiment, the two-dimensional human body posture key point feature map is a 19-channel feature map, wherein the 19 channels include channels of 18 key points and channels of 1 background map. The positions of the peaks of the channels in the two-dimensional human body posture key point feature map are correspondingly human body key points.

Referring to fig. 6, the articulation characteristic diagram of fig. 6 includes 18 key points of the human body and connection relations between adjacent key points. The 18 keypoints are numbered from 0 to 17, respectively. In fig. 6, two joints may be represented between two adjacent keypoints, for example, for the keypoint 2 and the keypoint 3 in fig. 6, the joint connection extending from the keypoint 3 to the direction of the keypoint 2 is a different joint connection from the joint connection extending from the keypoint 2 to the direction of the keypoint 3. According to the joint connection relation in the joint connection feature diagram, a plurality of key points in the two-dimensional human body gesture key point feature diagram can be matched, all key points belonging to the same human body are matched, and the two-dimensional positions of the human body key points of all human bodies in the two-dimensional human body gesture key point feature diagram are determined based on all key points of the same human body.

Therefore, human body key points in the two-dimensional human body gesture key point feature map can be rapidly divided, human body key points belonging to the same human body are definitely determined, and further two-dimensional positions of the human body key points of a single human body are obtained.

Step S1032, matching the three-dimensional positions of the human body key points of the human bodies from the three-dimensional human body gesture key point feature map according to the two-dimensional positions of the human body key points of the human bodies.

In this embodiment, two-dimensional positions of human body key points of each human body can be determined from the two-dimensional human body posture key point feature map, the human body to which each key point corresponds, and three-dimensional positions of the corresponding key points are determined from the three-dimensional human body posture key point feature map based on the coordinate positions of each key point in the two-dimensional human body posture key point feature map. For example, the key point of the two-dimensional human body posture key point feature map is the left eye, and the pixel coordinates of the left eye in the two-dimensional human body posture key point feature map are (3, 3). Corresponding three-dimensional channel data (x, y, z) are acquired at the positions of pixel coordinates (3, 3) of the three-dimensional human body posture key point feature map, and the three-dimensional channel data (x, y, z) are taken as the three-dimensional positions of human body key points.

Optionally, step S1032 includes:

acquiring target positions which are the same as the two-dimensional positions of the human body key points of all the human bodies from the three-dimensional human body posture key point feature map;

and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.

Referring to fig. 6 again, the key point 1 in the first marker region 601 may be used as a human body center point, or the middle points of the left hip joint 8 and the right hip joint 11 in the second marker region 602 may be used as human body center points. In fig. 6, taking the key point 1 as an example, if the pixel coordinate of the key point 1 in the two-dimensional human body posture key point feature map is (x, y), the same target position as the pixel coordinate of the key point 1 is (x, y) is obtained from the three-dimensional human body posture key point feature map, that is, the pixel coordinate (x, y) is obtained in the three-dimensional human body posture key point feature map, and the pixel coordinate (x, y) in the three-dimensional human body posture key point feature map is also the key point 1. Since the three-dimensional human body posture key point feature map is 3×19 channel data, each 3 channels represent three-dimensional coordinates of one key point, three-channel data of the key point 1 located at the pixel coordinates (x, y) are read in the three-dimensional human body posture key point feature map, and then the three-dimensional coordinates of the key point 1 can be obtained.

Therefore, the positions of the corresponding human body key points in the three-dimensional human body posture key point feature map can be determined based on the two-dimensional positions of the human body key points in the two-dimensional human body posture key point feature map, three-channel data of the positions of the corresponding human body key points in the three-dimensional human body posture key point feature map are read, the three-dimensional positions of the human body key points are determined, the problem of matching of the two-dimensional positions of the human body key points with the three-dimensional positions of the human body key points is solved, meanwhile, the two-dimensional positions of the human body key points and the three-dimensional positions of the human body key points are obtained, and time expenditure is reduced.

Additionally stated, the human body posture estimation method further comprises:

and obtaining the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map through downsampling the two-dimensional human body posture estimation model according to a preset multiple.

In this embodiment, the preset multiple is determined according to the data accuracy and the data calculation amount, so that the aims of meeting the accuracy requirement and having short calculation time are needed. For example, the preset multiple may be 4 times. For example, if the size of the input image is 512×512, the two-dimensional human body posture key point feature map and the two-dimensional human body articulation feature map have a size of 128×128.

Further to the above, the method further comprises:

when a two-dimensional position extraction instruction of the human body key points is received, determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map.

In this embodiment, to ensure flexibility of the end-to-end human body posture estimation model, when only two-dimensional positions of human body key points are needed, the three-dimensional human body posture estimation model may be free from reasoning, time overhead is reduced, a two-dimensional position extraction instruction of the human body key points may be sent to an output layer of the two-dimensional human body posture estimation model, the two-dimensional human body posture estimation model performs a reasoning process of the two-dimensional positions of the human body key points, a two-dimensional human body posture key point feature map and a two-dimensional human body articulation feature map are obtained, the two-dimensional positions of the human body key points are obtained according to the two-dimensional human body posture key point feature map and the two-dimensional human body articulation feature map, the reasoning process of the three-dimensional human body posture estimation model is forbidden, and the reasoning time of the three-dimensional positions of the human body key points is reduced.

In addition, in step S102, the inputting the first feature data and the second feature data into the three-dimensional human body posture estimation model includes:

combining the first characteristic data and the second characteristic data to obtain a combined result;

and inputting the combination result to the three-dimensional human body posture estimation model.

In this embodiment, after the second feature data of the first feature data is combined by the concat function, the obtained combination result may increase the channel data, for example, the first feature data of the 3×19 channels and the second feature data of the 6×19 channels are combined by the concat function and become feature data of 9×19 channels.

According to the human body posture estimation method provided by the embodiment, first characteristic data of an image are extracted, the first characteristic data are input into a two-dimensional human body posture estimation model, and a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data are output through the two-dimensional human body posture estimation model; inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.

Example 2

The embodiment of the disclosure provides a training method of a human body posture estimation model.

Specifically, the human body posture estimation model comprises a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the human body posture estimation method comprises the following steps:

In the embodiment, an end-to-end human body posture estimation model is constructed, wherein the human body posture estimation model comprises a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained; the two-dimensional human body posture estimation model to be trained can be provided with a plurality of nodes, the nodes are connected with each other, the nodes comprise a plurality of activating (Relu) functions, a plurality of convolution (Conv) functions and a plurality of adding operation (Add) functions, and each node is provided with corresponding parameters. In the process of training the two-dimensional human body posture estimation model to be trained, parameters set by each node are adjusted, and the two-dimensional human body posture estimation model is optimized. It should be noted that, the node connection relationship and parameter setting of the two-dimensional human body posture estimation model to be trained may be set according to actual conditions, which is not limited herein.

The three-dimensional human body posture estimation model to be trained is provided with a plurality of nodes, the nodes are connected with each other, each node comprises a plurality of activating (Relu) functions, a plurality of convolution (Conv) functions and a plurality of adding operation (Add) functions, and each node is provided with corresponding parameters. In the process of training the three-dimensional human body posture estimation model to be trained, parameters set by each node are adjusted, and the two-dimensional human body posture estimation model is optimized. It should be noted that, the node connection relationship and parameter setting of the three-dimensional human body posture estimation model to be trained may be set according to actual conditions, which is not limited herein.

In the embodiment, when the two-dimensional human body posture estimation model to be trained is trained by using the two-dimensional human body key point data, network parameters of the three-dimensional human body posture estimation model to be trained are frozen, and the three-dimensional human body posture estimation model to be trained does not perform reasoning learning; when the three-dimensional human body posture estimation model to be trained is trained by using the three-dimensional human body key point data, network parameters of the two-dimensional human body posture estimation model to be trained are frozen, and the two-dimensional human body posture estimation model to be trained does not perform reasoning learning.

According to the training method of the human body posture estimation model, the two-dimensional human body posture estimation model to be trained and the three-dimensional human body posture estimation model to be trained in the human body posture estimation model can be independently trained, so that an end-to-end human body posture estimation model is obtained, two-dimensional position and three-dimensional position detection of key points of a human body can be simultaneously achieved through the end-to-end human body posture estimation model, and time expenditure is reduced.

Example 3

Further, the embodiment of the present disclosure provides a human body posture estimating apparatus.

Specifically, as shown in fig. 7, the human body posture estimating apparatus 700 includes:

the first processing module 701 is configured to extract first feature data of an image, input the first feature data to a two-dimensional human body posture estimation model, and output a two-dimensional human body posture key point feature map, a two-dimensional human body joint connection feature map, and second feature data through the two-dimensional human body posture estimation model;

the second processing module 702 is configured to input the first feature data and the second feature data to a three-dimensional human body posture estimation model, and output a three-dimensional human body posture key point feature map through the three-dimensional human body posture estimation model;

the determining module 703 is configured to determine a two-dimensional position of a human body key point of each human body in the two-dimensional human body posture key point feature map and a three-dimensional position of a human body key point of each human body according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.

Optionally, the determining module 703 is further configured to determine two-dimensional positions of human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;

and matching the three-dimensional positions of the human body key points of the human bodies from the three-dimensional human body gesture key point feature map according to the two-dimensional positions of the human body key points of the human bodies.

Optionally, the determining module 703 is further configured to determine a plurality of human body key points according to the two-dimensional human body posture key point feature map;

determining a plurality of joint connection relations according to the two-dimensional human joint connection feature diagram;

and matching the plurality of human body key points with the plurality of joint connection relations, and determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map.

Optionally, the determining module 703 is further configured to obtain, from the three-dimensional human body posture key point feature map, a target position that is the same as the two-dimensional position of the human body key point of each human body;

Optionally, the first processing module 701 is further configured to obtain the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map by downsampling the two-dimensional human body posture estimation model according to a preset multiple.

Optionally, the determining module 703 is further configured to determine, when receiving a two-dimensional position extraction instruction of a human body keypoint, a two-dimensional position of a human body keypoint of each human body in the two-dimensional human body posture keypoint feature map according to the two-dimensional human body posture keypoint feature map and the two-dimensional human body joint connection feature map.

Optionally, a second processing module 603 is configured to combine the first feature data and the second feature data to obtain a combined result;

The human body posture estimation device 700 provided in this embodiment can implement the human body posture estimation method provided in embodiment 1, and in order to avoid repetition, a detailed description is omitted here.

Example 4

In addition, the embodiment of the disclosure provides a training device of the human body posture estimation model.

Specifically, the human body posture estimation model includes a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the device includes:

the first control module is used for controlling and freezing the three-dimensional human body posture estimation model to be trained, inputting two-dimensional human body key point data into the two-dimensional human body posture estimation model to be trained and training to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprise image data marked with two-dimensional position information of human body key points;

the second control module is used for controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained and training to obtain a trained three-dimensional human body posture estimation model, and the three-dimensional human body key point data comprise image data marked with three-dimensional position information of human body key points.

The training device for the human body posture estimation model provided in this embodiment may implement the training method for the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a description thereof will be omitted.

The training device for the human body posture estimation model provided by the embodiment can be used for independently training the two-dimensional human body posture estimation model to be trained and the three-dimensional human body posture estimation model to be trained in the human body posture estimation model, so that an end-to-end human body posture estimation model is obtained, and through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the human body key point can be detected simultaneously, and the time cost is reduced.

Example 5

Furthermore, an embodiment of the present disclosure provides a terminal device, including a memory and a processor, where the memory stores a computer program that, when executed on the processor, performs the human body posture estimation method provided in the above method embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2.

The terminal device provided in this embodiment may implement the human body posture estimation method provided in embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a description thereof will be omitted.

Example 6

The present application also provides a computer-readable storage medium storing a computer program that, when run on a processor, performs the human body posture estimation method provided in embodiment 2 or the training method of the human body posture estimation model provided in embodiment 2.

The computer readable storage medium provided in this embodiment may implement the human body posture estimation method provided in embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a detailed description is omitted here.

In the present embodiment, the computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of human body pose estimation, the method comprising:

determining two-dimensional positions of human body key points of all human bodies in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;

2. The method according to claim 1, wherein determining the two-dimensional positions of the human keypoints of each human body in the two-dimensional human body posture keypoint feature map from the two-dimensional human body posture keypoint feature map and the two-dimensional human body articulation feature map comprises:

determining a plurality of human body key points according to the two-dimensional human body posture key point feature map;

3. The method according to claim 1, wherein the method further comprises:

4. The method according to claim 1, wherein the method further comprises:

5. The method of claim 1, wherein the inputting the first feature data and the second feature data to a three-dimensional human body pose estimation model comprises:

6. A method for training a human body posture estimation model, the human body posture estimation model comprising a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, the method comprising:

controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained, and training to obtain a trained three-dimensional human body posture estimation model, wherein the three-dimensional human body key point data comprises image data marked with three-dimensional position information of human body key points;

extracting first characteristic data of an image, inputting the first characteristic data into the trained two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;

inputting the first characteristic data and the second characteristic data into the trained three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;

7. A human body posture estimation apparatus, characterized in that the apparatus comprises:

the determining module is used for determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;

8. A terminal device comprising a memory and a processor, the memory storing a computer program that, when run by the processor, performs the human body posture estimation method of any one of claims 1 to 5 or the training method of the human body posture estimation model of claim 6.