CN113298922B - Human body posture estimation method and device and terminal equipment - Google Patents

Human body posture estimation method and device and terminal equipment Download PDF

Info

Publication number
CN113298922B
CN113298922B CN202110655977.7A CN202110655977A CN113298922B CN 113298922 B CN113298922 B CN 113298922B CN 202110655977 A CN202110655977 A CN 202110655977A CN 113298922 B CN113298922 B CN 113298922B
Authority
CN
China
Prior art keywords
human body
dimensional
body posture
dimensional human
estimation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110655977.7A
Other languages
Chinese (zh)
Other versions
CN113298922A (en
Inventor
郭渺辰
程骏
汤志超
邵池
张惊涛
胡淑萍
庞建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN202110655977.7A priority Critical patent/CN113298922B/en
Publication of CN113298922A publication Critical patent/CN113298922A/en
Priority to PCT/CN2021/134498 priority patent/WO2022257378A1/en
Application granted granted Critical
Publication of CN113298922B publication Critical patent/CN113298922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Abstract

The embodiment of the application provides a human body posture estimation method, a device and terminal equipment, wherein the method comprises the following steps: extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model; inputting first characteristic data and second characteristic data into the three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected at the same time, and the time cost is reduced.

Description

Human body posture estimation method and device and terminal equipment
Technical Field
The present application relates to the field of man-machine interaction technologies, and in particular, to a human body posture estimation method, a device and a terminal device.
Background
With the gradual popularization and application of intelligent control and man-machine interaction technology, human interaction with machines is frequent. The need to analyze people's emotions, behaviors, etc. with machines is becoming more and more urgent. The behavior recognition can include recognition of behaviors such as hand lifting detection, sleepiness detection, standing detection, mind detection and the like in an education scene, and recognition of behaviors such as fighting, drowning, calling for help and the like in the security field.
In the process of identifying the behaviors of the people, the method has an extremely important meaning for analyzing the gesture actions of the people. The simple human body detection cannot analyze the human body posture in detail, so that the acquisition of the motion state of human bones is necessary. The current human body posture estimation can be divided into a multi-human body posture estimation and a single human body posture estimation, and the multi-human body posture estimation or the single human body posture estimation respectively comprises the detection of the 2D human body posture estimation and the 3D human body posture estimation.
In the prior art, the human body posture of a single person is estimated by detecting the position of each human body in an image by using a human body detector, and then positioning key points of a detection rectangular area of each human body.
In the prior art, 2D human body posture estimation can be roughly divided into two ideas, namely a top-down (top-down) scheme, typically an alphaPose, firstly, all human bodies in an image are detected by using a human body detector, then, single human body posture estimation is performed on each human body, the top-down scheme has higher requirements on the precision of human body detection, the precision of human body detection affects the precision of key points to a great extent, and the more the number of people is, the larger the total time overhead is. Another is the bottom-up (openpoint) scheme, which is typically performed by first detecting the positions of all key points of all persons from the whole graph and then assigning the points to each person, and has the advantage that the number of persons in the image does not affect the reasoning speed, but the accuracy is slightly lower than the top-down scheme. The multi-person 3D human body posture estimation is to predict the coordinate position under the camera system from the image or calculate the spatial related positions of other key points relative to the zero point by taking a certain key point of the multi-person 3D human body posture estimation as the zero point. The scheme provided by the prior art is intelligent, and can be used for independently realizing 2D human body posture estimation or 3D human body posture estimation, so that 2D human body posture estimation and 3D human body posture estimation can not be realized at the same time, and the problem of relatively large time consumption of human body posture estimation exists.
Disclosure of Invention
In order to solve the technical problems described above, embodiments of the present application provide a human body posture estimation method, a training method of a human body posture estimation model, a device, a terminal device, and a computer readable storage medium.
In a first aspect, an embodiment of the present application provides a human body posture estimation method, including:
extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;
inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;
and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.
In a second aspect, an embodiment of the present application provides a training method for a human body posture estimation model, where the human body posture estimation model includes a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the method includes:
controlling and freezing the three-dimensional human body posture estimation model to be trained, and inputting two-dimensional human body key point data to the two-dimensional human body posture estimation model to be trained to train to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprises image data marked with two-dimensional position information of human body key points;
and controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained, and training to obtain the trained three-dimensional human body posture estimation model, wherein the three-dimensional human body key point data comprises image data marked with three-dimensional position information of human body key points.
In a third aspect, an embodiment of the present application provides a human body posture estimation apparatus, including:
the first processing module is used for extracting first characteristic data of an image, inputting the first characteristic data into the two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data through the two-dimensional human body posture estimation model;
the second processing module is used for inputting the first characteristic data and the second characteristic data into the three-dimensional human body posture estimation model and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;
the determining module is used for determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.
In a fourth aspect, an embodiment of the present application provides a terminal device, including a memory and a processor, where the memory is configured to store a computer program, where the computer program executes, when executed by the processor, a human body posture estimation method provided in the first aspect or a training method for a human body posture estimation model provided in the second aspect.
The human body posture estimation method provided by the application extracts the first characteristic data of the image, inputs the first characteristic data into the two-dimensional human body posture estimation model, and outputs a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data through the two-dimensional human body posture estimation model; inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of the present application. Like elements are numbered alike in the various figures.
Fig. 1 is a schematic flow chart of a human body posture estimation method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a human body posture estimation model according to an embodiment of the present application;
FIG. 3 is another schematic diagram of a human body posture estimation model according to an embodiment of the present application;
fig. 4 is a schematic flowchart of step S103 of the human body posture estimation method according to the embodiment of the present application;
fig. 5 is a schematic flow chart of step S1031 of the human body posture estimation method according to the embodiment of the application;
FIG. 6 is a schematic illustration of a human body articulation provided in accordance with an embodiment of the present application;
fig. 7 is a schematic structural diagram of a human body posture estimating apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.
Example 1
The embodiment of the disclosure provides a human body posture estimation method.
Specifically, referring to fig. 1, the human body posture estimating method includes:
step S101, extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;
in this embodiment, an end-to-end human body posture estimation model is constructed. Referring to fig. 2, the human body posture estimation model includes a backbone network model 202, a two-dimensional human body posture estimation model 203, and a three-dimensional human body posture estimation model 205. The backbone network model 202, also referred to as a backbone (backbone) network model, may be a lightweight or heavy-weight deep neural network model, and is not limited herein. The backbone network model 202 is connected to the two-dimensional human body posture estimation model 203 and the three-dimensional human body posture estimation model 205, and the two-dimensional human body posture estimation model 203 and the three-dimensional human body posture estimation model 205 are connected. The first output 204 includes a two-dimensional human body pose key point feature map and a two-dimensional human body articulation feature map. Referring to fig. 2, the first feature data of the image 201 may be extracted, where the image 201 includes a plurality of human bodies, the image 201 is an image captured by a camera, the image includes a plurality of human body images, the human body images in the image 201 are only illustrated, and the actually captured image may be in other forms, which is not limited herein.
In this embodiment, the specific structure of the two-dimensional human body posture estimation model can be seen in the two-dimensional human body posture estimation model 301 in fig. 3. A plurality of nodes are provided in the two-dimensional human body posture estimation model 301, and the nodes are connected to each other, and include a plurality of activation (Relu) functions, a plurality of convolution (Conv) functions, and a plurality of addition (Add) functions. And setting corresponding parameters for each node. It should be noted that, the two-dimensional human body posture estimation model 301 is merely illustrative, and in specific cases, the node connection relationship and the parameter setting may be different, which is not limited herein. The two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map can be a two-dimensional human body posture key point heat map and a two-dimensional human body joint connection heat map respectively. For example, in fig. 3, the hetmap output by the two-dimensional human body posture estimation model 301 is a two-dimensional human body posture key point heat map, and pafs is a two-dimensional human body joint connection heat map.
Step S102, inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model.
In this embodiment, the specific structure of the three-dimensional human body posture estimation model may be referred to as a three-dimensional human body posture estimation model 302 in fig. 3. A plurality of nodes are provided in the three-dimensional body posture estimation model 302, and the nodes are connected with each other, and the nodes include a plurality of activation (Relu) functions, a plurality of convolution (Conv) functions, and a plurality of addition (Add) functions. And setting corresponding parameters for each node. It should be noted that, the three-dimensional human body posture estimation model 302 is merely an exemplary convolution (Conv) function, and may further include other nodes, where in specific cases, the number of nodes, the types of nodes, the connection relationships of the nodes, and the parameters may be set according to actual situations, which is not limited herein. Referring again to fig. 2, the second output 206 of fig. 2 includes a three-dimensional human body posture key point feature map. In this embodiment, the three-dimensional human body posture key point feature map is a heat map of 3×19 channels, where 3×18 channels correspond to 18 key points of the human body and 3×1 channels correspond to the background map.
Step S103, determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.
Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.
Optionally, referring to fig. 4, step S103 includes:
step S1031, determining two-dimensional positions of human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map.
In this embodiment, the images input to the end-to-end human body posture estimation model may include a plurality of human body images, so that in order to accurately divide key points of each human body, key nodes of each human body need to be matched by combining a two-dimensional human body posture key point feature map and a two-dimensional human body joint connection feature map.
Optionally, referring to fig. 5, step S1031 includes:
step S10311, determining a plurality of human body key points according to the two-dimensional human body gesture key point feature map;
step S10312, determining a plurality of joint connection relations according to the two-dimensional human body joint connection characteristic diagram;
step S10313, matching the plurality of human body key points and the plurality of joint connection relations, and determining two-dimensional positions of the human body key points of each human body in the two-dimensional human body gesture key point feature map.
In this embodiment, the two-dimensional human body posture key point feature map is a 19-channel feature map, wherein the 19 channels include channels of 18 key points and channels of 1 background map. The positions of the peaks of the channels in the two-dimensional human body posture key point feature map are correspondingly human body key points.
Referring to fig. 6, the articulation characteristic diagram of fig. 6 includes 18 key points of the human body and connection relations between adjacent key points. The 18 keypoints are numbered from 0 to 17, respectively. In fig. 6, two joints may be represented between two adjacent keypoints, for example, for the keypoint 2 and the keypoint 3 in fig. 6, the joint connection extending from the keypoint 3 to the direction of the keypoint 2 is a different joint connection from the joint connection extending from the keypoint 2 to the direction of the keypoint 3. According to the joint connection relation in the joint connection feature diagram, a plurality of key points in the two-dimensional human body gesture key point feature diagram can be matched, all key points belonging to the same human body are matched, and the two-dimensional positions of the human body key points of all human bodies in the two-dimensional human body gesture key point feature diagram are determined based on all key points of the same human body.
Therefore, human body key points in the two-dimensional human body gesture key point feature map can be rapidly divided, human body key points belonging to the same human body are definitely determined, and further two-dimensional positions of the human body key points of a single human body are obtained.
Step S1032, matching the three-dimensional positions of the human body key points of the human bodies from the three-dimensional human body gesture key point feature map according to the two-dimensional positions of the human body key points of the human bodies.
In this embodiment, two-dimensional positions of human body key points of each human body can be determined from the two-dimensional human body posture key point feature map, the human body to which each key point corresponds, and three-dimensional positions of the corresponding key points are determined from the three-dimensional human body posture key point feature map based on the coordinate positions of each key point in the two-dimensional human body posture key point feature map. For example, the key point of the two-dimensional human body posture key point feature map is the left eye, and the pixel coordinates of the left eye in the two-dimensional human body posture key point feature map are (3, 3). Corresponding three-dimensional channel data (x, y, z) are acquired at the positions of pixel coordinates (3, 3) of the three-dimensional human body posture key point feature map, and the three-dimensional channel data (x, y, z) are taken as the three-dimensional positions of human body key points.
Optionally, step S1032 includes:
acquiring target positions which are the same as the two-dimensional positions of the human body key points of all the human bodies from the three-dimensional human body posture key point feature map;
and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.
Referring to fig. 6 again, the key point 1 in the first marker region 601 may be used as a human body center point, or the middle points of the left hip joint 8 and the right hip joint 11 in the second marker region 602 may be used as human body center points. In fig. 6, taking the key point 1 as an example, if the pixel coordinate of the key point 1 in the two-dimensional human body posture key point feature map is (x, y), the same target position as the pixel coordinate of the key point 1 is (x, y) is obtained from the three-dimensional human body posture key point feature map, that is, the pixel coordinate (x, y) is obtained in the three-dimensional human body posture key point feature map, and the pixel coordinate (x, y) in the three-dimensional human body posture key point feature map is also the key point 1. Since the three-dimensional human body posture key point feature map is 3×19 channel data, each 3 channels represent three-dimensional coordinates of one key point, three-channel data of the key point 1 located at the pixel coordinates (x, y) are read in the three-dimensional human body posture key point feature map, and then the three-dimensional coordinates of the key point 1 can be obtained.
Therefore, the positions of the corresponding human body key points in the three-dimensional human body posture key point feature map can be determined based on the two-dimensional positions of the human body key points in the two-dimensional human body posture key point feature map, three-channel data of the positions of the corresponding human body key points in the three-dimensional human body posture key point feature map are read, the three-dimensional positions of the human body key points are determined, the problem of matching of the two-dimensional positions of the human body key points with the three-dimensional positions of the human body key points is solved, meanwhile, the two-dimensional positions of the human body key points and the three-dimensional positions of the human body key points are obtained, and time expenditure is reduced.
Additionally stated, the human body posture estimation method further comprises:
and obtaining the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map through downsampling the two-dimensional human body posture estimation model according to a preset multiple.
In this embodiment, the preset multiple is determined according to the data accuracy and the data calculation amount, so that the aims of meeting the accuracy requirement and having short calculation time are needed. For example, the preset multiple may be 4 times. For example, if the size of the input image is 512×512, the two-dimensional human body posture key point feature map and the two-dimensional human body articulation feature map have a size of 128×128.
Further to the above, the method further comprises:
when a two-dimensional position extraction instruction of the human body key points is received, determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map.
In this embodiment, to ensure flexibility of the end-to-end human body posture estimation model, when only two-dimensional positions of human body key points are needed, the three-dimensional human body posture estimation model may be free from reasoning, time overhead is reduced, a two-dimensional position extraction instruction of the human body key points may be sent to an output layer of the two-dimensional human body posture estimation model, the two-dimensional human body posture estimation model performs a reasoning process of the two-dimensional positions of the human body key points, a two-dimensional human body posture key point feature map and a two-dimensional human body articulation feature map are obtained, the two-dimensional positions of the human body key points are obtained according to the two-dimensional human body posture key point feature map and the two-dimensional human body articulation feature map, the reasoning process of the three-dimensional human body posture estimation model is forbidden, and the reasoning time of the three-dimensional positions of the human body key points is reduced.
In addition, in step S102, the inputting the first feature data and the second feature data into the three-dimensional human body posture estimation model includes:
combining the first characteristic data and the second characteristic data to obtain a combined result;
and inputting the combination result to the three-dimensional human body posture estimation model.
In this embodiment, after the second feature data of the first feature data is combined by the concat function, the obtained combination result may increase the channel data, for example, the first feature data of the 3×19 channels and the second feature data of the 6×19 channels are combined by the concat function and become feature data of 9×19 channels.
According to the human body posture estimation method provided by the embodiment, first characteristic data of an image are extracted, the first characteristic data are input into a two-dimensional human body posture estimation model, and a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data are output through the two-dimensional human body posture estimation model; inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.
Example 2
The embodiment of the disclosure provides a training method of a human body posture estimation model.
Specifically, the human body posture estimation model comprises a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the human body posture estimation method comprises the following steps:
controlling and freezing the three-dimensional human body posture estimation model to be trained, and inputting two-dimensional human body key point data to the two-dimensional human body posture estimation model to be trained to train to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprises image data marked with two-dimensional position information of human body key points;
and controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained, and training to obtain the trained three-dimensional human body posture estimation model, wherein the three-dimensional human body key point data comprises image data marked with three-dimensional position information of human body key points.
In the embodiment, an end-to-end human body posture estimation model is constructed, wherein the human body posture estimation model comprises a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained; the two-dimensional human body posture estimation model to be trained can be provided with a plurality of nodes, the nodes are connected with each other, the nodes comprise a plurality of activating (Relu) functions, a plurality of convolution (Conv) functions and a plurality of adding operation (Add) functions, and each node is provided with corresponding parameters. In the process of training the two-dimensional human body posture estimation model to be trained, parameters set by each node are adjusted, and the two-dimensional human body posture estimation model is optimized. It should be noted that, the node connection relationship and parameter setting of the two-dimensional human body posture estimation model to be trained may be set according to actual conditions, which is not limited herein.
The three-dimensional human body posture estimation model to be trained is provided with a plurality of nodes, the nodes are connected with each other, each node comprises a plurality of activating (Relu) functions, a plurality of convolution (Conv) functions and a plurality of adding operation (Add) functions, and each node is provided with corresponding parameters. In the process of training the three-dimensional human body posture estimation model to be trained, parameters set by each node are adjusted, and the two-dimensional human body posture estimation model is optimized. It should be noted that, the node connection relationship and parameter setting of the three-dimensional human body posture estimation model to be trained may be set according to actual conditions, which is not limited herein.
In the embodiment, when the two-dimensional human body posture estimation model to be trained is trained by using the two-dimensional human body key point data, network parameters of the three-dimensional human body posture estimation model to be trained are frozen, and the three-dimensional human body posture estimation model to be trained does not perform reasoning learning; when the three-dimensional human body posture estimation model to be trained is trained by using the three-dimensional human body key point data, network parameters of the two-dimensional human body posture estimation model to be trained are frozen, and the two-dimensional human body posture estimation model to be trained does not perform reasoning learning.
According to the training method of the human body posture estimation model, the two-dimensional human body posture estimation model to be trained and the three-dimensional human body posture estimation model to be trained in the human body posture estimation model can be independently trained, so that an end-to-end human body posture estimation model is obtained, two-dimensional position and three-dimensional position detection of key points of a human body can be simultaneously achieved through the end-to-end human body posture estimation model, and time expenditure is reduced.
Example 3
Further, the embodiment of the present disclosure provides a human body posture estimating apparatus.
Specifically, as shown in fig. 7, the human body posture estimating apparatus 700 includes:
the first processing module 701 is configured to extract first feature data of an image, input the first feature data to a two-dimensional human body posture estimation model, and output a two-dimensional human body posture key point feature map, a two-dimensional human body joint connection feature map, and second feature data through the two-dimensional human body posture estimation model;
the second processing module 702 is configured to input the first feature data and the second feature data to a three-dimensional human body posture estimation model, and output a three-dimensional human body posture key point feature map through the three-dimensional human body posture estimation model;
the determining module 703 is configured to determine a two-dimensional position of a human body key point of each human body in the two-dimensional human body posture key point feature map and a three-dimensional position of a human body key point of each human body according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map.
Optionally, the determining module 703 is further configured to determine two-dimensional positions of human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;
and matching the three-dimensional positions of the human body key points of the human bodies from the three-dimensional human body gesture key point feature map according to the two-dimensional positions of the human body key points of the human bodies.
Optionally, the determining module 703 is further configured to determine a plurality of human body key points according to the two-dimensional human body posture key point feature map;
determining a plurality of joint connection relations according to the two-dimensional human joint connection feature diagram;
and matching the plurality of human body key points with the plurality of joint connection relations, and determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map.
Optionally, the determining module 703 is further configured to obtain, from the three-dimensional human body posture key point feature map, a target position that is the same as the two-dimensional position of the human body key point of each human body;
and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.
Optionally, the first processing module 701 is further configured to obtain the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map by downsampling the two-dimensional human body posture estimation model according to a preset multiple.
Optionally, the determining module 703 is further configured to determine, when receiving a two-dimensional position extraction instruction of a human body keypoint, a two-dimensional position of a human body keypoint of each human body in the two-dimensional human body posture keypoint feature map according to the two-dimensional human body posture keypoint feature map and the two-dimensional human body joint connection feature map.
Optionally, a second processing module 603 is configured to combine the first feature data and the second feature data to obtain a combined result;
and inputting the combination result to the three-dimensional human body posture estimation model.
The human body posture estimation device 700 provided in this embodiment can implement the human body posture estimation method provided in embodiment 1, and in order to avoid repetition, a detailed description is omitted here.
According to the human body posture estimation method provided by the embodiment, first characteristic data of an image are extracted, the first characteristic data are input into a two-dimensional human body posture estimation model, and a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data are output through the two-dimensional human body posture estimation model; inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model; and determining the two-dimensional positions of the human body key points of all the human bodies in the two-dimensional human body posture key point feature map and the three-dimensional positions of the human body key points of all the human bodies according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map. Therefore, through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the key point of the human body can be detected simultaneously, and the time cost is reduced.
Example 4
In addition, the embodiment of the disclosure provides a training device of the human body posture estimation model.
Specifically, the human body posture estimation model includes a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the device includes:
the first control module is used for controlling and freezing the three-dimensional human body posture estimation model to be trained, inputting two-dimensional human body key point data into the two-dimensional human body posture estimation model to be trained and training to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprise image data marked with two-dimensional position information of human body key points;
the second control module is used for controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained and training to obtain a trained three-dimensional human body posture estimation model, and the three-dimensional human body key point data comprise image data marked with three-dimensional position information of human body key points.
The training device for the human body posture estimation model provided in this embodiment may implement the training method for the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a description thereof will be omitted.
The training device for the human body posture estimation model provided by the embodiment can be used for independently training the two-dimensional human body posture estimation model to be trained and the three-dimensional human body posture estimation model to be trained in the human body posture estimation model, so that an end-to-end human body posture estimation model is obtained, and through the end-to-end human body posture estimation model, the two-dimensional position and the three-dimensional position of the human body key point can be detected simultaneously, and the time cost is reduced.
Example 5
Furthermore, an embodiment of the present disclosure provides a terminal device, including a memory and a processor, where the memory stores a computer program that, when executed on the processor, performs the human body posture estimation method provided in the above method embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2.
The terminal device provided in this embodiment may implement the human body posture estimation method provided in embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a description thereof will be omitted.
Example 6
The present application also provides a computer-readable storage medium storing a computer program that, when run on a processor, performs the human body posture estimation method provided in embodiment 2 or the training method of the human body posture estimation model provided in embodiment 2.
The computer readable storage medium provided in this embodiment may implement the human body posture estimation method provided in embodiment 1 or the training method of the human body posture estimation model provided in embodiment 2, and in order to avoid repetition, a detailed description is omitted here.
In the present embodiment, the computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (8)

1. A method of human body pose estimation, the method comprising:
extracting first characteristic data of an image, inputting the first characteristic data into a two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;
inputting the first characteristic data and the second characteristic data into a three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;
determining two-dimensional positions of human body key points of all human bodies in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;
acquiring target positions which are the same as the two-dimensional positions of the human body key points of all the human bodies from the three-dimensional human body posture key point feature map;
and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.
2. The method according to claim 1, wherein determining the two-dimensional positions of the human keypoints of each human body in the two-dimensional human body posture keypoint feature map from the two-dimensional human body posture keypoint feature map and the two-dimensional human body articulation feature map comprises:
determining a plurality of human body key points according to the two-dimensional human body posture key point feature map;
determining a plurality of joint connection relations according to the two-dimensional human joint connection feature diagram;
and matching the plurality of human body key points with the plurality of joint connection relations, and determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map.
3. The method according to claim 1, wherein the method further comprises:
and obtaining the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map through downsampling the two-dimensional human body posture estimation model according to a preset multiple.
4. The method according to claim 1, wherein the method further comprises:
when a two-dimensional position extraction instruction of the human body key points is received, determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map.
5. The method of claim 1, wherein the inputting the first feature data and the second feature data to a three-dimensional human body pose estimation model comprises:
combining the first characteristic data and the second characteristic data to obtain a combined result;
and inputting the combination result to the three-dimensional human body posture estimation model.
6. A method for training a human body posture estimation model, the human body posture estimation model comprising a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, the method comprising:
controlling and freezing the three-dimensional human body posture estimation model to be trained, and inputting two-dimensional human body key point data to the two-dimensional human body posture estimation model to be trained to train to obtain a trained two-dimensional human body posture estimation model, wherein the two-dimensional human body key point data comprises image data marked with two-dimensional position information of human body key points;
controlling and freezing the two-dimensional human body posture estimation model to be trained, inputting three-dimensional human body key point data into the three-dimensional human body posture estimation model to be trained, and training to obtain a trained three-dimensional human body posture estimation model, wherein the three-dimensional human body key point data comprises image data marked with three-dimensional position information of human body key points;
extracting first characteristic data of an image, inputting the first characteristic data into the trained two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic diagram, a two-dimensional human body joint connection characteristic diagram and second characteristic data through the two-dimensional human body posture estimation model;
inputting the first characteristic data and the second characteristic data into the trained three-dimensional human body posture estimation model, and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;
determining two-dimensional positions of human body key points of all human bodies in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;
acquiring target positions which are the same as the two-dimensional positions of the human body key points of all the human bodies from the three-dimensional human body posture key point feature map;
and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.
7. A human body posture estimation apparatus, characterized in that the apparatus comprises:
the first processing module is used for extracting first characteristic data of an image, inputting the first characteristic data into the two-dimensional human body posture estimation model, and outputting a two-dimensional human body posture key point characteristic image, a two-dimensional human body joint connection characteristic image and second characteristic data through the two-dimensional human body posture estimation model;
the second processing module is used for inputting the first characteristic data and the second characteristic data into the three-dimensional human body posture estimation model and outputting a three-dimensional human body posture key point characteristic diagram through the three-dimensional human body posture estimation model;
the determining module is used for determining the two-dimensional positions of the human body key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map;
acquiring target positions which are the same as the two-dimensional positions of the human body key points of all the human bodies from the three-dimensional human body posture key point feature map;
and acquiring three-channel data corresponding to the target position in the three-dimensional human body posture key point feature map, and taking the three-channel data as the three-dimensional positions of the human body key points of all human bodies.
8. A terminal device comprising a memory and a processor, the memory storing a computer program that, when run by the processor, performs the human body posture estimation method of any one of claims 1 to 5 or the training method of the human body posture estimation model of claim 6.
CN202110655977.7A 2021-06-11 2021-06-11 Human body posture estimation method and device and terminal equipment Active CN113298922B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110655977.7A CN113298922B (en) 2021-06-11 2021-06-11 Human body posture estimation method and device and terminal equipment
PCT/CN2021/134498 WO2022257378A1 (en) 2021-06-11 2021-11-30 Human body posture estimation method and apparatus, and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110655977.7A CN113298922B (en) 2021-06-11 2021-06-11 Human body posture estimation method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN113298922A CN113298922A (en) 2021-08-24
CN113298922B true CN113298922B (en) 2023-08-29

Family

ID=77328131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110655977.7A Active CN113298922B (en) 2021-06-11 2021-06-11 Human body posture estimation method and device and terminal equipment

Country Status (2)

Country Link
CN (1) CN113298922B (en)
WO (1) WO2022257378A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298922B (en) * 2021-06-11 2023-08-29 深圳市优必选科技股份有限公司 Human body posture estimation method and device and terminal equipment
CN114724177B (en) * 2022-03-08 2023-04-07 三峡大学 Human body drowning detection method combining Alphapos and YOLOv5s models

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model
CN112836618A (en) * 2021-01-28 2021-05-25 清华大学深圳国际研究生院 Three-dimensional human body posture estimation method and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9123144B2 (en) * 2011-11-11 2015-09-01 Microsoft Technology Licensing, Llc Computing 3D shape parameters for face animation
CN108460338B (en) * 2018-02-02 2020-12-11 北京市商汤科技开发有限公司 Human body posture estimation method and apparatus, electronic device, storage medium, and program
CN113298922B (en) * 2021-06-11 2023-08-29 深圳市优必选科技股份有限公司 Human body posture estimation method and device and terminal equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model
CN112836618A (en) * 2021-01-28 2021-05-25 清华大学深圳国际研究生院 Three-dimensional human body posture estimation method and computer readable storage medium

Also Published As

Publication number Publication date
CN113298922A (en) 2021-08-24
WO2022257378A1 (en) 2022-12-15

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
CN110781765B (en) Human body posture recognition method, device, equipment and storage medium
CN110781843B (en) Classroom behavior detection method and electronic equipment
CN107358149B (en) Human body posture detection method and device
CN109766840B (en) Facial expression recognition method, device, terminal and storage medium
CN113298922B (en) Human body posture estimation method and device and terminal equipment
CN112434721A (en) Image classification method, system, storage medium and terminal based on small sample learning
CN110633004B (en) Interaction method, device and system based on human body posture estimation
EP2426643A1 (en) Object position inference device, object position inference method, and program
KR20150039252A (en) Apparatus and method for providing application service by using action recognition
CN111368751A (en) Image processing method, image processing device, storage medium and electronic equipment
Li et al. Robust multiperson detection and tracking for mobile service and social robots
CN112380955B (en) Action recognition method and device
CN112381837A (en) Image processing method and electronic equipment
CN113515981A (en) Identification method, device, equipment and storage medium
CN110807379A (en) Semantic recognition method and device and computer storage medium
CN111652181B (en) Target tracking method and device and electronic equipment
AU2021204584A1 (en) Methods, apparatuses, devices and storage media for detecting correlated objects involved in image
CN109190466A (en) A kind of method and apparatus that personnel position in real time
CN112613668A (en) Scenic spot dangerous area management and control method based on artificial intelligence
JP6773825B2 (en) Learning device, learning method, learning program, and object recognition device
Kiyokawa et al. Efficient collection and automatic annotation of real-world object images by taking advantage of post-diminished multiple visual markers
US11551379B2 (en) Learning template representation libraries
CN111160219B (en) Object integrity evaluation method and device, electronic equipment and storage medium
CN112966546A (en) Embedded attitude estimation method based on unmanned aerial vehicle scout image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant