WO2022257378A1 - 人体姿态估计方法、装置及终端设备 - Google Patents

人体姿态估计方法、装置及终端设备 Download PDF

Info

Publication number
WO2022257378A1
WO2022257378A1 PCT/CN2021/134498 CN2021134498W WO2022257378A1 WO 2022257378 A1 WO2022257378 A1 WO 2022257378A1 CN 2021134498 W CN2021134498 W CN 2021134498W WO 2022257378 A1 WO2022257378 A1 WO 2022257378A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
dimensional
feature map
key point
estimation model
Prior art date
Application number
PCT/CN2021/134498
Other languages
English (en)
French (fr)
Inventor
郭渺辰
程骏
汤志超
邵池
张惊涛
胡淑萍
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022257378A1 publication Critical patent/WO2022257378A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Definitions

  • the present application relates to the field of human-computer interaction technology, and in particular to a human body posture estimation method, device and terminal equipment.
  • Behavior recognition can include the recognition of behaviors such as raising hands in class, drowsiness detection, standing detection, and mind-wandering detection in educational scenes, and can also include the recognition of fighting, drowning, and calling for help in the security field.
  • the pose estimation of a single human body uses a human body detector to detect the position of each human body in an image, and then performs key point positioning on the detected rectangular area of each human body.
  • 2D human body pose estimation can be roughly divided into two ideas.
  • One is a top-down (top-down) scheme, typically AlphaPose.
  • a human body detector is used to detect all human bodies in an image, and then Each human body performs single-person human body posture estimation.
  • the top-down scheme has high requirements for the accuracy of human body detection.
  • the accuracy of human body detection greatly affects the accuracy of key points, and the more people there are, the greater the overall time cost will be.
  • the other is a bottom-up scheme, typically Openpose, which first detects the position of all key points of all people from the full image, and then assigns the points to each person.
  • Multi-person 3D human pose estimation is to predict the coordinate position of the camera system from the image or use a certain key point as the zero point to calculate the spatial correlation position of other key points relative to the zero point.
  • the solutions provided by the prior art intelligently realize 2D human body pose estimation or 3D human body pose estimation alone, but cannot realize 2D human body pose estimation and 3D human body pose estimation at the same time, and there is a problem that the time consumption of human body pose estimation is relatively large.
  • embodiments of the present application provide a human body pose estimation method, a human body pose estimation model training method, a device, a terminal device, and a computer-readable storage medium.
  • the embodiment of the present application provides a method for estimating human body pose, the method comprising:
  • Extracting the first characteristic data of the image inputting the first characteristic data to the two-dimensional human body posture estimation model, outputting the two-dimensional human body posture key point characteristic map and the two-dimensional human body joint connection characteristic map through the two-dimensional human body posture estimation model, and the second feature data;
  • the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map determine the human body key points of each human body in the two-dimensional human body posture key point feature map The two-dimensional position, and the three-dimensional position of the key points of each human body.
  • an embodiment of the present application provides a training method for a human body pose estimation model, the human body pose estimation model includes a two-dimensional human body pose estimation model to be trained and a three-dimensional human body pose estimation model to be trained, and the method includes :
  • Controlling and freezing the 3D human body pose estimation model to be trained inputting 2D human body key point data into the 2D human body pose estimation model to be trained for training to obtain a trained 2D human body pose estimation model, wherein the two The three-dimensional key point data of the human body includes image data marked with the two-dimensional position information of the key points of the human body;
  • Controlling and freezing the 2D human body pose estimation model to be trained inputting 3D human body key point data to the 3D human body pose estimation model to be trained for training to obtain a trained 3D human body pose estimation model, the 3D human body key point data Including image data marked with three-dimensional position information of key points of the human body.
  • the embodiment of the present application provides a device for estimating human body posture
  • the device for estimating human body posture includes:
  • the first processing module is used to extract the first characteristic data of the image, input the first characteristic data to the two-dimensional human body posture estimation model, and output the two-dimensional human body posture key point feature map and the two-dimensional human body posture estimation model through the two-dimensional human body posture estimation model.
  • the second processing module is used to input the first feature data and the second feature data to the three-dimensional human body posture estimation model, and output the three-dimensional human body posture key point feature map by the three-dimensional human body posture estimation model;
  • a determining module configured to determine each of the key points in the two-dimensional human body posture feature map according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map, and the three-dimensional human body posture key point feature map. The two-dimensional positions of the key points of the human body, and the three-dimensional positions of the key points of the human body.
  • an embodiment of the present application provides a terminal device, including a memory and a processor, the memory is used to store a computer program, and the computer program executes the human body pose estimation provided in the first aspect when the processor is running method, or the training method of the human body pose estimation model provided by the second aspect.
  • the human body posture estimation method extracts the first characteristic data of the image, inputs the first characteristic data to the two-dimensional human body posture estimation model, and outputs the two-dimensional human body posture key point features through the two-dimensional human body posture estimation model Figure and two-dimensional human body joint connection feature map, and second feature data; input the first feature data and the second feature data to the three-dimensional human body pose estimation model, and output the three-dimensional human body pose key through the three-dimensional human body pose estimation model Point feature map; according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map, determine each human body in the two-dimensional human body posture key point feature map The two-dimensional position of the key point of the human body, and the three-dimensional position of the key point of the human body of each human body. In this way, through the end-to-end human body pose estimation model, the two-dimensional position and three-dimensional position detection of the key points of the human body can be realized at the
  • FIG. 1 shows a schematic flow chart of a method for estimating human body posture provided by an embodiment of the present application
  • Fig. 2 shows a schematic structural diagram of the human body pose estimation model provided by the embodiment of the present application
  • Fig. 3 shows another schematic structural diagram of the human body pose estimation model provided by the embodiment of the present application.
  • FIG. 4 shows a schematic flow chart of step S103 of the human body pose estimation method provided by the embodiment of the present application
  • FIG. 5 shows a schematic flow chart of step S1031 of the human body pose estimation method provided by the embodiment of the present application
  • Fig. 6 shows a schematic diagram of the joint connection of the human body provided by the embodiment of the present application.
  • FIG. 7 shows a schematic structural diagram of a device for estimating a human body pose provided by an embodiment of the present application.
  • An embodiment of the present disclosure provides a human body pose estimation method.
  • the human body pose estimation method includes:
  • Step S101 extracting the first feature data of the image, inputting the first feature data into the 2D human body pose estimation model, and outputting the 2D human body pose key point feature map and the 2D human body joint connections through the 2D human body pose estimation model a feature map, and second feature data;
  • the human body pose estimation model includes a backbone network model 202 , a 2D human body pose estimation model 203 and a 3D human body pose estimation model 205 .
  • the backbone network model 202 is also called a backbone (backbone) network model, and may be a lightweight or heavyweight deep neural network model, which is not limited here.
  • the backbone network model 202 is respectively connected with the 2D human body pose estimation model 203 and the 3D human body pose estimation model 205, and the 2D human body pose estimation model 203 and the 3D human body pose estimation model 205 are connected.
  • the first output result 204 includes a 2D human body pose key point feature map and a 2D human body joint connection feature map.
  • the first feature data of image 201 can be extracted.
  • Image 201 is an image captured by a camera.
  • the image includes multiple human body images.
  • the human body images in image 201 are only for illustration , the actually captured image may also be in other forms, which is not limited here.
  • the specific structure of the two-dimensional human body pose estimation model may refer to the two-dimensional human body pose estimation model 301 in FIG. 3 .
  • the two-dimensional human pose estimation model 301 is provided with multiple nodes, and the nodes are connected to each other.
  • the nodes include multiple activation (Relu) functions, multiple convolution (Conv) functions, and multiple addition (Add) functions. Set the corresponding parameters for each node.
  • Relu activation
  • Conv convolution
  • Add addition
  • the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map may be respectively a two-dimensional human body posture key point heat map and a two-dimensional human body joint connection heat map.
  • the heatmaps output by the 2D human pose estimation model 301 are heatmaps of key points of 2D human poses
  • pafs are heatmaps of 2D human joint connections.
  • Step S102 inputting the first feature data and the second feature data into the 3D human body pose estimation model, and outputting a 3D human body pose key point feature map through the 3D human body pose estimation model.
  • the specific structure of the 3D human body pose estimation model may refer to the 3D human body pose estimation model 302 in FIG. 3 .
  • the 3D human pose estimation model 302 is provided with multiple nodes, and the nodes are connected to each other.
  • the nodes include multiple activation (Relu) functions, multiple convolution (Conv) functions, and multiple addition (Add) functions. Set the corresponding parameters for each node.
  • the three-dimensional human pose estimation model 302 is only an exemplary drawing of a convolution (Conv) function, and may also include other nodes. The situation is set, and there is no limitation here.
  • the second output result 206 in FIG. 2 includes a 3D human body pose key point feature map.
  • the 3D human body posture key point feature map is a heat map of 3 ⁇ 19 channels, wherein 3 ⁇ 18 channels correspond to 18 key points of the human body, and 3 ⁇ 1 channel corresponds to the background image.
  • Step S103 according to the 2D human body pose key point feature map, the 2D human body joint connection feature map, and the 3D human body pose key point feature map, determine each human body in the 2D human body pose key point feature map The two-dimensional position of the key point of the human body, and the three-dimensional position of the key point of the human body.
  • the two-dimensional position and three-dimensional position detection of the key points of the human body can be realized at the same time, reducing the time consumption.
  • step S103 includes:
  • Step S1031 according to the 2D human body pose key point feature map and the 2D human body joint connection feature map, determine the two-dimensional position of each human body key point in the 2D human body pose key point feature map.
  • the image of the input end-to-end human body pose estimation model may include multiple human body images.
  • the human body joint connection feature map matches the key nodes of each human body.
  • step S1031 includes:
  • Step S10311 determining a plurality of human body key points according to the two-dimensional human body posture key point feature map
  • Step S10312 determining a plurality of joint connection relationships according to the two-dimensional human joint connection feature map
  • Step S10313 matching the plurality of human body key points and the plurality of joint connection relationships, and determining the two-dimensional position of the human body key points of each human body in the two-dimensional human body posture key point feature map.
  • the two-dimensional human body pose key point feature map is a feature map of 19 channels, wherein the 19 channels include 18 key point channels and 1 background image channel.
  • the positions of the peaks of each channel in the two-dimensional human body posture key point feature map correspond to the human body key points.
  • the joint connection feature map in FIG. 6 includes 18 key points of the human body and the connection relationship between adjacent key points.
  • the 18 key points are numbered from 0 to 17 respectively.
  • two adjacent key points can represent two joint connections, for example, for key point 2 and key point 3 in Figure 6, the joint extending from key point 3 to key point 2 A connection is a different articulation than an articulation that extends from keypoint 2 to keypoint 3.
  • multiple key points in the two-dimensional human body posture key point feature map can be matched, and all key points belonging to the same human body are matched. Based on all key points of the same human body, determine The two-dimensional positions of human key points of each human body in the two-dimensional human body posture key point feature map.
  • the key points of the human body in the two-dimensional human body posture key point feature map can be quickly divided, the key points of the human body belonging to the same human body can be identified, and the two-dimensional position of the key points of the human body can be obtained.
  • Step S1032 matching the three-dimensional positions of the key points of each human body from the three-dimensional key point feature map of human body poses according to the two-dimensional positions of the key points of the human bodies.
  • the two-dimensional position of the key points of each human body can be determined from the two-dimensional human body posture key point feature map.
  • Position determine the three-dimensional position of the corresponding key point from the three-dimensional human body posture key point feature map.
  • the key point of the two-dimensional human body pose key point feature map is the left eye
  • the pixel coordinates of the left eye in the two-dimensional human body pose key point feature map are (3, 3).
  • Obtain the corresponding three-dimensional channel data (x, y, z) at the position of the pixel coordinates (3, 3) of the key point feature map of the three-dimensional human body posture and use the three-dimensional channel data (x, y, z) as the three-dimensional position of the key point of the human body .
  • step S1032 includes:
  • the three-channel data corresponding to the target position is acquired in the three-dimensional human body posture key point feature map, and the three-channel data is used as the three-dimensional position of the human body key point of each human body.
  • the key point 1 in the first marking area 601 can be used as the center point of the human body, and the midpoint of the left hip joint 8 and the right hip joint 11 in the second marking area 602 can also be used as the center point of the human body.
  • each 3 channels represent the 3D coordinates of a key point, then read the key point located at the pixel coordinates (x, y) in the 3D human body pose key point feature map 1 three-channel data, the three-dimensional coordinates of key point 1 can be obtained.
  • the position of the corresponding human body key point in the three-dimensional human body posture key point feature map can be determined, and the corresponding human body key points in the three-dimensional human body posture key point feature map can be read.
  • the three-channel data of the position of the human body determines the three-dimensional position of the key point of the human body, solves the matching problem between the two-dimensional position of the key point of the human body and the three-dimensional position of the key point of the human body, and obtains the two-dimensional position of the key point of the human body and the three-dimensional position of the key point of the human body at the same time, reducing the time consumption .
  • the human body pose estimation method also includes:
  • the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map are obtained by downsampling the two-dimensional human body pose estimation model according to a preset multiple.
  • the preset multiple is determined according to the accuracy of the data and the calculation amount of the data, and needs to achieve the purpose of meeting the accuracy requirement and shortening the calculation time.
  • the preset multiple can be 4 times. For example, if the size of the input image is 512 ⁇ 512, the size of the two-dimensional human body posture key point feature map and the two-dimensional human joint connection feature map is 128 ⁇ 128.
  • the method also includes:
  • the three-dimensional human body pose estimation model can be reduced by reasoning without reasoning, and can be converted to two-dimensional
  • the output layer of the human body pose estimation model sends the two-dimensional position extraction instruction of the key points of the human body, and the two-dimensional human body pose estimation model performs the inference process of the two-dimensional position of the key points of the human body, and obtains the key point feature map of the two-dimensional human body pose and the joint connection characteristics of the two-dimensional human body Figure, according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map to obtain the two-dimensional position of the human body key point, prohibit the reasoning process of the three-dimensional human body pose estimation model, and reduce the reasoning time of the three-dimensional position of the human body key point.
  • step S102 the inputting the first feature data and the second feature data to the 3D human pose estimation model includes:
  • the combined result is input to the three-dimensional human pose estimation model.
  • the obtained merged result will add channel data, for example, the first characteristic data of 3 ⁇ 19 channels and the second characteristic data of 6 ⁇ 19 channels.
  • the feature data is merged by the concat function, it becomes the feature data of 9 ⁇ 19 channels.
  • the human body pose estimation method extracts the first feature data of the image, inputs the first feature data to the two-dimensional human body pose estimation model, and outputs the two-dimensional human body pose key point features through the two-dimensional human body pose estimation model Figure and two-dimensional human body joint connection feature map, and second feature data; input the first feature data and the second feature data to the three-dimensional human body pose estimation model, and output the three-dimensional human body pose key through the three-dimensional human body pose estimation model Point feature map; according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map, determine each human body in the two-dimensional human body posture key point feature map The two-dimensional position of the key point of the human body, and the three-dimensional position of the key point of the human body of each human body. In this way, through the end-to-end human body pose estimation model, the two-dimensional position and three-dimensional position detection of the key points of the human body can be realized at the same time, reducing the two-dimensional
  • An embodiment of the present disclosure provides a training method for a human pose estimation model.
  • the human body posture estimation model includes a two-dimensional human body posture estimation model to be trained and a three-dimensional human body posture estimation model to be trained, and the human body posture estimation method includes:
  • Controlling and freezing the 3D human body pose estimation model to be trained inputting 2D human body key point data into the 2D human body pose estimation model to be trained for training to obtain a trained 2D human body pose estimation model, wherein the two The three-dimensional key point data of the human body includes image data marked with the two-dimensional position information of the key points of the human body;
  • Controlling and freezing the 2D human body pose estimation model to be trained inputting 3D human body key point data to the 3D human body pose estimation model to be trained for training to obtain a trained 3D human body pose estimation model, the 3D human body key point data Including image data marked with three-dimensional position information of key points of the human body.
  • an end-to-end human body pose estimation model is constructed.
  • the human body pose estimation model includes a two-dimensional human body pose estimation model to be trained and a three-dimensional human body pose estimation model to be trained; the two-dimensional human body pose estimation model to be trained can be set to multiple
  • the nodes are connected to each other, and the nodes include multiple activation (Relu) functions, multiple convolution (Conv) functions, and multiple add operation (Add) functions, and each node sets corresponding parameters.
  • the parameters set at each node are adjusted to optimize the 2D human body pose estimation model.
  • the node connection relationship and parameter settings of the 2D human pose estimation model to be trained can be set according to the actual situation, and there is no limitation here.
  • the three-dimensional human pose estimation model to be trained is set with multiple nodes, and the nodes are connected to each other.
  • the nodes include multiple activation (Relu) functions, multiple convolution (Conv) functions, and multiple addition (Add) functions.
  • the node sets the corresponding parameters.
  • the parameters set at each node are adjusted to optimize the 2D human body pose estimation model. It should be noted that the node connections and parameter settings of the 3D human body pose estimation model to be trained can be set according to actual conditions, and are not limited here.
  • the network parameters of the three-dimensional human body pose estimation model to be trained are frozen, and the three-dimensional human body pose estimation model to be trained does not perform reasoning learning;
  • the network parameters of the 2D human body pose estimation model to be trained are frozen, and the 2D human body pose estimation model to be trained does not perform inference learning.
  • the training method of the human body pose estimation model provided in this embodiment can independently train the two-dimensional human body pose estimation model to be trained and the three-dimensional human body pose estimation model to be trained in the human body pose estimation model, thereby obtaining end-to-end human body pose
  • the estimation model through the end-to-end human body pose estimation model, can realize the detection of the two-dimensional position and the three-dimensional position of the key points of the human body at the same time, reducing the time consumption.
  • an embodiment of the present disclosure provides a human body pose estimation device.
  • the human body pose estimation device 700 includes:
  • the first processing module 701 is used to extract the first characteristic data of the image, input the first characteristic data to the two-dimensional human body posture estimation model, and output the two-dimensional human body posture key point feature map and the two-dimensional human body posture estimation model through the two-dimensional human body posture estimation model Two-dimensional human joint connection feature map, and second feature data;
  • the second processing module 702 is configured to input the first feature data and the second feature data to the 3D human body pose estimation model, and output a 3D human body pose key point feature map through the 3D human body pose estimation model;
  • Determining module 703 configured to determine the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map The two-dimensional positions of key points of each human body, and the three-dimensional positions of key points of each human body.
  • the determination module 703 is further configured to determine the key points of each human body in the two-dimensional human body posture key point feature map according to the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map. The two-dimensional position of the key points of the human body;
  • the determining module 703 is further configured to determine a plurality of human body key points according to the two-dimensional human body posture key point feature map;
  • the determination module 703 is further configured to acquire the same target position as the two-dimensional position of the human body key points of each human body from the three-dimensional human body posture key point feature map;
  • the three-channel data corresponding to the target position is acquired in the three-dimensional human body posture key point feature map, and the three-channel data is used as the three-dimensional position of the human body key point of each human body.
  • the first processing module 701 is further configured to obtain the two-dimensional human body posture key point feature map and the two-dimensional human body joint connection feature map by downsampling the two-dimensional human body pose estimation model according to a preset multiple.
  • the determining module 703 is further configured to determine the two-dimensional human body key point feature map and the two-dimensional human body joint connection feature map according to the two-dimensional human body key point feature map and the two-dimensional human body joint connection feature map when receiving the instruction for extracting the two-dimensional position of the key point of the human body.
  • the second processing module 603 is configured to combine the first feature data and the second feature data to obtain a combined result
  • the combined result is input to the three-dimensional human pose estimation model.
  • the human body pose estimation apparatus 700 provided in this embodiment can implement the human body pose estimation method provided in Embodiment 1, and to avoid repetition, details are not repeated here.
  • the human body pose estimation method extracts the first feature data of the image, inputs the first feature data to the two-dimensional human body pose estimation model, and outputs the two-dimensional human body pose key point features through the two-dimensional human body pose estimation model Figure and two-dimensional human body joint connection feature map, and second feature data; input the first feature data and the second feature data to the three-dimensional human body pose estimation model, and output the three-dimensional human body pose key through the three-dimensional human body pose estimation model Point feature map; according to the two-dimensional human body posture key point feature map, the two-dimensional human body joint connection feature map and the three-dimensional human body posture key point feature map, determine each human body in the two-dimensional human body posture key point feature map The two-dimensional position of the key point of the human body, and the three-dimensional position of the key point of the human body of each human body. In this way, through the end-to-end human body pose estimation model, the two-dimensional position and three-dimensional position detection of the key points of the human body can be realized at the same time, reducing the two-dimensional
  • an embodiment of the present disclosure provides a training device for a human pose estimation model.
  • the human body pose estimation model includes a two-dimensional human body pose estimation model to be trained and a three-dimensional human body pose estimation model to be trained, and the device includes:
  • the first control module is used to control and freeze the three-dimensional human body pose estimation model to be trained, and input two-dimensional human body key point data to the two-dimensional human body pose estimation model to be trained for training to obtain a trained two-dimensional human body pose estimation A model, wherein the two-dimensional key point data of the human body includes image data marked with two-dimensional position information of the key points of the human body;
  • the second control module is used to control and freeze the 2D human body pose estimation model to be trained, and input 3D human body key point data to the 3D human body pose estimation model to be trained for training to obtain a trained 3D human body pose estimation model,
  • the 3D key point data of the human body includes image data marked with 3D position information of the key points of the human body.
  • the human body pose estimation model training device provided in this embodiment can realize the human body pose estimation model training method provided in Embodiment 2, and to avoid repetition, details are not repeated here.
  • the human body pose estimation model training device provided in this embodiment can independently train the two-dimensional human body pose estimation model to be trained and the three-dimensional human body pose estimation model to be trained in the human body pose estimation model, thereby obtaining end-to-end human body pose
  • the estimation model through the end-to-end human body pose estimation model, can realize the detection of the two-dimensional position and the three-dimensional position of the key points of the human body at the same time, reducing the time consumption.
  • an embodiment of the present disclosure provides a terminal device, including a memory and a processor, the memory stores a computer program, and when the computer program runs on the processor, it executes the human body posture provided by the above-mentioned method embodiment 1 Estimation method, or the training method of the human pose estimation model provided in Embodiment 2.
  • the terminal device provided in this embodiment can implement the human body pose estimation method provided in Embodiment 1, or the human body pose estimation model training method provided in Embodiment 2. To avoid repetition, details are not repeated here.
  • the present application also provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, it executes the human body pose estimation method provided in Embodiment 2, or the human body pose estimation model provided in Embodiment 2 training method.
  • the computer-readable storage medium provided in this embodiment can implement the human body pose estimation method provided in Embodiment 1, or the human body pose estimation model training method provided in Embodiment 2. To avoid repetition, details are not repeated here.
  • the computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk or an optical disk, and the like.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in various embodiments of the present application.
  • a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种人体姿态估计方法、装置及终端设备,其中方法包括:提取图像的第一特征数据,向二维人体姿态估计模型输入第一特征数据,通过二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;向三维人体姿态估计模型输入第一特征数据及第二特征数据,通过三维人体姿态估计模型输出三维人体姿态关键点特征图;根据二维人体姿态关键点特征图、二维人体关节连接特征图及三维人体姿态关键点特征图,确定二维人体姿态关键点特征图中各人体的人体关键点二维位置、及各人体的人体关键点三维位置。这样,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。

Description

人体姿态估计方法、装置及终端设备
相关申请的交叉引用
本申请要求于2021年06月11日提交中国专利局的申请号为2021106559777、名称为“人体姿态估计方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人机交互技术领域,尤其涉及一种人体姿态估计方法、装置及终端设备。
背景技术
随着智能控制和人机交互技术逐渐推广应用,人类与机器的交互日渐频繁。利用机器来分析人们的情绪、行为等的需求越来越迫切。行为识别可以包括对教育场景中的课堂举手检测、瞌睡检测、站立检测、走神检测等行为进行识别,也可以包括对安防领域的打架斗殴、溺水、呼救等行为进行识别。
在对人进行行为识别过程中,对人的姿态动作分析具有格外重要的意义。单纯的人体检测无法对人体姿态做出详细分析,因此获取人体骨骼的运动状态十分必要。当前人体姿态估计可分为多人人体姿态估计和单人人体姿态估计,多人人体姿态估计或单人人体姿态估计又分别包括2D人体姿态估计和3D人体姿态估计的检测。
现有技术中单人人体姿态估计是通过在图像中使用人体检测器检测出每个人体的位置,再对每个人体的检测矩形区域进行关键点的定位。
现有技术中,2D人体姿态估计大致可分为两个思路,一是自顶向下(top-down)方案,典型的是AlphaPose,首先利用人体检测器检测出图像中的所有人体,再对每个人体进行单人人体姿态估计,top-down方案对人体检测的精度有较高要求,人体检测的准确度很大程度影响关键点的准确度,而且人数越多,总体时间开销会越大。另一种是自低向上(bottom-up)方案,典型的是Openpose,首先从全图中检测所有人的所有关键点的位置,然后将点分配到每个人,该方案的优点是图像中的人数不会影响推理速度,但是精度略低于top-down方案。多人3D人体姿态估计是从图像中预测出相机系下的坐标位置或者以自身某个关键点为零点,计算其他关键点相对于零点的空间相关的位置。 现有技术提供的方案智能单独实现2D人体姿态估计或3D人体姿态估计,无法同时实现2D人体姿态估计及3D人体姿态估计,存在人体姿态估计的时间消耗比较大的问题。
申请内容
为了解决上述技术问题,本申请实施例提供了一种人体姿态估计方法、人体姿态估计模型的训练方法、装置、终端设备及计算机可读存储介质。
第一方面,本申请实施例提供了一种人体姿态估计方法,所述方法包括:
提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;
根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。
第二方面,本申请实施例提供了一种人体姿态估计模型的训练方法,所述人体姿态估计模型包括待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型,所述方法包括:
控制冻结所述待训练的三维人体姿态估计模型,向所述待训练的二维人体姿态估计模型输入二维人体关键点数据进行训练得到训练好的二维人体姿态估计模型,其中,所述二维人体关键点数据包括标注人体关键点二维位置信息的图像数据;
控制冻结所述待训练的二维人体姿态估计模型,向所述待训练的三维人体姿态估计模型输入三维人体关键点数据进行训练得到训练好的三维人体姿态估计模型,所述三维人体关键点数据包括标注人体关键点三维位置信息的图像数据。
第三方面,本申请实施例提供了一种人体姿态估计装置,所述人体姿态估计装置包括:
第一处理模块,用于提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
第二处理模块,用于向三维人体姿态估计模型输入所述第一特征数据及所述第二特 征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;
确定模块,用于根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。
第四方面,本申请实施例提供了一种终端设备,包括存储器以及处理器,所述存储器用于存储计算机程序,所述计算机程序在所述处理器运行时执行第一方面提供的人体姿态估计方法、或第二方面提供的人体姿态估计模型的训练方法。
上述本申请提供的人体姿态估计方法,提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。这样,通过端到端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
附图说明
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对本申请保护范围的限定。在各个附图中,类似的构成部分采用类似的编号。
图1示出了本申请实施例提供的人体姿态估计方法的一流程示意图;
图2示出了本申请实施例提供的人体姿态估计模型的一结构示意图;
图3示出了本申请实施例提供的人体姿态估计模型的另一结构示意图;
图4示出了本申请实施例提供的人体姿态估计方法的步骤S103的流程示意图;
图5示出了本申请实施例提供的人体姿态估计方法的步骤S1031的流程示意图;
图6示出了本申请实施例提供的人体关节连接的一示意图;
图7示出了本申请实施例提供的人体姿态估计装置的一结构示意图。
具体实施方式
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地 描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
在下文中,可在本申请的各种实施例中使用的术语“包括”、“具有”及其同源词仅意在表示特定特征、数字、步骤、操作、元件、组件或前述项的组合,并且不应被理解为首先排除一个或更多个其它特征、数字、步骤、操作、元件、组件或前述项的组合的存在或增加一个或更多个特征、数字、步骤、操作、元件、组件或前述项的组合的可能性。
此外,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
除非另有限定,否则在这里使用的所有术语(包括技术术语和科学术语)具有与本申请的各种实施例所属领域普通技术人员通常理解的含义相同的含义。所述术语(诸如在一般使用的词典中限定的术语)将被解释为具有与在相关技术领域中的语境含义相同的含义并且将不被解释为具有理想化的含义或过于正式的含义,除非在本申请的各种实施例中被清楚地限定。
实施例1
本公开实施例提供了一种人体姿态估计方法。
具体的,参见图1,人体姿态估计方法包括:
步骤S101,提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
在本实施例中,构建端对端的人体姿态估计模型。参见图2,该人体姿态估计模型包括主干网络模型202、二维人体姿态估计模型203及三维人体姿态估计模型205。主干网络模型202又被称为主干(backbone)网络模型,可以为轻量级或重量级的深度神经网络模型,在此不做限制。主干网络模型202分别与二维人体姿态估计模型203及三维人体姿态估计模型205连接,二维人体姿态估计模型203及三维人体姿态估计 模型205连接。第一输出结果204包括二维人体姿态关键点特征图及二维人体关节连接特征图。请参见图2,可以提取图像201的第一特征数据,图像201中有多个人体,图像201是通过摄像头拍摄的图像,图像中包括多个人体图像,图像201中的人体图像仅作示例说明,在实际拍摄的图像也可以是其他形式,在此不做限制。
在本实施例中,二维人体姿态估计模型的具体结构可以参见图3中二维人体姿态估计模型301。二维人体姿态估计模型301中设置多个节点,节点和节点之间相互连接,节点包括多个激活(Relu)函数、多个卷积(Conv)函数及多个加操作(Add)函数。各节点设置对应的参数。需要说明的是,二维人体姿态估计模型301仅仅是做示例性说明,具体情况中,节点连接关系、参数设置可以不一样,在此不做限制。二维人体姿态关键点特征图及二维人体关节连接特征图可以分别为二维人体姿态关键点热图及二维人体关节连接热图。例如,在图3中,二维人体姿态估计模型301输出的heatmaps为二维人体姿态关键点热图,pafs为二维人体关节连接热图。
步骤S102,向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图。
在本实施例中,三维人体姿态估计模型的具体结构可以参见图3中三维人体姿态估计模型302。三维人体姿态估计模型302中设置多个节点,节点和节点之间相互连接,节点包括多个激活(Relu)函数、多个卷积(Conv)函数及多个加操作(Add)函数。各节点设置对应的参数。需要说明的是,三维人体姿态估计模型302仅仅是做示例性画出一个卷积(Conv)函数,还可以包括其他节点,具体情况中,节点数量、节点种类、节点连接关系、参数可以根据实际情况进行设置,在此不做限制。请再次参阅图2,图2中的第二输出结果206包括三维人体姿态关键点特征图。在本实施例中,三维人体姿态关键点特征图为3×19通道的热图,其中,3×18通道对应人体的18个关键点,3×1通道对应背景图。
步骤S103,根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。
这样,通过端到端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
可选的,请参阅图4,步骤S103包括:
步骤S1031,根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
在本实施例中,输入端对端的人体姿态估计模型的图像中可包括多个人体图像,为便于正确地对各人体的关键点进行划分,需要结合二维人体姿态关键点特征图及二维人体关节连接特征图对各人体的关键节点进行匹配。
可选的,请参阅图5,步骤S1031包括:
步骤S10311,根据所述二维人体姿态关键点特征图确定多个人体关键点;
步骤S10312,根据所述二维人体关节连接特征图确定过多个关节连接关系;
步骤S10313,将所述多个人体关键点及所述多个关节连接关系进行匹配,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
在本实施例中,二维人体姿态关键点特征图是19通道的特征图,其中,19通道中包括18个关键点的通道及1个背景图的通道。在二维人体姿态关键点特征图中的各通道的峰值所在位置对应为人体关键点。
请参见图6,在图6的关节连接特征图中包括人体的18个关键点及相邻关键点之间的连接关系。18个关键点分别从0到17进行编号。在图6中,相邻两个关键点之间可以表示两个关节连接,举例来说,对于图6中的关键点2及关键点3,从关键点3向关键点2方向的延伸的关节连接,与从关键点2向关键点3方向的延伸的关节连接是不同的关节连接。根据关节连接特征图中的关节连接关系,可以对二维人体姿态关键点特征图中的多个关键点进行匹配,将属于同一人体的所有关键点匹配出来,基于同一人体的所有关键点,确定二维人体姿态关键点特征图中各人体的人体关键点二维位置。
这样,可以快速地对二维人体姿态关键点特征图中人体关键点进行划分,明确属于同一人体的人体关键点,进而得到单个人体的人体关键点二维位置。
步骤S1032,根据所述各人体的人体关键点二维位置从所述三维人体姿态关键点特征图中匹配所述各人体的人体关键点三维位置。
在本实施例中,从二维人体姿态关键点特征图可以确定各人体的人体关键点二维位置,各关键点对应属于的人体,基于二维人体姿态关键点特征图中各关键点的坐标位置,从三维人体姿态关键点特征图中确定对应关键点的三维位置。比如说,二维人体姿态关键点特征图的关键点为左眼,左眼在二维人体姿态关键点特征图中的像素坐 标是(3,3)。在三维人体姿态关键点特征图的像素坐标(3,3)的位置上获取相应三维通道数据(x,y,z),将三维通道数据(x,y,z)作为人体关键点的三维位置。
可选的,步骤S1032包括:
从所述三维人体姿态关键点特征图中获取与所述各人体的人体关键点二维位置相同的目标位置;
在所述三维人体姿态关键点特征图中获取所述目标位置对应的三通道数据,将所述三通道数据作为所述各人体的人体关键点三维位置。
请再次参阅图6,可以将第一标记区域601中的关键点1作为人体中心点,也可以将第二标记区域602中的左髋关节8及右髋关节11的中点作为人体中心点。在图6中,以关键点1为例,若关键点1在二维人体姿态关键点特征图的像素坐标是(x,y),从三维人体姿态关键点特征图中获取与关键点1像素坐标是(x,y)相同的目标位置,即在三维人体姿态关键点特征图中获取像素坐标(x,y),在三维人体姿态关键点特征图的像素坐标(x,y)也为关键点1。由于三维人体姿态关键点特征图是3×19通道数据,每3个通道表示一个关键点的三维坐标,则在三维人体姿态关键点特征图中读取位于像素坐标(x,y)的关键点1的三通道数据,就可以获取关键点1的三维坐标。
这样,可以基于人体关键点在二维人体姿态关键点特征图的二维位置,确定在三维人体姿态关键点特征图相应人体关键点的位置,读取三维人体姿态关键点特征图相应人体关键点的位置的三通道数据,确定人体关键点的三维位置,解决人体关键点二维位置与人体关键点三维位置的匹配问题,同时获取人体关键点二维位置与人体关键点三维位置,减低时间开销。
补充说明的是,人体姿态估计方法还包括:
通过所述二维人体姿态估计模型按照预设倍数下采样得到所述二维人体姿态关键点特征图及所述二维人体关节连接特征图。
在本实施例中,预设倍数根据数据精确度及数据计算量进行确定,需要达到满足精度要求及计算耗时较短的目的。举例来说,预设倍数可以为4倍。例如,输入的图像的尺寸为512×512,则二维人体姿态关键点特征图及二维人体关节连接特征图尺寸为128×128。
进一步补充说明的是,所述方法还包括:
在接收到人体关键点二维位置提取指令时,根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
在本实施例中,为保证端对端的人体姿态估计模型的灵活性,当只需人体关键点的二维位置时,三维人体姿态估计模型可以不经推理,减少时间开销,可以通过向二维人体姿态估计模型的输出层发送人体关键点二维位置提取指令,二维人体姿态估计模型进行人体关键点二维位置的推理过程,得到二维人体姿态关键点特征图及二维人体关节连接特征图,根据二维人体姿态关键点特征图及二维人体关节连接特征图获取人体关键点二维位置,禁止三维人体姿态估计模型的推理过程,减少人体关键点的三维位置的推理时间。
进一步补充说明的是,步骤S102中,所述向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,包括:
将所述第一特征数据及所述第二特征数据进行合并,得到合并结果;
向所述三维人体姿态估计模型输入所述合并结果。
在本实施例中,通过concat函数将第一特征数据第二特征数据进行合并后,得到的合并结果会增加通道数据,例如3×19通道的第一特征数据及6×19个通道的第二特征数据经过concat函数合并后,变为9×19个通道的特征数据。
本实施例提供的人体姿态估计方法,提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。这样,通过端到端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
实施例2
本公开实施例提供了一种人体姿态估计模型的训练方法。
具体的,所述人体姿态估计模型包括待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型,人体姿态估计方法包括:
控制冻结所述待训练的三维人体姿态估计模型,向所述待训练的二维人体姿态估 计模型输入二维人体关键点数据进行训练得到训练好的二维人体姿态估计模型,其中,所述二维人体关键点数据包括标注人体关键点二维位置信息的图像数据;
控制冻结所述待训练的二维人体姿态估计模型,向所述待训练的三维人体姿态估计模型输入三维人体关键点数据进行训练得到训练好的三维人体姿态估计模型,所述三维人体关键点数据包括标注人体关键点三维位置信息的图像数据。
本实施例中,构建端对端的人体姿态估计模型,人体姿态估计模型包括待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型;待训练的二维人体姿态估计模型可以设置多个节点,节点和节点之间相互连接,节点包括多个激活(Relu)函数、多个卷积(Conv)函数及多个加操作(Add)函数,各节点设置对应的参数。对待训练的二维人体姿态估计模型进行训练的过程中,对各节点设置的参数进行调整,优化二维人体姿态估计模型。需要说明的是,待训练的二维人体姿态估计模型的节点连接关系、参数设置可以根据实际情况进行设置,在此不做限制。
待训练的三维人体姿态估计模型设置多个节点,节点和节点之间相互连接,节点包括多个激活(Relu)函数、多个卷积(Conv)函数及多个加操作(Add)函数,各节点设置对应的参数。对待训练的三维人体姿态估计模型进行训练的过程中,对各节点设置的参数进行调整,优化二维人体姿态估计模型。需要说明的是,待训练的三维人体姿态估计模型的节点连接关系、参数设置可以根据实际情况进行设置,在此不做限制。
本实施例中,使用二维人体关键点数据训练待训练的二维人体姿态估计模型时,冻结待训练的三维人体姿态估计模型的网络参数,待训练的三维人体姿态估计模型不进行推理学习;使用三维人体关键点数据训练待训练的三维人体姿态估计模型时,冻结待训练的二维人体姿态估计模型的网络参数,待训练的二维人体姿态估计模型不进行推理学习。
本实施例提供的人体姿态估计模型的训练方法,能够单独对人体姿态估计模型中的待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型进行训练,从而获取端对端的人体姿态估计模型,通过端对端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
实施例3
此外,本公开实施例提供了一种人体姿态估计装置。
具体的,如图7所示,人体姿态估计装置700包括:
第一处理模块701,用于提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
第二处理模块702,用于向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;
确定模块703,用于根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。
可选的,所述确定模块703,还用于根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置;
根据所述各人体的人体关键点二维位置从所述三维人体姿态关键点特征图中匹配所述各人体的人体关键点三维位置。
可选的,所述确定模块703,还用于根据所述二维人体姿态关键点特征图确定多个人体关键点;
根据所述二维人体关节连接特征图确定多个关节连接关系;
将所述多个人体关键点及所述多个关节连接关系进行匹配,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
可选的,所述确定模块703,还用于从所述三维人体姿态关键点特征图中获取与所述各人体的人体关键点二维位置相同的目标位置;
在所述三维人体姿态关键点特征图中获取所述目标位置对应的三通道数据,将所述三通道数据作为所述各人体的人体关键点三维位置。
可选的,第一处理模块701,还用于通过所述二维人体姿态估计模型按照预设倍数下采样得到所述二维人体姿态关键点特征图及所述二维人体关节连接特征图。
可选的,确定模块703,还用于在接收到人体关键点二维位置提取指令时,根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
可选的,第二处理模块603,用于将所述第一特征数据及所述第二特征数据进行合并,得到合并结果;
向所述三维人体姿态估计模型输入所述合并结果。
本实施例中提供的人体姿态估计装置700可以实现实施例1中提供的人体姿态估计方法,为避免重复,在此不再赘述。
本实施例提供的人体姿态估计方法,提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。这样,通过端到端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
实施例4
此外,本公开实施例提供了一种人体姿态估计模型的训练装置。
具体,所述人体姿态估计模型包括待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型,所述装置包括:
第一控制模块,用于控制冻结所述待训练的三维人体姿态估计模型,向所述待训练的二维人体姿态估计模型输入二维人体关键点数据进行训练得到训练好的二维人体姿态估计模型,其中,所述二维人体关键点数据包括标注人体关键点二维位置信息的图像数据;
第二控制模块,用于控制冻结所述待训练的二维人体姿态估计模型,向所述待训练的三维人体姿态估计模型输入三维人体关键点数据进行训练得到训练好的三维人体姿态估计模型,所述三维人体关键点数据包括标注人体关键点三维位置信息的图像数据。
本实施例中提供的人体姿态估计模型的训练装置可以实现实施例2中提供的人体姿态估计模型的训练方法,为避免重复,在此不再赘述。
本实施例提供的人体姿态估计模型的训练装置,能够单独对人体姿态估计模型中的待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型进行训练,从而获取端对端的人体姿态估计模型,通过端对端的人体姿态估计模型,可以同时实现人体关键点二维位置及三维位置检测,减低时间开销。
实施例5
此外,本公开实施例提供了一种终端设备,包括存储器以及处理器,所述存储器存储有计算机程序,所述计算机程序在所述处理器上运行时执行上述方法实施例1所提供的人体姿态估计方法、或实施例2提供的人体姿态估计模型的训练方法。
本实施例中提供的终端设备可以实现实施例1中提供的人体姿态估计方法、或实施例2提供的人体姿态估计模型的训练方法,为避免重复,在此不再赘述。
实施例6
本申请还提供一种计算机可读存储介质,其存储有计算机程序,所述计算机程序在处理器上运行时执行实施例2所提供的人体姿态估计方法、或实施例2提供的人体姿态估计模型的训练方法。
本实施例中提供的计算机可读存储介质可以实现实施例1中提供的人体姿态估计方法、或实施例2提供的人体姿态估计模型的训练方法,为避免重复,在此不再赘述。
在本实施例中,计算机可读存储介质可以为只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者终端中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (10)

  1. 一种人体姿态估计方法,其特征在于,所述方法包括:
    提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
    向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;
    根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置、及所述各人体的人体关键点三维位置,包括:
    根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置;
    根据所述各人体的人体关键点二维位置从所述三维人体姿态关键点特征图中匹配所述各人体的人体关键点三维位置。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置,包括:
    根据所述二维人体姿态关键点特征图确定多个人体关键点;
    根据所述二维人体关节连接特征图确定多个关节连接关系;
    将所述多个人体关键点及所述多个关节连接关系进行匹配,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述各人体的人体关键点二维位置从所述三维人体姿态关键点特征图中匹配所述各人体的人体关键点三维位置,还包括:
    从所述三维人体姿态关键点特征图中获取与所述各人体的人体关键点二维位置相同的目标位置;
    在所述三维人体姿态关键点特征图中获取所述目标位置对应的三通道数据,将所述三通道数据作为所述各人体的人体关键点三维位置。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    通过所述二维人体姿态估计模型按照预设倍数下采样得到所述二维人体姿态关键点特征图及所述二维人体关节连接特征图。
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在接收到人体关键点二维位置提取指令时,根据所述二维人体姿态关键点特征图及所述二维人体关节连接特征图,确定所述二维人体姿态关键点特征图中各人体的人体关键点二维位置。
  7. 根据权利要求1所述的方法,其特征在于,所述向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,包括:
    将所述第一特征数据及所述第二特征数据进行合并,得到合并结果;
    向所述三维人体姿态估计模型输入所述合并结果。
  8. 一种人体姿态估计模型的训练方法,其特征在于,所述人体姿态估计模型包括待训练的二维人体姿态估计模型及待训练的三维人体姿态估计模型,所述方法包括:
    控制冻结所述待训练的三维人体姿态估计模型,向所述待训练的二维人体姿态估计模型输入二维人体关键点数据进行训练得到训练好的二维人体姿态估计模型,其中,所述二维人体关键点数据包括标注人体关键点二维位置信息的图像数据;
    控制冻结所述待训练的二维人体姿态估计模型,向所述待训练的三维人体姿态估计模型输入三维人体关键点数据进行训练得到训练好的三维人体姿态估计模型,所述三维人体关键点数据包括标注人体关键点三维位置信息的图像数据。
  9. 一种人体姿态估计装置,其特征在于,所述装置包括:
    第一处理模块,用于提取图像的第一特征数据,向二维人体姿态估计模型输入所述第一特征数据,通过所述二维人体姿态估计模型输出二维人体姿态关键点特征图及二维人体关节连接特征图、及第二特征数据;
    第二处理模块,用于向三维人体姿态估计模型输入所述第一特征数据及所述第二特征数据,通过所述三维人体姿态估计模型输出三维人体姿态关键点特征图;
    确定模块,用于根据所述二维人体姿态关键点特征图、所述二维人体关节连接特征图及所述三维人体姿态关键点特征图,确定所述二维人体姿态关键点特征图中各人体 的人体关键点二维位置、及所述各人体的人体关键点三维位置。
  10. 一种终端设备,其特征在于,包括存储器以及处理器,所述存储器存储有计算机程序,所述计算机程序在所述处理器运行时执行权利要求1至7中任一项所述的人体姿态估计方法、或权利要求8所述的人体姿态估计模型的训练方法。
PCT/CN2021/134498 2021-06-11 2021-11-30 人体姿态估计方法、装置及终端设备 WO2022257378A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110655977.7A CN113298922B (zh) 2021-06-11 2021-06-11 人体姿态估计方法、装置及终端设备
CN202110655977.7 2021-06-11

Publications (1)

Publication Number Publication Date
WO2022257378A1 true WO2022257378A1 (zh) 2022-12-15

Family

ID=77328131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134498 WO2022257378A1 (zh) 2021-06-11 2021-11-30 人体姿态估计方法、装置及终端设备

Country Status (2)

Country Link
CN (1) CN113298922B (zh)
WO (1) WO2022257378A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298922B (zh) * 2021-06-11 2023-08-29 深圳市优必选科技股份有限公司 人体姿态估计方法、装置及终端设备
CN114724177B (zh) * 2022-03-08 2023-04-07 三峡大学 结合Alphapose和YOLOv5s模型的人体溺水检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130121526A1 (en) * 2011-11-11 2013-05-16 Microsoft Corporation Computing 3d shape parameters for face animation
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN110020633A (zh) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 姿态识别模型的训练方法、图像识别方法及装置
CN113298922A (zh) * 2021-06-11 2021-08-24 深圳市优必选科技股份有限公司 人体姿态估计方法、装置及终端设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836618B (zh) * 2021-01-28 2023-10-20 清华大学深圳国际研究生院 一种三维人体姿态估计方法及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130121526A1 (en) * 2011-11-11 2013-05-16 Microsoft Corporation Computing 3d shape parameters for face animation
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN110020633A (zh) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 姿态识别模型的训练方法、图像识别方法及装置
CN113298922A (zh) * 2021-06-11 2021-08-24 深圳市优必选科技股份有限公司 人体姿态估计方法、装置及终端设备

Also Published As

Publication number Publication date
CN113298922B (zh) 2023-08-29
CN113298922A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2021017606A1 (zh) 视频处理方法、装置、电子设备及存储介质
Lim et al. Isolated sign language recognition using convolutional neural network hand modelling and hand energy image
Papandreou et al. Towards accurate multi-person pose estimation in the wild
CN111291739B (zh) 面部检测、图像检测神经网络训练方法、装置和设备
Starner et al. Visual contextual awareness in wearable computing
WO2022257378A1 (zh) 人体姿态估计方法、装置及终端设备
Liu et al. End-to-end trajectory transportation mode classification using Bi-LSTM recurrent neural network
WO2022252274A1 (zh) 基于PointNet网络点云分割及虚拟环境生成方法和装置
CN110555481A (zh) 一种人像风格识别方法、装置和计算机可读存储介质
CN110163211B (zh) 一种图像识别方法、装置和存储介质
Chiranjeevi et al. Neutral face classification using personalized appearance models for fast and robust emotion detection
WO2019114726A1 (zh) 图像识别方法、装置、电子设备以及可读存储介质
WO2022142854A1 (zh) 一种人体姿态识别模型优化方法、装置和终端设备
CN111292262B (zh) 图像处理方法、装置、电子设备以及存储介质
CN111368751A (zh) 图像处理方法、装置、存储介质及电子设备
Lv et al. Application of face recognition method under deep learning algorithm in embedded systems
JP2022542566A (ja) オブジェクト追跡方法及び装置、記憶媒体並びにコンピュータプログラム
CN111523559A (zh) 一种基于多特征融合的异常行为检测方法
CN112906520A (zh) 一种基于姿态编码的动作识别方法及装置
JP2020071876A (ja) 人員捜索方法と装置及び画像処理装置
CN114764870A (zh) 对象定位模型处理、对象定位方法、装置及计算机设备
KR102617756B1 (ko) 속성 기반 실종자 추적 장치 및 방법
CN112655021A (zh) 图像处理方法、装置、电子设备和存储介质
CN114283461A (zh) 图像处理方法、装置、设备、存储介质及计算机程序产品
Botzheim et al. Growing neural gas for information extraction in gesture recognition and reproduction of robot partners

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944874

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE