CN115830642B

CN115830642B - 2D whole body human body key point labeling method and 3D human body grid labeling method

Info

Publication number: CN115830642B
Application number: CN202310104173.7A
Authority: CN
Inventors: 林靖; 曾爱玲; 李昱; 张磊
Original assignee: International Digital Economy Academy IDEA
Current assignee: International Digital Economy Academy IDEA
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2024-01-12
Anticipated expiration: 2043-02-13
Also published as: CN115830642A

Abstract

The application discloses a labeling method of 2D whole-body human body key points and a labeling method of 3D human body grids, which comprises the steps of obtaining whole-body human body key points, obtaining a face region graph according to human face key points in the whole-body human body key points, and adjusting the human face key points in the whole-body human body key points based on candidate human face key points determined by the face region graph; and acquiring a hand region graph in the input image according to the hand key points in the whole-body human key points, and adjusting the hand key points in the whole-body human key points based on the candidate hand key points determined by the hand region graph. According to the method and the device, the whole-body human body key points are determined according to the human body areas, the hand area diagram and the face area diagram which are determined based on the whole-body human body key points are used for adjusting the hand key points and the face key points, so that 2D whole-body human body key point label marking can be automatically carried out on any data set obtained, manual marking is not needed on the data set, and the labor cost required by 2D whole-body human body key point marking of the data set is reduced.

Description

2D whole body human body key point labeling method and 3D human body grid labeling method

Technical Field

The application relates to the technical field of machine vision, in particular to a 2D whole-body human body key point labeling method and a 3D human body grid labeling method.

Background

In recent years, human body grid surface reconstruction tasks are greatly developed and widely applied to tasks such as virtual equipment, virtual reloading, motion capturing and the like due to the proposal of parameterized models such as SMPL (surface model) 1, SMPL-X2 and the like. In the field of human body mesh surface reconstruction, there are two main types of data sets at present, one type is a motion capture data set, and the data sets are usually captured in a laboratory by using a plurality of calibrated cameras, so that accurate 3d key point coordinates can be provided, however, the data sets are captured in a specific environment, so that the appearance of pictures is very similar. Another type of dataset is an outdoor dataset, which is photographed in a daily environment and thus has various appearances, however, such datasets can only provide 2d key point coordinate tags by means of manual calibration, and cannot provide 3d key point tags due to factors such as depth, size ambiguity, etc.

There is thus a need for improvements and improvements in the art.

Disclosure of Invention

The technical problem to be solved by the application is to provide a 2D whole body human body key point marking method and a 3D human body grid marking method aiming at the defects of the prior art.

In order to solve the above technical problems, a first aspect of the embodiments of the present application provides a method for labeling key points of a 2D whole body, where the labeling method includes:

acquiring a human body region diagram of an input image, and acquiring key points of a whole body human body based on the human body region diagram;

acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points;

and acquiring a hand region graph in the input image according to the hand key points in the whole body human key points, determining candidate hand key points based on the hand region graph, and adjusting the hand key points in the whole body human key points by adopting the candidate hand key points.

The method for labeling the 2D whole-body human body key points, wherein the acquiring the human body region map of the input image and acquiring the whole-body human body key points based on the human body region map specifically comprises:

Inputting an input image into a whole-body human body detector, and determining a human body frame through the whole-body human body detector;

and acquiring a human body region map in the input image based on the human body frame.

The method for labeling the 2D whole-body human body key points, wherein the obtaining the whole-body human body key points based on the human body region map specifically comprises:

inputting the human body region map into a human body detector, and determining candidate human body key points through the human body detector;

and adopting the human body key points in the human body key points of the whole body.

The labeling method of the 2D whole body human body key points, wherein the obtaining the face region map in the input image according to the face key points in the whole body human body key points, determining candidate face key points based on the face region map, and adjusting the face key points in the whole body human body key points by adopting the candidate face key points includes:

determining a face frame according to the face key points in the whole body human key points, and intercepting a face area diagram in the input image according to the face frame;

inputting the face region diagram into a pre-trained transducer network, and determining candidate face key points of the face region diagram through the transducer network;

And adjusting the face key points in the whole-body human key points by adopting the candidate face key points.

The method for labeling 2D whole-body human body key points, wherein the selecting a hand region map in the input image according to hand key points in the whole-body human body key points specifically includes:

determining a first hand frame corresponding to the input image through a pre-trained hand detector, and determining a second hand frame based on hand key points in the whole-body human key points;

determining a target hand frame based on the first hand frame and the second hand frame;

and intercepting a hand area diagram in the input image based on the target hand frame.

The method for labeling 2D whole-body human body keypoints, wherein the determining candidate hand keypoints based on the hand region map specifically includes:

acquiring a left hand frame and a right hand frame corresponding to the hand region diagram, and calculating the overlapping area of the left hand frame and the right hand frame;

when the overlapping area is smaller than or equal to a preset area threshold value, determining candidate hand key points corresponding to the hand area map through a pre-trained hand detector;

and when the overlapping area is larger than a preset area threshold, taking the hand key points in the whole body human key points as candidate hand key points.

determining reference hand key points corresponding to the hand region map through a pre-trained hand detector;

calculating the distance between the central coordinates of the candidate hand key points and the wrist key point coordinates in the whole-body human body key points;

when the distance is larger than a preset distance threshold, the reference hand key points are used as candidate hand key points;

and when the distance is smaller than or equal to a preset distance threshold, taking the hand key points in the whole body human key points as candidate hand key points.

A second aspect of an embodiment of the present application provides a 3D human body mesh labeling method, where the method includes:

acquiring a target data set, wherein each target picture in the target data set is a 2D whole body human body key point label determined by adopting the marking method of the 2D whole body human body key points;

determining an initial 3D pseudo tag corresponding to the target data set based on a pre-trained human body grid estimation network;

and supervising the human body grid estimation network based on the target data set, the 2D whole body human body key point label and the initial 3D pseudo label to obtain a 3D pseudo label corresponding to the target data set.

The 3D human body mesh labeling method, wherein in the process of supervising the human body mesh estimation network, the method further comprises:

after the human body grid estimation network is supervised by adopting a preset number of target pictures; determining candidate 3D pseudo tags based on the supervised human mesh estimation network and the target data set;

and updating the initial 3D pseudo tag by adopting the candidate 3D pseudo tag.

The 3D human body grid labeling method comprises the following steps of:

；

wherein,representing a preset 3D body network, +.>Representing an initial 3D pseudo tag->2D prediction key points obtained by projecting human body grid surfaces corresponding to preset 3D human body networks are +.>Representing 2D whole body human body key point labels.

A third aspect of the embodiments of the present application provides a labeling system for 2D whole-body human body key points, where the labeling system includes:

the first acquisition module is used for acquiring a human body region graph of an input image and acquiring key points of a whole body human body based on the human body region graph;

the first updating module is used for acquiring a face area diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face area diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points;

And the second updating module is used for acquiring a hand region diagram in the input image according to the hand key points in the whole-body human body key points, determining candidate hand key points based on the hand region diagram, and adjusting the hand key points in the whole-body human body key points by adopting the candidate hand key points.

A fourth aspect of the present application provides a computer readable storage medium, where the computer readable storage medium stores one or more programs, where the one or more programs are executable by one or more processors to implement steps in a method for labeling 2D whole body human body key points as described above, and/or to implement steps in a method for labeling 3D human body mesh as described above.

A fifth aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus, the memory having stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements steps in a method for labeling 2D whole-body human body key points as described above, and/or implements steps in a method for labeling 3D human body grids as described above.

The beneficial effects are that: compared with the prior art, the application provides a 2D whole body human body key point labeling method and a 3D human body grid labeling method, wherein the method comprises the steps of acquiring a human body region graph of an input image and acquiring whole body human body key points based on the human body region graph; acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points; and acquiring a hand region graph in the input image according to the hand key points in the whole body human key points, determining candidate hand key points based on the hand region graph, and adjusting the hand key points in the whole body human key points by adopting the candidate hand key points. According to the method and the device, the whole-body human body key points are determined based on the human body region diagram of the input diagram, then the hand region diagram and the human face region diagram which are determined based on the whole-body human body key points are used for adjusting the hand key points and the human face key points, so that 2D whole-body human body key point label marking can be automatically carried out on any data set obtained, manual marking is not needed on the data set, and the labor cost required by 2D whole-body human body key point marking of the data set is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without creative effort for a person of ordinary skill in the art.

Fig. 1 is a flowchart of a method for labeling 2D whole-body human body key points provided in the present application.

Fig. 2 is a flowchart of a method for labeling a 3D human body mesh provided in the present application.

Fig. 3 is a flowchart illustrating an example of a method for labeling a 3D human body mesh provided in the present application.

Fig. 4 is a schematic structural diagram of a labeling system for key points of a 2D whole body provided in the present application.

Fig. 5 is a schematic structural diagram of a labeling system for a 3D human body mesh provided in the present application.

Fig. 6 is a schematic structural diagram of a terminal device provided in the present application.

Detailed Description

The application provides a method for labeling key points of a 2D whole body and a method for labeling 3D human body grids, and in order to make the purposes, technical schemes and effects of the application clearer and more definite, the application is further described in detail below by referring to the drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. Further, "connected" as used herein may include wireless connections. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It should be understood that the sequence number and the size of each step in this embodiment do not mean the sequence of execution, and the execution sequence of each process is determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Through research, in recent years, human body grid surface reconstruction tasks are greatly developed and widely applied to tasks such as virtual equipment, virtual reloading, motion capturing and the like due to the proposal of parameterized models such as SMPL (surface model) 1, SMPL-X2 and the like. Whole body mesh surface reconstruction aims at estimating body pose, hand pose and facial expression from monocular images. In the field of human body mesh surface reconstruction, there are two main types of data sets at present, one type is a motion capture data set, and the data sets are usually captured in a laboratory by using a plurality of calibrated cameras, so that accurate 3d key point coordinates can be provided, however, the data sets are captured in a specific environment, so that the appearance of pictures is very similar. Another type of data set is an outdoor data set, which is photographed in a daily environment and thus has various appearances, however, such data sets can only provide 2d key point coordinate tags by means of manual calibration, and cannot provide 3d key point tags due to factors such as depth and size ambiguity.

In order to solve the problem that the outdoor data set has only 2d key point labels and cannot provide 3d supervision, researchers propose a labeler based on a fitting technology, by fitting parameters of a parameterized model SMPL/SMPLX to 2d key point coordinates to generate 3d pseudo labels, and forming supervision signals based on the 3d pseudo labels and the 2d key point coordinates together so as to train a network. The existing annotators comprise a first stage and a second stage, wherein the annotators in the first stage, such as SMPLify-X3, directly output 3d pseudo tags under the condition of no initializing pseudo tags, however, SMPLify-X is based on an optimization technology, fits each sample respectively, the speed is low, and error results are easy to generate.

The two-stage annotators initialize a 3d pseudo tag by using the one-stage annotators, then supervise the annotators by using the initialized 3d pseudo tag and the 2d key point coordinates, and finally generate the optimized 3d pseudo tag. In the two-stage method, the same 3d pseudo tag is always used for supervision in the second stage, so that the fitting effect is seriously dependent on the quality of the first stage marker, and the 3d supervision cannot be continuously optimized along with the increase of iteration times, so that the final fitting effect is limited.

In addition, the current annotators all need manually calibrated 2d key point coordinate labels, so that a large amount of manpower is needed, and therefore the annotators can only be used for the current coco and other public data sets, but cannot be applied to the data sets which are collected by the annotators and do not contain the 2d key point labels, and the application scene is greatly limited.

In order to solve the above-mentioned problem, in the embodiment of the present application, a human body region map of an input image is acquired, and a whole-body human body key point is obtained based on the human body region map; acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points; and acquiring a hand region graph in the input image according to the hand key points in the whole body human key points, determining candidate hand key points based on the hand region graph, and adjusting the hand key points in the whole body human key points by adopting the candidate hand key points. According to the method and the device, the whole-body human body key points are determined based on the human body region diagram of the input diagram, then the hand region diagram and the human face region diagram which are determined based on the whole-body human body key points are used for adjusting the hand key points and the human face key points, so that 2D whole-body human body key point label marking can be automatically carried out on any data set obtained, manual marking is not needed on the data set, and the labor cost required by 2D whole-body human body key point marking of the data set is reduced.

The application will be further described by the description of embodiments with reference to the accompanying drawings.

The embodiment provides a 2D whole-body human body key point labeling method, as shown in fig. 1, comprising the following steps:

s10, acquiring a human body region diagram of an input image, and acquiring key points of a whole body human body based on the human body region diagram.

Specifically, the input image may be an image photographed in an indoor scene or an image photographed in an outdoor scene. In the present embodiment, the input image is an image captured in an outdoor scene. The input image may be obtained by direct photographing by a photographing device, may be transmitted by an external device, or may be obtained by obtaining through a network (for example, hundred degrees).

The input image is an image containing a person, the human body frame is a detection frame of the person contained in the input image, and the image content of the image area corresponding to the human body frame is the person contained in the input image. The human body frame can be obtained through a pre-trained whole body human body detector, namely, an input image is input into the pre-trained whole body human body detector, and the human body frame of the input image is determined through the whole body human body detector. Of course, in practical application, the human body frame may be obtained in other manners, for example, a conventional edge detection method is adopted.

In one implementation manner, the acquiring a human body region map of the input image, and acquiring the whole body key points based on the human body region map specifically includes:

Specifically, the human body region map is an image region corresponding to a human body frame in the input image, that is, the human body region map is a part of the input image, and the image content corresponding to the human body region map is the same as the image content contained in the human body frame. For example, a human body frame is marked in the input image, and then an image area contained in the human body frame is cut out to obtain a human body area map, or a human body area is directly selected in the input image based on four vertex coordinates of the human body frame, and the selected human body area is cut out to obtain the human body area map.

The whole-body human body key points comprise human body key points, hand key points and face key points, and each key point in the whole-body human body key points is a two-dimensional key point. The human body key points of the whole body are detected by a pre-trained human body detector, namely, the human body region map is input into the pre-trained human body detector, and the human body key points of the whole body corresponding to the human body region are identified by the human body detector. The whole body human detector may be based on a deep-learning neural network model, such as the vitpost-whitebody network, and the like.

In one implementation manner, the obtaining the whole-body human body key points based on the human body region map specifically includes:

and adopting the candidate human body key points to adjust human body key points in the whole-body human body key points.

Specifically, the human body key points are key points corresponding to body regions of a person other than the face and the hand contained in the input image, that is, the human body key points do not include the face key points and the hand key. The human Body key points can be detected by a pre-trained human Body detector, wherein the human Body detector can adopt a neural network model based on deep learning, such as a ViT Body-only network model and the like. The human body region map is identified by adopting the trained human body detector, and human body key points in the whole-body human body key points are adjusted by adopting the identified candidate human body key points, so that the accuracy of human body keys can be improved. The human body detector based on the human body key points obtained by taking the human body key points as the truth value training is adopted, and compared with the whole body human body detector obtained by taking the whole body human body key points as the truth value training, the accuracy of the human body key points detected by the human body detector is higher than that of the human body key points detected by the whole body human body detector, so that the accuracy of the human body key points in the whole body human body key can be improved by adjusting the human body key points in the whole body human body key points by adopting the human body key points detected by the key point detection model.

In addition, the step of adjusting the human body key points in the whole body human body key points by using the candidate human body key points may be to replace the human body key points in the whole body human body key points by using the candidate human body key points, or may be to adjust the human body key points in the whole body human body key points by weighting the candidate human body key points and the human body key points in the whole body human body key points, etc. In this embodiment, the human body keypoints in the whole body human body keypoints are adjusted by replacing the human body keypoints in the whole body human body keypoints with candidate human body keypoints.

S20, acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points.

Specifically, the face region map is an image region corresponding to a face region in the input image, and the face key points are key points corresponding to the face region, wherein the whole-body human key points comprise face key points, and the face key points can be identified based on the face region map, so that when candidate face key points are obtained, the face key points in the whole-body human key points can be adjusted by adopting the candidate face key points, and the accuracy of the face key points in the whole-body human key points is improved. The method for adjusting the face key points in the whole body human key points by using the candidate face key points may be the same as the method for adjusting the human joint points in the whole body human key points by using the candidate human key points, and will not be described in detail here.

In one implementation manner, the obtaining the face region map in the input image according to the face keypoints in the whole-body human body keypoints, determining candidate face keypoints based on the face region map, and adjusting the face keypoints in the whole-body human body keypoints by using the candidate face keypoints specifically includes:

Specifically, the face frame is a detection frame corresponding to the face region, where the face frame is determined based on face key points in key points of a whole body of the human body, for example, the face frame is a detection frame including the face key points, or is a detection frame determined based on a minimum abscissa, a minimum ordinate, a maximum abscissa, and a maximum ordinate among coordinates of each face key point of the face key points.

In one implementation, the process of determining the face frame according to the face key points in the whole-body human key points may be: acquiring face key points in whole-body human key points, and acquiring the minimum abscissa, the minimum ordinate, the maximum abscissa and the maximum ordinate in the coordinates of each face key point; and then determining candidate detection frames by taking the acquired minimum abscissa, minimum ordinate, maximum abscissa and maximum ordinate as vertexes, and finally amplifying the candidate detection frames by a preset multiple to obtain the face frames, wherein the preset multiple is preset, for example, 1.1,1.2 and the like. In a typical implementation, the preset multiple is 1.2, i.e., the candidate detection frame is enlarged 1.2 times to obtain the face frame.

Further, the face area map is obtained based on face frame interception, namely, the face frame is marked in the input image, and then the image area corresponding to the face frame is intercepted to obtain the face area map. The transducer network is a pre-trained network model for identifying face key points, candidate face key points in the face area diagram can be identified through the transducer network, and then the face key points in the whole-body human key points are adjusted by adopting the candidate face key points. Therefore, the coarse face key points in the whole-body human key points can be adjusted by adopting the accurate face key points identified based on the face area map, the accuracy of the face key points in the adjusted whole-body human key points can be improved, and the accuracy of the 2D whole-body human key points can be further improved.

S30, acquiring a hand region diagram in the input image according to the hand key points in the whole body human body key points, determining candidate hand key points based on the hand region diagram, and adjusting the hand key points in the whole body human body key points by adopting the candidate hand key points.

Specifically, the hand region map is an image region corresponding to a hand region in the input image, and the hand key points are key points corresponding to the hand region, wherein the whole body human body key points comprise hand key points, the hand key points can be identified based on the hand region map, and the accuracy of the hand key points identified based on the hand region map is greatly higher than that of the hand key points in the whole body human body key points, so that the hand key points in the whole body human body key points can be adjusted by adopting candidate hand key point fingers determined based on the hand region map, and the accuracy of the hand key points in the whole body human body key points can be improved. The method for adjusting the hand key points in the whole body human key points by using the candidate hand key points identified by the hand region graph can be the method for adjusting the human key points, and will not be described herein.

In one implementation, when the input image is identified, the human body frame is detected and simultaneously the hand frame can be synchronously detected, and the hand frame can be determined based on hand key points in key points of the whole body human body. Thus, in order to further improve the accuracy of the hand region map, a target hand frame for determining the hand region map may be determined based on the hand frame determined by the input image and the hand frame determined based on the hand keypoints, and then the hand region map may be determined based on the target hand frame, so that the accuracy of the hand region map may be improved, and further the accuracy of candidate hand keypoints for adjusting hand keypoints among the whole-body human body keypoints may be improved.

Based on this, the selecting a hand region map from the input image according to the hand key points in the whole-body human key points specifically includes:

and intercepting a hand area diagram in the input image based on the target hand frame. .

Specifically, the first hand frame is determined based on the input image, where the first hand frame may be obtained by detecting the input image by using a hand detector, or may be obtained synchronously when a human frame of the input image is obtained, that is, a human frame and a hand frame corresponding to the input image are detected synchronously by using a whole-body human detector, and the hand frame is used as the first hand frame. The second hand frame is determined based on hand key points in the key points of the whole body human body, wherein the second hand frame can be a detection frame containing the hand key points, or a detection frame determined based on the minimum abscissa, the minimum ordinate, the maximum abscissa and the maximum ordinate of the coordinates of all the hand key points of the hand key points, and the like. Further, it is worth noting that the first hand frame and the second hand frame each include a left hand frame and a right hand frame, and the determination process of the left hand frame and the right hand frame may be the same.

Further, after the first hand frame and the second hand frame are obtained, the first hand frame and the second hand frame are subjected to region-of-interest matching to obtain a target hand frame, and then a hand region diagram is cut out from an input image based on the target hand frame. The process of matching the region of interest of the first hand frame and the second hand frame may be: and calculating the overlapping area of the first hand frame and the second hand frame, sequencing the calculated overlapping areas to obtain the corresponding relation between the first hand frame and the second hand frame (namely, determining the corresponding relation between the left hand frame in the first hand frame and the left hand frame in the second hand frame and the corresponding relation between the right hand frame in the first hand frame and the right hand frame in the second hand frame), and finally determining the target hand frame based on the most corresponding relation. Of course, in practical applications, other ways of matching the first hand frame and the second hand frame may be adopted, for example, matching the first hand frame and the second hand frame based on the spatial transformation relationship to obtain a target hand frame, where matching according to the spatial transformation relationship refers to matching based on feature points in the first hand frame and feature points in the second hand frame.

In practical applications, because there may be left and right hand overlapping or hand blocked in the input image, which may result in a complete left hand region map and/or right hand region map that cannot be intercepted, determining hand keypoints by using the hand region map determined by the target hand frame to adjust hand keypoints in the whole body human body keypoints may result in inaccurate hand keypoints. Therefore, when the candidate hand key points are determined based on the hand region graph, whether the hand region graph meets the preset condition or not can be judged, and when the preset condition is met, the hand key points determined based on the hand region are taken as the candidate hand key points; and when the preset condition is not met, taking the hand key points in the key points of the whole body human body as candidate hand key points.

In one implementation, the determining candidate hand keypoints based on the hand region map specifically includes:

Specifically, the hand region map is determined by determining the target hand frame based on the first hand frame and the second hand frame, and the left hand frame and the right hand frame are left hand frame and right hand frame included in the target hand frame, wherein the overlapping area is the overlapping area of the left hand frame and the right hand frame. The preset area threshold is preset and is used for measuring whether the hand key points determined based on the hand area can be used as the basis of the candidate hand key points, wherein when the overlapping area is smaller than or equal to the preset area threshold, the hand key points determined based on the hand area map are used as the candidate hand key points, so that the accuracy of the hand key points can be improved, otherwise, when the overlapping area is larger than the preset area threshold, the fact that the accuracy of the hand key points determined based on the hand area map is low is indicated, and therefore the hand key points in the whole body human body key points are used as the candidate hand key points, and the accuracy of the hand key points is guaranteed.

Specifically, the center coordinates of the reference hand keypoints are coordinates of the middle-most keypoints among the reference hand keypoints, and the distance may be a euclidean distance between the center coordinates of the reference hand keypoints and the wrist keypoint coordinates. The preset distance threshold is preset and used for measuring whether the reference hand key points corresponding to the hand region diagram can be used as the basis of the candidate hand key points. As with the area threshold in the above implementation, details are repeated herein, and specific reference may be made to the description of the comparison process based on the preset area threshold.

In summary, the present embodiment provides a method for labeling 2D whole-body human body key points, which includes obtaining a human body region map of an input image, and obtaining whole-body human body key points based on the human body region map; acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points; and acquiring a hand region graph in the input image according to the hand key points in the whole body human key points, determining candidate hand key points based on the hand region graph, and adjusting the hand key points in the whole body human key points by adopting the candidate hand key points. According to the method and the device, the whole-body human body key points are determined according to the human body frame, then the corresponding hand region diagram and the corresponding face region diagram are acquired based on the whole-body human body key points, and the hand key points and the face key points are updated, so that 2D whole-body human body key point label marking can be automatically carried out on a data set acquired in any form, manual marking is not needed on the data set, and the labor cost required by 2D whole-body human body key point marking of the data set is reduced.

Based on the above 2D whole body human body key point labeling method, this embodiment also provides a 3D human body grid labeling method, as shown in fig. 2 and 3, where the method includes:

h10, acquiring a target data set;

h20, determining an initial 3D pseudo tag corresponding to the target data set based on a pre-trained human body grid estimation network;

and H30, supervising the human body grid estimation network based on the target data set, the 2D whole body human body key point label and the initial 3D pseudo label so as to obtain a 3D pseudo label corresponding to the target data set.

Specifically, the target data set includes a plurality of target pictures, and each of the plurality of target pictures determines a 2D whole body human body key point label by using the labeling method of the 2D whole body human body key points described in the above embodiment. In one implementation, the determination of the target data set may be: shooting a plurality of target pictures in an outdoor environment, and marking the 2D whole-body human body key points on each target picture by adopting the marking method of the 2D whole-body human body key points.

After the target data set is acquired, inputting each target image in the target data set into a pre-trained human body grid estimation network to generate an initial 3D pseudo tag. And when the initial 3D pseudo tag is acquired, monitoring the pre-trained human body grid estimation network by taking the 2D whole body human body key point mark and the initial 3D pseudo tag as true values, so as to finely adjust the human body grid estimation network, and fitting the human body grid estimation network to a target data set.

In one implementation, in the process of supervising the human mesh estimation network, the method further comprises:

and updating the initial 3D pseudo tag by adopting the candidate 3D pseudo tag.

Specifically, in the process of supervising the human body grid estimation network, in order to alleviate inaccuracy of the 3D pseudo tag corresponding to the target data set caused by inaccuracy of 2D whole body human body key point labeling, the initial 3D pseudo tag may be dynamically updated. After each preset number of target pictures are adopted to monitor the human body grid estimation network, each target picture in the target data set is input into the monitored human body grid estimation network again to obtain candidate 3D pseudo tags, and the candidate 3D pseudo tags are adopted to update initial 3D pseudo tags, so that the initial 3D pseudo tags can be updated once every preset number of target pictures, dependence on the labeling quality of key points of a 2D whole body human body is eliminated, and the labeling effect of the 3D pseudo tags is improved. According to the method for marking the 2D whole body human body key points, on one hand, the target pictures in the target data set are marked, the human resources required by 3D pseudo tag marking are reduced by reducing the human resources required by 2D whole body human body key point marking, on the other hand, the accuracy of the 3D pseudo tag is improved by improving the accuracy of the 2D whole body human body key point marking, and meanwhile, the accuracy of the 3D pseudo tag is further improved by dynamically updating the initial 3D pseudo tag.

In one implementation, the human mesh estimation network employs a supervision loss function of:

；

Based on the above 2D whole body human body key point labeling method, this embodiment provides a labeling system for 2D whole body human body key points, as shown in fig. 4, where the labeling system includes:

a first obtaining module 110, configured to obtain a human body region map of an input image, and obtain key points of a whole body human body based on the human body region map;

a first updating module 120, configured to obtain a face region map in the input image according to face keypoints in the whole-body human body keypoints, determine candidate face keypoints based on the face region map, and adjust the face keypoints in the whole-body human body keypoints by using the candidate face keypoints;

the second updating module 130 is configured to obtain a hand region map in the input image according to the hand keypoints in the whole-body human body keypoints, determine candidate hand keypoints based on the hand region map, and adjust the hand keypoints in the whole-body human body keypoints by using the candidate hand keypoints.

Based on the 3D human body mesh labeling method, the embodiment provides a 3D human body mesh labeling system, as shown in fig. 5, which includes:

a second obtaining unit 210, configured to obtain a target data set, where each target picture in the target data set is a 2D whole body human body key point label determined by using the method for labeling 2D whole body human body key points described above;

a second determining module 220, configured to determine an initial 3D pseudo tag corresponding to the target data set based on a pre-trained human mesh estimation network;

and a supervision module 230, configured to supervise the human body mesh estimation network based on the target data set, the 2D whole body human body key point label and the initial 3D pseudo label, so as to obtain a 3D pseudo label corresponding to the target data set.

Based on the above 2D whole body human body key point labeling method and/or the 3D human body grid labeling method, the present embodiment provides a computer readable storage medium, where one or more programs are stored in the computer readable storage medium, and the one or more programs may be executed by one or more processors, so as to implement the steps in the 2D whole body human body key point labeling method and/or the 3D human body grid labeling method described in the above embodiments.

Based on the above 2D whole body human body key point labeling method and/or the 3D human body mesh labeling method, the present application further provides a terminal device, as shown in fig. 6, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, which may also include a communication interface (Communications Interface) 23 and a bus 24. Wherein the processor 20, the display 21, the memory 22 and the communication interface 23 may communicate with each other via a bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may invoke logic instructions in the memory 22 to perform the methods of the embodiments described above.

Further, the logic instructions in the memory 22 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 22, as a computer readable storage medium, may be configured to store a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 performs functional applications and data processing, i.e. implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include high-speed random access memory, and may also include nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

In addition, the specific processes that the storage medium and the plurality of instruction processors in the terminal device load and execute are described in detail in the above method, and are not stated here.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. The method for marking the key points of the 2D whole body human body is characterized by comprising the following steps of:

acquiring a face region diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face region diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points, wherein the candidate face key points are determined based on a transducer network;

acquiring a hand region graph in the input image according to hand key points in the whole body human body key points, determining candidate hand key points based on the hand region graph, and adjusting the hand key points in the whole body human body key points by adopting the candidate hand key points, wherein the candidate hand key points are determined based on a hand detector;

the selecting a hand region map from the input image according to the hand key points in the whole body human key points specifically includes:

Determining a target hand frame based on the first hand frame and the second hand frame, wherein the determining process of the target hand frame is to calculate the overlapping area of the first hand frame and the second hand frame, sort the calculated overlapping areas to obtain the corresponding relation between the first hand frame and the second hand frame, and determine the target hand frame based on the corresponding relation;

intercepting a hand area diagram in the input image based on the target hand frame;

the determining candidate hand keypoints based on the hand region map specifically includes:

judging whether the hand area diagram meets preset conditions or not;

when the preset condition is met, taking the hand key points determined based on the hand region diagram as candidate hand key points;

and when the preset condition is not met, taking the hand key points in the key points of the whole body human body as candidate hand key points.

2. The method for labeling 2D whole-body human body keypoints according to claim 1, wherein the acquiring a human body region map of an input image and acquiring whole-body human body keypoints based on the human body region map specifically comprises:

3. The method for labeling 2D whole-body human body keypoints according to claim 1, wherein the obtaining the whole-body human body keypoints based on the human body region map specifically comprises:

4. The method for labeling 2D whole-body human keypoints according to claim 1, wherein the obtaining a face region map in the input image according to the face keypoints in the whole-body human keypoints, determining candidate face keypoints based on the face region map, and adjusting the face keypoints in the whole-body human keypoints with the candidate face keypoints comprises:

5. The method for labeling 2D whole-body human keypoints according to claim 1, wherein the determining candidate hand keypoints based on the hand region map specifically comprises:

6. The method for labeling 2D whole-body human keypoints according to claim 1, wherein the determining candidate hand keypoints based on the hand region map specifically comprises:

7. A method for labeling a 3D human mesh, the method comprising:

acquiring a target data set, wherein each target picture in the target data set adopts the marking method of the 2D whole-body human body key points as set forth in any one of claims 1-6 to determine 2D whole-body human body key point labels;

monitoring the human body grid estimation network based on the target data set, the 2D whole body human body key point label and the initial 3D pseudo label to obtain a 3D pseudo label corresponding to the target data set;

wherein, in the process of supervising the human body mesh estimation network, the method further comprises:

And updating the initial 3D pseudo tag by adopting the candidate 3D pseudo tag.

8. The 3D human mesh labeling method of claim 7, wherein the human mesh estimation network employs a supervision loss function of:

L＝L _smplx +λL _kpt2D ；

L _smplx ＝||S _E -S _GT || ₁ ；

L _kpt2D ＝||J _E -J _GT || ₁ ；

wherein λ represents a weight coefficient, L _smplx Representing three-dimensional loss terms, L _kpt2D Representing two-dimensional loss terms, S _E Representing a preset 3D human body network, S _GT Representing an initial 3D pseudo tag, J _E For 2D prediction key points obtained by projecting human body grid surfaces corresponding to preset 3D human body networks, J _GT Representing 2D whole body human body key point labels.

9. The utility model provides a 2D whole body human key point's marking system which characterized in that, marking system includes:

the first updating module is used for acquiring a face area diagram in the input image according to the face key points in the whole-body human key points, determining candidate face key points based on the face area diagram, and adjusting the face key points in the whole-body human key points by adopting the candidate face key points, wherein the candidate face key points are determined based on a transducer network;

The second updating module is used for acquiring a hand region diagram in the input image according to hand key points in the whole-body human body key points, determining candidate hand key points based on the hand region diagram, and adjusting the hand key points in the whole-body human body key points by adopting the candidate hand key points, wherein the candidate hand key points are determined based on a hand detector;

judging whether the hand area diagram meets preset conditions or not;

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs executable by one or more processors to implement steps in the method for labeling 2D whole-body human keypoints according to any one of claims 1-6 and/or to implement steps in the method for labeling 3D human body mesh according to any one of claims 7-8.

11. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps of the method for labeling 2D whole-body human keypoints according to any one of claims 1-6 and/or implements the steps of the method for labeling 3D human body mesh according to any one of claims 7-8.