CN112837362A

CN112837362A - Three-dimensional human body posture estimation method for obtaining space positioning and computer readable storage medium

Info

Publication number: CN112837362A
Application number: CN202110121062.8A
Authority: CN
Inventors: 王好谦; 高艺华; 杨芳
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-05-25

Abstract

The invention provides a three-dimensional human body posture estimation method for obtaining space positioning and a computer readable storage medium, the method comprises the following steps: acquiring a single image from an original image by adopting a human body detection network and carrying out standardization processing; predicting two-dimensional coordinates of key points from the single image by using a two-dimensional human body posture estimation method, and predicting three-dimensional coordinates of the key points from the two-dimensional coordinates of the key points by using a three-dimensional posture generator to obtain a three-dimensional human body posture estimation result; simultaneously acquiring human body parameters from the characteristics of the two-dimensional human body posture estimation network; correcting the scale of the result of the three-dimensional human body posture estimation according to the human body parameters; calculating the visual angle deviation, and aligning the three-dimensional human body posture estimation result after rotation correction with a camera coordinate system; and fitting absolute depth according to the principle of perspective projection, obtaining space positioning and finishing three-dimensional human body posture estimation. The result accuracy is improved, and the system error is reduced.

Description

Three-dimensional human body posture estimation method for obtaining space positioning and computer readable storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a three-dimensional human body posture estimation method for obtaining space positioning and a computer readable storage medium.

Background

The human body posture estimation is carried out on the two-dimensional RGB image containing the portrait, namely the coordinate position of the key point of the specified human body is output according to the picture information, and the method is a very valuable research direction in the field of computer vision. The detection result of the human body posture estimation can further serve for multiple industrial fields of human body reconstruction, human-computer interaction, behavior recognition, virtual reality, game modeling and the like, and is applied to a plurality of products. Compared with the two-dimensional posture, the three-dimensional posture estimation result can provide higher-dimensional and richer information in application, and has a wide application prospect. Therefore, in recent years, three-dimensional pose estimation has attracted the attention of researchers at home and abroad.

When a plurality of human bodies exist in a picture, the adopted three-dimensional human body posture estimation method is divided into a top-down method and a bottom-up method, wherein the top-down method and the bottom-up method are firstly used for detecting a single human body, and then the three-dimensional key point coordinates are obtained through a set method, so that the accuracy is higher; the later firstly detects all key points of a plurality of people, and then connects the key points belonging to the same person by combining the global information through a matching method, so that the real-time performance is better. There are various methods for obtaining the coordinates of the three-dimensional key points, mainly including: generating a three-dimensional coordinate from the two-dimensional coordinate of the key point; directly predicting three-dimensional coordinates of key points from the image; and synchronously estimating two-dimensional coordinates and three-dimensional coordinates of key points from the image, sharing information, and the like.

The method for generating the three-dimensional coordinate from the two-dimensional coordinate of the key point is that the two-dimensional coordinate of the key point is obtained from an original image by using a two-dimensional human body posture estimation method, and the corresponding third-dimensional coordinate is deduced only by learning the point information of the two-dimensional coordinate by using a three-dimensional posture generator. The method has the following advantages: the two-dimensional human body posture estimation method is mature and reliable, can provide prior information for subsequent tasks, and is a higher-dimensional and more concentrated characteristic compared with pictures; in addition, in the process of generating the two-dimensional to three-dimensional coordinates, data used for supervision training is a two-dimensional-three-dimensional coordinate pair, the total amount of the data is small, the method occupies less display memory compared with other methods using original images as input, and the training speed is high. The defect is that the information obtained from the two-dimensional coordinates is limited, and the abundant information expression of the picture is lost.

In the top-down three-dimensional human body posture estimation task, the evaluation index is set so that the prediction result can only represent the coordinate position of each key point in a coordinate system taking the self as a root node, and the relative position relation with other people in a picture cannot be displayed. The problem needs to be solved by predicting the absolute position of each human body key point in a camera coordinate system, so that the spatial positioning of the human body is realized, and the requirements on understanding and visual vision of the relation of multiple people in the estimation of the three-dimensional posture of the multiple people are met.

The existing top-down three-dimensional human body posture estimation method usually directly takes the three-dimensional coordinates of key points output by a neural network as a final result and does not perform other processing. The default is that the result is the same size and the same direction as the real human body, which is an assumption that there is a deviation. The above-mentioned error needs to be reduced as much as possible before comparing the predicted result with the real tag, and this is not solved effectively.

The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

In order to solve the existing problems, the invention provides a three-dimensional human body posture estimation method for obtaining space positioning and a computer readable storage medium.

In order to solve the above problems, the technical solution adopted by the present invention is as follows:

a three-dimensional human body posture estimation method for obtaining space positioning comprises the following steps: s1: acquiring a single image from an original image by adopting a human body detection network and carrying out standardization processing; s2: predicting two-dimensional coordinates of key points from the single image by using a two-dimensional human body posture estimation method, and predicting three-dimensional coordinates of the key points from the two-dimensional coordinates of the key points by using a three-dimensional posture generator to obtain a three-dimensional human body posture estimation result; simultaneously acquiring human body parameters from the characteristics of the two-dimensional human body posture estimation network; s3: correcting the scale of the result of the three-dimensional human body posture estimation according to the human body parameters; s4: calculating the visual angle deviation, and aligning the three-dimensional human body posture estimation result after rotation correction with a camera coordinate system; s5: and fitting absolute depth according to the principle of perspective projection, obtaining space positioning and finishing three-dimensional human body posture estimation.

Preferably, the original image is a single or multiple person image; detecting each human body range from the original image to obtain the single image; the standardization processing comprises filling the pixels in each human body range to a uniform proportion and scaling to a uniform size; the coordinate position label of the two-dimensional key point corresponding to the pixel is subjected to standardization processing along with the pixel; and performing decentralized processing on the coordinate position labels of the three-dimensional key points corresponding to the pixels.

Preferably, predicting two-dimensional coordinates of a key point from the single-person image using a method of two-dimensional human pose estimation, predicting three-dimensional coordinates of the key point from the two-dimensional coordinates of the key point using a three-dimensional pose generator comprises: applying a two-dimensional attitude estimation network to the single image I, wherein the obtained result is a two-dimensional coordinate prediction value R of each joint point of the single image, and is described as follows:

R＝NetA1(I)

predicting the three-dimensional coordinates of the key points from the two-dimensional coordinates of the key points through the three-dimensional posture generator to obtain a result P of three-dimensional human body posture estimation, wherein the result P is expressed as:

P＝NetA2(R)

wherein NetA1 is a two-dimensional pose estimation network; NetA2 is a three-dimensional pose generator.

Preferably, the obtaining of the human body parameters from the features of the two-dimensional human body posture estimation network comprises: learning body parameters from the features of the two-dimensional pose estimation network NetA1 using network NetB, expressed as:

β＝NetB(F_A1)

wherein beta is a numerical value in the range of 0.5-1, represents action information, and F represents a characteristic value of a network middle layer.

Preferably, the correction is based on the body parametersObtaining the scale of the result of the three-dimensional human body posture estimation to obtain the three-dimensional human body posture estimation P after the scale correction₁：

P₁＝βP。

Preferably, calculating the view angle deviation, and aligning the rotation-corrected three-dimensional human body posture estimation result with the camera coordinate system comprises: and acquiring rotation angles around the x-axis direction and the y-axis direction according to the horizontal coordinate and the vertical coordinate of the midpoint of the single image, wherein the rotation angles are respectively as follows:

estimating P the three-dimensional attitude after scale correction₁Obtaining three-dimensional attitude estimation P after visual angle correction by reversely rotating alpha angle₂：

P₂＝Rotate_α(P₁)

Wherein Rotate is the corrected three-dimensional attitude P₁Performing rotation operation; x and y are respectively a midpoint horizontal coordinate and a midpoint vertical coordinate of a human body detection frame of the single image; f is the focal length of the imaging system of the raw image, c is the ratio of the conversion between the sensor size of the imaging plane of the raw image and the pixel values of said raw image.

Preferably, according to the principle of perspective projection, fitting the absolute depth of the root node, and solving the coordinates of the root node in the camera coordinate system specifically includes: according to the two-dimensional coordinates R of the key points, two-dimensional coordinates (X, Y) of a root node are taken out, and the three-dimensional coordinates (X, Y, Z) of the root node of the corrected three-dimensional human body posture estimation result under a camera coordinate system are solved;

from the reasoning of perspective projection, the relationship of a three-dimensional coordinate point (X, Y, Z) in space to its two-dimensional coordinates (X, Y) imaged on the projection plane can be expressed as:

wherein Z is the Z coordinate of the three-dimensional coordinate point;

assuming the value of Z as a determined value Z₀To obtain corresponding X₀，Y₀Three-dimensional coordinates P of other key points in the camera coordinate system^c _iObtained by the following formula:

P^c _i＝P_i+(X₀，Y₀，Z₀)

by iterative fitting of Z₀The method of (2) is as follows:

preferably, (X) will be₀，Y₀，Z₀) As an approximate solution of three-dimensional coordinates (X, Y, Z) of the root node in the camera coordinate system, obtaining the positioning P of the human body posture in the camera coordinate system space^c：

P^c＝P+(X₀，Y₀，Z₀)。

Preferably, the two-dimensional posture estimation network is Hourglass, Simple baseline or HRNet, and the three-dimensional posture generator selects a fully-connected layer or a graph neural network layer.

The invention also provides a computer-readable storage medium having stored thereon a computer program adapted to be loaded and executed by a processor to cause a computer device having said processor to perform the method as defined in any one of the above.

The invention has the beneficial effects that: the method for estimating the three-dimensional human body posture for obtaining the space positioning and the computer readable storage medium are provided, and the steps of human body scale correction and visual angle deviation correction are added on the basis of the original top-down three-dimensional human body posture estimation method, so that the misalignment degree between a prediction coordinate system and a camera coordinate system is effectively reduced, and the aims of improving the result accuracy and reducing the system error can be fulfilled.

Furthermore, the invention adopts a perspective projection method to fit the absolute depth, thereby realizing the space positioning of the human body.

Drawings

Fig. 1 is a schematic diagram of a three-dimensional human body posture estimation method for obtaining spatial localization in an embodiment of the present invention.

Fig. 2 is a schematic flow chart of predicting three-dimensional coordinates from a single image in the embodiment of the present invention.

FIG. 3 is a diagram illustrating a deviation of a viewing angle of a camera according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for either a fixing function or a circuit connection function.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the embodiments of the present invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be in any way limiting of the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

As shown in fig. 1, the present invention provides a three-dimensional human body posture estimation method for obtaining spatial localization, comprising the following steps:

s1: acquiring a single image from an original image by adopting a human body detection network and carrying out standardization processing;

s2: predicting two-dimensional coordinates of key points from the single image by using a two-dimensional human body posture estimation method, and predicting three-dimensional coordinates of the key points from the two-dimensional coordinates of the key points by using a three-dimensional posture generator to obtain a three-dimensional human body posture estimation result; simultaneously acquiring human body parameters from the characteristics of the two-dimensional human body posture estimation network;

s3: correcting the scale of the result of the three-dimensional human body posture estimation according to the human body parameters;

s4: calculating the visual angle deviation, and aligning the three-dimensional human body posture estimation result after rotation correction with a camera coordinate system;

s5: and fitting absolute depth according to the principle of perspective projection, obtaining space positioning and finishing three-dimensional human body posture estimation.

The method adds the steps of human body scale correction and visual angle deviation correction on the basis of the original top-down three-dimensional human body posture estimation method, effectively reduces the non-coincidence degree between the prediction coordinate system and the camera coordinate system, and can achieve the purposes of improving the result accuracy and reducing the system error.

First is the first step in the top-down approach, where a single person image is taken from the original image. It is to be understood that the original image is a single or multi-person image; detecting each human body range from an original image to obtain a single image; the standardization process comprises filling pixels in each human body range to a uniform proportion and scaling to a uniform size; carrying out standardization processing on a coordinate position label of a two-dimensional key point corresponding to a pixel along with the pixel; and (5) performing decentralized processing on the coordinate position labels of the three-dimensional key points corresponding to the pixels. Specifically, the decentration processing means that the coordinate values of hip joints are subtracted from the three-dimensional coordinates of all joints to achieve a root relative form. In one embodiment of the present invention, the larger value of the length and width of the detection frame of the single image is selected as the long edge, the original detection frame is centered, the pixel value of the short edge is first expanded to the aspect ratio of 1:1, and then the expanded square is scaled to the specified size (256x256 or 384x384, which can be set by the user according to the experimental conditions).

The normalized single person image I and label T will be used in pairs for subsequent network training and supervision. In the training process of the task, the invention does not train the detection process, but uses the human body range collected in the label to replace the predicted value of the human body detection network, thereby simplifying the training process and improving the training accuracy; in the network testing stage and practical application, the process is realized by adopting a pre-trained detection network, and the human body detection network can adopt mature detection networks such as YOLO series or Mask RCNN and the like.

As a core function of the task, a module for predicting three-dimensional coordinates of key points from a single picture is fixed. Predicting two-dimensional coordinates of a key point from the single image using a two-dimensional human pose estimation method, the predicting three-dimensional coordinates of the key point from the two-dimensional coordinates of the key point using a three-dimensional pose generator comprising:

applying a two-dimensional attitude estimation network to the single image I, wherein the obtained result is a two-dimensional coordinate prediction value R of each joint point of the single image, and is described as follows:

R＝NetA1(I)

P＝NetA2(R)

P obtained at this time is the relative coordinate of each key point in the root node coordinate system. In a specific embodiment, the two-dimensional pose estimation network is Hourglass, Simple baseline or HRNet, and the three-dimensional pose generator selects a fully-connected layer or a graph neural network layer.

Common evaluation indexes in three-dimensional human body posture estimation are MEJPE (mean Per Joint Position error), P-MEJPE (Procrusts analysis MEJPE), PCK (Percentage of Correct Key-point), and the like. The evaluation criteria are differences between the calculated prediction results and the labels, and specifically mean that after the closeness degree between each group of corresponding key points is quantitatively evaluated, the average value is taken. The prediction result and the label are in an array comprising N three-dimensional vectors, represent three-dimensional coordinate values of N key points of a human body in space, and coordinate systems of the prediction result and the label are not uniform. In one embodiment, the value of N is determined by the data set and is fixed, typically 14, 16 or 17.

To achieve a comparison of two sets of spatial points, both are processed into a (root-relative) coordinate form of the relative root node. First, a root node (usually a hip joint) needs to be specified, three-dimensional coordinates of the root node are defined as (0, 0, 0), and coordinates of other key points estimated from a single person picture are three-dimensional vectors with the root node as an origin. The coordinates of each key point in the label are also unified into this form, and the vector (X, Y, Z) of the root node needs to be subtracted in the preprocessing to obtain a relative position. In this way, the two poses to be compared can be "aligned" between the two coordinate systems by coinciding the root nodes, calculating the predicted error in units of each human body.

The above evaluation index has two problems: firstly, the scale of the predicted gesture cannot represent the real scale of the human body in the camera coordinate system; secondly, only displacement deviation is eliminated in the process of aligning the two coordinate systems, and angle deviation is not considered. The following describes these two points separately and proposes a solution.

Without the pilfer transformation, the default prediction results in the same size as the real human body, which is in error. Because each single image is subjected to standardization processing, the three-dimensional result scale obtained by predicting the pictures of the children, the adults, the standing postures and the sitting postures is close, and the scale can be understood as the range of a cuboid which can be reached by a key point in space. In order to realize more reasonable space positioning, auxiliary information is added to the estimated value and used for correcting the size of the estimated value of the human posture so as to enable the estimated value to be closer to the real size. In order to lighten the whole network, an additional network is not needed, and the characteristics of the network middle layer can be shared to learn the human characteristic information while the two-dimensional human posture estimation network predicts the two-dimensional coordinate R of the key point NetA1 (I). Acquiring human body parameters from the characteristics of the two-dimensional human body posture estimation network comprises the following steps:

learning body parameters from the features of the two-dimensional pose estimation network NetA1 using network NetB, expressed as:

β＝NetB(F_A1)

wherein beta is a numerical value in the range of 0.5-1, represents action information, and F represents a characteristic value of a network middle layer. The role of NetB is to extract features, this network using a configuration similar to ResNet-18, with slight adjustments at the input. The body parameter beta is a value in the range of 0.5-1, and the function of the body parameter beta is to learn the motion information transmitted in the single picture, for example, the body parameter of a standing human body is close to 1, and the body parameter of a squatting human body is close to 0.5.

As shown in FIG. 2, the error correction of the present invention includes scale correction and angle correction, and the three-dimensional human body posture estimation P after the scale correction is obtained according to the scale of the result of the human body parameter correction of the three-dimensional human body posture estimation₁：

P₁＝βP。

Because a human body in an original picture may be distributed at any position of the picture, in the process of predicting the human body posture from a single image, after the existing top-down method cuts the single picture, the midpoint of the default picture passes through the main axis of a camera imaging system, and the human body orientation shown in the picture is consistent with the normal direction of a projection plane (picture), namely the direction of the main axis of the camera. The assumption is that a certain angle deviation occurs when the predicted three-dimensional human body posture is put back to a camera coordinate system, specifically, the collection and marking of the three-dimensional coordinates of the key points of the real human body are in the camera coordinate system, and the directions of coordinate axes of the three-dimensional coordinates are consistent with the normal direction of a projection plane; in the perspective projection theory, the image of the object deviated from the angle α of the imaging main axis on the projection plane is not the plane of the object parallel to the projection plane, but the plane of the object perpendicular to the lens-object connecting line, so that the coordinate system of the predicted three-dimensional coordinate has a rotational deviation with the angle α from the camera coordinate system, as shown in fig. 3,

continuing with fig. 2, to resolve this angular error, the three-dimensional pose P output from the previous step is rotated in reverse by an angle α before proceeding to the next step. Calculating the view angle deviation, wherein the alignment of the three-dimensional human body posture estimation result after the rotation correction and a camera coordinate system comprises the following steps:

and acquiring rotation angles around the x-axis direction and the y-axis direction according to the horizontal coordinate and the vertical coordinate of the midpoint of the single image, wherein the rotation angles are respectively as follows:

estimating P the three-dimensional attitude after scale correction₁Obtaining three-dimensional attitude estimation P after visual angle correction by reversely rotating alpha angle₂The rotation operation is based on the assumption that the midpoint of the human body observed by the camera and the root node of the human body can be approximately equal, and can be expressed as:

P₂＝Rotate_α(P₁)

wherein Rotate is the corrected three-dimensional attitude P₁Performing rotation operation; x and y are respectively a midpoint horizontal coordinate and a midpoint vertical coordinate of a human body detection frame of the single image; f is the focal length of the imaging system of the original image, c is the composition of the original imageThe ratio of the conversion between the sensor size of the image plane and the pixel values of the raw image, f and c are both parameters of the imaging system are known quantities. The view-corrected three-dimensional attitude estimate P obtained at this time₂Is the relative coordinates of each key point in the root node coordinate system.

In order to transform the human body posture from the root node coordinate system to the camera coordinate system, the three-dimensional coordinates (X, Y, Z) of the root node in the camera coordinate system need to be solved first. According to the principle of perspective projection, fitting the absolute depth of the root node, and solving the coordinate of the root node under a camera coordinate system specifically comprises the following steps:

according to the two-dimensional coordinates R of the key points, two-dimensional coordinates (X, Y) of a root node are taken out, and the three-dimensional coordinates (X, Y, Z) of the root node of the corrected three-dimensional human body posture estimation result under a camera coordinate system are solved;

wherein Z is the Z coordinate of the three-dimensional coordinate point;

in the prediction process, the two-dimensional coordinates R of the key points can be obtained, and the two-dimensional coordinates (x, y) of the root node are taken out. Assuming the value of Z as a determined value Z₀To obtain corresponding X₀，Y₀Three-dimensional coordinates P of other key points in the camera coordinate system^c _iObtained by the following formula:

P^c _i＝P_i+(X₀，Y₀，Z₀)

by iterative fitting of Z₀The method of (2) is as follows:

will (X)₀，Y₀，Z₀) As an approximate solution of three-dimensional coordinates (X, Y, Z) of the root node in the camera coordinate system, obtaining the positioning P of the human body posture in the camera coordinate system space^c：

P^c＝P+(X₀，Y₀，Z₀)。

The method of the invention is tested in a Human3.6M data set to obtain the experimental results shown in Table 1, the method of Martinez et al is taken as a reference, the same two-dimensional human body posture estimation network (hourglass network) and three-dimensional posture generator (full connection layer) are adopted, and a scale correction module and a visual angle correction module are added on the basis, and the same training method is executed.

TABLE 1 three-dimensional human posture estimation, prediction and correction experiment results

Experimental results show that the performance of the method on MPJPE (mm) indexes (the last column) is superior to that of a reference method, the result accuracy is improved, and the system error is reduced.

An embodiment of the present application further provides a control apparatus, including a processor and a storage medium for storing a computer program; wherein a processor is adapted to perform at least the method as described above when executing the computer program.

Embodiments of the present application also provide a storage medium for storing a computer program, which when executed performs at least the method described above.

Embodiments of the present application further provide a processor, where the processor executes a computer program to perform at least the method described above.

The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAMEN), Synchronous linked Dynamic Random Access Memory (DRAM), and Direct Random Access Memory (DRMBER). The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A three-dimensional human body posture estimation method for obtaining space positioning is characterized by comprising the following steps:

2. The method of claim 1, wherein the original image is a single or multi-person image; detecting each human body range from the original image to obtain the single image;

the standardization processing comprises filling the pixels in each human body range to a uniform proportion and scaling to a uniform size;

the coordinate position label of the two-dimensional key point corresponding to the pixel is subjected to standardization processing along with the pixel; and performing decentralized processing on the coordinate position labels of the three-dimensional key points corresponding to the pixels.

3. The method of obtaining a spatially positioned three-dimensional body pose estimation according to claim 2, wherein predicting two-dimensional coordinates of a keypoint from the single image using a two-dimensional body pose estimation method, predicting three-dimensional coordinates of the keypoint from the two-dimensional coordinates of the keypoint using a three-dimensional pose generator comprises:

R＝NetA1(I)

P＝NetA2(R)

4. The method of obtaining a spatially-oriented three-dimensional body pose estimation method of claim 3, wherein obtaining body parameters from features of the two-dimensional body pose estimation network comprises:

β＝NetB(F_A1)

5. The method of claim 4, wherein the three-dimensional body pose estimation method for spatial localization is characterized in that the three-dimensional body pose estimation P after scale correction is obtained according to the scale of the result of the human body parameter correction₁：

P₁＝βP。

6. The method of obtaining a spatially oriented three-dimensional body pose estimate of claim 5, wherein computing a view angle deviation, aligning the rotation corrected three-dimensional body pose estimate with a camera coordinate system comprises:

P₂＝Rotate_α(P₁)

7. The method for obtaining a three-dimensional human body pose estimation of spatial localization according to claim 6, wherein fitting the absolute depth of the root node according to the principle of perspective projection, and solving the coordinates of the root node under the camera coordinate system specifically comprises:

wherein Z is the Z coordinate of the three-dimensional coordinate point;

P^c _i＝P_i+(X₀，Y₀，Z₀)

by iterative fitting of Z₀The method of (2) is as follows:

8. the method of claim 7 wherein (X) is₀，Y₀，Z₀) As an approximate solution of three-dimensional coordinates (X, Y, Z) of the root node in the camera coordinate system, obtaining the positioning P of the human body posture in the camera coordinate system space^c：

P^c＝P+(X₀，Y₀，Z₀)。

9. The method of claim 8, wherein the two-dimensional pose estimation network is Hourglass, Simple baseline or HRNet, and the three-dimensional pose generator is a fully-connected layer or a graph neural network layer.

10. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded and executed by a processor to cause a computer device having said processor to carry out the method of any one of claims 1 to 9.