CN114612605A

CN114612605A - Three-dimensional human body reconstruction method and device

Info

Publication number: CN114612605A
Application number: CN202210089804.8A
Authority: CN
Inventors: 朱翔昱; 雷震; 廖婷婷
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-06-10

Abstract

The invention provides a three-dimensional human body reconstruction method and a device, wherein the method comprises the following steps: acquiring an image of a target human body example as a target image; inputting the target image into a three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of a target human body example output by the three-dimensional human body reconstruction model; the three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on a target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on a target image, the skin parameters are determined based on corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space. The three-dimensional human body reconstruction method and the device provided by the invention can acquire the image of the target human body example without constraint under the condition of not depending on an image sensor for calibrating camera parameters, thereby being capable of reconstructing the three-dimensional human body more simply and efficiently and realizing the three-dimensional human body reconstruction more widely suitable for various scenes.

Description

Three-dimensional human body reconstruction method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a three-dimensional human body reconstruction method and a three-dimensional human body reconstruction device.

Background

Three-dimensional human reconstruction is a key problem in the field of computer graphics and computer vision. At present, three-dimensional human body reconstruction is widely applied to various fields of virtual reality AR/VR, movie and television entertainment, virtual fitting, industrial manufacturing and the like.

The existing three-dimensional human body reconstruction method can construct a three-dimensional model of a human body example based on a 2D image comprising the human body example. However, the existing three-dimensional human body reconstruction methods have strong limitations, such as: the 2D image needs to be acquired by an image sensor with calibrated camera parameters; or, the 2D image needs to be acquired by a laser scanner or a multi-camera array system, which is expensive, so that the existing three-dimensional human body reconstruction method is difficult to be widely applied to various scenes.

Disclosure of Invention

The invention provides a three-dimensional human body reconstruction method and a three-dimensional human body reconstruction device, which are used for overcoming the defect that the prior art is difficult to be widely applied to various scenes and realizing wider application to various scenes.

The invention provides a three-dimensional human body reconstruction method, which comprises the following steps:

acquiring an image of a target human body example as a target image;

inputting the target image into a three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model;

the three-dimensional human body reconstruction model is obtained by training a three-dimensional model based on a sample human body example and a sample image, wherein the sample image comprises the three-dimensional model of the sample human body example;

the three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on the target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on the target image, the skin parameters are determined based on the corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space.

According to the three-dimensional human body reconstruction method provided by the invention, the three-dimensional model of the sample human body example comprises a first sample model, a second sample model and a third sample model of the sample human body example, the postures of the first sample model and the third sample model are the same, the posture of the second sample model is a preset standard posture, and the acquisition modes of the first sample model and the third sample model are different; the sample image includes the first sample model;

the three-dimensional human body reconstruction model comprises: the device comprises an image feature extraction layer, a skin parameter extraction layer, an attention mechanism layer and a result output layer;

correspondingly, the inputting the target image into a three-dimensional human body reconstruction model to obtain a three-dimensional human body reconstruction result of the target human body instance output by the three-dimensional human body reconstruction model specifically includes:

inputting the target image and the position information of each preset sampling point into the image feature extraction layer, and acquiring the image feature corresponding to each preset sampling point output by the image feature extraction layer;

inputting the position information of each preset sampling point and the corresponding image characteristics into the skin parameter extraction layer, and acquiring skin parameters corresponding to each preset sampling point output by the skin parameter extraction layer;

inputting the position information of each preset sampling point, the corresponding image characteristics and the skin parameters into the attention mechanism layer, and acquiring identification information of each preset sampling point output by the attention mechanism layer, wherein the identification information is used for indicating that each preset sampling point is or is not positioned in the three-dimensional human body model of the target human body example;

and inputting the identification information of each preset sampling point into the result output layer, and acquiring the three-dimensional human body reconstruction result output by the result output layer, wherein the three-dimensional human body reconstruction result comprises a first model of the target human body example, and the posture of the first model is the standard posture.

According to the three-dimensional human body reconstruction method provided by the invention, the image feature extraction layer comprises: the device comprises a data processing unit, a position updating unit and a feature extracting unit;

correspondingly, the inputting the target image and the position information of each preset sampling point into the image feature extraction layer to obtain the image feature corresponding to each preset sampling point specifically includes:

inputting the target image into the data processing unit, and acquiring a second model of the target human body example output by the data processing unit, wherein the second model carries image posture parameters corresponding to the target image;

inputting the target image, the image posture parameters and the position information of each preset sampling point into the position updating unit, and acquiring the updated position information of each preset sampling point output by the position updating unit, wherein the updated position information corresponds to the real posture of the target human body example in the target image;

and inputting the updated position information of the target image and each preset sampling point into the feature extraction unit, and acquiring the image features corresponding to each preset sampling point output by the feature extraction unit.

According to the three-dimensional human body reconstruction method provided by the invention, after the position information, the corresponding image characteristics and the skin parameters of each preset sampling point are input into the attention mechanism layer and the identification information of each preset sampling point output by the attention mechanism layer is obtained, the method further comprises the following steps:

and inputting the identification information of each preset sampling point and the second model into the result output layer, and acquiring the three-dimensional human body reconstruction result output by the result output layer, wherein the three-dimensional human body reconstruction result comprises a third model of the target human body example, and the posture of the third model is the real posture of the target human body example in the target image.

According to the three-dimensional human body reconstruction method provided by the invention, the loss function of the three-dimensional human body reconstruction model comprises the following steps: a skin parameter loss function;

the skin parameter loss function is determined based on the predicted skin parameters corresponding to the sample sampling points and the skin parameter labels; the skin parameter prediction method comprises the steps that position information of sample sampling points and corresponding image characteristics are input into a skin parameter extraction model in training and output by the skin parameter extraction model in the training; the image features corresponding to the sample points are determined based on the sample image.

According to the three-dimensional human body reconstruction method provided by the invention, the loss function of the three-dimensional human body reconstruction model comprises the following steps: a classification loss function;

the classification loss function is determined based on the predicted identification information and the identification information label of the sample sampling point; and the predicted identification information of the sample sampling points is output by the attention mechanism layer in the training by inputting the position information of the sample sampling points, the corresponding image characteristics and the predicted skin parameters into the attention mechanism layer in the training.

The present invention also provides a three-dimensional human body reconstruction device, comprising:

the image acquisition module is used for acquiring an image of a target human body example as a target image;

the three-dimensional reconstruction module is used for inputting the target image into a three-dimensional human body reconstruction model and acquiring a three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the three-dimensional human body reconstruction methods.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the three-dimensional human reconstruction method as described in any one of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the three-dimensional body reconstruction method as described in any one of the above.

The three-dimensional human body reconstruction method and the device provided by the invention have the advantages that the image of the target human body example is obtained as the target image, the target image is input into the three-dimensional human body reconstruction model, the three-dimensional human body reconstruction model obtains the image characteristic of each preset sampling point based on the target image, the skin parameter corresponding to each preset sampling point is obtained based on the image characteristic and the skin parameter corresponding to each preset sampling point, the three-dimensional human body reconstruction is carried out on the target human body example based on the image characteristic and the skin parameter corresponding to each preset sampling point, the three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model is obtained based on the three-dimensional model of the sample human body example and the sample image of the three-dimensional model comprising the sample human body example, the preset sampling points are uniformly distributed in the three-dimensional space, and the method and the device can be used for reconstructing the three-dimensional human body without depending on the image sensor for calibrating the camera parameters, Under the condition of an expensive laser scanner or a multi-camera array system, images of a target human body example are acquired without constraint, so that three-dimensional human body reconstruction can be carried out more simply and efficiently, and the three-dimensional human body reconstruction which is more widely applicable to various scenes can be realized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a three-dimensional human body reconstruction method provided by the present invention;

FIG. 2 is a schematic structural diagram of a three-dimensional human body reconstruction model in the three-dimensional human body reconstruction method provided by the present invention;

FIG. 3 is a schematic flow chart of acquiring a sample image in the three-dimensional human body reconstruction method provided by the present invention;

FIG. 4 is a schematic flow chart of training a three-dimensional human body reconstruction model in the three-dimensional human body reconstruction method provided by the present invention;

FIG. 5 is a schematic structural diagram of a three-dimensional human body reconstruction device provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

Fig. 1 is a schematic flow chart of a three-dimensional human body reconstruction method provided by the invention. The three-dimensional human body reconstruction method of the present invention is described below with reference to fig. 1. As shown in fig. 1, the method includes: and 101, acquiring an image of a target human body example as a target image.

Specifically, the target human body example is a human body example which needs to be subjected to three-dimensional human body reconstruction.

The image of the target human body example can be acquired as the target image by relying on a conventional image sensor or an electronic device with an image acquisition function. For example: the method comprises the steps that a mobile phone with an image acquisition function can be used for acquiring an image of a target human body example as a target image; alternatively, an image of the target human body instance may be acquired as the target image using a conventional camera.

It should be noted that, in the embodiment of the present invention, when the image of the target human body example is acquired, there is no need to be constrained by the camera parameters and other conditions. Compared with the traditional three-dimensional human body reconstruction method which needs to rely on an image sensor with calibrated camera parameters, a laser scanner with high price or a multi-camera array system to obtain the image of the target human body example, the three-dimensional human body reconstruction method provided by the invention can obtain the image of the target human body example without constraints, thereby improving the application range of the three-dimensional human body reconstruction method and enabling the three-dimensional human body reconstruction method provided by the invention to be more widely applicable to various scenes.

It should be noted that the number of images of the target human body instance may be one or more. In the case of acquiring multiple frames of images or a segment of video of the target human body example, each frame of image of the target human body example or each frame of image in the video may be used as the target image. Accordingly, the number of target images may be one or more.

And 102, inputting the target image into the three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model.

The three-dimensional human body reconstruction model is obtained after training based on a three-dimensional model of a sample human body example and a sample image, and the sample image comprises the three-dimensional model of the sample human body example.

The three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on a target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on a target image, the skin parameters are determined based on corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space.

It should be noted that before the target image is input into the three-dimensional human body reconstruction model and the three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model is obtained, the three-dimensional human body reconstruction model may be trained based on the three-dimensional model of the sample human body example and the sample image including the three-dimensional model of the sample human body example, so as to obtain the trained three-dimensional human body reconstruction model.

Optionally, the three-dimensional human reconstruction model may be trained by: first, a three-dimensional model of a sample human body instance may be obtained in a number of ways, for example: a three-dimensional model of the sample human body instance may be obtained based on conventional laser scanners, SMPL algorithms, and the like. Wherein, the number of the three-dimensional models of the sample human body example can be a plurality. Second, a three-dimensional model of the sample human body instance may be obtained as a sample image. And finally, training the three-dimensional human body reconstruction model based on the sample image and the three-dimensional model of the sample human body example to obtain the trained three-dimensional human body reconstruction model.

After the trained three-dimensional human body reconstruction model is obtained, the target image can be input into the trained three-dimensional human body reconstruction model.

The trained three-dimensional human body reconstruction model can acquire image features corresponding to each preset sampling point based on a target image, skin parameters corresponding to each preset sampling point based on the image features corresponding to each preset sampling point, perform three-dimensional human body reconstruction on a target human body example based on the image features and the skin parameters corresponding to each preset sampling point, and acquire and output a three-dimensional human body reconstruction result of the target human body example.

It should be noted that all the preset sampling points are uniformly distributed in the three-dimensional space, and the distances between any two adjacent preset sampling points are equal to form a three-dimensional lattice, and the greater the density of the three-dimensional lattice is, the more accurate the obtained three-dimensional human body reconstruction result of the target human body example is.

Optionally, in an embodiment of the present invention, each preset sampling point forms a three-dimensional cubic lattice, where the three-dimensional cubic lattice includes 256 × 256 preset sampling points.

The method comprises the steps of obtaining an image of a target human body example as a target image, inputting the target image into a three-dimensional human body reconstruction model, obtaining image characteristics of each preset sampling point based on the target image by the three-dimensional human body reconstruction model, obtaining skin parameters corresponding to each preset sampling point based on the image characteristics corresponding to each preset sampling point, and carrying out three-dimensional human body reconstruction on the target human body example based on the image characteristics corresponding to each preset sampling point and the skin parameters so as to obtain a three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model, wherein the three-dimensional human body reconstruction model is obtained by training a three-dimensional model based on a sample human body example and a sample image comprising the three-dimensional model of the sample human body example, and each preset sampling point is uniformly distributed in a three-dimensional space, so that the method can be carried out without depending on an image sensor for calibrating camera parameters, Under the condition of an expensive laser scanner or a multi-camera array system, images of a target human body example are acquired without constraint, so that three-dimensional human body reconstruction can be carried out more simply and efficiently, and the three-dimensional human body reconstruction which is more widely applicable to various scenes can be realized.

Based on the content of each embodiment, the three-dimensional model of the sample human body example comprises a first sample model, a second sample model and a third sample model of the sample human body example, the postures of the first sample model and the third sample model are the same, the posture of the second sample model is a preset standard posture, and the obtaining modes of the first sample model and the third sample model are different; the sample image includes a first sample model.

Fig. 2 is a schematic structural diagram of a three-dimensional human body reconstruction model in the three-dimensional human body reconstruction method provided by the present invention. As shown in fig. 2, the three-dimensional human body reconstruction model includes: the device comprises an image feature extraction layer, a skin parameter extraction layer, an attention mechanism layer and a result output layer.

It should be noted that a sample space coordinate system may be constructed in a three-dimensional space including a three-dimensional model of a sample target human body example, and coordinates of any one point in the sample space coordinate system may be used as position information of the point.

Specifically, the sample human body example can be scanned based on a laser scanner, and a first sample model of the sample human body example is obtained. A third sample model of the sample human instance may be constructed based on the SMPL algorithm. Wherein the first and third sample molds need to be sealed, void-free molds. And the third sample model constructed based on the SMPL algorithm carries the skin parameter corresponding to any point on the surface of the third sample model and the sample image posture parameter corresponding to the sample image.

When the laser scanner scans the sample human body example, the posture of the sample human body example is the sample posture of the sample human body example. The poses of the first and third sample models are both the sample poses described above.

It is noted that the first sample model may include clothing folds of the sample human body example. The third sample model did not include the clothing folds of the sample human body example.

Alternatively, images in different scenes may also be acquired as background images. In order to make the three-dimensional human body reconstruction model have better generalization, the acquired background image needs to reflect common scenes in daily life as truly and variously as possible.

Fig. 3 is a schematic flow chart of acquiring a sample image in the three-dimensional human body reconstruction method provided by the present invention. As shown in fig. 3, the first sample human body model is subjected to data processing, and sample camera projection parameters corresponding to the first sample human body model and the first sample human body model after the data processing are obtained. Randomly selecting a background image, cutting and zooming the background image to a preset size of 512 x 512, rendering a first sample body model after data processing to the background image through orthogonal projection to obtain a sample image comprising the first sample body model, and simultaneously rendering to obtain a cut-out image, a depth image and a normal vector image of the first sample body model.

After rendering the first sample body model after data processing to the background image, whether the background image is selected completely or not can be judged, under the condition that the background image is not selected completely, a background image can be selected optionally again, after the background image is processed, the first sample body model after data processing is rendered to the background image until the background image is selected completely, and a sample image set is obtained.

Optionally, the data processing of the first sample body model specifically includes the steps of, first, normalizing the first sample model, and translating the first sample model in the sample space coordinate system so that a central point of the first sample model coincides with a preset base point in the sample space coordinate system. Wherein the central point of the first sample model is determined based on the geometric features of the first sample model, and the predetermined base point may be an origin of the sample space coordinate system. Second, scale the first sample model to [ -0.5,0.5 ]; finally, the first sample model is randomly rotated, translated, and scaled along the y-axis.

By using the SCANimate model, a second sample model of the sample human body example standard posture can be constructed based on the first sample model, and skin parameters corresponding to any point on the second sample model can be obtained.

Fig. 4 is a schematic flow chart of training a three-dimensional human body reconstruction model in the three-dimensional human body reconstruction method provided by the present invention, and as shown in fig. 4, after obtaining a first sample model, a second sample model, a third sample model of a sample human body example and a sample image including the first sample model, the three-dimensional human body reconstruction model may be trained based on the first sample model, the second sample model, the third sample model and the sample image.

It should be noted that, because infinite accuracy sampling can be performed in a three-dimensional space, training a three-dimensional human body reconstruction model based on a uniform sampling mode is very inefficient, and sampling is performed within a certain distance range on the surface of the three-dimensional model of a sample human body example, although a decision boundary can be found better by the three-dimensional human body reconstruction model in training, the sampling mode can cause over-fitting of the three-dimensional human body reconstruction model. Therefore, in the embodiment of the invention, the three-dimensional model surface of the sample human body example and the three-dimensional space near the surface are uniformly sampled in a certain proportion, so that the three-dimensional human body reconstruction model can be rapidly converged and has better generalization capability.

Specifically, the second sample model is translated in the sample space coordinate system such that a center point of the second sample model coincides with a preset base point in the sample space coordinate system.

Determining M on the surface of the second sample model₁A first sample sampling point for uniformly determining M in the three-dimensional space near the surface of the second sample model₂A second sample sampling point. And taking the first sample sampling point and the second sample sampling point as sample sampling points. Wherein M is₁＞M₂. Alternatively, M₁:M＝16:1。

After the sample sampling points are determined, a first sample sampling point in the sample sampling points can be marked as 1, and the first sample sampling point is used as an identification information tag of the first sample sampling point and represents that the first sample sampling point is located on the surface of the second sample model. A second sample sampling point of the sample sampling points may be labeled as 0, and serves as an identification information tag of the second sample sampling point, indicating that the second sample sampling point is not located on the surface of the second sample model. So that the identification information label of the sample sampling point can be obtained.

After the sample sampling points are determined, the coordinates of a first sample sampling point in the sample sampling points can be obtained, and a positive-distribution noise disturbance with the mean value of 0 and the variance of 0.05 is added to the coordinates of the first sample sampling point to serve as the position information of the first sample sampling point. The coordinates of the second one of the sample sampling points may be acquired as the position information of the second sample sampling point, thereby acquiring the position information of the sample sampling point.

After the position information of the sample sampling point is obtained, the position of the sample sampling point may be updated based on the third sample model, and the updated position information of the sample sampling point is obtained. The updated position information of the sample sampling points may correspond to the pose of the first sample model.

Specifically, the third sample model is translated in the sample space coordinate system, so that a central point of the third sample model and a central point of the second sample model coincide with a preset base point in the sample space coordinate system, and the second sample model, the third sample model and the sample sampling point are located in the same three-dimensional space.

And for any sample sampling point, determining the closest point of the sample sampling point to the third sample model surface as an associated sampling point corresponding to the sample sampling point. And taking the skin parameters corresponding to the associated sampling points as skin parameter labels corresponding to the sample sampling points.

And inputting the skin parameter labels and the position information of the sample sampling points and sample image attitude parameters corresponding to the sample images into a linear hybrid skin algorithm based on key point driving, and calculating to obtain updated position information of the sample sampling points.

The image feature extraction layer in the three-dimensional human body reconstruction model can be constructed based on a Hourglass network. The updated position information of the sample sampling points, the sample image and the camera projection parameters corresponding to the first sample body model are input into the image feature extraction layer, and the image feature extraction layer can project the sample sampling points to image features based on the updated position information of the sample camera parameters and the sample sampling points, so that the image features corresponding to the sample sampling points output by the image feature extraction layer can be obtained.

The skin parameter extraction layer in the three-dimensional human body reconstruction model can be constructed based on an MLP network. After the image features corresponding to the sample sampling points are obtained, the position information of the sample sampling points and the corresponding image features can be input into a skin parameter extraction layer in training, and the predicted skin parameters corresponding to the sample sampling points output by the skin parameter extraction layer in training are obtained.

Optionally, the skin parameter extraction layer may obtain the predicted skin parameter corresponding to the sample sampling point based on the following formula:

wherein S represents a skin parameter extraction layer; p represents the position information of the sample sampling point, and I represents the sample image; phi represents an image feature extraction model,

an MLP network is represented.

And training the skin parameter extraction layer based on the predicted skin parameters and the skin parameter labels corresponding to the sample sampling points to obtain the trained skin parameter extraction layer.

The attention mechanism layer in the three-dimensional human body reconstruction model can be constructed based on a Transformer network. And inputting the position information of the sample sampling point, the corresponding predicted skin parameter and the image characteristic into the attention mechanism layer in training, and acquiring the predicted identification information of the sample sampling point output by the attention mechanism layer in training. The predicted identification information of the sample sampling point may be 1 or 0, and is used to indicate that the sample sampling point is located or not located inside the second sample model. Under the condition that the prediction identification information of the sample sampling point is 1, the sample sampling point is positioned in the second sample model; and under the condition that the prediction identification information of the sample sampling point is 0, the sample sampling point is not positioned in the second sample model.

Optionally, the attention mechanism layer may obtain the predicted identification information of the sample sampling point based on the following formula:

F(p,I)＝H[Φ(p,I),S(p,I),p]

where H denotes an attention mechanism model.

Based on the predicted identification information and the identification information labels of the sample sampling points, the attention mechanism layer can be trained, and the trained attention mechanism layer is obtained.

Correspondingly, inputting the target image into the three-dimensional human body reconstruction model, and obtaining a three-dimensional human body reconstruction result of the target human body example output by the three-dimensional human body reconstruction model, which specifically comprises the following steps: and inputting the target image and the position information of each preset sampling point into the image feature extraction layer, and acquiring the image feature corresponding to each preset sampling point output by the image feature extraction layer.

Specifically, a spatial coordinate system may be established in a three-dimensional space, and the position information of the preset sampling point may be represented by a coordinate of the preset sampling point in the spatial coordinate system (referred to as a coordinate of the preset sampling point for short).

After the target image is obtained, the target image and the coordinates of each preset sampling point can be input into an image feature extraction layer in the three-dimensional human body reconstruction model.

The image feature extraction layer constructed based on the Hourglass network can extract features of a target image, and can acquire image features corresponding to each preset sampling point through a numerical calculation method based on camera projection parameters corresponding to a first human body model, coordinates of each preset sampling point and the extracted image features, so that the image features corresponding to each preset sampling point output by the image feature extraction layer can be acquired.

And inputting the position information of each preset sampling point and the corresponding image characteristics into the skin parameter extraction layer, and acquiring skin parameters corresponding to each preset sampling point output by the skin parameter extraction layer.

Specifically, after the image features corresponding to each preset sampling point are obtained, the position information of each preset sampling point and the corresponding image features may be input into the trained skin parameter extraction layer.

The input skin parameter extraction layer constructed based on the MLP network can acquire the skin parameter corresponding to each preset sampling point based on the position information of each preset sampling point and the corresponding image characteristics, so that the skin parameter corresponding to each preset sampling point output by the skin parameter extraction layer can be acquired.

And inputting the position information of each preset sampling point, the corresponding image characteristics and the skin parameters into an attention mechanism layer, and acquiring identification information of each preset sampling point output by the attention mechanism layer, wherein the identification information is used for indicating whether the preset sampling point is positioned or not positioned in the three-dimensional human body model of the target human body example.

Specifically, after obtaining the skin parameter corresponding to each preset sampling point, the position information, the corresponding image feature, and the skin parameter of each preset sampling point may be input into the trained attention mechanism layer.

The attention mechanism layer constructed based on the Transformer network can perform network forward calculation based on the position information of each preset sampling point, the corresponding image characteristics and the skin parameters to acquire the identification information of each preset sampling point, so that the identification information of each preset sampling point output by the attention mechanism layer can be acquired. The identification information of the preset sampling point can be 1 or 0, and the preset sampling point is positioned in the three-dimensional human body model of the target human body example under the condition that the identification information of the preset sampling point is 1; and under the condition that the identification information of the preset sampling point is 0, the preset sampling point is not positioned in the three-dimensional human body model of the target human body example.

And inputting the identification information of each preset sampling point into the result output layer, and acquiring a three-dimensional human body reconstruction result output by the result output layer, wherein the three-dimensional human body reconstruction result comprises a first model of the target human body example, and the posture of the first model is a standard posture.

Specifically, after the identification information of each preset sampling point is acquired, the identification information of each preset sampling point may be input to the result output layer.

The result output layer can generate a first model of the target human body example through a Marching Cubic algorithm based on the preset threshold and the identification information of each preset sampling point, the first model is used as a three-dimensional human body reconstruction result of the target human body example, and then the three-dimensional human body reconstruction result of the target human body example output by the result output layer can be obtained.

It should be noted that the pose of the first model of the target human body example is the standard pose.

Alternatively, the preset threshold may be 0.5.

The embodiment of the invention obtains the image characteristics corresponding to each preset sampling point by inputting a target image and the position information of each preset sampling point into an image characteristic extraction layer, obtains the skin parameters corresponding to each preset sampling point output by the skin parameter extraction layer, inputs the position information of each preset sampling point, the corresponding image characteristics and the skin parameters into an attention mechanism layer, obtains the identification information which is output by the attention mechanism layer and is used for indicating that each preset sampling point is positioned or not positioned in the three-dimensional human body model of the target human body example, inputs the identification information of each preset sampling point into a result output layer, obtains the three-dimensional human body reconstruction result which is output by the result output layer and comprises the first model of the target human body example, and can obtain the parameters of each preset sampling point through the skin parameter extraction layer constructed based on an MLP network, and the three-dimensional human body reconstruction is driven based on the skin parameter of each preset sampling point, compared with the traditional three-dimensional human body reconstruction method in which the skin parameter is obtained based on a parameterized human body model, the driving is smoother and more flexible, the fusion of multi-frame graphic features is realized through an attention mechanism layer constructed based on a Transformer network, compared with the direct calculation of average image features, the image features can be fused in a self-adaptive manner, and therefore the accuracy of the three-dimensional human body reconstruction can be improved.

Based on the content of the foregoing embodiments, the image feature extraction layer includes: the device comprises a data processing unit, a position updating unit and a feature extracting unit.

Correspondingly, the position information of the target image and each preset sampling point is input into the image feature extraction layer, and the image feature corresponding to each preset sampling point is obtained, which specifically comprises the following steps: and inputting the target image into the data processing unit, and acquiring a second model of the target human body example output by the data processing unit, wherein the second model carries the image posture parameters corresponding to the target image.

Specifically, the target image is input to the data processing unit in the image feature extraction layer, and the data processing unit may generate the second model of the target human body instance based on the SMPL algorithm, so that the above-mentioned second model output by the data processing unit may be acquired.

It should be noted that although the pose of the second model generated based on the SMPL algorithm is the same as the real pose of the target human body instance in the target image, the second model is a rough three-dimensional human body model, and is prone to distortion, loss of details, and the like, and therefore the second model cannot be used as the three-dimensional human body reconstruction result of the target human body instance.

It should be noted that, when the data processing unit generates the second model based on the SMPL algorithm, the image posture parameter corresponding to the target image may be acquired, and the second model output by the data processing unit may carry the image posture parameter.

Inputting the target image, the image posture parameters and the position information of each preset sampling point into a position updating unit, and acquiring the updated position information of each preset sampling point output by the position updating unit, wherein the updated position information corresponds to the real posture of the target human body example in the target image.

Specifically, the target image, the image posture parameter and the position information of each preset sampling point are input to a position updating unit in the image feature extraction layer, and the position updating unit can input the target image, the image posture parameter and the position information of each preset sampling point into a linear hybrid skinning algorithm and calculate to obtain the updated position information of each preset sampling point. The updated position information of each preset sampling point can correspond to the real posture of the target human body example in the target image.

Specifically, after the updated position information of each preset sampling point is obtained, the target image and the updated position information of each preset sampling point may be input to the feature extraction unit in the image feature extraction layer.

The feature extraction unit can extract features of the target image, and project each preset sampling point to the image features through orthogonal projection, so that the image features corresponding to each preset sampling point output by the feature extraction unit can be obtained.

The embodiment of the invention inputs a target image into a data processing unit in an image feature extraction layer, acquires a second model of a target human body example output by the data processing unit and an image attitude parameter corresponding to the target image carried by the second model, inputs the target image, the image attitude parameter and position information of each preset sampling point into a position updating unit in the image feature extraction layer, acquires updated position information of each preset sampling point output by the position updating unit, the updated position information corresponds to the real attitude of the target human body example in the target position, inputs the updated position information of the target image and each preset sampling point into a feature extraction unit in the image feature extraction layer, acquires image features corresponding to each preset sampling point output by the feature extraction unit, can use a rough second model as prior knowledge to provide richer three-dimensional semantic information for the preset sampling points, the accuracy of the three-dimensional human body reconstruction can be further improved.

Based on the content of each embodiment, after inputting the position information of each preset sampling point, the corresponding image feature and the skin parameter into the attention mechanism layer and acquiring the identification information of each preset sampling point output by the attention mechanism layer, the method further includes: and inputting the identification information of each preset sampling point and the second model into a result output layer, and acquiring a three-dimensional human body reconstruction result output by the result output layer, wherein the three-dimensional human body reconstruction result comprises a third model of the target human body example, and the posture of the third model is the real posture of the target human body example in the target image.

Specifically, after the identification information of each preset sampling point output by the attention mechanism layer is obtained, the identification information of each preset sampling point and the second model of the target human body example can be input into the result output layer together.

The result output layer can generate a first model of the target human body example through a Marching Cubic algorithm based on the preset threshold value and the identification information of each preset sampling point, posture deformation is carried out on the first model based on the second model, and a third model of the target human body example is obtained, wherein the postures of the third model and the second model are the same and are both the real postures of the target human body example in the target image.

Alternatively, the preset threshold may be 0.5.

According to the embodiment of the invention, the identification information of each preset sampling point and the second model of the target human body example are input into the result output layer, the three-dimensional human body reconstruction result including the third model of the target human body example output by the result output layer is obtained, and the posture deformation of the first model can be carried out based on the rough second model on the basis of obtaining the first model of the standard posture, so that the third model which is the same as the real posture of the target human body example in the target image can be obtained more accurately and efficiently.

Based on the content of the above embodiments, the loss function of the three-dimensional human body reconstruction model includes: a skin parameter loss function.

The skin parameter loss function is determined based on the predicted skin parameters corresponding to the sample sampling points and the skin parameter labels; predicting skin parameters, inputting the position information of sample sampling points and corresponding image characteristics into a skin parameter extraction model in training, and outputting the skin parameter extraction model in training; image features corresponding to the sample points are determined based on the sample image.

Specifically, the skin parameter loss function may be represented by the following formula:

wherein L is_skinRepresenting skin parameter loss values of sample sampling points; p represents a set of sample sampling points; s (p, I) represents a predicted skin parameter corresponding to a sample sampling point; s^*And (p, I) represents the skinning parameter label corresponding to the sample sampling point. Skin parameter loss value L with training target as sample sampling point_skinAnd (4) minimizing.

It should be noted that, for a specific process of obtaining the predicted skin parameter and the skin parameter label corresponding to the sample sampling point, reference may be made to the contents of the foregoing embodiments, and details are not described here again.

According to the embodiment of the invention, the skin parameter loss function is determined based on the skin parameter prediction and the skin parameter label corresponding to the sample sampling point, and the three-dimensional human body reconstruction model is trained based on the skin parameter loss function, so that the calculation accuracy of the three-dimensional human body reconstruction model can be improved.

Based on the content of the above embodiments, the loss function of the three-dimensional human body reconstruction model includes: a classification loss function.

Specifically, the classification loss function can be represented by the following formula:

wherein L is_3DA classification loss value representing a sample sampling point; p represents a set of sample sampling points; f (p, I) represents the predicted identification information of the sample sampling point, F^*(p, I) an identification information tag representing a sample sampling point. Training target is classification loss value L of sample sampling point_3DAnd (4) minimizing.

It should be noted that, for the specific process of obtaining the predicted identifier information and the identifier information tag of the sampling point, reference may be made to the contents of the foregoing embodiments, and details are not described here again.

Optionally, the skin parameter loss function and the classification loss function may be subjected to weighted summation to obtain a target loss function for training the three-dimensional human body reconstruction model. The target loss function can be expressed by the following formula:

wherein L represents the target loss function value; gamma ray₁And gamma₂Respectively representing weight parameters; θ represents a learnable model parameter. The training objective is to minimize the objective loss function value L.

It should be noted that the weight parameter γ₁And gamma₂May be predetermined based on a priori knowledge.

When the three-dimensional human body reconstruction model is trained on the basis of the target loss function, the model parameter theta can be optimized on the basis of the obtained target loss function value, and the three-dimensional human body reconstruction model is trained on the basis of the target loss function under the condition that the target loss function value is not converged; and under the condition that the target loss function value is converged, stopping training the three-dimensional human body reconstruction model, and outputting a model parameter theta to obtain the trained three-dimensional human body reconstruction model.

According to the embodiment of the invention, the classification loss function is determined based on the predicted identification information and the identification information label of the sample sampling point, and the three-dimensional human body reconstruction model is trained based on the classification loss function, so that the calculation accuracy of the three-dimensional human body reconstruction model can be improved.

Fig. 5 is a schematic structural diagram of a three-dimensional human body reconstruction device provided by the invention. The three-dimensional human body reconstruction device provided by the present invention is described below with reference to fig. 5, and the three-dimensional human body reconstruction device described below and the three-dimensional human body reconstruction method provided by the present invention described above may be referred to correspondingly. As shown in fig. 5, the apparatus includes: an image acquisition module 501 and a three-dimensional reconstruction module 502.

An image obtaining module 501, configured to obtain an image of a target human body example as a target image.

The three-dimensional reconstruction module 502 is configured to input the target image into the three-dimensional human body reconstruction model, and obtain a three-dimensional human body reconstruction result of the target human body instance output by the three-dimensional human body reconstruction model.

Specifically, the image acquisition module 501 and the three-dimensional reconstruction module 502 are electrically connected.

The image acquisition module 501 may be used to acquire an image of a target human body instance as a target image, relying on a conventional image sensor or an electronic device with image acquisition functionality.

The three-dimensional reconstruction module 502 may be configured to input the target image into the trained three-dimensional human body reconstruction model, and obtain a three-dimensional human body reconstruction result of the target human body instance output by the three-dimensional human body reconstruction model.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a three-dimensional human reconstruction method comprising: acquiring an image of a target human body example as a target image; inputting the target image into a three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of a target human body example output by the three-dimensional human body reconstruction model; the three-dimensional human body reconstruction model is obtained by training a three-dimensional model based on a sample human body example and a sample image, wherein the sample image comprises the three-dimensional model of the sample human body example; the three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on a target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on a target image, the skin parameters are determined based on corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the three-dimensional human body reconstruction method provided by the above methods, the method comprising: acquiring an image of a target human body example as a target image; inputting the target image into a three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of a target human body example output by the three-dimensional human body reconstruction model; the three-dimensional human body reconstruction model is obtained by training a three-dimensional model based on a sample human body example and a sample image, wherein the sample image comprises the three-dimensional model of the sample human body example; the three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on a target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on a target image, the skin parameters are determined based on corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for three-dimensional human body reconstruction provided by the above methods, the method comprising: acquiring an image of a target human body example as a target image; inputting the target image into a three-dimensional human body reconstruction model, and acquiring a three-dimensional human body reconstruction result of a target human body example output by the three-dimensional human body reconstruction model; the three-dimensional human body reconstruction model is obtained by training a three-dimensional model based on a sample human body example and a sample image, wherein the sample image comprises the three-dimensional model of the sample human body example; the three-dimensional human body reconstruction model is used for performing three-dimensional human body reconstruction on a target human body example based on image characteristics and skin parameters corresponding to each preset sampling point, the image characteristics are determined based on a target image, the skin parameters are determined based on corresponding image characteristics, and all the preset sampling points are uniformly distributed in a three-dimensional space.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of three-dimensional human reconstruction, comprising:

acquiring an image of a target human body example as a target image;

2. The three-dimensional human body reconstruction method according to claim 1, wherein the three-dimensional model of the sample human body instance comprises a first sample model, a second sample model and a third sample model of the sample human body instance, the postures of the first sample model and the third sample model are the same, the posture of the second sample model is a preset standard posture, and the first sample model and the third sample model are obtained in different manners; the sample image comprises the first sample model;

3. The three-dimensional human body reconstruction method according to claim 2, wherein the image feature extraction layer comprises: the device comprises a data processing unit, a position updating unit and a feature extracting unit;

4. The three-dimensional human body reconstruction method according to claim 3, wherein after the position information, the corresponding image feature and the skin parameter of each preset sampling point are input into the attention mechanism layer and the identification information of each preset sampling point output by the attention mechanism layer is obtained, the method further comprises:

5. The three-dimensional body reconstruction method of claim 2, wherein the loss function of the three-dimensional body reconstruction model comprises: a skin parameter loss function;

6. The three-dimensional body reconstruction method of claim 5, wherein the loss function of the three-dimensional body reconstruction model comprises: a classification loss function;

7. A three-dimensional body reconstruction device, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the three-dimensional body reconstruction method according to any one of claims 1 to 6.

9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the three-dimensional body reconstruction method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the three-dimensional body reconstruction method according to any one of claims 1 to 6.