WO2022205760A1

WO2022205760A1 - Three-dimensional human body reconstruction method and apparatus, and device and storage medium

Info

Publication number: WO2022205760A1
Application number: PCT/CN2021/115122
Authority: WO
Inventors: 宋勃宇; 邓又铭; 刘文韬; 钱晨
Original assignee: 深圳市慧鲤科技有限公司
Priority date: 2021-03-31
Filing date: 2021-08-27
Publication date: 2022-10-06
Also published as: CN113012282A; CN113012282B

Abstract

Provided in the embodiments of the present disclosure are a three-dimensional human body reconstruction method and apparatus, and a device and a storage medium. The method may comprise: performing human body geometric reconstruction on the basis of a human body image of a target human body to obtain a three-dimensional mesh model of the target human body; on the basis of the human body image, performing local geometric reconstruction on a local part of the target human body to obtain a three-dimensional mesh model of the local part; fusing the three-dimensional mesh model of the local part and the three-dimensional mesh model of the target human body to obtain an initial three-dimensional model; and performing human body texture reconstruction according to the initial three-dimensional model and the human body image, so as to obtain a three-dimensional human body model of the target human body. According to the embodiments of the present disclosure, a local part in a three-dimensional mesh model of a target human body is clear and accurate, thereby improving the reconstruction effect of the local part.

Description

Three-dimensional human body reconstruction method, device, equipment and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the priority of the Chinese patent application filed on March 31, 2021, with the application number of 202110352199.4 and the invention titled "Three-dimensional Human Body Reconstruction Method, Apparatus, Equipment and Storage Medium", which is incorporated by reference. into the text.

technical field

The present disclosure relates to image processing technology, and in particular, to a three-dimensional human body reconstruction method, device, equipment and storage medium.

Background technique

3D human body reconstruction is an important problem in the field of computer vision and computer graphics. The reconstructed human digital model has important applications in many fields, such as body measurement, virtual fitting, virtual anchor, game character custom design, virtual reality social networking and other fields. Among them, how to project the human body in the real world into the virtual world to obtain a three-dimensional human body digital model is an important issue. However, the digital reconstruction of the 3D human body is very complicated, requiring the scanner to perform continuous scanning at multiple angles without dead ends around the scanning target; and the reconstruction results have the problem that the local reconstruction effect is not fine enough.

SUMMARY OF THE INVENTION

In view of this, the embodiments of the present disclosure provide at least a three-dimensional human body reconstruction method, apparatus, device, and storage medium.

In a first aspect, a three-dimensional human body reconstruction method is provided, the method comprising:

Performing human body geometry reconstruction based on the human body image of the target human body to obtain a three-dimensional mesh model of the target human body;

Based on the human body image of the target human body, perform local geometric reconstruction on the local part of the target human body to obtain a three-dimensional mesh model of the local part;

fusing the three-dimensional mesh model of the local part with the three-dimensional mesh model of the target body to obtain an initial three-dimensional model;

According to the initial three-dimensional model and the human body image, reconstruction of the human body texture of the target human body is performed to obtain a three-dimensional human body model of the target human body.

In one example, the performing geometric reconstruction of the human body based on the human body image of the target human body to obtain the three-dimensional mesh model of the target human body includes: performing three-dimensional reconstruction of the human body image of the target human body through a first deep neural network branch, obtaining a first human body model; performing three-dimensional reconstruction on a partial image in the human body image through a second deep neural network branch to obtain a second human body model; wherein, the partial image includes a partial area of the target human body; The first human body model and the second human body model are fused to obtain a fused human body model; the fused human body model is meshed to obtain a three-dimensional mesh model of the target human body.

In one example, the first deep neural network branch includes: a global feature sub-network and a first fitting sub-network; the second deep neural network branch includes: a local feature sub-network and a second fitting sub-network; the The three-dimensional reconstruction of the human body image of the target human body through the first deep neural network branch to obtain a first human body model includes: performing feature extraction on the human body image through the global feature sub-network to obtain first image features; The first human body model is obtained based on the first image features through the first fitting sub-network; the second human body is obtained by performing three-dimensional reconstruction of the partial image in the human body image through the second deep neural network branch The model includes: extracting features from the local image through the local feature sub-network to obtain second image features; using the second fitting sub-network based on the second image features and the first fitter The intermediate features output by the network are obtained to obtain the second human body model.

In one example, performing local geometric reconstruction on a local part of the target human body based on the human body image of the target human body to obtain a three-dimensional mesh model of the local part, including: performing a human body image on the target human body Perform feature extraction to obtain a third image feature; and determine a three-dimensional mesh model of the partial portion according to the third image feature and the three-dimensional topology template of the partial portion.

In one example, the obtaining the initial 3D model by fusing the 3D mesh model of the local part with the 3D mesh model of the target body includes: obtaining the local body image according to the body image of the target body multiple key points of the part; determine the information of the first model key point corresponding to the multiple key points on the 3D mesh model of the target body, and determine the multiple key points in the local part The information of the corresponding second model key points on the three-dimensional mesh model; based on the information of the first model key points and the information of the second model key points, the three-dimensional mesh model of the local part is fused to the The three-dimensional mesh model of the target human body is obtained to obtain the initial three-dimensional model.

In an example, the 3D mesh model of the local part is fused to the 3D mesh model of the target human body based on the information of the key points of the first model and the information of the key points of the second model, Obtaining the initial three-dimensional model includes: determining the three-dimensional mesh model of the target body and the three-dimensional mesh model of the local part based on the information of the key points of the first model and the information of the key points of the second model According to the coordinate transformation relationship, the three-dimensional grid model of the local part is transformed into the coordinate system of the three-dimensional grid model of the target body; in the transformed coordinate system, the local The three-dimensional mesh model of the part is fused to the three-dimensional mesh model of the target body to obtain the initial three-dimensional model.

In one example, the human body image includes: a frontal texture and a background image of the target human body; the reconstruction of the human body texture of the target human body is performed according to the initial three-dimensional model and the human body image to obtain the The three-dimensional human body model of the target human body includes: performing human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask and a frontal texture of the target human body; wherein, the first segmentation mask corresponds to the The mask area of the front texture, the second segmentation mask corresponds to the mask area of the back texture of the target body; the front texture, the first segmentation mask and the second segmentation mask , input the texture generation network to obtain the back texture of the target body; based on the back texture and the front texture, obtain a textured three-dimensional body model corresponding to the target body.

In one example, the training of the texture generation network includes the following processing: performing human body segmentation on images of human body samples in the training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask and the human body sample , wherein the first sample segmentation mask corresponds to the mask area of the front texture of the human sample, and the second sample segmentation mask corresponds to the mask area of the back texture of the human sample; according to The frontal texture of the human body, the third sample segmentation mask and the fourth sample segmentation mask in the auxiliary human body image, and the auxiliary texture generation network is trained, wherein the auxiliary human body image is obtained by reducing the resolution of the image of the human body sample. The third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human image; After the auxiliary texture generation network training is completed, the texture generation network is trained based on the frontal texture of the human sample, the first sample segmentation mask and the second sample segmentation mask, wherein the texture generation network The network parameters include: at least part of the network parameters of the auxiliary texture generation network that has been trained.

In one example, the partial part of the target human body is the face of the target human body; and/or the human body image is an RGB image.

In one example, the method further includes: when performing the human body geometry reconstruction based on the human body image of the target human body, also obtaining the human skeleton structure of the target human body; after the obtaining the three-dimensional human body model of the target human body, based on the The three-dimensional human body model and the human skeleton structure determine skin weights for driving the three-dimensional human body model.

In a second aspect, a three-dimensional human body reconstruction device is provided, the device comprising:

an overall reconstruction module, used for performing geometric reconstruction of the human body based on the human body image of the target human body to obtain a three-dimensional mesh model of the target human body;

a local reconstruction module, configured to perform local geometric reconstruction on the local part of the target human body based on the human body image of the target human body to obtain a three-dimensional mesh model of the local part;

a fusion processing module, configured to fuse the 3D mesh model of the local part with the 3D mesh model of the target human body to obtain an initial 3D model;

The texture reconstruction module is used for reconstructing the human body texture of the target human body according to the initial three-dimensional model and the human body image, so as to obtain the three-dimensional human body model of the target human body.

In an example, when the overall reconstruction module is used to obtain the three-dimensional mesh model of the target human body, the method includes: performing three-dimensional reconstruction on the human body image of the target human body through a first deep neural network branch to obtain the first deep neural network branch. a human body model; three-dimensional reconstruction is performed on a partial image in the human body image through a second deep neural network branch to obtain a second human body model; wherein, the partial image includes a partial area of the target human body; the first human body The model and the second human body model are fused to obtain a fused human body model; the fused human body model is meshed to obtain a three-dimensional mesh model of the target human body.

In one example, the local reconstruction module is specifically configured to: perform feature extraction on the human body image of the target human body to obtain third image features; according to the third image features and the three-dimensional topology template of the local part , and determine the three-dimensional mesh model of the local part.

In an example, the fusion processing module is specifically configured to: obtain multiple key points of the local part according to the human body image of the target human body; determine that the multiple key points are in a three-dimensional network of the target human body information on the key points of the first model corresponding to the grid model, and determining the information on the key points of the second model corresponding to the plurality of key points on the three-dimensional mesh model of the local part; based on the key points of the first model point information and the information of the key points of the second model, and fuse the three-dimensional mesh model of the local part into the three-dimensional mesh model of the target body to obtain the initial three-dimensional model.

In one example, the fusion processing module is configured to fuse the three-dimensional mesh model of the local part to the target based on the information of the key points of the first model and the information of the key points of the second model The three-dimensional mesh model of the human body, when the initial three-dimensional model is obtained, including: based on the information of the key points of the first model and the information of the key points of the second model, determining the three-dimensional mesh model of the target human body and the The coordinate transformation relationship between the three-dimensional mesh models of the local parts; according to the coordinate transformation relationship, the three-dimensional mesh model of the local parts is transformed into the coordinate system of the three-dimensional mesh model of the target body; after the transformation The 3D mesh model of the local part is fused to the 3D mesh model of the target human body under the coordinate system of 1 to obtain the initial 3D model.

In one example, the texture reconstruction module is specifically configured to: perform human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask and a frontal texture of the target human body; wherein the first segmentation mask A segmentation mask corresponds to the mask area of the front texture, and the second segmentation mask corresponds to the mask area of the back texture of the target human body; the front texture, the first segmentation mask and all the The second segmentation mask is input into a texture generation network to obtain the back texture of the target body; based on the back texture and the front texture, a textured 3D body model corresponding to the target body is obtained.

In one example, the apparatus further includes: a model training module for training the texture generation network, including: performing human body segmentation on images of human body samples in the training sample image set to obtain a first sample segmentation mask, The second sample segmentation mask and the frontal texture of the human sample, wherein the first sample segmentation mask corresponds to the mask area of the frontal texture of the human sample, and the second sample segmentation mask corresponds to the The mask area of the back texture of the human sample; according to the front texture of the human body in the auxiliary human image, the third sample segmentation mask and the fourth sample segmentation mask, the auxiliary texture generation network is trained, wherein, by reducing the image of the human sample The resolution of the auxiliary human body image is obtained, the third sample segmentation mask corresponds to the mask area of the frontal texture of the human body in the auxiliary human body image, and the fourth sample segmentation mask corresponds to the The mask area of the back texture of the human body; after the auxiliary texture generation network training is completed, based on the front texture of the human sample, the first sample segmentation mask and the second sample segmentation mask, training The texture generation network, wherein the network parameters of the texture generation network include: at least part of the network parameters of the auxiliary texture generation network that has been trained.

In a third aspect, an electronic device is provided, the device comprising: a memory and a processor, where the memory is used for storing computer-readable instructions, and the processor is used for invoking the computer instructions to implement any of the embodiments of the present disclosure. Methods.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any of the embodiments of the present disclosure.

In a fifth aspect, a computer program product is provided, including a computer program, which implements the method described in any embodiment of the present disclosure when the computer program is executed by a processor.

The three-dimensional human body reconstruction method, device, device, and storage medium provided by the embodiments of the present disclosure perform local geometric reconstruction on a local part of the target human body, and combine the three-dimensional mesh model of the local part obtained by the local geometric reconstruction with the three-dimensional mesh model of the target human body. The mesh model is fused, so that the local parts in the 3D mesh model of the target body are more clear, fine and accurate, and the reconstruction effect of the local parts is improved; It simplifies the user's cooperation process and makes the three-dimensional human body reconstruction easier.

Description of drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the accompanying drawings required in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings in the following description The drawings are only some of the embodiments described in one or more embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

FIG. 1 shows a flowchart of a three-dimensional human body reconstruction method provided by at least one embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a manner for obtaining a 3D mesh model based on a single human body image provided by at least one embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an acquisition process of an initial three-dimensional model provided by at least one embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a texture reconstruction process provided by at least one embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of a skin weight determination process provided by at least one embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of a manner for obtaining a three-dimensional mesh model based on a single human body image provided by at least one embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of the principle of texture generation provided by at least one embodiment of the present disclosure;

FIG. 8 shows a schematic diagram of a training process of a texture generation network provided by at least one embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of a human body image provided by at least one embodiment of the present disclosure;

FIG. 10 shows a structural diagram of a three-dimensional human body reconstruction apparatus provided by at least one embodiment of the present disclosure;

FIG. 11 shows a structural diagram of a three-dimensional human body reconstruction apparatus provided by at least one embodiment of the present disclosure.

Detailed ways

In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the following will describe the technical solutions in one or more embodiments of the present disclosure with reference to the accompanying drawings in one or more embodiments of the present disclosure. The technical solutions are clearly and completely described, and obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. Based on one or more embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

3D human body reconstruction has important applications in many fields, including but not limited to the following application scenarios:

For example, the realism of some virtual reality application scenarios can be enhanced through 3D human reconstruction. For example, virtual fitting, virtual cloud meeting, virtual classroom, etc.

For another example, the 3D human body model obtained by 3D human body reconstruction can be imported into the game data to complete the generation of the personalized character.

For another example, the production of science fiction movies currently requires the use of various technologies such as green screens and motion capture. The hardware equipment is expensive and the overall process is time-consuming and complicated. Obtaining a virtual three-dimensional human body model through three-dimensional human body reconstruction can simplify the process and save resources.

Regardless of the application scenario, 3D human body reconstruction has the following requirements: On the one hand, the user's cooperation process should be simplified as much as possible. Bad experience. On the other hand, try to obtain a 3D human body model with higher accuracy. For example, in scenarios such as virtual cloud conferences or AR virtual interaction scenarios, the 3D human body model obtained from 3D human body reconstruction has a higher sense of realism and immersion. need.

In order to solve the above problem, an embodiment of the present disclosure provides a three-dimensional human body reconstruction method, which aims to perform three-dimensional human body reconstruction of the user based on a photo of the user, simplify the user's cooperation process, and achieve high-precision reconstruction Effect.

Referring to FIG. 1 , FIG. 1 shows a flowchart of a three-dimensional human body reconstruction method provided by at least one embodiment of the present disclosure. The method may include steps 100 to 106 .

In step 100, the geometric reconstruction of the human body is performed based on the single human body image of the target human body to obtain a three-dimensional mesh model of the target human body.

Among them, the target human body is the basic user of the 3D human body reconstruction. For example, when 3D human body reconstruction is performed on user Xiao Zhang, Xiao Zhang can be called the target human body, and the reconstructed 3D human body model is also obtained based on Xiao Zhang's body, which is similar to Xiao Zhang's posture, appearance, clothing and hairstyle. have high similarity.

The single human body image is a human body image of the target human body. The embodiment of the present disclosure has no special requirements on the collection method and format of the human body image. In an exemplary manner, the single human body image may be a frontal photograph of the target human body. For another example, the single human body image may be an RGB color image. The acquisition cost of this RGB format image is low. For example, it is not necessary to use high-cost equipment such as a depth-of-field camera during image acquisition, and it can be acquired by ordinary shooting equipment.

In this step, the human body geometry can be reconstructed based on the single human body image of the target human body to obtain a three-dimensional mesh model. The three-dimensional mesh model is a three-dimensional mesh Mesh representing the human body geometry, and the mesh includes several vertices and faces.

In an example, in this embodiment, the three-dimensional mesh Mesh obtained by the above reconstruction and a pre-stored parameterized human body model can also be aligned and fitted with respect to the posture and body shape. Specifically, the parameterized human body model includes a mesh of the human body surface and a set of skeletal structures, which are controlled by a set of pose and body shape parameters, and the skeleton position and surface shape of the human body will change as the parameter values change. By geometrically aligning the three-dimensional mesh Mesh reconstructed in this step 100 with the parameterized human body model, the bone structure corresponding to the three-dimensional mesh Mesh reconstructed in this step 100 is obtained. This bone structure will be used in the calculation of skinning weights in subsequent steps.

Please refer to Fig. 2 in combination, which illustrates a method of obtaining a 3D mesh model based on a single human image reconstruction. As shown in FIG. 2 , a single human body image 21 of the target human body can be input into the first deep neural network branch 22 for three-dimensional reconstruction. In an exemplary embodiment, the first deep neural network branch 22 may include a global feature sub-network 221 and a first fitting sub-network 222 .

The features of the single human body image 21 can be extracted through the global feature sub-network 221 to obtain high-level image features of the single human body image 21, and the high-level image features can be referred to as first image features. For example, the global feature sub-network 221 may be a HourGlass convolutional network. The first image feature is input to the first fitting sub-network 222, and the first fitting sub-network 222 can predict whether each voxel block in the three-dimensional space belongs to the interior of the target human body according to the first image feature. For example, the first fitting sub-network 222 may be a multilayer perceptron structure. The first fitting sub-network 222 outputs a first human body model including each three-dimensional voxel block located inside the target human body.

Next, the meshing process may continue to be performed on the first human body model. For example, the meshing process may be to apply the MarchingCubes algorithm in the voxel space to the first human body model to obtain a three-dimensional mesh model of the target human body.

In step 102, based on the single human body image of the target human body, a local high-definition geometric reconstruction is performed on a local part of the target human body to obtain a three-dimensional mesh model of the local part.

The three-dimensional mesh model of the target human body reconstructed in step 100 may be blurred in local parts of the target human body. For example, the local part may be a human face, or may be other local parts, such as a hand and other parts that need to reflect detailed features. The above-mentioned 3D mesh model is relatively vague in the face details of the target human body, but the face is usually the area that the user pays more attention to. Therefore, in this step, the partial parts of the target human body can be individually geometrically reconstructed.

Taking the local part being a human face as an example: the reconstruction of the human face can use a fine reconstruction of a fixed topology, that is, the three-dimensional topology of the human face can be reconstructed based on the image features obtained by feature extraction from a single human image of the target human body. The position of each vertex in the template is fitted to obtain a three-dimensional mesh model of the face. Specifically, the semantic structure of the human face is consistent, so a 3D human face with a fixed topology structure can be used as a template, and the template can be called a 3D topology template of the human face. The template includes a plurality of vertices, each vertex is fixedly corresponding to a face semantics, for example, one vertex represents the tip of the nose, and the other vertex represents the corner of the eye. During face reconstruction, each vertex position of the above-mentioned three-dimensional topology template of the face can be obtained by regression through a deep neural network.

For example, the deep neural network may include a deep convolutional network and a graph convolutional network, and a single human body image of the target human body may be input into the deep convolutional network to extract image features, and the extracted features may be referred to as the third image feature. Then the third image feature and the 3D topology template of the face are used as the input of the graph convolution network, and finally a 3D mesh model of the face output by the graph convolution network is obtained, and the 3D network model is closer to the target human face. . Optionally, the input of the deep convolutional network may also be a partial image area containing a face captured from a single human body image of the target human body.

In step 104, the 3D mesh model of the local part is fused with the 3D mesh model of the target human body to obtain an initial 3D model.

The 3D mesh model of the target human body reconstructed in step 100 may be somewhat blurred in the local part of the human body, and the local part is taken as an example of a human face, and in step 102, the 3D mesh model of the human face is obtained through the separate geometric reconstruction of the face Grid model, in this step, the corresponding part in the 3D grid model of the target body in step 100 can be replaced by the 3D grid model of the face, so that the 3D grid model of the target body can be retained. Information such as head shape, body shape, and posture can also make the facial features more refined and accurate, and achieve better reconstruction effects. Of course, it can be understood that the partial part is a human face as an example here, and other partial parts can also be independently reconstructed to make it clearer in actual implementation.

Specifically, a single human body image of the target human body may be input into a pre-trained key point detection model, and a plurality of key points of local parts of the target human body in the image may be determined by the key point detection model. Please refer to FIG. 3 , still taking the local part of the human face as an example, after acquiring multiple key points 31 of the human face, it can be determined according to the coordinates of these key points 31 on the human face that the key points are located in the target human body. The 3D mesh model of the face, and the corresponding model key points on the 3D mesh model of the face. Specifically, the information of multiple first model key points corresponding to multiple key points of the face on the 3D mesh model of the target body can be determined. For example, the information may include the key point identifiers of each first model key point and the corresponding key points. key point location. It is also possible to determine the information of the key points of the second model corresponding to the multiple key points of the face on the three-dimensional mesh model of the face, for example, the information may include the key point identification of each key point of the second model and the corresponding key point. point location.

After obtaining the above-mentioned information of the key points of the first model and the information of the key points of the second model, the three-dimensional mesh model of the face can be fused to the information of the key points of the first model and the information of the key points of the second model. The 3D mesh model of the target human body is obtained, and the initial 3D model is obtained.

In the embodiment of the present disclosure, fusing the 3D mesh model of the face into the 3D mesh model of the target body includes: based on the information of the key points of the first model and the information of the key points of the second model, and combining the cameras of the two models The external parameter determines the coordinate transformation relationship between the 3D mesh model of the target body and the 3D mesh model of the face; based on the coordinate transformation relationship, the 3D mesh model of the face can be transformed into the 3D mesh model of the target body In the transformed coordinate system, the 3D mesh model of the face is fused to the 3D mesh model of the target body. For example, the facial geometry on the 3D mesh model of the target body can be removed, In addition to using the 3D mesh model of the face, the 3D mesh model of the face and the 3D mesh model of the target body are integrated into a whole by means of Poisson reconstruction, and the obtained model can be called the initial 3D model. The initial 3D model already has relatively clear facial features, similar head shape, body shape and other information, and the accuracy is high.

In step 106, reconstruction of the human body texture of the target human body is performed according to the initial three-dimensional model and the single human body image to obtain a three-dimensional human body model with colored textures of the target human body.

Since this embodiment performs three-dimensional human body reconstruction based on a single human body image of the target human body, part of the human body area is invisible. For example, if the frontal human body image of the target human body is used for reconstruction, the back of the target human body is invisible. , which will cause missing textures. Therefore, in this step, the prediction and completion of the human body texture in the invisible area of the target human body can be performed according to the initial three-dimensional model and the single human body image of the target human body, and the human body texture in the single human body image can be fused. Then a textured 3D human body model is generated.

As shown in Figure 4, taking the single human body image of the target human body as an example of a frontal image, the deep learning network can be used to predict the human body back texture 41, and combine the human body back texture 41 with the front of the human body in the single human body image. Texture 42, performing texture mapping on the initial 3D model, that is, performing texture reconstruction on the initial 3D model. The three-dimensional model 43 in FIG. 4 has mapped the above-mentioned back and front textures of the human body on the initial three-dimensional model. The initial three-dimensional model obtained in step 104 is a mesh Mesh of human body geometry, and this step is to add human body texture to the model on the basis of the mesh model. In addition, there are some invisible human body regions, and the interpolation technology can be used to fill some gaps in the model with textures, so as to complete the texture of the initial 3D model, and obtain the 3D human body model 44 of the target human body.

The three-dimensional human body reconstruction method of this embodiment performs local geometric reconstruction on a local part of the target human body, and fuses the three-dimensional mesh model of the local part obtained by the local geometric reconstruction with the three-dimensional mesh model of the target human body, so that the target human body can be reconstructed. The local parts in the initial 3D model are more clear, fine and accurate, which improves the reconstruction effect of the local parts; moreover, this method is based on the single human body image of the target human body for reconstruction, which also simplifies the user's cooperation process, so that the three-dimensional human body can be reconstructed. Rebuilding is easier.

In addition, after the three-dimensional human body model of the human body is obtained, the skin weight for driving the three-dimensional human body model can be determined based on the three-dimensional human body model and the human skeleton structure of the target human body. The skin weight is used to drive the built 3D human model. For example, if you want to drive the 3D human model to do various actions, you need to bind the model to the human skeleton structure. Binding the model to the bone is called a mask. Skin. Then, the model can be driven by the movement of the bones, and the skin weight is used to represent the influence of the nodes of the bones on the model vertices. According to the skin weight, the size of each vertex in the 3D human model can be controlled by the influence of each bone joint point. , so as to better control the movement of the model.

Specifically, calculating the skin weight of the three-dimensional human body model may include the following processing: in step 100, the human skeleton structure has been obtained according to the single human body image of the target human body, and in this step, the human skeleton structure and the obtained three-dimensional human body can be obtained. The model is input to the deep learning network, and the skin weight of the model is automatically obtained through the deep learning network.

Referring to the example in FIG. 5 , the attribute features corresponding to the vertices in the three-dimensional human body model 51 may be generated first according to the three-dimensional human body model 51 and the human skeleton structure 52 . The attribute feature can be constructed by using the spatial positional relationship between each vertex and the human skeleton structure. For example, for one of the vertices, the attribute features of the vertex can include the following four features:

1) The position coordinates of the vertex;

2) The position coordinates of the K bone joint points closest to the vertex;

3) from the position of the vertex to the geodesic distance between each skeleton joint point in the above-mentioned K skeleton joint points;

4) Taking each bone joint point in the above-mentioned K bone joint points as a starting point, the angle between the vector of the vertex and the bone where the bone joint point is located by the starting point;

Among them, K is a positive integer.

Please continue to refer to Figure 5. After the attribute features of each vertex are obtained, the attribute features of each vertex and the adjacency relationship feature between the vertices can be used as the input of the spatial graph convolutional attention network in the deep learning network. Before feeding these features into the spatial graph convolutional attention network, the above features can be transformed into hidden layer features through a multilayer perceptron. The spatial graph convolutional attention network can predict the weight of each vertex affected by each of the above K skeletal joint points according to the above hidden layer features, and the latter multi-layer perceptron in the deep learning network can be used for this. The weights are normalized so that for a certain vertex, the sum of the influence weights of each bone joint point on the vertex is 1. The weight corresponding to each vertex in the finally obtained 3D human model and affected by each skeleton joint point is the skin weight of the vertex.

The three-dimensional human body reconstruction method of this embodiment can obtain the human skeleton structure according to a single human body image of the target human body, and automatically calculate the skin weight according to the human skeleton structure and the reconstructed three-dimensional human body model, which not only ensures that different input images are The semantic structure of the lower bones is consistent, and appropriate skin weights can be quickly generated in combination with different clothing and apparel shapes. Among them, the semantic consistency of the skeleton can facilitate the registration of the model and the ready-made action library. The advantage of the semantic consistency is that it is conducive to the application (registration) of the generated model and the skeleton and the action library. The action library can store some human action sequences in advance, such as dancing, boxing, etc. The action library stores a series of motion bones. The semantics and structure of these bones in the action library are consistent. If the generated bones are random (the joint semantics are uncertain), it is not conducive to the generated model to apply the actions in the action library. Therefore, this embodiment makes the registration of the action library more convenient by ensuring the consistency of the semantic structure of the generated bones. The skin weights calculated according to the specific shape can make the visual effect of the movement of different human models more natural.

The present disclosure provides a method for three-dimensional human body reconstruction in another embodiment. Compared with the embodiment in FIG. 1 , the reconstruction process of this embodiment is different in that the human body image is performed on the single human body image based on the target human body in step 100 . The process of geometric reconstruction has been improved to improve the geometric reconstruction accuracy of the reconstructed 3D mesh model of the target body. Wherein, in this embodiment, the same processing steps as the embodiment in FIG. 1 will not be described in detail, and only the differences will be mainly described.

As shown in FIG. 6 , on the basis of the network structure shown in FIG. 2 , a second deep neural network branch 61 is added. The second deep neural network branch 61 may include: a local feature sub-network 611 and a second fitting sub-network 612 . An image of a local area can be extracted from the single human body image 21 of the target human body to obtain a local image 62 , and the second deep neural network is used for three-dimensional reconstruction of the local image 62 .

It should be noted that the body region of the target human body included in the partial image here may not be exactly the same as the partial part corresponding to the local geometric reconstruction in step 102. For example, the partial image here may include the area above the shoulder of the target human body , and the local part reconstructed in step 102 may be the face of the target human body. Of course, the reconstruction above the shoulder of the target human body in FIG. 6 is just an example, and refined geometric reconstruction can also be performed on other human body regions of the target human body.

Specifically, please continue to refer to FIG. 6, the first human body model is reconstructed through the first deep neural network branch 22, and the partial image 62 is input into the second deep neural network branch 61, and the partial image is processed by the partial feature sub-network 611. Feature extraction to obtain second image features. Then, a second human body model is obtained through the second fitting sub-network 612 based on the second image feature and the intermediate feature output by the first fitting sub-network 222 . Wherein, the intermediate features may be the features output by part of the network structure in the first fitting sub-network 222. Exemplarily, if the first fitting sub-network 222 includes a certain number of fully connected layers, then the The outputs of the partial number of fully connected layers are input to the second fitting sub-network 612 as the intermediate features.

Exemplarily, the structure of the second deep neural network branch 61 may be basically the same as that of the first deep neural network branch 22, for example, the global feature sub-network 221 in the first deep neural network branch 22 may include four Blocks, Each block may include a certain number of feature extraction layers such as convolution layers and pooling layers, and the local feature sub-network 611 in the second deep neural network branch 61 may include one of the above-mentioned blocks. After the first human body model and the second human body model are obtained, then the first human body model and the second human body model may be fused to obtain a fused human body model. And continue to mesh the fused human body model to obtain a three-dimensional mesh model of the target human body.

The three-dimensional human body reconstruction method of this embodiment not only improves the reconstruction effect of local parts by performing local geometric reconstruction on the local parts of the target human body, but also performs reconstruction based on a single human body image of the target human body, which simplifies the cooperation process of users; , and also reconstruct the local image through the second deep neural network, which improves the reconstruction effect of the local human body area of the target human body.

The present disclosure provides a three-dimensional human body reconstruction method in yet another embodiment. Compared with the embodiment in FIG. 1 , the reconstruction process of the further embodiment provides a specific method for predicting the back texture of the human body through a deep learning network. The way. Wherein, in this embodiment, the same processing steps as the embodiment in FIG. 1 will not be described in detail, and only the differences will be mainly described.

As shown in Figure 7, a single human body image of the target human body sometimes includes a background image and a frontal texture of the human body. In this case, image segmentation can be performed first to segment the frontal texture of the human body, and then predict the human body based on the frontal texture. back texture. For example, the frontal image 71 of the target human body may be segmented to obtain a first segmentation mask 72 and the segmented frontal texture 73 of the target human body. In addition, the first segmentation mask 72 is horizontally flipped to obtain a second segmentation mask 74, and then the front texture 73, the first segmentation mask 72 and the second segmentation mask 74 are input into the texture generation network 75, and finally the The texture generation network 75 outputs the back texture of the target body.

In addition, FIG. 7 is an example of obtaining the second segmentation mask 74 by horizontally flipping the first segmentation mask 72. The actual implementation is not limited to this. For example, the frontal image of the target human body can be input into a pre-training After the neural network is created, the neural network directly outputs the first segmentation mask and the second segmentation mask. After the front and back textures of the target human body are obtained, the front and back textures can be mapped to the initial three-dimensional model of the human body, and the three-dimensional human body model of the target human body can be obtained.

Wherein, the above-mentioned training process of the texture generation network 75 may include the following processing: please refer to FIG. 8 in combination, an auxiliary texture generation network 76 may be used. The auxiliary texture generation network 76 may include a part of the network structure of the texture generation network 75 . For example, the texture generation network 75 may add a certain number of convolution layers to the auxiliary texture generation network 76 .

During training, the auxiliary texture generation network can be trained according to the auxiliary human body image, the third sample segmentation mask and the fourth sample segmentation mask in the training sample image set, and after the auxiliary texture generation network is trained, the auxiliary texture generation network can be generated. At least part of the network parameters of the network are initialized as part of the texture generation network parameters, and the texture generation network is continued to be trained based on the frontal texture of the human sample, the first sample segmentation mask and the second sample segmentation mask. The auxiliary human image is obtained by reducing the resolution of a single image of the human sample, the first sample segmentation mask corresponds to the mask area of the front texture of the human sample, and the second sample segmentation mask corresponds to the mask of the back texture of the human sample The third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human image.

Please continue to refer to FIG. 8 in combination: the frontal texture 82 of the human body in the auxiliary human body image 81, the third sample segmentation mask 83 and the fourth sample segmentation mask 84 can be obtained by performing image segmentation on the auxiliary human body image 81, and input auxiliary human body image 81. The texture generation network 76 obtains the first predicted value of the back texture of the human body in the auxiliary human body image 81; and then adjusts the auxiliary texture based on the first predicted value and the first real value of the back texture of the human body in the auxiliary human body image 81 Network parameters for the network 76 are generated. After several iterations, the trained auxiliary texture generation network 76 can be obtained. Among them, the training supervision of the auxiliary texture generation network, in addition to the loss calculated based on the first predicted value and the first real value, may also include other losses based on the first predicted value, for example, based on the auxiliary body image and the first predicted value. Feature loss for texture feature computation, etc. The auxiliary human body image can be obtained by reducing the resolution of the frontal human body image 71 in FIG. 7 . Correspondingly, the resolution of the frontal texture 82 of the human body in the auxiliary human body image 81 is also higher than the resolution of the frontal texture 73 in FIG. 7 . Low. The third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human body image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human body image.

After the training of the auxiliary texture generation network is completed, the network parameters of the auxiliary texture generation network can be used as the initialization of part of the network parameters of the texture generation network, that is, the network parameters of the texture generation network include: at least some of the network parameters. That is, the auxiliary texture generation network and the texture generation network share some network weights. Then, the frontal texture of the human body, the first sample segmentation mask and the second sample segmentation mask in the training sample image set used to train the texture generation network are input into the texture generation network to obtain the second prediction of the back texture of the human body sample value. Based on the second predicted value and the second real value of the back texture, network parameters of the texture generation network are adjusted. The resolution of the second real value is higher than the resolution of the first real value, that is, the resolution of the back texture output by the texture generation network is higher than the resolution of the back texture output by the auxiliary texture generation network.

The three-dimensional human body reconstruction method of this embodiment not only improves the reconstruction effect of local parts by performing local geometric reconstruction on the local parts of the target human body, but also performs reconstruction based on a single human body image of the target human body, which simplifies the cooperation process of users; It also automatically predicts the texture through the neural network, so that the generated texture effect is better, for example, the texture around the human body is more uniform and the color is more realistic; and, by training the auxiliary texture generation network first and then training the texture generation network, the texture is The training process of the generative network is more stable and easier to converge.

In other embodiments, in order to improve the reconstruction effect, a plurality of images of the target body from different angles may also be acquired to comprehensively perform the three-dimensional reconstruction of the target body. For example, taking three images of the target human body as an example, the three images may be acquired from different angles. Referring to FIG. 2 , the three images can be used as the input of the global feature sub-network 221 respectively, and a first image feature output by the global feature sub-network 221 corresponding to the three images can be obtained. Then, the three first image features are fused, and the image features obtained after fusion are used as the input of the first fitting sub-network 222 to continue processing.

When the three-dimensional human body reconstruction adopts the network structure shown in FIG. 6 , in addition to using the above three images as the input of the global feature sub-network 221 respectively, the local images can also be obtained by extracting local regions from the three images, and the three Each of the local images is used as the input of the local feature sub-network 611, respectively, and the second image features output by the local feature sub-network 611 corresponding to the three local images are obtained, and then the three second image features are fused, and the result obtained after fusion is obtained. The image features continue to be processed as input to the second fitting sub-network 612 .

As above, by acquiring a plurality of images of the target human body from different angles to comprehensively perform the three-dimensional human body reconstruction of the target human body, a more refined three-dimensional human body model corresponding to the target human body can be obtained.

In addition, it should also be noted that, in each process step of the three-dimensional human body reconstruction method described in any embodiment of the present disclosure, the neural network models involved can be trained separately. For example, the first deep neural network branch and the texture generation network may each perform their own training.

An example of a three-dimensional human body reconstruction process is described as follows, wherein the process is the same as the process described in any of the foregoing method embodiments, which is briefly described here, and the detailed process may be combined with reference to the foregoing embodiments.

In this example, it is assumed that the three-dimensional human body model of user U1 is to be constructed based on a single human body image of user U1. The single human body image may be a frontal image of user U1, including the frontal texture and background image of user U1. Referring to the illustration in FIG. 9 , the single human body image 91 of the user U1 includes a front texture 92 and a background image 93 of the user.

First, two aspects of reconstruction can be performed based on the single human body image 91 of the user U1.

One aspect of reconstruction is to perform geometric reconstruction of the human body based on the single human body image 91 to obtain the three-dimensional mesh model of U1 and the human skeleton structure. Illustratively, the single human body image 91 can be processed through the network shown in FIG. 6 , and the single human body image 91 can be processed through the global feature sub-network and the first fitting sub-network in the first deep neural network branch to obtain: The first human body model; the local feature sub-network and the second fitting sub-network in the second deep neural network branch process the image of the area above the human body shoulder in the single human body image 91 to obtain the second human body model. After the first human body model and the second human body model are fused, a fused human body model is obtained. The fused human body model is then meshed to obtain a three-dimensional mesh model (mesh) of the user U1.

Another aspect of reconstruction is to perform local geometric reconstruction on the face of the user U1 based on the single human body image 91 to obtain a three-dimensional mesh model of the face. Specifically, feature extraction can be performed on a single human body image 91, and the extracted image features and the three-dimensional face topology template are input into a graph convolutional neural network to obtain the face mesh of the user U1.

Next, the face mesh (three-dimensional mesh model of the human face) obtained by the above reconstruction and the human body mesh of the user U1 (the three-dimensional mesh model of the human body of U1) can be combined to obtain the initial three-dimensional model of U1.

Specifically, according to the schematic process of FIG. 3, combined with the key points of the human face, the identification and position of the key points of each model corresponding to the key points on the face mesh and the human mesh respectively can be determined, and based on the identification of these model key points and position, camera external parameters of the model and other parameters to determine the coordinate transformation relationship between the models. Based on the coordinate transformation relationship, the face mesh is transformed into the coordinate system of the human mesh, the face in the human mesh is replaced with the face mesh, and the face mesh and the human mesh are fused together through Poisson reconstruction to obtain the user U1 the initial 3D model.

Then, based on the above-mentioned initial three-dimensional model and the single human body image 91 of the user U1, reconstruction of the human body texture of U1 is performed. Among them, since the single human body image 91 is the front texture of the user U1, the back texture of U1 can be predicted based on the front texture.

Specifically, the human body can be segmented on the single human body image 91 to obtain the human body frontal texture with the background image removed, and the first segmentation mask used to represent the frontal texture area of the human body. A second segmentation mask representing the textured regions of the back of the human body. Then input the human body front texture, the first segmentation mask and the second segmentation mask into the pre-trained texture generation network to obtain the back texture of the user U1. Finally, texture mapping is performed on the initial 3D model based on the front texture and back texture, and the texture is filled and completed in the gap area of the model, and finally the 3D human model with texture U1 is obtained.

In order to facilitate the model driving of the built 3D human model, the skin weight of the 3D human model can also be calculated by combining the reconstructed 3D human model of U1 and the human skeleton structure obtained when reconstructing the 3D mesh model of U1. You can then drive the model to perform actions through this skin weight.

FIG. 10 illustrates a schematic structural diagram of a three-dimensional human body reconstruction apparatus. As shown in FIG. 10 , the apparatus may include: an overall reconstruction module 1001 , a local reconstruction module 1002 , a fusion processing module 1003 and a texture reconstruction module 1004 .

The overall reconstruction module 1001 is configured to perform geometric reconstruction of the human body based on a single human body image of the target human body to obtain a three-dimensional mesh model of the target human body.

The local reconstruction module 1002 is configured to perform local geometric reconstruction on the local part of the target human body based on the single human body image of the target human body to obtain a three-dimensional mesh model of the local part.

The fusion processing module 1003 is configured to fuse the 3D mesh model of the local part with the 3D mesh model of the target human body to obtain an initial 3D model.

The texture reconstruction module 1004 is configured to reconstruct the human body texture of the target human body according to the initial three-dimensional model and the single human body image, so as to obtain a three-dimensional human body model of the target human body.

In one example, when the overall reconstruction module 1001 is used to obtain the 3D mesh model of the target human body, it includes: performing 3D reconstruction on the single human body image of the target human body through the first deep neural network branch to obtain the first deep neural network branch. a human body model; three-dimensional reconstruction is performed on the partial image in the single human body image through the second deep neural network branch to obtain a second human body model; wherein, the partial image includes a partial area of the target human body; The first human body model and the second human body model are fused to obtain a fused human body model; the fused human body model is meshed to obtain a three-dimensional mesh model of the target human body.

In one example, the local reconstruction module 1002 is specifically configured to: perform feature extraction on a single human body image of the target human body to obtain a third image feature; according to the third image feature and the three-dimensional topology of the local part A template is used to determine the three-dimensional mesh model of the local part.

In one example, the fusion processing module 1003 is specifically configured to: obtain multiple key points of the local part according to a single human body image of the target human body; information on the key points of the first model corresponding to the grid model, and determining the information on the key points of the second model corresponding to the plurality of key points on the three-dimensional grid model of the local part; based on the first model The information of the key points and the information of the key points of the second model are fused with the three-dimensional mesh model of the local part into the three-dimensional mesh model of the target body to obtain the initial three-dimensional model.

In one example, the fusion processing module 1003 is configured to fuse the three-dimensional mesh model of the local part to the target human body based on the information of the key points of the first model and the information of the key points of the second model When obtaining the initial three-dimensional model, it includes: based on the information of the key points of the first model and the information of the key points of the second model, determining the three-dimensional mesh model of the target human body and the The coordinate transformation relationship between the three-dimensional mesh models of the local parts; according to the coordinate transformation relationship, the three-dimensional mesh model of the local part is transformed into the coordinate system of the three-dimensional mesh model of the target body; The three-dimensional mesh model of the local part is fused to the three-dimensional mesh model of the target body under the coordinate system to obtain the initial three-dimensional model.

In one example, the texture reconstruction module 1004 is specifically configured to: perform human body segmentation on the single human body image to obtain a first segmentation mask, a second segmentation mask and a frontal texture of the target human body; wherein the first segmentation mask The segmentation mask corresponds to the mask area of the front texture, and the second segmentation mask corresponds to the mask area of the back texture of the target human body; the front texture, the first segmentation mask and the second segmentation mask are code, input the texture generation network to obtain the back texture of the target body; based on the back texture and the front texture, obtain a textured three-dimensional body model corresponding to the target body.

In an example, as shown in FIG. 11 , the apparatus may further include: a model training module 1005 .

The model training module 1005 is used to perform the training of the texture generation network, including: performing human body segmentation on a single image of a human body sample in the training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask and all the front texture of the human sample, wherein the first sample segmentation mask corresponds to the mask area of the front texture of the human sample, and the second sample segmentation mask corresponds to the mask of the back texture of the human sample area; according to the frontal texture of the human body, the third sample segmentation mask and the fourth sample segmentation mask in the auxiliary human body image, the auxiliary texture generation network is trained, wherein the said human body sample is obtained by reducing the resolution of a single image of the human body sample. The auxiliary human body image, the third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human body image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human body image. code area; after the auxiliary texture generation network is trained, the texture generation network is trained based on the frontal texture of the human sample, the first sample segmentation mask and the second sample segmentation mask, wherein , the network parameters of the texture generation network include: at least part of the network parameters of the auxiliary texture generation network that has been trained.

In some embodiments, the foregoing apparatus may be configured to execute any corresponding method described above, which is not repeated here for brevity.

An embodiment of the present disclosure further provides an electronic device, where the device includes a memory and a processor, where the memory is used to store computer-readable instructions, and the processor is used to invoke the computer instructions to implement any embodiment of this specification Methods.

An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, implements the method of any embodiment of the present specification.

Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, a system or a computer program product, the computer program product comprising a computer program that, when executed by a processor, is capable of implementing any of the embodiments of the present specification Methods. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.

Wherein, "and/or" in the embodiments of the present disclosure means at least one of the two. For example, "A and/or B" includes three schemes: A, B, and "A and B".

The various embodiments in the present disclosure are described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the appended claims. In some cases, the acts or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, in tangible embodied computer software or firmware, in computer hardware including the structures disclosed in this disclosure and their structural equivalents, or in a combination of one or more. Embodiments of the subject matter described in this disclosure may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data. The processing device executes. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both. However, the computer does not have to have such a device. Additionally, the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.

Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Although this disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosed or claimed, but rather as describing features of particular embodiments of particular disclosure. Certain features that are described in this disclosure in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function as described above in certain combinations and even be originally claimed as such, one or more features from a claimed combination may in some cases be removed from the combination and the claimed A protected combination may point to a subcombination or a variation of a subcombination.

Similarly, although operations are depicted in the figures in a particular order, this should not be construed as requiring the operations to be performed in the particular order shown or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the above-described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product , or packaged into multiple software products.

Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above descriptions are only preferred embodiments of one or more embodiments of the present disclosure, and are not intended to limit one or more embodiments of the present disclosure. All within the spirit and principles of one or more embodiments of the present disclosure, Any modifications, equivalent replacements, improvements, etc. made should be included within the protection scope of one or more embodiments of the present disclosure.

Claims

A three-dimensional human body reconstruction method, comprising:

Performing human body geometry reconstruction based on the human body image of the target human body to obtain a three-dimensional mesh model of the target human body;

Based on the human body image of the target human body, perform local geometric reconstruction on the local part of the target human body to obtain a three-dimensional mesh model of the local part;

fusing the three-dimensional mesh model of the local part with the three-dimensional mesh model of the target body to obtain an initial three-dimensional model;

According to the initial three-dimensional model and the human body image, reconstruction of the human body texture of the target human body is performed to obtain a three-dimensional human body model of the target human body.
The method according to claim 1, wherein the performing geometric reconstruction of the human body based on the human body image of the target human body to obtain the three-dimensional mesh model of the target human body, comprising:

Performing three-dimensional reconstruction on the human body image of the target human body through the first deep neural network branch to obtain a first human body model;

A second human body model is obtained by performing three-dimensional reconstruction on a partial image in the human body image through a second deep neural network branch; wherein, the partial image includes a partial region of the target human body;

Fusing the first human body model and the second human body model to obtain a fusion human body model;

Grid processing is performed on the fused human body model to obtain a three-dimensional grid model of the target human body.
The method according to claim 2, wherein the first deep neural network branch comprises: a global feature sub-network and a first fitting sub-network; the second deep neural network branch comprises: a local feature sub-network and the second fitting sub-network;

The three-dimensional reconstruction of the human body image of the target human body through the first deep neural network branch to obtain a first human body model includes: extracting features from the human body image through the global feature sub-network to obtain first image features ; Obtain the first human body model based on the first image feature through the first fitting sub-network;

The step of performing three-dimensional reconstruction on the partial image in the human body image through the second deep neural network branch to obtain the second human body model includes: extracting the feature of the partial image through the partial feature sub-network to obtain the second image feature; the second human body model is obtained by the second fitting sub-network based on the second image feature and the intermediate feature output by the first fitting sub-network.
The method according to any one of claims 1 to 3, wherein, based on the human body image of the target human body, local geometric reconstruction is performed on a local part of the target human body to obtain a three-dimensional mesh of the local part. models, including:

performing feature extraction on the human body image of the target human body to obtain a third image feature;

A three-dimensional mesh model of the partial part is determined according to the third image feature and the three-dimensional topology template of the partial part.
The method according to any one of claims 1 to 4, wherein the obtaining an initial 3D model by fusing the 3D mesh model of the local part with the 3D mesh model of the target body includes:

obtaining a plurality of key points of the local part according to the human body image of the target human body;

Determine the information of the first model key points corresponding to the multiple key points on the 3D mesh model of the target human body, and determine the corresponding key points of the multiple key points on the 3D mesh model of the local part Information about key points of the second model;

Based on the information of the key points of the first model and the information of the key points of the second model, the three-dimensional mesh model of the local part is fused to the three-dimensional mesh model of the target body to obtain the initial three-dimensional model.
The method according to claim 5, wherein the three-dimensional mesh model of the local part is fused to the The three-dimensional mesh model of the target human body to obtain the initial three-dimensional model, including:

Based on the information of the key points of the first model and the information of the key points of the second model, determine the coordinate transformation relationship between the three-dimensional mesh model of the target human body and the three-dimensional mesh model of the local part;

According to the coordinate transformation relationship, transform the three-dimensional mesh model of the local part into the coordinate system of the three-dimensional mesh model of the target human body;

The three-dimensional mesh model of the local part is fused to the three-dimensional mesh model of the target human body under the transformed coordinate system to obtain the initial three-dimensional model.
The method according to any one of claims 1 to 6, wherein the human body image comprises: a frontal texture and a background image of the target human body;

The reconstruction of the human body texture of the target human body is performed according to the initial three-dimensional model and the human body image to obtain the three-dimensional human body model of the target human body, including:

Perform human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask and a frontal texture of the target human body; wherein, the first segmentation mask corresponds to the mask area of the frontal texture, so the The second segmentation mask corresponds to the mask area of the back texture of the target human body;

Inputting the front texture, the first segmentation mask and the second segmentation mask into a texture generation network to obtain the back texture of the target human body;

Based on the back texture and the front texture, a textured three-dimensional human body model corresponding to the target human body is obtained.
The method according to claim 7, wherein the training of the texture generation network includes the following processing:

Perform human body segmentation on the images of human body samples in the training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask and the frontal texture of the human body sample, wherein the first sample segmentation mask corresponds to the the mask area of the front texture of the human sample, and the second sample segmentation mask corresponds to the mask area of the back texture of the human sample;

According to the frontal texture of the human body in the auxiliary human body image, the third sample segmentation mask and the fourth sample segmentation mask, an auxiliary texture generation network is trained, wherein the auxiliary human body image is obtained by reducing the resolution of the image of the human body sample, The third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human image;

After the auxiliary texture generation network is trained, the texture generation network is trained based on the frontal texture of the human sample, the first sample segmentation mask and the second sample segmentation mask, wherein the The network parameters of the texture generation network include: at least part of the network parameters of the auxiliary texture generation network that has been trained.
The method according to any one of claims 1 to 8, wherein,

The partial part of the target human body is the face of the target human body; and/or,

The human body image is an RGB image.
The method according to any one of claims 1 to 9, wherein the method further comprises:

When the human body geometry reconstruction is performed based on the human body image of the target human body, the human skeleton structure of the target human body is obtained;

After the three-dimensional human body model of the target human body is obtained, skin weights for driving the three-dimensional human body model are determined based on the three-dimensional human body model and the human skeleton structure.
A three-dimensional human body reconstruction device, characterized in that the device comprises:

an overall reconstruction module, used for performing geometric reconstruction of the human body based on the human body image of the target human body to obtain a three-dimensional mesh model of the target human body;

a local reconstruction module, configured to perform local geometric reconstruction on the local part of the target human body based on the human body image of the target human body to obtain a three-dimensional mesh model of the local part;

a fusion processing module, configured to fuse the 3D mesh model of the local part with the 3D mesh model of the target human body to obtain an initial 3D model;

The texture reconstruction module is used for reconstructing the human body texture of the target human body according to the initial three-dimensional model and the human body image, so as to obtain the three-dimensional human body model of the target human body.
The apparatus of claim 11, wherein:

When the overall reconstruction module is used to obtain the three-dimensional mesh model of the target human body, it includes: performing three-dimensional reconstruction on the human body image of the target human body through the first deep neural network branch to obtain a first human body model; The second deep neural network branch performs three-dimensional reconstruction on the partial image in the human body image to obtain a second human body model; wherein, the partial image includes a partial area of the target human body; the first human body model and the second human body model are combined. The two human body models are fused to obtain a fused human body model; the fused human body model is meshed to obtain a three-dimensional mesh model of the target human body.
The device according to claim 11 or 12, characterized in that:

The local reconstruction module is specifically configured to: perform feature extraction on the human body image of the target human body to obtain a third image feature; determine the local part according to the third image feature and the three-dimensional topology template of the local part 3D mesh model of the part.
The device according to any one of claims 11 to 13, characterized in that:

The fusion processing module is specifically configured to: obtain multiple key points of the local part according to a single human body image of the target human body; determine that the multiple key points are on the three-dimensional mesh model of the target human body information of the corresponding key points of the first model, and, determining the information of the key points of the second model corresponding to the plurality of key points on the three-dimensional mesh model of the local part; based on the information of the key points of the first model and the information of the key points of the second model, and fuse the three-dimensional mesh model of the local part into the three-dimensional mesh model of the target body to obtain the initial three-dimensional model.
The apparatus of claim 14, wherein:

The fusion processing module is used to fuse the three-dimensional mesh model of the local part to the three-dimensional mesh of the target human body based on the information of the key points of the first model and the information of the key points of the second model model, when the initial three-dimensional model is obtained, including: based on the information of the key points of the first model and the information of the key points of the second model, determining the three-dimensional mesh model of the target human body and the three-dimensional grid model of the local part The coordinate transformation relationship between the grid models; according to the coordinate transformation relationship, the three-dimensional grid model of the local part is transformed into the coordinate system of the three-dimensional grid model of the target body; under the transformed coordinate system, the The three-dimensional mesh model of the local part is fused to the three-dimensional mesh model of the target human body to obtain the initial three-dimensional model.
The device according to any one of claims 11 to 15, characterized in that:

The texture reconstruction module is specifically configured to: perform human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask and the frontal texture of the target human body; wherein, the first segmentation mask corresponds to The mask area of the front texture, the second segmentation mask corresponds to the mask area of the back texture of the target human body; the front texture, the first segmentation mask and the second segmentation mask are code, input the texture generation network to obtain the back texture of the target body; based on the back texture and the front texture, obtain a textured three-dimensional body model corresponding to the target body.
The apparatus of claim 16, wherein the apparatus further comprises:

A model training module for training the texture generation network, including: performing human body segmentation on images of human body samples in the training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask and the human body sample , wherein the first sample segmentation mask corresponds to the mask area of the front texture of the human sample, and the second sample segmentation mask corresponds to the mask area of the back texture of the human sample; according to The frontal texture of the human body, the third sub-sample segmentation mask and the fourth sample segmentation mask in the auxiliary human body image, and the auxiliary texture generation network is trained, wherein the auxiliary human body image is obtained by reducing the resolution of the image of the human body sample, The third sample segmentation mask corresponds to the mask area of the front texture of the human body in the auxiliary human body image, and the fourth sample segmentation mask corresponds to the mask area of the back texture of the human body in the auxiliary human body image; After the auxiliary texture generation network is trained, the texture generation network is trained based on the frontal texture of the human sample, the first sample segmentation mask and the second sample segmentation mask, wherein the The network parameters of the texture generation network include: at least part of the network parameters of the auxiliary texture generation network that has been trained.
An electronic device comprising: a memory and a processor, where the memory is used to store computer-readable instructions, and the processor is used to invoke the computer instructions to implement the method according to any one of claims 1 to 10.
A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method of any one of claims 1 to 10.
A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.