CN114429518A

CN114429518A - Face model reconstruction method, device, equipment and storage medium

Info

Publication number: CN114429518A
Application number: CN202111627286.2A
Authority: CN
Inventors: 刘永进; 吕天; 叶子鹏; 夏萌霏; 王雷
Original assignee: Tsinghua University; Deep Blue Technology Shanghai Co Ltd
Current assignee: Tsinghua University; Deep Blue Technology Shanghai Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-05-03

Abstract

The invention provides a face model reconstruction method, a device, equipment and a storage medium, wherein the method comprises the following steps: inputting a front photo of a current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face mesh model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model. The technical scheme of the invention can realize the improvement of the reconstruction precision of the face model.

Description

Face model reconstruction method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for reconstructing a face model, an electronic device, and a non-transitory computer-readable storage medium.

Background

Three-dimensional face reconstruction refers to reconstructing a three-dimensional model of a face based on one or more images containing the face. The three-dimensional model is typically a mesh model of a human face. The three-dimensional face reconstruction technology is widely applied, for example, a three-dimensional model can be obtained based on a patient picture for diagnosis and treatment in medical science, and a virtual digital person can be generated based on a user picture in social media.

In the real world, the details of the human face are actually very rich, including the concave-convex part of the face, the change of skin color, the change of the details of the eyes, the nose and the mouth edge, the position and the size of the mole, and the like, and the details jointly depict the characteristics of the human face. The existing three-dimensional face reconstruction method is difficult to process the parts at the same time, so that the reconstructed three-dimensional face has the phenomena of deviation and incoordination of five sense organs, deviation of a grid model, roughness and unsmooth of textures and the like.

Because human beings are very sensitive to the details of the faces of real people, people can easily find the unreality of the three-dimensional faces with low precision, and the application of the three-dimensional face reconstruction method in scenes such as social media is greatly limited. For scenes with extremely high precision requirements on the face model, for example, in the field of medical research, faces with low precision may not be used.

Disclosure of Invention

The invention provides a face model reconstruction method, a face model reconstruction device, electronic equipment and a non-transient computer readable storage medium, which are used for solving the problem of low face model reconstruction precision in the prior art and obtaining a face model with higher precision.

The invention provides a face model reconstruction method, which comprises the following steps: inputting a front photo of a current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face mesh model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

According to the face model reconstruction method provided by the invention, after the reconstructed front face model is obtained, the method further comprises the following steps: selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face; performing point processing on the second color depth image to obtain a face point cloud; non-rigidly registering the front face model to the front face point cloud to obtain a half face model; non-rigidly registering a template full-head model to the half-face model; and splicing the half face model and the head-back part of the template full-head model to obtain a complete full-head model.

According to the face model reconstruction method provided by the invention, the generation network comprises a texture generator, a normal generator and a parallax generator, and the training method of the generation network comprises the following steps: training the texture generator until the texture generator converges to obtain an intermediate generator; introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

According to the face model reconstruction method provided by the invention, the training method of the face description model comprises the following steps: acquiring a training data set of historical face photos; and sequentially inputting the historical face positive photos in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

According to the face model reconstruction method provided by the invention, the face mesh model of the current face is obtained according to the geometric parameters, and the method comprises the following steps: and acquiring the coordinates of all vertexes in the face mesh through the geometric parameters.

According to the face model reconstruction method provided by the invention, five frames of second color depth images with different angles are selected from the first color depth images with multiple angles of the current face, and the method comprises the following steps: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: face, left 15 °, left 30 °, right 15 ° and right 30 °.

According to the face model reconstruction method provided by the invention, the point processing is performed on the second color depth image to obtain a frontal face point cloud, and the method comprises the following steps: performing point processing on the second color depth image to obtain a frontal face point cloud and key points; the selecting of the second color depth image at the following five different angles includes: calculating the direction and angle of the turning of the head corresponding to the current face according to the key points; and selecting the five second color depth images with different angles according to the direction and the angle of the turning direction of the head of the person.

According to the face model reconstruction method provided by the invention, the non-rigid registration of the front face model to the front face point cloud is carried out to obtain a half-face model, and the method comprises the following steps: matching the front face model by using the front face point cloud so as to enable the front face point cloud to carry out deformation fitting on a grid according to the front face model; the non-rigid registration of the template full-head model to the half-face model comprises: and matching the template full-head model by using the half-open face model, so that the half-open face model is deformed and fitted according to the template full-head model.

According to the face model reconstruction method provided by the invention, the half face model and the head-back part of the template full-head model are spliced to obtain a complete full-head model, and the method comprises the following steps: carrying out position matching on the front face model and the head and back parts of the template full-head model in space; and obtaining a new vertex in the head back part of the front face model and the template full-head model through interpolation calculation, and adding a new triangular patch in the full-head face model based on the new vertex.

According to the face model reconstruction method provided by the invention, the new vertex is obtained by interpolation calculation in the front face model and the head and back part of the template full-head model, and the method comprises the following steps: and matching the face edge vertex and the head rear edge vertex after position matching, and performing interpolation by using the space coordinates of the face edge vertex and the head rear edge vertex and a normal vector by adopting a first difference method.

According to the face model reconstruction method provided by the invention, the first difference method comprises any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

The invention provides a human face model reconstruction device, comprising: the face description unit is used for inputting the front photo of the current face into a preset face description model to obtain the description information of the face, wherein the description information comprises the geometric parameters of the current face, and the face description model is a deep neural network model; a mesh model obtaining unit, configured to obtain a face mesh model of the current face according to the geometric parameters; the mapping generating unit is used for inputting the front photo of the current face into a preset generating network to generate a texture mapping, a normal mapping and a parallax mapping of the current face; and the human face model reconstruction unit is used for inputting the human face mesh model, the texture map, the normal map and the parallax map into a parallax micro renderer to obtain a reconstructed front human face model.

According to the human face model reconstruction device provided by the invention, the device further comprises: the selecting unit is used for selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face; the processing unit is used for carrying out point processing on the second color depth image to obtain a normal face point cloud; the first registration unit is used for non-rigidly registering the front face model to the front face point cloud to obtain a half face model; the second registration unit is used for non-rigidly registering the template full-head model to the half-face model; and the splicing unit is used for splicing the half face model and the head-rear part of the template full-head model to obtain a complete full-head model.

According to the human face model reconstruction device provided by the invention, the generating network comprises a texture generator, a normal generator and a parallax generator, and the device further comprises a generating unit used for: training the texture generator until the texture generator converges to obtain an intermediate generator; introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

According to the human face model reconstruction device provided by the invention, the device further comprises a training unit, which is used for: acquiring a training data set of historical face photos; and sequentially inputting the historical face positive photos in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

According to the human face model reconstruction device provided by the invention, the grid model obtaining unit is further configured to: and acquiring the coordinates of all vertexes in the face mesh through the geometric parameters.

According to the human face model reconstruction device provided by the invention, the selection unit is further configured to: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

According to the human face model reconstruction apparatus provided by the present invention, the processing unit is further configured to: performing point processing on the second color depth image to obtain a frontal face point cloud and key points; the selecting unit is further configured to: calculating the direction and angle of the turning of the head corresponding to the current face according to the key points; and selecting the second color depth images at the five different angles according to the direction and the angle of the turning direction of the head of the person.

According to the human face model reconstruction apparatus provided by the present invention, the first registration unit is further configured to: matching the front face model by using the front face point cloud so as to enable the front face point cloud to carry out deformation fitting on a grid according to the front face model; the second registration unit is further configured to: and matching the template full-head model by using the half-open face model, so that the half-open face model is deformed and fitted according to the template full-head model.

According to the human face model reconstruction device provided by the invention, the splicing unit is further configured to: carrying out position matching on the front face model and the head and back parts of the template full-head model in space; and obtaining a new vertex in the head back part of the front face model and the template full-head model through interpolation calculation, and adding a new triangular patch in the full-head model based on the new vertex.

According to the human face model reconstruction device provided by the invention, the splicing unit is further configured to: and matching the face edge vertex and the head rear edge vertex after position matching, and performing interpolation by using a first difference device by using the spatial coordinates and the normal vector of the face edge vertex and the head rear edge vertex.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the above human face model reconstruction methods.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the face model reconstruction method as described in any of the above.

The face model reconstruction method, the face model reconstruction device, the electronic equipment and the non-transitory computer readable storage medium provided by the invention have the advantages that the face mesh model, the high-precision texture mapping, the parallax mapping and the method mapping are generated based on the input single face photo, and the reconstructed front face model is obtained by inputting the parallax micro-renderer, so that the precision of the geometric parameters and the texture of the face is ensured, and the reconstruction precision of the face model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a face model reconstruction method according to the present invention;

FIG. 2 is a schematic flow chart of generating a full-head face model according to the present invention;

FIG. 3 is a second schematic flowchart of a face model reconstruction method according to the present invention;

FIG. 4 is a second schematic flow chart of generating a full-head face model according to the present invention;

FIG. 5 is a schematic structural diagram of a face model reconstruction apparatus provided in the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the present invention, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the reconstruction method of the three-dimensional face model in the related art, attention is generally paid to improving geometry or texture, and the whole aspect of the three-dimensional face reconstruction is rarely considered comprehensively, so that a three-dimensional face model with higher precision cannot be obtained.

In order to solve the above problems, embodiments of the present invention provide a technical solution for reconstructing a face model.

The following detailed description of exemplary embodiments of the invention refers to the accompanying drawings.

Fig. 1 is a flowchart of a face model reconstruction method according to an embodiment of the present invention. The method provided by the embodiment of the invention can be executed by any electronic equipment with computer processing capability, such as a terminal or a server. As shown in fig. 1, the face model reconstruction method includes:

step 102, inputting the front photo of the current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model.

Specifically, the face description information includes geometric parameters, positions, and orientations of the face. The geometric parameters of the face are the geometric parameters of 3D digital media models (3D deformation models) of the face. The front face photo of the current face is a photo containing a single face. The face description model in the embodiment of the invention can be a weakly supervised three-dimensional face reconstruction model.

And 104, acquiring a face mesh model of the current face according to the geometric parameters.

Specifically, coordinates of fixed points in the face mesh can be obtained through geometric parameters of the face, so that a face mesh model of the current face is obtained.

And 106, inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face.

Specifically, the generation network includes a texture generator (TextureGAN), a normal generator (NormalGAN), and a disparity generator (parallelxgan) to generate a texture map, a normal map, and a disparity map, respectively.

And 108, inputting the face mesh model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

In particular, the differentiable disparity renderer can be written based on the pytorch3d, can effectively combine texture mapping, normal mapping and disparity mapping simultaneously to obtain a rendering result rich in details, can effectively calculate on a GPU, and meanwhile, various arithmetic and matrix operations in the renderable disparity renderer are differentiable, so that training of an upstream neural network can be supported. Wherein, the pytorech 3d is a three-dimensional implicit shape rendering component library. The front face model comprises information such as vertexes, faces, vertex texture coordinates, vertex normal coordinates and the like.

The technical scheme of the embodiment of the invention is based on a single face photo, and can simultaneously obtain the texture map, the normal map and the parallax map by generating the network, and the texture map, the normal map and the parallax map can change the light calculation and texture sampling modes during rendering, so that the face texture is finer, and a more real concave-convex effect is generated.

Specifically, according to the technical scheme of the embodiment of the invention, the pose, diffuse reflection, geometric parameters and other information of the face are firstly obtained through the pre-training of the face description model, so that face grids can be reconstructed, and then high-precision texture mapping, parallax mapping and normal mapping can be generated based on the input face picture through the training of TextureGAN, NormalGAN and parallelxgan, so that the high-precision texture of the face is obtained. Based on the technical scheme, the technical scheme of the embodiment of the invention can ensure the high precision of the geometric parameters and the textures of the human face.

In the embodiment of the present invention, the training method of the face description model includes: acquiring a training data set of historical face photos; and sequentially inputting the historical face positive photos in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

Specifically, a data set of a face photo is obtained, the face photo in the data set is sequentially input into an initial face description model, information such as 3DMM geometric parameters, pose, diffuse reflection and illumination of a human face is output, the initial face description model is trained to obtain a pre-training model A0, the parameters of the pre-training model A0 are kept unchanged, and the pre-training model A0 is a final face description model.

The above training data set may be an existing data set, such as the CelebA-HQ data set. The training dataset comprises a plurality of photographs containing faces.

It will be appreciated that the input to the face description model is a picture of a face at a time, and the output is the location δ of the face predicted by the model_mIdentity delta_idExpression delta_exDelta. diffuse reflection_albPosture delta_poseAnd the illumination coefficient delta_illuForming a parameter set { delta }_m,δ_id,δ_ex,δ_alb,δ_pose,δ_illu}∈R²⁵⁸。

It should be understood that the network of the classes of coefficients of the face encoded in the face description model is ResNet (residual network), and the residual network includes: a convolutional layer, a pooling layer, an active layer, and a full link layer.

Wherein the training goal of the face description model is to minimize a loss function L, including a position center loss L_cPixel loss L_pixelLoss of sensing order L_perCharacteristic point loss L_lanRegular term loss L_normAnd texture variance loss L_vaThe formulas are respectively as follows:

L_reg＝λ_normL_norm+λ_vaL_va

L＝λ_cL_c+λ_pixL_pix+λ_perL_per+λ_lanL_lan+L_reg

wherein n represents that n training samples are in the training set, and each training sample is a normal face photo. W and H are the width and height of the photograph,

indicates the probability that the pixel with (i, j) in the picture is predicted to be a positive example, i.e., a face, and

y^ija label representing a pixel of position (i, j), such as a face or a background, and the parameter γ is 2. In addition, I^*Respectively representing rendered images and face images, M_skin,M_maskRepresenting the skin, respectively the facial area in the sample,

is a Hadamard product, phi (-) represents a depth feature extractor, I_k,I_k ^*Respectively represent images I, I^*The k-th personal face area, m is 68, the total number of the face characteristic points, q_k,j,

Respectively represent the position of the jth characteristic point of the kth face area in the rendering result and the position of the jth characteristic point of the kth face area in the real result, omega_jIs the weight of the jth feature point, with the mouth set to 20 and the remainder to 1, and the weight parameter set to λ_id＝1.0,λ_ex＝0.8,λ_albR in texture variance loss represents a predefined subset of the skin area covered by diffuse reflection a, 0.0017.

The coefficients of each part in the final loss are:

λ_norm＝0.0001,λ_va＝0.001,λ_c＝1,λ_pix＝100,λ_per＝0.01,λ_lan＝0.1

the geometric parameters of the 3DMM comprise parameter vectors of identity and expression, and the coordinates of all vertexes in the face mesh can be obtained through calculation of the geometric parameters so as to reconstruct the three-dimensional face.

The information of the 3DMM such as geometric parameters, pose, diffuse reflection, illumination and the like is a component of a part of dimensionality in a vector output by the face description model.

When the three-dimensional face reconstruction is carried out, the coordinates of the vertexes in the face mesh are obtained by calculating the identity and expression parameters by a 3DMM method:

wherein S represents the three-dimensional coordinates of 35709 vertices,

is the shape of the average face, S_id,S_exBase vectors, δ, of identity and expression, respectively_id,δ_exRespectively, parameters of identity and expression.

In the embodiment of the invention, the TextureGAN receives the input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain 512 × 512 RGB three-channel corresponding face texture maps. The RGB three-channel texture map contains texture information on a human face, and the color of each pixel can be obtained from the texture map in the process of rendering the human face through sampling and interpolation.

The NormalGAN accepts the input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain a normal map of the face corresponding to 1024 × 1024 RGB three channels. The RGB three-channel normal mapping comprises the orientation of a normal vector of each point of the human face, and RGB respectively corresponds to the XYZ directions of the normal vectors. The normal corresponding to each pixel can be obtained from the normal map in the process of rendering the face through sampling and interpolation, and more concave-convex details are generated in the process of calculating illumination.

The parallelxgan receives the input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain 1024 × 1024 single-channel parallax maps of the corresponding face. The single-channel parallax map comprises the height of each point of the human face, and the offset of the texture map can be calculated through the position of the camera. The texture map offset corresponding to each pixel can be obtained from the parallax map in the process of rendering the face through sampling and interpolation, so that the sampling result needs to be corrected according to the offset when texture sampling is performed, and concave-convex details are further enhanced when rendering is performed. Meanwhile, the front face model and the back head model can be spliced and fused in a high-quality mode, and a complete and high-precision full head model is obtained.

In the embodiment of the present invention, the training method for generating a network includes the following two steps: firstly, training the texture generator until the texture generator converges to obtain an intermediate generator, namely an intermediate model. The texture mapping precision is higher when the texture generator converges. And secondly, introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator, namely synchronously adjusting and training the three models.

In step 106, the photo is input into the texture generator TextureGAN to obtain the texture map of the human face. Specifically, the TextureGAN encodes an input photo by using ResNet, and then performs upsampling on a convolutional layer, a pooling layer and an activation layer to obtain an output texture map. The TextureGAN structure may be a pretrained ResNet coding layer + a linear layer (input 1000 dimensions, output 128 x 64 dimensions) + a two-dimensional batch normalization layer + an upsampling layer (amplification ratio of 2) + a convolutional layer (input channel of 128, output channel of 128, convolutional kernel size of 3, step size of 1, padding of 1) + a two-dimensional batch normalization layer + a leaky relu active layer (coefficient of 0.2) + an upsampling layer (amplification ratio of 2) + a convolutional layer (input channel of 128, output channel of 64, convolutional kernel size of 3, step size of 1, padding of 1) + a two-dimensional batch normalization layer + a leaky relu active layer (coefficient of 0.2) + an upsampling layer (amplification ratio of 2) + a convolutional layer (input channel of 64, output channel of 32, convolutional kernel size of 3, step size of 1, padding of 1) + a two-dimensional batch normalization layer + a leaky relu active layer (coefficient of 0.2) + an upsampling layer (amplification ratio of 2) + a convolutional layer (input channel of 32, convolution kernel size of 3, step size of 1, padding of 1) + a two-dimensional standard batch layer + a leaky relu active layer (coefficient of 0.2) + an upsampling layer (input channel of 32) + an amplification ratio of 32) + a convolutional kernel (input channel of 32), output channel is 32, convolution kernel size is 3, step size is 1, fill is 1) + two-dimensional batch normalization layer + leak ReLU active layer (coefficient is 0.2) + upsampling layer (magnification ratio is 2) + convolutional layer (input channel is 32, output channel is 16, convolution kernel size is 3, step size is 1, fill is 1) + two-dimensional batch normalization layer + leak ReLU active layer (coefficient is 0.2) + upsampling layer (magnification ratio is 2) + convolutional layer (input channel is 16, output channel is 8, convolution kernel size is 3, step size is 1, fill is 1) + two-dimensional batch normalization layer + leak ReLU active layer (coefficient is 0.2) + upsampling layer (magnification ratio is 2) + convolutional layer (input channel is 8, output channel is 3, convolution kernel size is 3, step size is 1, fill is 1) + hyperbolic active layer.

In step 106, the normal generator NormalGAN is used to input the picture to obtain a normal map of the face. Specifically, in NormalGAN, ResNet is used to encode the input photo, and then upsampling is performed through the convolutional layer, the pooling layer and the active layer to obtain the output normal mapping. The NormalGAN structure may be a pre-trained ResNet coding layer + linear layer (input 1000 dimensions, output 128 x 64 dimensions) + two-dimensional batch normalization layer + upsampling layer (amplification ratio is 2) + convolutional layer (input channel is 128, output channel is 128, convolutional kernel size is 3, step size is 1, fill is 1) + two-dimensional batch normalization layer + leak relu active layer (coefficient is 0.2) + upsampling layer (amplification ratio is 2) + convolutional layer (input channel is 128, output channel is 64, convolutional kernel size is 3, step size is 1, fill is 1) + two-dimensional batch normalization layer + leak relu active layer (coefficient is 0.2) + upsampling layer (amplification ratio is 2) + convolutional layer (input channel is 64, output channel is 32, convolutional kernel size is 3, step size is 1, fill is 1) + batch two-dimensional batch normalization layer + leak relu active layer (coefficient is 0.2) + upsampling layer (amplification ratio is 2) + convolutional layer (input channel is 32, output channel is 3, step size is 1, fill is 1) + batch layer) + two-dimensional convolutional kernel size is 2, fill is 1) + batch layer + batch two-dimensional convolutional kernel + batch two-dimensional normalization layer + leak normalization layer (coefficient is 2) (amplification ratio is 32), the output channel is 32, the convolution kernel size is 3, the step size is 1, the padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 32, output channel is 16, convolution kernel size is 3, step size is 1, padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 16, output channel is 8, convolution kernel size is 3, step size is 1, padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 8, output channel is 3, convolution kernel size is 3, step size is 1, padding is 1) + the hyperbolic active layer.

In step 106, the photo is input to a disparity generator parallelaxgan to obtain a texture map of the human face. Specifically, ResNet is used in parallelxgan to encode an input photo, and then upsampling is performed on a convolutional layer, a pooling layer and an active layer to obtain an output disparity map. The parallelxgan structure may be the pretrained ResNet coding layer + linear layer (input 1000 dimensions, output 128 × 64 dimensions) + two-dimensional batch normalization layer + upsampled layer (amplification ratio is 2) + convolutional layer (input channel is 128, output channel is 128, convolutional kernel size is 3, step size is 1, padding is 1) + two-dimensional batch normalization layer + laakyrelu active layer (coefficient is 0.2) + upsampled layer (amplification ratio is 2) + convolutional layer (input channel is 128, output channel is 64, convolutional kernel size is 3, step size is 1, padding is 1) + two-dimensional batch normalization layer + laakyrelu active layer (coefficient is 0.2) + upsampled layer (amplification ratio is 2) + convolutional layer (input channel is 64, output channel is 32, convolutional kernel size is 3, step size is 1, padding is 1) + two-dimensional batch normalization layer + LeakyReLU active layer (coefficient is 0.2) + upsampled layer (amplification ratio is 2) + convolutional layer (input channel is 32, convolutional kernel size is 3, step size is 1, padding is 1) + two-dimensional convolutional kernel size is 2) + input channel (amplification ratio is 32), the output channel is 32, the convolution kernel size is 3, the step size is 1, the padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 32, output channel is 16, convolution kernel size is 3, step size is 1, padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 16, output channel is 8, convolution kernel size is 3, step size is 1, padding is 1) + the two-dimensional batch normalization layer + the leak relu active layer (coefficient is 0.2) + the upsampling layer (amplification ratio is 2) + the convolution layer (input channel is 8, output channel is 1, convolution kernel size is 3, step size is 1, padding is 1) + the hyperbolic active layer.

The disparity renderable device in step 108 is a high-precision renderable device, which can receive the texture map, the normal map, the disparity map and the mesh model of the human face as input, and output a rendered image of the high-precision reconstructed human face. All arithmetic operation, sampling operation, interpolation operation and the like in the high-precision micro-renderer are micro and are used for guaranteeing the training of the model. The high-precision micro-renderer can run on a GPU, and the training and rendering process is accelerated.

The high-precision micro-renderer needs to obtain the information of each pixel after rendering, which includes: obtaining a normal vector corresponding to each pixel by sampling the normal map with interpolation, and calculating the subsequent light reflection and the like; obtaining the corresponding height of each pixel by sampling the parallax map with interpolation; by calculating the tangent space of the patch corresponding to the pixel, the coordinate of the camera position and the like can be converted into the coordinate of the tangent space; calculating the offset of the texture coordinate according to the height corresponding to each pixel obtained previously and the tangent space coordinate position of the camera position; finally, the texture map is sampled with interpolation, and the offset generated by the parallax map needs to be introduced during sampling. And calculating the color of each pixel point by utilizing the texture obtained by sampling and combining with the illumination model, and finally obtaining a rendering result.

Wherein, the illumination model can be Phong model.

All arithmetic operations, sampling operations, interpolation operations and the like in the high-precision micro-renderer are micro, so that the training of the model can be guaranteed. All operations of the micro-renderable device are completed by using matrix operations, such as offset calculation operations, so that the micro-renderable device can be efficiently run on a GPU to accelerate the training and rendering process.

TextureGAN, normalgAN, and parallelxgAN may be trained by calculating the loss of high precision micro-renderer output results and backpropagating. With an optimization target of pixel loss L_pixel：

I＝R(I_p,I_n,I_t,Me)

Wherein n represents that n training samples are in the training set, each training sample is a normal face photo, W and H are the width and height of the photo respectively,

y^ija label representing a pixel of position (i, j), such as a face or a background, and the parameter γ is 2. In addition, I^*Respectively representing the rendered image and the real face image, M representing the area of the face in the sample,

is Hadamard product, R stands for high precision micro-parallax renderer, I_p,I_n,I_tMe represents a parallax map, a normal map, a texture map, and a mesh model of a human face, respectively.

In order to accomplish technical goals such as digital human generation, it is necessary to be able to reconstruct the complete head mesh. The half face model and the back head model have differences in parameters, precision and the like, and the generation of the full head model by combining the front face model and the back head grid is an important research topic.

In the embodiment of the present invention, after obtaining the reconstructed front face model, the full-head model may be generated by combining the mesh behind the head, as shown in fig. 2, the steps are as follows:

step 202, selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face.

In particular, the first color depth image may be a color depth photograph obtained by RealSense. RealSense is a three-dimensional depth camera that can obtain RGB images and the depth of portions of the images. The first color depth image should be at least 5 frames, and each frame of the first color depth image needs to have an RGB picture of a human face and a depth map of the corresponding human face at the same time. When the second color depth image is selected, only one image which is closest to the target angle needs to be searched, and accurate matching is not needed.

And 204, performing point processing on the second color depth image to obtain a frontal point cloud.

Specifically, the key point and the ear point cloud can be obtained while the front face point cloud is obtained. The ear point cloud can be divided into a left ear point cloud and a right ear point cloud.

And step 206, non-rigidly registering the front face model to the front face point cloud to obtain a half face model.

And step 208, non-rigidly registering the template full-head model to the half-face model.

Specifically, the template full-head model is a mesh model prepared in advance.

And step 210, splicing the half face model and the head-back part of the template full-head model to obtain a complete full-head model.

According to the technical scheme of the embodiment of the invention, the point cloud is obtained based on the color depth image, and the high-precision full-head model can be generated by registering and splicing the point cloud. Through point cloud registration, the half face model obtained by reconstructing a single picture can be more refined. The template full-head model is registered, so that the shape of the template full-head model tends to be real. Furthermore, the half-face model and the head-back part of the full-head model can be spliced by an interpolation method to obtain a complete and high-precision full-head model.

In step 202, the first color depth image is obtained, the first color depth image is preprocessed, and the following five second color depth images with different angles are selected: face, left 15 °, left 30 °, right 15 ° and right 30 °.

Specifically, the direction and angle of the human head steering corresponding to the current face may be calculated according to the key points, and the five second color depth images at different angles may be selected according to the direction and angle of the human head steering. The rotation angle and the rotation direction can be calculated according to the coordinates and the position relations of the 68 key points of the human face output by the human face key point detection module and through the coordinates and the position relations of the key points.

In step 206, when the front face model is non-rigidly registered to the front face point cloud, the front face model is matched by using the front face point cloud, so that the front face point cloud performs deformation fitting on the mesh according to the front face model.

The human face key point detection module can ensure that the obtained point cloud is just the point cloud of the face part, so that the point cloud of the non-human face part, namely the background part is omitted. An ear detection module is needed to obtain the point clouds of the left ear and the right ear, so that the obtained point clouds are guaranteed to be just the point clouds of the left ear and the right ear respectively, and the point clouds of the rest parts are omitted. The algorithm used by the non-rigid registration step may be the nicap algorithm. The nicap algorithm is a modified algorithm of an ICP (Iterative Closest Point) algorithm.

In step 208, when the full-head template model is non-rigidly registered to the half-face model, the full-head template model is matched with the half-face model, so that the half-face model is deformed and fitted according to the full-head template model. The algorithm used by the non-rigid registration step may be the nicap algorithm.

In step 210, the stitching process needs to be implemented by interpolation and filling of patches. Specifically, position matching is carried out on the front face model and the head and back part of the template full-head model in the space; and obtaining a new vertex in the head back part of the front face model and the template full-head model through interpolation calculation, and adding a new triangular patch in the full-head face model based on the new vertex.

Specifically, the face edge vertex and the head back edge vertex after position matching may be paired, and a first difference method may be used to perform interpolation using the spatial coordinates of the face edge vertex and the head back edge vertex and a normal vector.

The first difference method includes any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

Among them, the most common method is a method of linear interpolation plus trigonometric function offset, and its formula is:

wherein P is₀,P_M+1Coordinates of two points are matched for the face and the back of the head, respectively, N₀,N_M+1Normal vectors of two points matched with the face and the back of the head respectively, i is the serial number of the interpolation point, M is the number of the interpolation point, P_i ^*And lambda is a super parameter for controlling the interpolation offset and is defaulted to be-0.001.

When the positions of the front face model and the head and back parts of the template full-head model are matched in space, a semi-fixed topology can be used to ensure that the connection relationship between the face and the head and the back parts is unchanged.

In the embodiment of the invention, the geometric parameters, the texture mapping, the normal mapping and the parallax mapping of the human face are automatically generated from a human face picture, so that a high-precision three-dimensional human face reconstruction result is obtained, a color depth image is obtained through a depth camera, then human face point cloud is obtained through calculation, a high-precision half-face model is obtained by registering the high-precision face reconstruction result and the human face point cloud, meanwhile, the template full-head model is non-rigidly registered to the high-precision half-face model, and finally the high-precision half-face model and the head-back part of the full-head model are spliced to obtain the high-precision full-head model.

As shown in fig. 3, a front face photo 301 may be input into an encoder 302 of a face description model, face description information 303 such as orientation, light, identity, expression, and diffuse reflection of a face may be obtained, and a decoder 304 of the face description model decodes the face description information 303 and outputs the decoded face description information to a three-dimensional mesh 305, so as to obtain a face mesh model. The front face photo 301 is respectively input to a texture map generation network 306, a normal map generation network 307 and a parallax map generation network 308 to respectively obtain a texture map, a normal map and a parallax map, and the face mesh model, the texture map, the normal map and the parallax map are input to a parallax micro-renderer 309 to obtain a front face model 310.

As shown in fig. 4, a front face point cloud 403 may be obtained from the N color depth images 401. In step 411, a first registration is performed to register the front face model 402 to the front face point cloud 403 to obtain a half-face model 404. In step 412, a second registration is performed to register the template full-head model 405 onto the half-face model 404. In step 413, the back-of-head portion model 406 and the half-face model 404 are merged to obtain the full-head model 407.

The face model reconstruction method provided by the invention generates a face grid model and a high-precision texture mapping, a parallax mapping and a normal mapping based on the input single face picture, and inputs the generated face grid model and the high-precision texture mapping, the parallax mapping and the normal mapping into the parallax micro-renderer to obtain the reconstructed front face model, thereby ensuring the precision of the geometric parameters and the texture of the face and improving the reconstruction precision of the face model.

The following describes the face model reconstruction device provided by the present invention, and the face model reconstruction device described below and the face model reconstruction method described above may be referred to each other correspondingly.

As shown in fig. 5, an apparatus for reconstructing a face model according to an embodiment of the present invention includes:

the face description unit 502 is configured to input the current face front photo into a preset face description model to obtain face description information, where the face description information includes geometric parameters of the current face, and the face description model is a deep neural network model.

A mesh model obtaining unit 504, configured to obtain a face mesh model of the current face according to the geometric parameters.

And a map generating unit 506, configured to input the front photo of the current face into a preset generating network to generate a texture map, a normal map, and a parallax map of the current face.

A face model reconstruction unit 508, configured to input the face mesh model, the texture map, the normal map, and the disparity map into a disparity micro-renderer, so as to obtain a reconstructed front face model.

In an embodiment of the present invention, the apparatus further includes: and the selecting unit is used for selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face. And the processing unit is used for carrying out point processing on the second color depth image to obtain a face point cloud. And the first registration unit is used for non-rigidly registering the front face model to the front face point cloud to obtain a half face model. And the second registration unit is used for non-rigidly registering the template full-head model to the half-face model. And the splicing unit is used for splicing the half face model and the head-back part of the template full-head model to obtain a complete full-head model.

In an embodiment of the present invention, the generating network comprises a texture generator, a normal generator and a disparity generator, the apparatus further comprises a generating unit configured to: training the texture generator until the texture generator converges to obtain an intermediate generator; introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

In an embodiment of the present invention, the apparatus further includes a training unit, configured to: acquiring a training data set of historical face photos; and sequentially inputting the historical face positive photos in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

In an embodiment of the present invention, the mesh model obtaining unit is further configured to: and acquiring the coordinates of all vertexes in the face mesh through the geometric parameters.

In an embodiment of the present invention, the selecting unit is further configured to: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

In an embodiment of the present invention, the processing unit is further configured to: performing point processing on the second color depth image to obtain a frontal face point cloud and key points; the selecting unit is further configured to: calculating the direction and angle of the turning of the head corresponding to the current face according to the key points; and selecting the second color depth images at the five different angles according to the direction and the angle of the turning direction of the head of the person.

In an embodiment of the invention, the first registration unit is further configured to: matching the front face model by using the front face point cloud so as to enable the front face point cloud to carry out deformation fitting on a grid according to the front face model; the second registration unit is further configured to: and matching the template full-head model by using the half face model, so that the half face model is deformed and fitted according to the template full-head model.

In an embodiment of the present invention, the splicing unit is further configured to: carrying out position matching on the front face model and the head and back parts of the template full-head model in space; and obtaining a new vertex in the head back part of the front face model and the template full-head model through interpolation calculation, and adding a new triangular patch in the full-head face model based on the new vertex.

In an embodiment of the present invention, the splicing unit is further configured to: and matching the face edge vertex and the head rear edge vertex after position matching, and performing interpolation by using a first difference device by using the spatial coordinates and the normal vector of the face edge vertex and the head rear edge vertex.

Since each functional module of the face model reconstruction apparatus according to the exemplary embodiment of the present invention corresponds to the steps of the exemplary embodiment of the face model reconstruction method, please refer to the embodiment of the face model reconstruction method according to the present invention for details that are not disclosed in the embodiment of the apparatus according to the present invention.

The face model reconstruction device provided by the invention generates a face grid model and a high-precision texture mapping, a parallax mapping and a normal mapping based on the input single face picture, and inputs the generated face grid model and the high-precision texture mapping, the parallax mapping and the normal mapping into the parallax micro-renderer to obtain the reconstructed front face model, thereby ensuring the precision of the geometric parameters and the texture of the face and improving the reconstruction precision of the face model.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a method of face model reconstruction, the method comprising: inputting a front photo of a current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face mesh model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the face model reconstruction method provided by the above methods, the method including: inputting the front photo of the current face into a preset face description model to obtain the description information of the face, wherein the description information comprises the geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face mesh model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the above-mentioned face model reconstruction method, the method including: inputting a front photo of a current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face mesh model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A face model reconstruction method is characterized by comprising the following steps:

inputting a front photo of a current face into a preset face description model to obtain face description information, wherein the face description information comprises geometric parameters of the current face, and the face description model is a deep neural network model;

acquiring a face mesh model of the current face according to the geometric parameters;

inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face;

and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

2. The method of claim 1, wherein after obtaining the reconstructed frontal face model, the method further comprises:

selecting five frames of second color depth images with different angles from the first color depth images of multiple angles of the current face;

performing point processing on the second color depth image to obtain a face point cloud;

non-rigidly registering the front face model to the front face point cloud to obtain a half face model;

non-rigidly registering a template full-head model to the half-face model;

and splicing the half face model and the head-back part of the template full-head model to obtain a complete full-head model.

3. The method of claim 1, wherein the generating network comprises a texture generator, a normal generator, and a disparity generator, and wherein the training method of the generating network comprises:

training the texture generator until the texture generator converges to obtain an intermediate generator;

introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

4. The method of claim 1, wherein the training method of the face description model comprises:

acquiring a training data set of historical face photos;

and sequentially inputting the historical face positive photos in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

5. The method of claim 1, wherein obtaining the face mesh model of the current face according to the geometric parameters comprises:

and acquiring the coordinates of all vertexes in the face mesh through the geometric parameters.

6. The method of claim 2, wherein selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face comprises:

acquiring the first color depth image;

preprocessing the first color depth image, and selecting a second color depth image with the following five different angles:

face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

7. The method of claim 6, wherein the point processing the second color depth image to obtain a frontal point cloud comprises:

performing point processing on the second color depth image to obtain a frontal face point cloud and key points;

the selecting of the second color depth image at the following five different angles includes: calculating the direction and angle of the turning of the head corresponding to the current face according to the key points;

and selecting the second color depth images at the five different angles according to the direction and the angle of the turning direction of the head of the person.

8. The method of claim 7, wherein non-rigidly registering the frontal face model to the frontal point cloud results in a half-face model comprising:

matching the front face model by using the front face point cloud so as to enable the front face point cloud to carry out deformation fitting gridding according to the front face model;

the non-rigid registration of the template full-head model to the half-face model comprises:

and matching the template full-head model by using the half-open face model, so that the half-open face model is deformed and fitted according to the template full-head model.

9. The method of claim 2, wherein the stitching the half-face model with the back-head portion of the template full-head model to obtain a complete full-head model comprises:

carrying out position matching on the front face model and the head-back part of the template full-head model in space;

and obtaining a new vertex in the head back part of the front face model and the template full-head model through interpolation calculation, and adding a new triangular patch in the full-head model based on the new vertex.

10. The method of claim 9, wherein said interpolating new vertices in said front face model and said back-of-head portion of said template full-head model comprises: and matching the face edge vertex and the head rear edge vertex after position matching, and performing interpolation by using the space coordinates of the face edge vertex and the head rear edge vertex and a normal vector by adopting a first difference method.

11. The method of claim 10, wherein the first difference method comprises any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

12. A face model reconstruction apparatus, comprising:

the face description unit is used for inputting the front photo of the current face into a preset face description model to obtain the description information of the face, wherein the description information comprises the geometric parameters of the current face, and the face description model is a deep neural network model;

a mesh model obtaining unit, configured to obtain a face mesh model of the current face according to the geometric parameters;

the mapping generating unit is used for inputting the front photo of the current face into a preset generating network to generate a texture mapping, a normal mapping and a parallax mapping of the current face;

and the human face model reconstruction unit is used for inputting the human face mesh model, the texture map, the normal map and the parallax map into a parallax micro renderer to obtain a reconstructed front human face model.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 11 are implemented when the processor executes the program.

14. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.