CN114429518B

CN114429518B - Face model reconstruction method, device, equipment and storage medium

Info

Publication number: CN114429518B
Application number: CN202111627286.2A
Authority: CN
Inventors: 刘永进; 吕天; 叶子鹏; 夏萌霏; 王雷
Original assignee: Tsinghua University; Deep Blue Technology Shanghai Co Ltd
Current assignee: Tsinghua University; Deep Blue Technology Shanghai Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2024-07-23
Anticipated expiration: 2041-12-28
Also published as: CN114429518A

Abstract

The invention provides a face model reconstruction method, a device, equipment and a storage medium, which comprise the following steps: inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face grid model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model. The technical scheme of the invention can improve the accuracy of the reconstruction of the face model.

Description

Face model reconstruction method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for reconstructing a face model, an electronic device, and a non-transitory computer readable storage medium.

Background

The three-dimensional face reconstruction refers to reconstructing a three-dimensional model of a face based on one or more images containing the face. The three-dimensional model is typically a mesh model of the face. The three-dimensional face reconstruction technology is widely applied, for example, in medicine, a three-dimensional model can be obtained based on a patient photo for diagnosis and treatment, and in social media, a virtual digital person can be generated based on the user photo.

In the real world, the details of a human face are actually very rich, including the concavity and convexity of the face, the change in skin color, the change in details of the edges of eyes, nose, mouth, the position and size of moles, etc., which together characterize the human face. The existing three-dimensional face reconstruction method is difficult to process the above parts at the same time, so that the reconstructed three-dimensional face has the phenomena of deviation and incompatibility of five sense organs, deviation of a grid model, roughness and unsmooth texture and the like.

Because human beings are very sensitive to the face details of real people, people can easily find the unrealistic of three-dimensional faces with low precision, and the application of the three-dimensional face reconstruction method in social media and other scenes is greatly limited. For scenes with extremely high requirements on the accuracy of the face model, for example, in the field of medical research, faces with low accuracy may not be used.

Disclosure of Invention

The invention provides a face model reconstruction method, a device, electronic equipment and a non-transitory computer readable storage medium, which are used for solving the problem of low face model reconstruction precision in the prior art and obtaining a face model with higher precision.

The invention provides a face model reconstruction method, which comprises the following steps: inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face grid model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

According to the face model reconstruction method provided by the invention, after the reconstructed front face model is obtained, the method further comprises the following steps: selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face; performing point processing on the second color depth image to obtain a front face point cloud; non-rigidly registering the frontal face model to the frontal face point cloud to obtain a half-face model; non-rigidly registering a template full-head model to the half-face model; and splicing the half face model with the head rear part of the template full-head model to obtain a complete full-head model.

According to the face model reconstruction method provided by the invention, the generation network comprises a texture generator, a normal generator and a parallax generator, and the training method of the generation network comprises the following steps: training the texture generator until the texture generator converges to obtain an intermediate generator; and introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

According to the face model reconstruction method provided by the invention, the training method of the face description model comprises the following steps: acquiring a training data set of a historical face photo; and sequentially inputting the historical face forward pictures in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

According to the face model reconstruction method provided by the invention, the face grid model of the current face is obtained according to the geometric parameters, and the face model reconstruction method comprises the following steps: and acquiring coordinates of all vertexes in the face grid through the geometric parameters.

According to the face model reconstruction method provided by the invention, five frames of second color depth images with different angles are selected from the first color depth images with multiple angles of the current face, and the method comprises the following steps: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: positive face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

According to the method for reconstructing the face model provided by the invention, the second color depth image is subjected to point processing to obtain the front face point cloud, and the method comprises the following steps: performing point processing on the second color depth image to obtain a front face point cloud and key points; the selecting the second color depth image of the following five different angles includes: calculating the direction and angle of the head turning of the person corresponding to the current face according to the key points; and selecting the second color depth images with the five different angles according to the steering direction and the steering angle of the head of the person.

According to the face model reconstruction method provided by the invention, the front face model is non-rigidly registered to the front face point cloud to obtain a half face model, and the method comprises the following steps: matching the frontal face model by utilizing the frontal face point cloud, so that the frontal face point cloud carries out deformation fitting grid according to the frontal face model; the non-rigid registration of the template full-head model to the half-face model includes: and matching the template full-head model by using the half face model, so that the half face model is deformed and fitted according to the template full-head model.

According to the face model reconstruction method provided by the invention, the half face model is spliced with the head rear part of the template full-head model to obtain the complete full-head model, and the method comprises the following steps: performing position matching on the head rear part of the front face model and the template full-head model in space; and obtaining new vertexes from the front face model and the head rear part of the template full-head model through interpolation calculation, and adding new triangular patches into the full-head face model based on the new vertexes.

According to the face model reconstruction method provided by the invention, the new vertex is obtained in the head rear part of the front face model and the template full-head model through interpolation calculation, and the method comprises the following steps: and matching the face edge vertex with the head rear edge vertex according to the position matching, and interpolating by using the space coordinates and normal vector of the face edge vertex and the head rear edge vertex by adopting a first difference value method.

According to the face model reconstruction method provided by the invention, the first difference method comprises any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

The invention provides a face model reconstruction device, which comprises: the face description unit is used for inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; the grid model acquisition unit is used for acquiring a face grid model of the current face according to the geometric parameters; the mapping generation unit is used for inputting the front photo of the current face into a preset generation network to generate a texture mapping, a normal mapping and a parallax mapping of the current face; and the face model reconstruction unit is used for inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

According to the invention, the device for reconstructing the face model further comprises: a selecting unit, configured to select five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face; the processing unit is used for carrying out point processing on the second color depth image to obtain a front face point cloud; the first registration unit is used for non-rigidly registering the front face model to the front face point cloud to obtain a half face model; a second registration unit for non-rigidly registering the template full-head model to the half-face model; and the splicing unit is used for splicing the half face model with the head rear part of the template full-head model to obtain a complete full-head model.

According to the face model reconstruction device provided by the invention, the generation network comprises a texture generator, a normal generator and a parallax generator, and the device further comprises a generation unit for: training the texture generator until the texture generator converges to obtain an intermediate generator; and introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

The invention provides a face model reconstruction device, which further comprises a training unit for: acquiring a training data set of a historical face photo; and sequentially inputting the historical face forward pictures in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

According to the face model reconstruction device provided by the invention, the grid model acquisition unit is further used for: and acquiring coordinates of all vertexes in the face grid through the geometric parameters.

According to the face model reconstruction device provided by the invention, the selection unit is further used for: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: positive face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

According to the face model reconstruction device provided by the invention, the processing unit is further used for: performing point processing on the second color depth image to obtain a front face point cloud and key points; the selecting unit is further configured to: calculating the direction and angle of the head turning of the person corresponding to the current face according to the key points; and selecting the second color depth images with the five different angles according to the steering direction and the steering angle of the head of the person.

According to the face model reconstruction device provided by the invention, the first registration unit is further used for: matching the frontal face model by utilizing the frontal face point cloud, so that the frontal face point cloud carries out deformation fitting grid according to the frontal face model; the second registration unit is further configured to: and matching the template full-head model by using the half face model, so that the half face model is deformed and fitted according to the template full-head model.

According to the face model reconstruction device provided by the invention, the splicing unit is further used for: performing position matching on the head rear part of the front face model and the template full-head model in space; and obtaining new vertexes in the front face model and the head rear part of the template full-head model through interpolation calculation, and adding new triangular patches into the full-head model based on the new vertexes.

According to the face model reconstruction device provided by the invention, the splicing unit is further used for: and matching the facial edge vertex with the head rear edge vertex according to the position matching, and interpolating by using a first difference device by using the space coordinates and normal vector of the facial edge vertex and the head rear edge vertex.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the steps of any of the face model reconstruction methods described above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the face model reconstruction method as described in any one of the above.

According to the face model reconstruction method, the device, the electronic equipment and the non-transitory computer readable storage medium, the face grid model, the high-precision texture map, the parallax map and the method map are generated based on the input single Zhang Ren face photo, and the parallax micro-renderer is input to obtain the reconstructed front face model, so that the geometric parameters of the face and the precision of textures are ensured, and the face model reconstruction precision is improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a face model reconstruction method provided by the invention;

FIG. 2 is a schematic flow chart of generating a full-head face model according to the present invention;

FIG. 3 is a second flow chart of the face model reconstruction method according to the present invention;

FIG. 4 is a second flow chart of generating a full-head face model according to the present invention;

Fig. 5 is a schematic structural diagram of a face model reconstruction device provided by the invention;

fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

In the reconstruction method of the three-dimensional face model in the related art, the improvement of geometry or texture is generally focused, and the whole aspect of three-dimensional face reconstruction is rarely comprehensively considered, so that a three-dimensional face model with higher precision cannot be obtained.

In order to solve the problems, the embodiment of the invention provides a technical scheme for reconstructing a face model.

The following describes example embodiments of the invention in detail with reference to the accompanying drawings.

Fig. 1 is a flowchart of a face model reconstruction method according to an embodiment of the present invention. The method provided by the embodiment of the invention can be executed by any electronic device with computer processing capability, such as a terminal or a server. As shown in fig. 1, the face model reconstruction method includes:

step 102, inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model.

Specifically, the face description information includes geometric parameters, positions, and orientations of the face. The geometrical parameters of the face are those of a 3DMM (3D morphable models,3D deformation model) of the face. The current face frontal photograph is a photograph containing a single face. The face description model in the embodiment of the invention can be a weakly supervised three-dimensional face reconstruction model.

And 104, acquiring a face grid model of the current face according to the geometric parameters.

Specifically, the coordinates of the fixed points in the face mesh can be obtained through the geometric parameters of the face, so that the face mesh model of the current face is obtained.

And 106, inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face.

Specifically, the generation network includes a texture generator (TextureGAN), a normal generator (NormalGAN), and a disparity generator (ParallaxGAN) to generate texture maps, normal maps, and disparity maps, respectively.

And step 108, inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

Specifically, the micro parallax renderer can be written based on pytorch d, can effectively and simultaneously combine texture mapping, normal mapping and parallax mapping to obtain a rendering result with rich details, can be efficiently calculated on the GPU, and can support training of an upstream neural network because various arithmetic and matrix operations are micro. Wherein pytorch d is a library of stereographic shape rendering components. The frontal face model contains information such as vertexes, faces, vertex texture coordinates, vertex normal coordinates, and the like.

According to the technical scheme provided by the embodiment of the invention, based on a single face photo, the texture map, the normal map and the parallax map can be obtained simultaneously by generating a network, and the modes of light calculation and texture sampling during rendering can be changed, so that the face texture is finer, and a more real concave-convex effect is generated.

Specifically, according to the technical scheme provided by the embodiment of the invention, the information such as the pose, diffuse reflection, geometric parameters and the like of the face is firstly obtained through the pre-training of the face description model, so that the face grid can be reconstructed, and then the high-precision texture map, parallax map and method map can be generated based on the input face photo through the training of TextureGAN, normalGAN and ParallaxGAN, so that the high-precision texture of the face is obtained. Based on the above, the technical scheme of the embodiment of the invention can ensure the high precision of the geometric parameters and textures of the face.

In the embodiment of the invention, the training method of the face description model comprises the following steps: acquiring a training data set of a historical face photo; and sequentially inputting the historical face forward pictures in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

Specifically, a data set of a front face photo is obtained, the front face photo in the data set is sequentially input into an initial human face description model, information such as geometrical parameters, pose, diffuse reflection, illumination and the like of a human face 3DMM is output, the initial human face description model is trained to obtain a pre-training model A0, and parameters of the pre-training model A0 are kept unchanged, wherein the pre-training model A0 is the final human face description model.

The training data set above may employ an existing data set, such as CelebA-HQ data set. The training dataset includes a plurality of photographs containing faces.

It should be understood that the input of the face description model is a photograph of the face at a certain time, and the output is the model predicted position δ _m, identity δ _id, expression δ _ex, diffuse reflection δ _alb, pose δ _pose, and illumination coefficient δ _illu of the face, forming a parameter set { δ _m,δ_id,δ_ex,δ_alb,δ_pose,δ_illu}∈R²⁵⁸.

It should be understood that the network of the various coefficients encoded to the face in the face description model is ResNet (residual network), which includes: convolution layer, pooling layer, activation layer, and full connection layer.

The training objective of the face description model is to minimize a loss function L, including a position center loss L _c, a pixel loss L _pixel, a perception level loss L _per, a feature point loss L _lan, a regularization term loss L _norm, and a texture variance loss L _va, where the formulas are as follows:

L_reg＝λ_normL_norm+λ_vaL_va

L＝λ_cL_c+λ_pixL_pix+λ_perL_per+λ_lanL_lan+L_reg

Wherein n represents that n training samples are in the training set, and each training sample is a normal face photo. W, H are the width and height of the photo respectively, Representing the probability that the prediction of the pixel with position (i, j) in the photograph is a positive example, i.e., a face

Y ^ij denotes a label of a pixel at position (i, j), such as a face or a background, and the parameter γ=2. In addition, I, I ^* respectively represent the rendered image and the face image, M _skin,M_mask respectively represent the skin and facial regions in the sample,For Hadamard product, phi (·) represents the depth feature extractor, I _k,I_k ^* represents the kth face region in the images I, I ^*, respectively, m=68 is the total number of facial feature points, q _k,j,Respectively representing the position of the jth feature point of the kth face region in the rendering result and the position of the jth feature point of the kth face region in the real result, ω _j is the weight of the jth feature point, with the mouth set to 20 and the rest set to 1, the weight parameter set to λ _id＝1.0,λ_ex＝0.8,λ_alb =0.0017, and R in the texture variance penalty represents a predefined subset of the skin region covered by diffuse reflection a.

The coefficients of each part in the final loss are:

λ_norm＝0.0001,λ_va＝0.001,λ_c＝1,λ_pix＝100,λ_per＝0.01,λ_lan＝0.1

The geometric parameters of the 3DMM comprise parameter vectors of identity and expression, and coordinates of all vertexes in the face grid can be obtained through calculation through the geometric parameters so as to reconstruct the three-dimensional face.

The information of the geometric parameters, the pose, the diffuse reflection, the illumination and the like of the 3DMM are all components of a part of dimensions in the vector output by the face description model.

When the three-dimensional face is reconstructed, the coordinates of the vertexes in the face grids are obtained by calculating the identity and the expression parameters through a 3DMM method:

wherein S represents the three-dimensional coordinates of 35709 vertexes, For the average face shape, S _id,S_ex is the basis vector of identity and expression, and δ _id,δ_ex is the parameter of identity and expression, respectively.

In the embodiment of the invention, textureGAN receives the input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain the texture map of the corresponding face of 512 x 512 RGB three channels. The RGB three-channel texture map comprises texture information on a human face, and the color of each pixel can be obtained from the texture map in the process of rendering the human face through sampling and interpolation.

NormalGAN receives the input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain the normal map of the corresponding face of 1024 x 1024 RGB three channels. The RGB three-channel normal map comprises the orientation of the normal vector of each point of the human face, and RGB corresponds to the XYZ direction of the normal vector. The normal direction corresponding to each pixel can be obtained from the normal map in the process of rendering the face through sampling and interpolation, and more concave-convex details are generated when illumination is calculated.

ParallaxGAN receives an input of a normal face image, firstly uses ResNet to encode, and then decodes to obtain 1024 x 1024 single-channel parallax maps of the corresponding faces. The single-channel parallax map comprises the height of each point of the face of the person, and the offset of the texture map can be calculated through the position of the camera. The texture map offset corresponding to each pixel can be obtained from the parallax map in the process of rendering the face through sampling and interpolation, so that the sampling result is required to be corrected according to the offset in the process of texture sampling, and the concave-convex detail is further enhanced in the process of rendering. Meanwhile, the front face model and the back head model can be spliced and fused with high quality to obtain a complete and high-precision full-head model.

In the embodiment of the invention, the training method for generating the network comprises the following two steps: training the texture generator until the texture generator converges to obtain an intermediate generator, namely an intermediate model. The texture mapping precision is higher when the texture generator converges. And secondly, introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator, namely synchronously adjusting and training three models.

In step 106, the photo is input into texture generator TextureGAN to obtain a texture map of the face. Specifically, resNet is used in TextureGAN to encode the input picture, and then upsampling is performed through the convolution layer, the pooling layer, and the activation layer to obtain the output texture map. The TextureGAN structure may be a pre-training ResNet coding layer+linear layer (1000 dimensions input, 128 x 64 dimensions output) +two-dimensional batch normalization layer+upsampling layer (scale of amplification 2) +convolution layer (128 input channel, 128 output channel, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 128, output channel 64, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 64, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 16, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 16, output channel 8, convolution kernel size 3, step 1), fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (amplification ratio 2) +convolution layer (input channel 8, output channel 3, convolution kernel size 3, step size 1, fill 1) +hyperbolic tangent activation layer.

In step 106, the photo is input to a normal generator NormalGAN to obtain a normal map of the face. Specifically, resNet is used in NormalGAN to encode the input photo, and then upsampling is performed through the convolution layer, the pooling layer, and the activation layer to obtain the output normal map. The NormalGAN structure may be a pre-training ResNet coding layer+linear layer (1000 dimensions input, 128 x 64 dimensions output) +two-dimensional batch normalization layer+upsampling layer (scale of amplification 2) +convolution layer (128 input channel, 128 output channel, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 128, output channel 64, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 64, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 16, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 16, output channel 8, convolution kernel size 3, step 1), fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (amplification ratio 2) +convolution layer (input channel 8, output channel 3, convolution kernel size 3, step size 1, fill 1) +hyperbolic tangent activation layer.

In step 106, the photo is input to the parallax generator ParallaxGAN to obtain a texture map of the face. Specifically, resNet is used in ParallaxGAN to encode the input picture, and then upsampling is performed through the convolution layer, the pooling layer, and the activation layer to obtain the output disparity map. The ParallaxGAN structure may be a pre-training ResNet coding layer+linear layer (1000 dimensions input, 128 x 64 dimensions output) +two-dimensional batch normalization layer+upsampling layer (scale of amplification 2) +convolution layer (128 input channel, 128 output channel, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 128, output channel 64, convolution kernel size 3, step size 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 64, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 32, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 32, output channel 16, convolution kernel size 3, step 1, fill 1) +two-dimensional batch normalization layer+ LeakyReLU activation layer (coefficient 0.2) +upsampling layer (scale of amplification 2) +convolution layer (input channel 16, output channel 8, convolution kernel size 3, step 1), filling is 1) +two-dimensional batch standardization layer+ LeakyReLU activation layer (coefficient is 0.2) +up-sampling layer (amplification ratio is 2) +convolution layer (input channel is 8, output channel is 1, convolution kernel size is 3, step size is 1, filling is 1) +hyperbolic tangent activation layer.

The parallax micro-renderer in step 108 is a high-precision micro-renderer, which can accept the texture map, the normal map, the parallax map and the mesh model of the face as inputs, and output a high-precision reconstructed rendered image of the face. All arithmetic operations, sampling operations, interpolation operations, etc. in the high-precision micro-renderers are micro-executable and are used for guaranteeing the training of models. The high-precision micro-renderer can run on the GPU to accelerate the training and rendering process.

The high-precision micro-renderer needs to obtain information of each pixel after rendering, which includes: obtaining a normal vector corresponding to each pixel through interpolation sampling of the normal map, and calculating light reflection and the like after the normal vector is used; obtaining the height corresponding to each pixel through interpolation sampling of the parallax map; the coordinates of the position of the camera and the like can be converted into coordinates of the tangent space by calculating the tangent space of the corresponding surface patch of the pixel; the offset of the texture coordinates can be calculated through the height corresponding to each pixel and the tangential space coordinate position of the camera position, which are obtained before; and finally, carrying out interpolation sampling on the texture map, and introducing offset generated by the parallax map during sampling. And (3) calculating the color of each pixel point by combining the texture obtained by sampling with the illumination model, and finally obtaining a rendering result.

Wherein the illumination model may be a Phong model.

All arithmetic operations, sampling operations, interpolation operations, etc. in the high-precision micro-renderers are micro-executable, so that the training of the model can be ensured. All operations of the micro-renderers are completed by matrix operations, such as offset calculation operations, so that the micro-renderers can be efficiently run on the GPU, and training and rendering processes are accelerated.

TextureGAN, normalGAN and ParallaxGAN may be trained by calculating the loss of output results from the high-precision micro-renderers and back-propagating. Its optimization target is pixel loss L _pixel:

I＝R(I_p,I_n,I_t,Me)

Wherein n represents that n training samples are arranged in the training set, each training sample is a photograph of a normal face, W and H are respectively the width and the height of the photograph,

Y ^ij denotes a label of a pixel at position (i, j), such as a face or a background, and the parameter γ=2. In addition, I, I ^* represent the rendered image and the real face image, respectively, M represents the area of the face in the sample,For Hadamard product, R represents a high-precision micro-parallax renderer, and I _p,I_n,I_t, me represent a parallax map, a normal map, a texture map, and a mesh model of a face, respectively.

In order to accomplish technical goals such as digital person generation, it is necessary to be able to reconstruct the completed head mesh. The half face model is different from the head back model in terms of parameters, precision and the like, and the combination of the front face model and the head back grid to generate the full head model is an important research subject.

In the embodiment of the present invention, after obtaining the reconstructed frontal face model, a full-head model may also be generated by combining the mesh at the back of the head, as shown in fig. 2, and the steps are as follows:

Step 202, selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face.

In particular, the first color depth image may be a color depth photograph obtained by REALSENSE. REALSENSE is a three-dimensional depth camera that can obtain RGB images and the depth of each part of the image. The first color depth image should be at least 5 frames, and each frame of the first color depth image needs to have an RGB photograph of the face and a depth map of the corresponding face at the same time. When the second color depth image is selected, only one image closest to the target angle is needed to be searched, and accurate matching is not needed.

And 204, performing point processing on the second color depth image to obtain a front face point cloud.

Specifically, the front face point cloud is obtained, and meanwhile, the key point and the ear point cloud can be obtained. The ear point cloud may be divided into a left ear point cloud and a right ear point cloud.

And 206, non-rigidly registering the frontal face model to the frontal face point cloud to obtain a half-face model.

Step 208, non-rigidly registering the template full-head model to the half-face model.

Specifically, the template full-head model is a mesh model prepared in advance.

And step 210, splicing the half face model and the head rear part of the template full-head model to obtain a complete full-head model.

According to the technical scheme provided by the embodiment of the invention, the point cloud is obtained based on the color depth image, and the point cloud is used for registering and splicing, so that a high-precision full-head model can be generated. Through point cloud registration, the half face model reconstructed from the previous single photo can be finer. Registering the whole head model of the template can lead the form trend of the whole head model of the template to be true. Further, the half face model and the head rear part of the full head model can be spliced by an interpolation method to obtain a complete and high-precision full head model.

In step 202, the first color depth image is acquired, the first color depth image is preprocessed, and the following five second color depth images with different angles are selected: positive face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

Specifically, the direction and angle of the head turn corresponding to the current face may be calculated according to the key point, and the second color depth images of the five different angles may be selected according to the direction and angle of the head turn. The rotation angle and the rotation direction can be calculated according to the coordinates and the position relation of 68 key points of the face output by the face key point detection module.

In step 206, when the frontal face model is non-rigidly registered to the frontal face point cloud, the frontal face model is matched by using the frontal face point cloud, so that the frontal face point cloud performs deformation fitting grid according to the frontal face model.

The face key point detection module can ensure that the obtained point cloud is just the point cloud of the face, so that the point cloud of the part of the face, namely the background part, of the non-person is omitted. In order to obtain the point clouds of the left ear and the right ear, an ear detection module is required to be used, so that the obtained point clouds are just the point clouds of the left ear and the right ear respectively, and the point clouds of the rest part are omitted. The algorithm used in the non-rigid registration step may be NICP algorithm. NICP algorithm is a modified algorithm of the ICP (ITERATIVE CLOSEST POINT ) algorithm.

In step 208, when the template full-head model is non-rigidly registered to the half-face model, the template full-head model is matched by using the half-face model, so that the half-face model is deformed and fitted according to the template full-head model. The algorithm used in the non-rigid registration step may be NICP algorithm.

In step 210, the stitching process needs to be implemented by interpolation and filling of patches. Specifically, performing position matching on the front face model and the head rear part of the template full-head model in space; and obtaining new vertexes from the front face model and the head rear part of the template full-head model through interpolation calculation, and adding new triangular patches into the full-head face model based on the new vertexes.

Specifically, the matching of the face edge vertex and the back head edge vertex after the position matching can be performed, and the spatial coordinates and the normal vector of the face edge vertex and the back head edge vertex are utilized to perform interpolation by adopting a first difference method.

The first difference method includes any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

The most common method is a method of linear interpolation and trigonometric function offset, and the formula is as follows:

Wherein P ₀,P_M+1 is the coordinates of two points matched with the face and the back of the head respectively, N ₀,N_M+1 is the normal vector of the two points matched with the face and the back of the head respectively, i is the serial number of interpolation points, M is the number of interpolation points, P _i ^* is the coordinates of the final interpolation points, lambda is the super parameter for controlling the interpolation offset, and lambda= -0.001 is defaulted.

When the front face model and the head rear part of the template full-head model are matched in space, semi-fixed topology can be used to ensure that the connection relationship between the face and the head rear part is unchanged.

In the embodiment of the invention, the geometric parameters, texture mapping, normal mapping and parallax mapping of a face are automatically generated from a face photo, so that a high-precision three-dimensional face reconstruction result is obtained, a color depth image is obtained through a depth camera, so that a face point cloud is obtained through calculation, a high-precision half face model is obtained through registration based on the high-precision face reconstruction result and the face point cloud, a template full-head model is non-rigidly registered to the high-precision half face model, and finally the high-precision half face model is spliced with the head rear part in the full-head model to obtain the high-precision full-head model.

As shown in fig. 3, a frontal face photo 301 may be input to an encoder 302 of a face description model, face description information 303 such as a face orientation, a light ray, an identity, an expression, diffuse reflection, etc. may be obtained, and a decoder 304 of the face description model decodes the face description information 303 and outputs the decoded face description information to a three-dimensional grid 305, to obtain a face grid model. The frontal face photo 301 is input to a texture map generation network 306, a normal map generation network 307, and a parallax map generation network 308, respectively, to obtain a texture map, a normal map, and a parallax map, respectively, and the face mesh model, the texture map, the normal map, and the parallax map are input to a parallax micro-renderer 309, to obtain a frontal face model 310.

As shown in fig. 4, a front face point cloud 403 may be obtained from N color depth images 401. In step 411, a first registration is performed to register the frontal face model 402 to the frontal face point cloud 403 to obtain a half-face model 404. In step 412, a second registration is performed to register the template full head model 405 onto the half face model 404. In step 413, the head back part model 406 and the half face model 404 are spliced to obtain the full head model 407.

According to the face model reconstruction method provided by the invention, the face grid model, the high-precision texture map, the parallax map and the method map are generated based on the input single Zhang Ren face photo, and the reconstructed front face model is obtained by inputting the parallax micro-renderer, so that the geometric parameters and the precision of the texture of the face are ensured, and the reconstruction precision of the face model is improved.

The face model reconstruction device provided by the invention is described below, and the face model reconstruction device described below and the face model reconstruction method described above can be referred to correspondingly.

As shown in fig. 5, a face model reconstruction device according to an embodiment of the present invention includes:

the face description unit 502 is configured to input a current face front photo into a preset face description model, and obtain description information of a face, where the description information includes geometric parameters of the current face, and the face description model is a deep neural network model.

And the mesh model obtaining unit 504 is configured to obtain a face mesh model of the current face according to the geometric parameter.

The map generating unit 506 is configured to input the current face front photo into a preset generating network to generate a texture map, a normal map and a parallax map of the current face.

And the face model reconstruction unit 508 is configured to input the face mesh model, the texture map, the normal map, and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

In an embodiment of the present invention, the apparatus further includes: and the selecting unit is used for selecting five frames of second color depth images with different angles from the first color depth images with the angles of the current face. And the processing unit is used for carrying out point processing on the second color depth image to obtain a front face point cloud. And the first registration unit is used for non-rigidly registering the front face model to the front face point cloud to obtain a half face model. And the second registration unit is used for non-rigidly registering the template full-head model to the half-face model. And the splicing unit is used for splicing the half face model with the head rear part of the template full-head model to obtain a complete full-head model.

In an embodiment of the present invention, the generating network includes a texture generator, a normal generator, and a parallax generator, and the apparatus further includes a generating unit configured to: training the texture generator until the texture generator converges to obtain an intermediate generator; and introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

In an embodiment of the present invention, the apparatus further includes a training unit configured to: acquiring a training data set of a historical face photo; and sequentially inputting the historical face forward pictures in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

In an embodiment of the present invention, the mesh model obtaining unit is further configured to: and acquiring coordinates of all vertexes in the face grid through the geometric parameters.

In an embodiment of the present invention, the selecting unit is further configured to: acquiring the first color depth image; preprocessing the first color depth image, and selecting a second color depth image with the following five different angles: positive face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

In an embodiment of the present invention, the processing unit is further configured to: performing point processing on the second color depth image to obtain a front face point cloud and key points; the selecting unit is further configured to: calculating the direction and angle of the head turning of the person corresponding to the current face according to the key points; and selecting the second color depth images with the five different angles according to the steering direction and the steering angle of the head of the person.

In an embodiment of the present invention, the first registration unit is further configured to: matching the frontal face model by utilizing the frontal face point cloud, so that the frontal face point cloud carries out deformation fitting grid according to the frontal face model; the second registration unit is further configured to: and matching the template full-head model by using the half face model, so that the half face model is deformed and fitted according to the template full-head model.

In an embodiment of the present invention, the splicing unit is further configured to: performing position matching on the head rear part of the front face model and the template full-head model in space; and obtaining new vertexes from the front face model and the head rear part of the template full-head model through interpolation calculation, and adding new triangular patches into the full-head face model based on the new vertexes.

In an embodiment of the present invention, the splicing unit is further configured to: and matching the facial edge vertex with the head rear edge vertex according to the position matching, and interpolating by using a first difference device by using the space coordinates and normal vector of the facial edge vertex and the head rear edge vertex.

Since each functional module of the face model reconstruction device according to the exemplary embodiment of the present invention corresponds to a step of the foregoing exemplary embodiment of the face model reconstruction method, for details not disclosed in the embodiment of the device according to the present invention, please refer to the foregoing embodiment of the face model reconstruction method according to the present invention.

The face model reconstruction device provided by the invention generates the face grid model and the high-precision texture map, the parallax map and the method map based on the input single Zhang Ren face photo, and inputs the parallax micro-renderer to obtain the reconstructed front face model, so that the geometric parameters and the precision of the texture of the face are ensured, and the reconstruction precision of the face model is improved.

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a face model reconstruction method comprising: inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face grid model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the face model reconstruction method provided by the above methods, the method comprising: inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face grid model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-provided face model reconstruction methods, the method comprising: inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model; acquiring a face grid model of the current face according to the geometric parameters; inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face; and inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The face model reconstruction method is characterized by comprising the following steps of:

inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model;

acquiring a face grid model of the current face according to the geometric parameters;

Inputting the front photo of the current face into a preset generation network to generate a texture map, a normal map and a parallax map of the current face;

Inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model;

the parallax map is a single-channel parallax map, the height of each point of the face of the person is contained, and the offset of the texture map is calculated through the position of a camera; the texture map offset corresponding to each pixel is obtained from the parallax map in the process of rendering the face through sampling and interpolation, so that the sampling result is required to be corrected according to the offset in the process of texture sampling.

2. The method of claim 1, wherein after the obtaining the reconstructed frontal face model, the method further comprises:

Selecting five frames of second color depth images with different angles from the first color depth images with multiple angles of the current face;

performing point processing on the second color depth image to obtain a front face point cloud;

Non-rigidly registering the frontal face model to the frontal face point cloud to obtain a half-face model;

Non-rigidly registering a template full-head model to the half-face model;

and splicing the half face model with the head rear part of the template full-head model to obtain a complete full-head model.

3. The method of claim 1, wherein the generation network comprises a texture generator, a normal generator, and a disparity generator, the training method of the generation network comprising:

Training the texture generator until the texture generator converges to obtain an intermediate generator;

and introducing the normal generator and the parallax generator, loading the intermediate generator, and synchronously adjusting the texture generator, the normal generator and the parallax generator.

4. The method according to claim 1, wherein the training method of the face description model comprises:

Acquiring a training data set of a historical face photo;

And sequentially inputting the historical face forward pictures in the training data set into an initial face description model, and adjusting parameters of the face description model according to the obtained output data to obtain a final face description model.

5. The method of claim 1, wherein obtaining a face mesh model of the current face from the geometric parameters comprises:

and acquiring coordinates of all vertexes in the face grid through the geometric parameters.

6. The method of claim 2, wherein selecting five different angles of the second color depth image from the plurality of angles of the current face of the first color depth image comprises:

Acquiring the first color depth image;

preprocessing the first color depth image, and selecting a second color depth image with the following five different angles:

positive face, 15 ° left, 30 ° left, 15 ° right, and 30 ° right.

7. The method of claim 6, wherein the performing point processing on the second color depth image to obtain a front face point cloud comprises:

Performing point processing on the second color depth image to obtain a front face point cloud and key points;

The selecting the second color depth image of the following five different angles includes: calculating the direction and angle of the head turning of the person corresponding to the current face according to the key points;

And selecting the second color depth images with the five different angles according to the steering direction and the steering angle of the head of the person.

8. The method of claim 7, wherein non-rigidly registering the frontal face model to the frontal face point cloud results in a half-face model, comprising:

Matching the frontal face model by utilizing the frontal face point cloud, so that the frontal face point cloud carries out deformation fitting grid according to the frontal face model;

the non-rigid registration of the template full-head model to the half-face model includes:

And matching the template full-head model by using the half face model, so that the half face model is deformed and fitted according to the template full-head model.

9. The method of claim 2, wherein the stitching the half face model with the back-to-head portion of the template full-head model results in a complete full-head model, comprising:

performing position matching on the head rear part of the front face model and the template full-head model in space;

and obtaining new vertexes in the front face model and the head rear part of the template full-head model through interpolation calculation, and adding new triangular patches into the full-head model based on the new vertexes.

10. The method of claim 9, wherein the obtaining new vertices in the frontal face model and the back-of-head portion of the template full-head model by interpolation computation comprises: and matching the face edge vertex with the head rear edge vertex according to the position matching, and interpolating by using the space coordinates and normal vector of the face edge vertex and the head rear edge vertex by adopting a first difference value method.

11. The method of claim 10, wherein the first difference method comprises any one of the following difference methods: linear interpolation, quadratic interpolation, and linear interpolation plus trigonometric offset.

12. A face model reconstruction device, comprising:

the face description unit is used for inputting the front photo of the current face into a preset face description model to obtain description information of the face, wherein the description information comprises geometric parameters of the current face, and the face description model is a deep neural network model;

The grid model acquisition unit is used for acquiring a face grid model of the current face according to the geometric parameters;

the mapping generation unit is used for inputting the front photo of the current face into a preset generation network to generate a texture mapping, a normal mapping and a parallax mapping of the current face;

The face model reconstruction unit is used for inputting the face grid model, the texture map, the normal map and the parallax map into a parallax micro-renderer to obtain a reconstructed front face model;

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 11 when the program is executed.

14. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 11.