CN114842136A - Single-image three-dimensional face reconstruction method based on differentiable renderer - Google Patents

Single-image three-dimensional face reconstruction method based on differentiable renderer Download PDF

Info

Publication number
CN114842136A
CN114842136A CN202210365752.2A CN202210365752A CN114842136A CN 114842136 A CN114842136 A CN 114842136A CN 202210365752 A CN202210365752 A CN 202210365752A CN 114842136 A CN114842136 A CN 114842136A
Authority
CN
China
Prior art keywords
image
face
dimensional face
renderer
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210365752.2A
Other languages
Chinese (zh)
Inventor
傅予力
梁俊韬
蔡磊
向友君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210365752.2A priority Critical patent/CN114842136A/en
Publication of CN114842136A publication Critical patent/CN114842136A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Graphics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a single-image three-dimensional face reconstruction method based on a differentiable renderer, which comprises the following steps: s1, inputting the target face image into a pre-trained regression network to obtain initialized three-dimensional face parameters; s2, inputting the three-dimensional face parameters and the posture parameters into a differentiable renderer to respectively obtain rendering images with the same posture as the input images and rendering images with different postures; s3, comparing the target image with the rendered image in the same posture to obtain a key point loss value and a pixel level loss value; s4, comparing the target image with the rendered images in different postures to obtain a generated countermeasure loss value and an identity consistency loss value; and S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters. The invention introduces a true differentiable renderer to obtain a three-dimensional face reconstruction result with higher quality.

Description

Single-image three-dimensional face reconstruction method based on differentiable renderer
Technical Field
The invention belongs to the field of image processing, and relates to a single-image three-dimensional face reconstruction method based on a differentiable renderer.
Background
With the continuous development of face recognition technology, three-dimensional face reconstruction technology is gradually developed into an important application branch of computer graphics. The traditional three-dimensional face reconstruction technology mainly depends on expensive three-dimensional scanning equipment and a large amount of post-manual processing. Therefore, how to use the two-dimensional face picture to perform fast and accurate three-dimensional face model reconstruction is a research focus.
The most advanced three-dimensional face reconstruction methods at present can be roughly divided into two types, namely learning-based methods and optimization-based methods. The deep learning-based method usually adopts a regression mode, and learns and regresses corresponding three-dimensional face model parameters by taking a face image as input. However, these methods usually require a large amount of labeled data, and the acquisition of the parameters of the real three-dimensional face model is difficult. On the other hand, the optimization-based method generally regards face imaging as a generation process, takes a series of parameters (including face geometry, albedo, texture, illumination, viewing angle, and the like) as input, generates a rendered image according to a certain graphic rule, and optimizes the input parameters by minimizing the distance between the rendered image and a target image.
Recent developments in micro-renderable devices provide an efficient tool optimization framework for both types of face reconstruction methods. Regression parameters in the learning-based method can be rendered into the image, and the parameters are optimized by utilizing pixel loss, so that unsupervised training can be realized. For optimization-based approaches, the micro-renderer introduces a gradient-based optimization, allowing more complex penalty functions to be employed, and stabilizing the training process. Most existing methods simply use z-buffer rendering, which is not necessarily differentiable, especially when the triangle surrounding each pixel changes during the optimization process. These methods make the reconstructed three-dimensional face somewhat flawed in reality and some texture details (hair, eyebrows, wrinkles).
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art, provides a single-image three-dimensional face reconstruction method based on a differentiable renderer and solves the problem that three-dimensional face reconstruction cannot be accurately realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
a single-image three-dimensional face reconstruction method based on a differentiable renderer comprises the following steps:
S1, inputting the target face image into a pre-trained three-dimensional face model to obtain an initialized three-dimensional face parameter;
s2, inputting the initialized three-dimensional face parameters and the attitude parameters into a differentiable renderer to respectively obtain a rendered image with the same attitude as the input image and rendered images with different attitudes;
s3, comparing the target face image with the rendered image in the same posture to obtain a key point loss value and a pixel level loss value;
s4, comparing the target face image with the rendered images in different postures to obtain a generated confrontation loss value and an identity consistency loss value;
and S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
Further, step S1 specifically includes:
inputting the target Face image into a pre-trained Large Scale Face Model three-dimensional Face Model, and regressing to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0
The expression of the three-dimensional face model is as follows:
Figure BDA0003586912290000021
wherein S represents a three-dimensional face shape, S represents an S-average face shape, S α And s β The face identity and the face expression vector are respectively the basis, alpha is a face identity coefficient with 158 dimensions, and beta is a face expression coefficient with 29 dimensions.
Further, the attitude parameters are obtained by random sampling, wherein the pitch and yaw angles are sampled from U (-40 °, 40 °), and the roll angles are sampled from U (-15 °, 15 °).
Further, inputting the initialized three-dimensional face parameters and the posture parameters obtained by regression into a differentiable renderer to respectively obtain a rendered image G with the same posture as the input image 1 And a rendered image G having a different pose from the input image 2
Further, the differentiable renderer considers an aggregation mechanism of probability contributions of all triangles to process the occlusion, and the implementation steps are as follows:
construction of a probability map to estimate the triangle f j Probability contribution to the pixel Pi
Figure BDA0003586912290000031
Figure BDA0003586912290000032
Wherein,
Figure BDA0003586912290000033
σ is a scalar quantity controlling the sharpness of the probability distribution, and d (i, j) is the pixels Pi to f j The euclidean distance of the edges is,
Figure BDA0003586912290000034
mapping the input variable to be between (0, 1);
finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
Figure BDA0003586912290000035
where T is the pixel value on the texture map, T b Representing the background color.
Further, step S3 is specifically:
calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the input original image pix And the loss of key point L lan
The pixel level error is optimized based on the pixel value difference, the illumination parameter is optimized, and the expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing an input original image, R representing a differentiable renderer, and R (alpha, beta, T, p) being a rendered image generated by the renderer under a random attitude parameter p;
carrying out 68 key point detection on the face image by adopting a depth alignment network M for key point loss;
inputting an original image and a rendered image to a depth alignment network, obtaining 68 key point coordinates of the original image and the rendered image related to human face features, calculating point-to-point Euclidean distances among the key points to optimize a reconstruction result, and enabling the postures of the rendered image and the input image to be consistent, wherein the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2
where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
Further, step S4 is specifically:
calculating the generation resisting loss L of the generated rendering image different from the input posture according to the generated rendering image adv And identity consistency loss L id
The generation route is regarded as a generator part for generating the confrontation network, a discriminator D is designed to judge whether the generator generates an image or not, a generation-confrontation mechanism is utilized to optimize a renderer to generate a more realistic face image, and the expression is as follows:
L adv =logD(I 0 )+log(1-D(R(α,β,T,p)))
Wherein R (alpha, beta, T, p) is a rendering image generated by the renderer under the random attitude parameter p; d () represents an input discriminator which judges whether an input image is a real image, to obtain a probability of being a real image;
training a face recognition network, respectively extracting 256-dimensional face features of an input image and a rendered image, calculating cosine distance as similarity, and designing an identity consistency loss function to judge whether the two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
Further, specifically, MTCNN is used as a depth alignment network, and the face image is input to obtain 68 key point coordinates of the face image with respect to the face features.
Further, the face recognition network specifically adopts Light CNN-29v 2.
Further, the human face features are specifically eyes, eyebrows, nose, mouth, and contours.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method introduces a truly differentiable renderer, can better learn the dense correspondence between the two-dimensional image and the three-dimensional face by utilizing a gradient descent algorithm compared with an incompletely differentiable renderer, and obtains a three-dimensional face reconstruction result with higher quality.
2. For an optimization-based method, input three-dimensional face parameters are too abstract for a neural network, and a generated image often lacks authenticity and identity consistency; the method of the invention adds a production confrontation network of a branch, can generate the image which is consistent with the target image in identity but different in posture, and utilizes the introduced generation confrontation loss and identity consistency loss to ensure that the finally optimized three-dimensional face result has higher authenticity and more detailed texture characteristics.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a network block diagram of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 1 and fig. 2, a single-image three-dimensional face reconstruction method based on a differentiable renderer includes the following steps:
s1, inputting the target face image into a regression network trained to obtain initialized three-dimensional face parameters; in this embodiment, the following concrete steps are performed:
inputting a target Face image into a pre-trained three-dimensional Face Model, outputting the three-dimensional Model by adopting a current universal Large Scale Face Model to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 Etc.;
wherein, the expression of the 3DMM shape model is as follows:
Figure BDA0003586912290000061
wherein S represents a three-dimensional face shape, S represents an S-average face shape, S α And s β The face identity and the face expression vector are respectively the basis, alpha is a face identity coefficient with 158 dimensions, and beta is a face expression coefficient with 29 dimensions.
S2, inputting the initialized three-dimensional face parameters and the initialized posture parameters into a differentiable renderer to respectively obtain a rendered image G with the same posture as the input image 1 And rendered images G of different poses 2
In this embodiment, G 1 Initialization from the result in step 1The dimensional face parameters comprise an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 And (6) rendering to obtain. G 2 In the rendering process, the input identity, expression and texture parameters and G 1 The input of the rendering process is the same, but the attitude parameter p is obtained by random sampling, wherein the pitching and yawing angles are sampled from U (-40 degrees and 40 degrees), namely are uniformly distributed, and the value range is from-40 degrees to 40 degrees; the sampling and roll angles were sampled from U (-15 °, 15 °).
The differentiable renderer in the embodiment is different from a z-buffer renderer in the traditional rasterization rendering; at present, z-buffer rasterization renderers are mostly adopted for renderers based on three-dimensional grids as input. The principle of the traditional rasterization renderer is based on computer vision rules, a three-dimensional grid is input, a triangular patch is appointed for each pixel of a rendered image, interpolation is carried out according to attributes (illumination, texture and normal vector) of vertexes in the triangular patch corresponding to the pixel, the color of each pixel in the rendered image is calculated, and the rendered two-dimensional image is obtained. The traditional rasterization process is a discrete process and an undifferentiated process, and in the subject research of the single-image three-dimensional reconstruction, the traditional rasterization renderer cannot optimize reconstruction parameters by using a gradient descent algorithm, so that the problem of low reconstruction quality of the human face with large posture (namely, non-front and partially shielded) is caused. The differentiable renderer of the embodiment considers an aggregation mechanism of probability contributions of all triangles to process occlusion, and the implementation steps are as follows:
Construction of probability map estimation triangle f j Probabilistic contribution to the pixel Pi
Figure BDA0003586912290000071
Figure BDA0003586912290000072
Wherein,
Figure BDA0003586912290000073
sigma controlling the sharpness of the probability distributionScalar quantity d (i, j) is the pixels Pi to f j The euclidean distance of the edges is,
Figure BDA0003586912290000074
mapping the input variable to be between (0, 1); in this example, σ is 0.01.
Finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
Figure BDA0003586912290000075
where T is the pixel value on the texture map, T b Representing the background color.
S3, calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the target human face image pix And the loss of key point L lan
Pixel level errors (pixel loss) are optimized based on pixel value differences, optimizing lighting parameters such as ambient color, direction, distance, and color of the light source, helping to improve recovery of texture features. The expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing a target face image, and R representing a renderer;
keypoint loss (landmark loss) a depth-aligned network M is used to perform 68 keypoint detection on the face image. In this embodiment, MTCNN is used as a depth alignment network, an original image and a rendered image are input, 68 key point coordinates of facial features (eyes, eyebrows, nose, mouth, and contour) of the original image and the rendered image are obtained, a point-to-point euclidean distance between the key points is calculated to optimize a reconstruction result, so that the postures of the rendered image and the input image are consistent, and the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2
Where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
S4, calculating the generation resisting loss L of the generated rendering image different from the input posture and the input image adv And identity consistency loss L id
Regarding the generation route as a Generator part (Generator) of a generation countermeasure network, designing a discriminator D, wherein the discriminator is of a general decoder network structure, inputting the produced rendering image and the original image into the discriminator D together to finally obtain a probability, the probability represents the probability that the image is a real image, and is used for judging whether the Generator generates the image (real/fake), and optimizing the renderer by using a generation-countermeasure mechanism to generate a face image with more authenticity, and the expression is as follows:
L adv =logD(I 0 )+log(1-D(R(α,β,T,p)))
wherein, R (α, β, T, p) is a rendered image generated by the renderer under the random attitude parameter p, and D () represents an input discriminator to judge whether the input image is a real image, to obtain a probability of being the real image.
Wherein, the generation-countermeasure mechanism, the training target of the generator is to generate the image with enough fidelity, and the training target of the discriminator is to judge the reality of the image more accurately. The generator and the discriminator are in a game relationship, in the embodiment and the formula, the goal of the generator (renderer) is to enable the generated rendering image under the random pose parameter p to "cheat" the discriminator, and finally the discriminator is trained to be unable to distinguish the authenticity of the real image and the generated rendering image, and the formula represents that the generator (renderer) is designed and optimized.
Training a face recognition network F R In this embodiment, Light CNN-29v2 is used as a face recognition network to extract 256-dimensional face features (Embeddings) of an input image and a rendered image, calculate a cosine distance between the input image and the rendered image as similarity, and design an identity consistency loss function to determine whether two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
And S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A single-image three-dimensional face reconstruction method based on a differentiable renderer is characterized by comprising the following steps:
s1, inputting the target face image into a pre-trained three-dimensional face model to obtain an initialized three-dimensional face parameter;
s2, inputting the initialized three-dimensional face parameters and the initialized posture parameters into a differentiable renderer to respectively obtain a rendered image with the same posture as the input image and rendered images with different postures;
s3, comparing the target face image with the rendered image under the same posture to obtain a key point loss value and a pixel-level loss value;
s4, comparing the target face image with the rendered images in different postures to obtain a generated confrontation loss value and an identity consistency loss value;
And S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
2. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S1 specifically comprises:
inputting the target Face image into a pre-trained Large Scale Face Model three-dimensional Face Model, and regressing to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0
The expression of the three-dimensional face model is as follows:
Figure FDA0003586912280000011
wherein S represents a three-dimensional face shape, representing
Figure FDA0003586912280000012
Average face shape, s α And s β The face identity and the face expression vector are respectively the basis, alpha is a face identity coefficient with 158 dimensions, and beta is a face expression coefficient with 29 dimensions.
3. The differentiable renderer-based single-image three-dimensional face reconstruction method according to claim 1, wherein the pose parameters are obtained by random sampling, wherein the pitch and yaw angles are sampled from U (-40 °, 40 °), and the roll angles are sampled from U (-15 °, 15 °).
4. The method of claim 1, wherein the initialized three-dimensional face parameters and pose parameters obtained by the regression are input into the differential renderer to obtain a rendered image G with the same pose as the input image respectively 1 And a rendered image G having a different pose from the input image 2
5. The method for reconstructing single-image three-dimensional human face based on differentiable renderer according to claim 4, wherein the differentiable renderer considers the aggregation mechanism of all triangle probability contributions to process occlusion, and the implementation steps are as follows:
construction of probability map estimation triangle f j Probabilistic contribution to the pixel Pi
Figure FDA0003586912280000021
Figure FDA0003586912280000022
Wherein,
Figure FDA0003586912280000023
σ is a scalar quantity controlling sharpness of probability distribution, and d (i, j) is pixels Pi to f j The euclidean distance of the edges is,
Figure FDA0003586912280000024
mapping the input variable to be between (0, 1);
finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
Figure FDA0003586912280000025
wherein T isPixel value, T, on texture map b Representing the background color.
6. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S3 specifically comprises:
calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the input original image pix And the loss of key point L lan
The pixel level error is optimized based on the pixel value difference, the illumination parameter is optimized, and the expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing an input original image, R representing a differentiable renderer, and R (alpha, beta, T, p) being a rendered image generated by the renderer under a random attitude parameter p;
Carrying out 68 key point detection on the face image by adopting a depth alignment network M for key point loss;
inputting an original image and a rendered image to a depth alignment network, obtaining 68 key point coordinates of the original image and the rendered image related to human face features, calculating point-to-point Euclidean distances among the key points to optimize a reconstruction result, and enabling the postures of the rendered image and the input image to be consistent, wherein the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2
where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
7. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S4 specifically comprises:
calculating the generation resisting loss L of the generated rendering image different from the input posture adv Loss of identity consistency L id
The generation route is regarded as a generator part for generating the confrontation network, a discriminator D is designed to judge whether the generator generates an image or not, a generation-confrontation mechanism is utilized to optimize a renderer to generate a more realistic face image, and the expression is as follows:
L adv =log D(I 0 )+log(1-D(R(α,β,T,p)))
wherein R (alpha, beta, T, p) is a rendering image generated by the renderer under the random attitude parameter p; d () represents an input discriminator which judges whether an input image is a real image, to obtain a probability of being a real image;
Training a face recognition network, respectively extracting 256-dimensional face features of an input image and a rendered image, calculating cosine distance as similarity, and designing an identity consistency loss function to judge whether the two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
8. The differential renderer-based single-graph three-dimensional face reconstruction method as claimed in claim 6, wherein MTCNN is used as a depth alignment network to input the face image and obtain its 68 key point coordinates related to the face features.
9. The method for reconstructing single-image three-dimensional human face based on differentiable renderer as claimed in claim 7, wherein the human face recognition network specifically adopts Light CNN-29v 2.
10. The differential renderer based single image three-dimensional face reconstruction method according to claim 6 or 7, characterized in that the face features are eyes, eyebrows, nose, mouth and contour.
CN202210365752.2A 2022-04-08 2022-04-08 Single-image three-dimensional face reconstruction method based on differentiable renderer Pending CN114842136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210365752.2A CN114842136A (en) 2022-04-08 2022-04-08 Single-image three-dimensional face reconstruction method based on differentiable renderer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210365752.2A CN114842136A (en) 2022-04-08 2022-04-08 Single-image three-dimensional face reconstruction method based on differentiable renderer

Publications (1)

Publication Number Publication Date
CN114842136A true CN114842136A (en) 2022-08-02

Family

ID=82564953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210365752.2A Pending CN114842136A (en) 2022-04-08 2022-04-08 Single-image three-dimensional face reconstruction method based on differentiable renderer

Country Status (1)

Country Link
CN (1) CN114842136A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588108A (en) * 2022-11-02 2023-01-10 上海人工智能创新中心 Method, electronic device and medium for generating sequence images
CN116206035A (en) * 2023-01-12 2023-06-02 北京百度网讯科技有限公司 Face reconstruction method, device, electronic equipment and storage medium
CN116363329A (en) * 2023-03-08 2023-06-30 广州中望龙腾软件股份有限公司 Three-dimensional image generation method and system based on CGAN and LeNet-5
CN116978102A (en) * 2023-08-04 2023-10-31 深圳市英锐存储科技有限公司 Face feature modeling and recognition method, chip and terminal
CN116993929A (en) * 2023-09-27 2023-11-03 北京大学深圳研究生院 Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium
CN118115638A (en) * 2024-01-24 2024-05-31 广州紫为云科技有限公司 Monocular three-dimensional facial expression driving system based on deep learning and optimizing method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588108A (en) * 2022-11-02 2023-01-10 上海人工智能创新中心 Method, electronic device and medium for generating sequence images
CN115588108B (en) * 2022-11-02 2024-05-14 上海人工智能创新中心 Method, electronic equipment and medium for generating sequence image
CN116206035A (en) * 2023-01-12 2023-06-02 北京百度网讯科技有限公司 Face reconstruction method, device, electronic equipment and storage medium
CN116206035B (en) * 2023-01-12 2023-12-01 北京百度网讯科技有限公司 Face reconstruction method, device, electronic equipment and storage medium
CN116363329A (en) * 2023-03-08 2023-06-30 广州中望龙腾软件股份有限公司 Three-dimensional image generation method and system based on CGAN and LeNet-5
CN116363329B (en) * 2023-03-08 2023-11-03 广州中望龙腾软件股份有限公司 Three-dimensional image generation method and system based on CGAN and LeNet-5
CN116978102A (en) * 2023-08-04 2023-10-31 深圳市英锐存储科技有限公司 Face feature modeling and recognition method, chip and terminal
CN116993929A (en) * 2023-09-27 2023-11-03 北京大学深圳研究生院 Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium
CN116993929B (en) * 2023-09-27 2024-01-16 北京大学深圳研究生院 Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium
CN118115638A (en) * 2024-01-24 2024-05-31 广州紫为云科技有限公司 Monocular three-dimensional facial expression driving system based on deep learning and optimizing method

Similar Documents

Publication Publication Date Title
US11302064B2 (en) Method and apparatus for reconstructing three-dimensional model of human body, and storage medium
CN114842136A (en) Single-image three-dimensional face reconstruction method based on differentiable renderer
US9679192B2 (en) 3-dimensional portrait reconstruction from a single photo
Shi et al. Automatic acquisition of high-fidelity facial performances using monocular videos
CN113269862B (en) Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment
WO2020108304A1 (en) Method for reconstructing face mesh model, device, apparatus and storage medium
Liu et al. Humangaussian: Text-driven 3d human generation with gaussian splatting
JP2006520054A (en) Image matching from invariant viewpoints and generation of 3D models from 2D images
CN113111861A (en) Face texture feature extraction method, 3D face reconstruction method, device and storage medium
CN108564619B (en) Realistic three-dimensional face reconstruction method based on two photos
Jin et al. Robust 3D face modeling and reconstruction from frontal and side images
CN114648613A (en) Three-dimensional head model reconstruction method and device based on deformable nerve radiation field
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
US10521892B2 (en) Image lighting transfer via multi-dimensional histogram matching
CN115861525A (en) Multi-view face reconstruction method based on parameterized model
CN111402403B (en) High-precision three-dimensional face reconstruction method
Song et al. A generic framework for efficient 2-D and 3-D facial expression analogy
CN115953513A (en) Method, device, equipment and medium for reconstructing drivable three-dimensional human head model
CN118196306A (en) 3D modeling reconstruction system, method and device based on point cloud information and Gaussian cloud cluster
Wang et al. Digital twin: Acquiring high-fidelity 3D avatar from a single image
Feng et al. Fdgaussian: Fast gaussian splatting from single image via geometric-aware diffusion model
Ma et al. X-dreamer: Creating high-quality 3d content by bridging the domain gap between text-to-2d and text-to-3d generation
Jeong et al. Automatic generation of subdivision surface head models from point cloud data
CN115984510A (en) Stylized face texture modeling method, system, equipment and storage medium
CN111611997B (en) Cartoon customized image motion video generation method based on human body action migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination