CN114842136A - Single-image three-dimensional face reconstruction method based on differentiable renderer - Google Patents
Single-image three-dimensional face reconstruction method based on differentiable renderer Download PDFInfo
- Publication number
- CN114842136A CN114842136A CN202210365752.2A CN202210365752A CN114842136A CN 114842136 A CN114842136 A CN 114842136A CN 202210365752 A CN202210365752 A CN 202210365752A CN 114842136 A CN114842136 A CN 114842136A
- Authority
- CN
- China
- Prior art keywords
- image
- face
- dimensional face
- renderer
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000036544 posture Effects 0.000 claims abstract description 31
- 238000009877 rendering Methods 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000008921 facial expression Effects 0.000 claims description 6
- 210000004709 eyebrow Anatomy 0.000 claims description 4
- 230000001815 facial effect Effects 0.000 claims description 4
- 238000005286 illumination Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000013604 expression vector Substances 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 2
- 208000032538 Depersonalisation Diseases 0.000 claims 1
- 230000006870 function Effects 0.000 description 7
- 238000005457 optimization Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Computer Graphics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single-image three-dimensional face reconstruction method based on a differentiable renderer, which comprises the following steps: s1, inputting the target face image into a pre-trained regression network to obtain initialized three-dimensional face parameters; s2, inputting the three-dimensional face parameters and the posture parameters into a differentiable renderer to respectively obtain rendering images with the same posture as the input images and rendering images with different postures; s3, comparing the target image with the rendered image in the same posture to obtain a key point loss value and a pixel level loss value; s4, comparing the target image with the rendered images in different postures to obtain a generated countermeasure loss value and an identity consistency loss value; and S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters. The invention introduces a true differentiable renderer to obtain a three-dimensional face reconstruction result with higher quality.
Description
Technical Field
The invention belongs to the field of image processing, and relates to a single-image three-dimensional face reconstruction method based on a differentiable renderer.
Background
With the continuous development of face recognition technology, three-dimensional face reconstruction technology is gradually developed into an important application branch of computer graphics. The traditional three-dimensional face reconstruction technology mainly depends on expensive three-dimensional scanning equipment and a large amount of post-manual processing. Therefore, how to use the two-dimensional face picture to perform fast and accurate three-dimensional face model reconstruction is a research focus.
The most advanced three-dimensional face reconstruction methods at present can be roughly divided into two types, namely learning-based methods and optimization-based methods. The deep learning-based method usually adopts a regression mode, and learns and regresses corresponding three-dimensional face model parameters by taking a face image as input. However, these methods usually require a large amount of labeled data, and the acquisition of the parameters of the real three-dimensional face model is difficult. On the other hand, the optimization-based method generally regards face imaging as a generation process, takes a series of parameters (including face geometry, albedo, texture, illumination, viewing angle, and the like) as input, generates a rendered image according to a certain graphic rule, and optimizes the input parameters by minimizing the distance between the rendered image and a target image.
Recent developments in micro-renderable devices provide an efficient tool optimization framework for both types of face reconstruction methods. Regression parameters in the learning-based method can be rendered into the image, and the parameters are optimized by utilizing pixel loss, so that unsupervised training can be realized. For optimization-based approaches, the micro-renderer introduces a gradient-based optimization, allowing more complex penalty functions to be employed, and stabilizing the training process. Most existing methods simply use z-buffer rendering, which is not necessarily differentiable, especially when the triangle surrounding each pixel changes during the optimization process. These methods make the reconstructed three-dimensional face somewhat flawed in reality and some texture details (hair, eyebrows, wrinkles).
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art, provides a single-image three-dimensional face reconstruction method based on a differentiable renderer and solves the problem that three-dimensional face reconstruction cannot be accurately realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
a single-image three-dimensional face reconstruction method based on a differentiable renderer comprises the following steps:
S1, inputting the target face image into a pre-trained three-dimensional face model to obtain an initialized three-dimensional face parameter;
s2, inputting the initialized three-dimensional face parameters and the attitude parameters into a differentiable renderer to respectively obtain a rendered image with the same attitude as the input image and rendered images with different attitudes;
s3, comparing the target face image with the rendered image in the same posture to obtain a key point loss value and a pixel level loss value;
s4, comparing the target face image with the rendered images in different postures to obtain a generated confrontation loss value and an identity consistency loss value;
and S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
Further, step S1 specifically includes:
inputting the target Face image into a pre-trained Large Scale Face Model three-dimensional Face Model, and regressing to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 ;
The expression of the three-dimensional face model is as follows:
wherein S represents a three-dimensional face shape, S represents an S-average face shape, S α And s β The face identity and the face expression vector are respectively the basis, alpha is a face identity coefficient with 158 dimensions, and beta is a face expression coefficient with 29 dimensions.
Further, the attitude parameters are obtained by random sampling, wherein the pitch and yaw angles are sampled from U (-40 °, 40 °), and the roll angles are sampled from U (-15 °, 15 °).
Further, inputting the initialized three-dimensional face parameters and the posture parameters obtained by regression into a differentiable renderer to respectively obtain a rendered image G with the same posture as the input image 1 And a rendered image G having a different pose from the input image 2 。
Further, the differentiable renderer considers an aggregation mechanism of probability contributions of all triangles to process the occlusion, and the implementation steps are as follows:
construction of a probability map to estimate the triangle f j Probability contribution to the pixel Pi
Wherein,σ is a scalar quantity controlling the sharpness of the probability distribution, and d (i, j) is the pixels Pi to f j The euclidean distance of the edges is,mapping the input variable to be between (0, 1);
finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
where T is the pixel value on the texture map, T b Representing the background color.
Further, step S3 is specifically:
calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the input original image pix And the loss of key point L lan ;
The pixel level error is optimized based on the pixel value difference, the illumination parameter is optimized, and the expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing an input original image, R representing a differentiable renderer, and R (alpha, beta, T, p) being a rendered image generated by the renderer under a random attitude parameter p;
carrying out 68 key point detection on the face image by adopting a depth alignment network M for key point loss;
inputting an original image and a rendered image to a depth alignment network, obtaining 68 key point coordinates of the original image and the rendered image related to human face features, calculating point-to-point Euclidean distances among the key points to optimize a reconstruction result, and enabling the postures of the rendered image and the input image to be consistent, wherein the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2
where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
Further, step S4 is specifically:
calculating the generation resisting loss L of the generated rendering image different from the input posture according to the generated rendering image adv And identity consistency loss L id :
The generation route is regarded as a generator part for generating the confrontation network, a discriminator D is designed to judge whether the generator generates an image or not, a generation-confrontation mechanism is utilized to optimize a renderer to generate a more realistic face image, and the expression is as follows:
L adv =logD(I 0 )+log(1-D(R(α,β,T,p)))
Wherein R (alpha, beta, T, p) is a rendering image generated by the renderer under the random attitude parameter p; d () represents an input discriminator which judges whether an input image is a real image, to obtain a probability of being a real image;
training a face recognition network, respectively extracting 256-dimensional face features of an input image and a rendered image, calculating cosine distance as similarity, and designing an identity consistency loss function to judge whether the two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
Further, specifically, MTCNN is used as a depth alignment network, and the face image is input to obtain 68 key point coordinates of the face image with respect to the face features.
Further, the face recognition network specifically adopts Light CNN-29v 2.
Further, the human face features are specifically eyes, eyebrows, nose, mouth, and contours.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method introduces a truly differentiable renderer, can better learn the dense correspondence between the two-dimensional image and the three-dimensional face by utilizing a gradient descent algorithm compared with an incompletely differentiable renderer, and obtains a three-dimensional face reconstruction result with higher quality.
2. For an optimization-based method, input three-dimensional face parameters are too abstract for a neural network, and a generated image often lacks authenticity and identity consistency; the method of the invention adds a production confrontation network of a branch, can generate the image which is consistent with the target image in identity but different in posture, and utilizes the introduced generation confrontation loss and identity consistency loss to ensure that the finally optimized three-dimensional face result has higher authenticity and more detailed texture characteristics.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a network block diagram of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 1 and fig. 2, a single-image three-dimensional face reconstruction method based on a differentiable renderer includes the following steps:
s1, inputting the target face image into a regression network trained to obtain initialized three-dimensional face parameters; in this embodiment, the following concrete steps are performed:
inputting a target Face image into a pre-trained three-dimensional Face Model, outputting the three-dimensional Model by adopting a current universal Large Scale Face Model to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 Etc.;
wherein, the expression of the 3DMM shape model is as follows:
wherein S represents a three-dimensional face shape, S represents an S-average face shape, S α And s β The face identity and the face expression vector are respectively the basis, alpha is a face identity coefficient with 158 dimensions, and beta is a face expression coefficient with 29 dimensions.
S2, inputting the initialized three-dimensional face parameters and the initialized posture parameters into a differentiable renderer to respectively obtain a rendered image G with the same posture as the input image 1 And rendered images G of different poses 2 ;
In this embodiment, G 1 Initialization from the result in step 1The dimensional face parameters comprise an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 And (6) rendering to obtain. G 2 In the rendering process, the input identity, expression and texture parameters and G 1 The input of the rendering process is the same, but the attitude parameter p is obtained by random sampling, wherein the pitching and yawing angles are sampled from U (-40 degrees and 40 degrees), namely are uniformly distributed, and the value range is from-40 degrees to 40 degrees; the sampling and roll angles were sampled from U (-15 °, 15 °).
The differentiable renderer in the embodiment is different from a z-buffer renderer in the traditional rasterization rendering; at present, z-buffer rasterization renderers are mostly adopted for renderers based on three-dimensional grids as input. The principle of the traditional rasterization renderer is based on computer vision rules, a three-dimensional grid is input, a triangular patch is appointed for each pixel of a rendered image, interpolation is carried out according to attributes (illumination, texture and normal vector) of vertexes in the triangular patch corresponding to the pixel, the color of each pixel in the rendered image is calculated, and the rendered two-dimensional image is obtained. The traditional rasterization process is a discrete process and an undifferentiated process, and in the subject research of the single-image three-dimensional reconstruction, the traditional rasterization renderer cannot optimize reconstruction parameters by using a gradient descent algorithm, so that the problem of low reconstruction quality of the human face with large posture (namely, non-front and partially shielded) is caused. The differentiable renderer of the embodiment considers an aggregation mechanism of probability contributions of all triangles to process occlusion, and the implementation steps are as follows:
Wherein,sigma controlling the sharpness of the probability distributionScalar quantity d (i, j) is the pixels Pi to f j The euclidean distance of the edges is,mapping the input variable to be between (0, 1); in this example, σ is 0.01.
Finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
where T is the pixel value on the texture map, T b Representing the background color.
S3, calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the target human face image pix And the loss of key point L lan ;
Pixel level errors (pixel loss) are optimized based on pixel value differences, optimizing lighting parameters such as ambient color, direction, distance, and color of the light source, helping to improve recovery of texture features. The expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing a target face image, and R representing a renderer;
keypoint loss (landmark loss) a depth-aligned network M is used to perform 68 keypoint detection on the face image. In this embodiment, MTCNN is used as a depth alignment network, an original image and a rendered image are input, 68 key point coordinates of facial features (eyes, eyebrows, nose, mouth, and contour) of the original image and the rendered image are obtained, a point-to-point euclidean distance between the key points is calculated to optimize a reconstruction result, so that the postures of the rendered image and the input image are consistent, and the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2 。
Where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
S4, calculating the generation resisting loss L of the generated rendering image different from the input posture and the input image adv And identity consistency loss L id :
Regarding the generation route as a Generator part (Generator) of a generation countermeasure network, designing a discriminator D, wherein the discriminator is of a general decoder network structure, inputting the produced rendering image and the original image into the discriminator D together to finally obtain a probability, the probability represents the probability that the image is a real image, and is used for judging whether the Generator generates the image (real/fake), and optimizing the renderer by using a generation-countermeasure mechanism to generate a face image with more authenticity, and the expression is as follows:
L adv =logD(I 0 )+log(1-D(R(α,β,T,p)))
wherein, R (α, β, T, p) is a rendered image generated by the renderer under the random attitude parameter p, and D () represents an input discriminator to judge whether the input image is a real image, to obtain a probability of being the real image.
Wherein, the generation-countermeasure mechanism, the training target of the generator is to generate the image with enough fidelity, and the training target of the discriminator is to judge the reality of the image more accurately. The generator and the discriminator are in a game relationship, in the embodiment and the formula, the goal of the generator (renderer) is to enable the generated rendering image under the random pose parameter p to "cheat" the discriminator, and finally the discriminator is trained to be unable to distinguish the authenticity of the real image and the generated rendering image, and the formula represents that the generator (renderer) is designed and optimized.
Training a face recognition network F R In this embodiment, Light CNN-29v2 is used as a face recognition network to extract 256-dimensional face features (Embeddings) of an input image and a rendered image, calculate a cosine distance between the input image and the rendered image as similarity, and design an identity consistency loss function to determine whether two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
And S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A single-image three-dimensional face reconstruction method based on a differentiable renderer is characterized by comprising the following steps:
s1, inputting the target face image into a pre-trained three-dimensional face model to obtain an initialized three-dimensional face parameter;
s2, inputting the initialized three-dimensional face parameters and the initialized posture parameters into a differentiable renderer to respectively obtain a rendered image with the same posture as the input image and rendered images with different postures;
s3, comparing the target face image with the rendered image under the same posture to obtain a key point loss value and a pixel-level loss value;
s4, comparing the target face image with the rendered images in different postures to obtain a generated confrontation loss value and an identity consistency loss value;
And S5, optimizing the three-dimensional face parameters according to the calculated loss values, and returning the updated three-dimensional face parameters to the step S2 until iteration is converged to obtain the optimized three-dimensional face parameters.
2. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S1 specifically comprises:
inputting the target Face image into a pre-trained Large Scale Face Model three-dimensional Face Model, and regressing to obtain initialized three-dimensional Face parameters including an identity parameter alpha, an expression parameter beta, a texture parameter T and a posture parameter p 0 ;
The expression of the three-dimensional face model is as follows:
3. The differentiable renderer-based single-image three-dimensional face reconstruction method according to claim 1, wherein the pose parameters are obtained by random sampling, wherein the pitch and yaw angles are sampled from U (-40 °, 40 °), and the roll angles are sampled from U (-15 °, 15 °).
4. The method of claim 1, wherein the initialized three-dimensional face parameters and pose parameters obtained by the regression are input into the differential renderer to obtain a rendered image G with the same pose as the input image respectively 1 And a rendered image G having a different pose from the input image 2 。
5. The method for reconstructing single-image three-dimensional human face based on differentiable renderer according to claim 4, wherein the differentiable renderer considers the aggregation mechanism of all triangle probability contributions to process occlusion, and the implementation steps are as follows:
Wherein,σ is a scalar quantity controlling sharpness of probability distribution, and d (i, j) is pixels Pi to f j The euclidean distance of the edges is,mapping the input variable to be between (0, 1);
finally, through an aggregation function, rendering output I at pixel point Pi i Comprises the following steps:
wherein T isPixel value, T, on texture map b Representing the background color.
6. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S3 specifically comprises:
calculating the pixel level loss L of the generated rendering image with the same posture as the input image and the input original image pix And the loss of key point L lan ;
The pixel level error is optimized based on the pixel value difference, the illumination parameter is optimized, and the expression is as follows:
L pix =||I 0 -R(α,β,T,p 0 )|| 1
wherein, I 0 Representing an input original image, R representing a differentiable renderer, and R (alpha, beta, T, p) being a rendered image generated by the renderer under a random attitude parameter p;
Carrying out 68 key point detection on the face image by adopting a depth alignment network M for key point loss;
inputting an original image and a rendered image to a depth alignment network, obtaining 68 key point coordinates of the original image and the rendered image related to human face features, calculating point-to-point Euclidean distances among the key points to optimize a reconstruction result, and enabling the postures of the rendered image and the input image to be consistent, wherein the expression is as follows:
L lan =||M(I 0 )-M(R(α,β,T,p 0 ))|| 2
where M () represents the input image to the depth alignment network, resulting in its 68 keypoint coordinates for the facial features.
7. The method for reconstructing a single-image three-dimensional face based on a differentiable renderer according to claim 1, wherein the step S4 specifically comprises:
calculating the generation resisting loss L of the generated rendering image different from the input posture adv Loss of identity consistency L id :
The generation route is regarded as a generator part for generating the confrontation network, a discriminator D is designed to judge whether the generator generates an image or not, a generation-confrontation mechanism is utilized to optimize a renderer to generate a more realistic face image, and the expression is as follows:
L adv =log D(I 0 )+log(1-D(R(α,β,T,p)))
wherein R (alpha, beta, T, p) is a rendering image generated by the renderer under the random attitude parameter p; d () represents an input discriminator which judges whether an input image is a real image, to obtain a probability of being a real image;
Training a face recognition network, respectively extracting 256-dimensional face features of an input image and a rendered image, calculating cosine distance as similarity, and designing an identity consistency loss function to judge whether the two images belong to the same person:
the identity consistency loss function expression is:
L id =1-cos(F R (R(α,β,T,p)),F R (I 0 ))
where cos () represents the cosine distance of two vectors, F R Representing a face recognition network, F R () And (4) representing the input image to a face recognition network to obtain 256-dimensional face features.
8. The differential renderer-based single-graph three-dimensional face reconstruction method as claimed in claim 6, wherein MTCNN is used as a depth alignment network to input the face image and obtain its 68 key point coordinates related to the face features.
9. The method for reconstructing single-image three-dimensional human face based on differentiable renderer as claimed in claim 7, wherein the human face recognition network specifically adopts Light CNN-29v 2.
10. The differential renderer based single image three-dimensional face reconstruction method according to claim 6 or 7, characterized in that the face features are eyes, eyebrows, nose, mouth and contour.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210365752.2A CN114842136A (en) | 2022-04-08 | 2022-04-08 | Single-image three-dimensional face reconstruction method based on differentiable renderer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210365752.2A CN114842136A (en) | 2022-04-08 | 2022-04-08 | Single-image three-dimensional face reconstruction method based on differentiable renderer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114842136A true CN114842136A (en) | 2022-08-02 |
Family
ID=82564953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210365752.2A Pending CN114842136A (en) | 2022-04-08 | 2022-04-08 | Single-image three-dimensional face reconstruction method based on differentiable renderer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114842136A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115588108A (en) * | 2022-11-02 | 2023-01-10 | 上海人工智能创新中心 | Method, electronic device and medium for generating sequence images |
CN116206035A (en) * | 2023-01-12 | 2023-06-02 | 北京百度网讯科技有限公司 | Face reconstruction method, device, electronic equipment and storage medium |
CN116363329A (en) * | 2023-03-08 | 2023-06-30 | 广州中望龙腾软件股份有限公司 | Three-dimensional image generation method and system based on CGAN and LeNet-5 |
CN116978102A (en) * | 2023-08-04 | 2023-10-31 | 深圳市英锐存储科技有限公司 | Face feature modeling and recognition method, chip and terminal |
CN116993929A (en) * | 2023-09-27 | 2023-11-03 | 北京大学深圳研究生院 | Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium |
CN118115638A (en) * | 2024-01-24 | 2024-05-31 | 广州紫为云科技有限公司 | Monocular three-dimensional facial expression driving system based on deep learning and optimizing method |
-
2022
- 2022-04-08 CN CN202210365752.2A patent/CN114842136A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115588108A (en) * | 2022-11-02 | 2023-01-10 | 上海人工智能创新中心 | Method, electronic device and medium for generating sequence images |
CN115588108B (en) * | 2022-11-02 | 2024-05-14 | 上海人工智能创新中心 | Method, electronic equipment and medium for generating sequence image |
CN116206035A (en) * | 2023-01-12 | 2023-06-02 | 北京百度网讯科技有限公司 | Face reconstruction method, device, electronic equipment and storage medium |
CN116206035B (en) * | 2023-01-12 | 2023-12-01 | 北京百度网讯科技有限公司 | Face reconstruction method, device, electronic equipment and storage medium |
CN116363329A (en) * | 2023-03-08 | 2023-06-30 | 广州中望龙腾软件股份有限公司 | Three-dimensional image generation method and system based on CGAN and LeNet-5 |
CN116363329B (en) * | 2023-03-08 | 2023-11-03 | 广州中望龙腾软件股份有限公司 | Three-dimensional image generation method and system based on CGAN and LeNet-5 |
CN116978102A (en) * | 2023-08-04 | 2023-10-31 | 深圳市英锐存储科技有限公司 | Face feature modeling and recognition method, chip and terminal |
CN116993929A (en) * | 2023-09-27 | 2023-11-03 | 北京大学深圳研究生院 | Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium |
CN116993929B (en) * | 2023-09-27 | 2024-01-16 | 北京大学深圳研究生院 | Three-dimensional face reconstruction method and device based on human eye dynamic change and storage medium |
CN118115638A (en) * | 2024-01-24 | 2024-05-31 | 广州紫为云科技有限公司 | Monocular three-dimensional facial expression driving system based on deep learning and optimizing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11302064B2 (en) | Method and apparatus for reconstructing three-dimensional model of human body, and storage medium | |
CN114842136A (en) | Single-image three-dimensional face reconstruction method based on differentiable renderer | |
US9679192B2 (en) | 3-dimensional portrait reconstruction from a single photo | |
Shi et al. | Automatic acquisition of high-fidelity facial performances using monocular videos | |
CN113269862B (en) | Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment | |
WO2020108304A1 (en) | Method for reconstructing face mesh model, device, apparatus and storage medium | |
Liu et al. | Humangaussian: Text-driven 3d human generation with gaussian splatting | |
JP2006520054A (en) | Image matching from invariant viewpoints and generation of 3D models from 2D images | |
CN113111861A (en) | Face texture feature extraction method, 3D face reconstruction method, device and storage medium | |
CN108564619B (en) | Realistic three-dimensional face reconstruction method based on two photos | |
Jin et al. | Robust 3D face modeling and reconstruction from frontal and side images | |
CN114648613A (en) | Three-dimensional head model reconstruction method and device based on deformable nerve radiation field | |
CN116416376A (en) | Three-dimensional hair reconstruction method, system, electronic equipment and storage medium | |
US10521892B2 (en) | Image lighting transfer via multi-dimensional histogram matching | |
CN115861525A (en) | Multi-view face reconstruction method based on parameterized model | |
CN111402403B (en) | High-precision three-dimensional face reconstruction method | |
Song et al. | A generic framework for efficient 2-D and 3-D facial expression analogy | |
CN115953513A (en) | Method, device, equipment and medium for reconstructing drivable three-dimensional human head model | |
CN118196306A (en) | 3D modeling reconstruction system, method and device based on point cloud information and Gaussian cloud cluster | |
Wang et al. | Digital twin: Acquiring high-fidelity 3D avatar from a single image | |
Feng et al. | Fdgaussian: Fast gaussian splatting from single image via geometric-aware diffusion model | |
Ma et al. | X-dreamer: Creating high-quality 3d content by bridging the domain gap between text-to-2d and text-to-3d generation | |
Jeong et al. | Automatic generation of subdivision surface head models from point cloud data | |
CN115984510A (en) | Stylized face texture modeling method, system, equipment and storage medium | |
CN111611997B (en) | Cartoon customized image motion video generation method based on human body action migration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |