CN117557721A

CN117557721A - A single image detail three-dimensional face reconstruction method, system, equipment and medium

Info

Publication number: CN117557721A
Application number: CN202311463711.8A
Authority: CN
Inventors: 李慧斌; 王静婷; 余璀璨
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-02-13

Abstract

The invention discloses a single image detail three-dimensional face reconstruction method, system, equipment and medium, which performs three-dimensional face reconstruction, rendering and feature extraction on two-dimensional face images to obtain global features; The 3D face reconstruction and rendered pictures are subjected to light matching, and the symbolic distance values of the face vertices and local position features are calculated; based on the global features and local position features, the implicit bidirectional reflection function is obtained; SDF‑Net and implicit bidirectional reflection are optimized Each component network of the reflection function; calculates the detailed 3D face geometric model and the color value corresponding to each vertex, which is combined to form a detailed 3D face model with high-fidelity texture. This invention only uses a single input image to achieve three-dimensional face reconstruction of a single image, eliminating steps such as data set construction and preprocessing; this method uses an implicit bidirectional reflection function to represent face texture, and can obtain high fidelity 3D face texture.

Description

Method, system, equipment and medium for reconstructing detail three-dimensional face of single image

Technical Field

The invention belongs to the technical field of three-dimensional face recognition, and particularly relates to a method, a system, equipment and a medium for reconstructing a single-image detail three-dimensional face.

Background

In recent years, face images are widely used in the life of people, and applications such as face recognition, face beautifying and face editing are ubiquitous. The concepts of "universe", "digital person" and the like also enter the field of view of the masses, and bring brand new experience to the life and entertainment of people. Three-dimensional face reconstruction has also received widespread attention as an important component of "digital man" technology. The existing methods are mainly divided into two main categories: methods based on implicit spatial coding and methods based on explicit spatial regression.

However, some existing methods have certain defects in geometric details and face textures. For three-dimensional face geometry, most methods are based on a linear three-dimensional face deformable model (3 DMM), the result is very linear, the reconstruction result lacks personalized geometric details, and different individuals have the problem of small visual difference. For three-dimensional face geometry, most methods construct the texture of a three-dimensional face by regressing vertex-by-vertex RGB values, the texture lacks reality, and the problems of illumination and the like are not considered.

In addition, most methods for reconstructing three-dimensional faces are trained based on data set construction, however, three-dimensional face data is difficult to acquire, open source data sets are fewer and different in quality, data preprocessing and registration are needed for different data sets, a large amount of manpower and time are consumed, and the reconstruction result has a great correlation with preprocessing quality. Therefore, exploring a three-dimensional face reconstruction method with personalized face geometry and high-fidelity texture is an urgent and challenging problem.

Disclosure of Invention

In order to overcome the problems in the prior art, the invention aims to provide a single image detail three-dimensional face reconstruction method, a system, equipment and a medium.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a single image detail three-dimensional face reconstruction method comprises the following steps:

acquiring a two-dimensional face image;

carrying out three-dimensional face reconstruction, rendering and feature extraction on the two-dimensional face image to obtain global features;

performing three-dimensional face reconstruction on the two-dimensional face image, performing light matching on the rendered image, and calculating a symbol distance value and local position characteristics of the face vertex;

obtaining an implicit bidirectional reflection function according to the global features and the local position features;

optimizing each component network of the SDF-Net and the implicit bidirectional reflection function by adopting a self-supervision training method;

calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net and the symbol distance value of the face vertex;

according to each component network of the optimized bidirectional reflection function, calculating to obtain a color value corresponding to each vertex;

and combining the three-dimensional face geometric model containing the details and the color values corresponding to each vertex to form the detail three-dimensional face model with high-fidelity textures.

Further, three-dimensional face reconstruction, rendering and feature extraction are performed on the two-dimensional face image to obtain global image features and global depth features, and the method comprises the following steps:

reconstructing and rendering a three-dimensional face image to obtain a multi-view image set, and calculating a multi-view depth map set through a monocular depth estimation algorithm according to the multi-view image set; and respectively extracting features from the multi-view image set and the multi-view depth map set to obtain global image features and global depth features.

Further, performing three-dimensional face reconstruction on a two-dimensional face image, performing light matching on the rendered image, and calculating a symbol distance value and local position characteristics of a face vertex, wherein the method comprises the following steps of:

performing three-dimensional face reconstruction on the two-dimensional face image, and performing ray matching on the rendered picture to obtain an intersection point of the ray and the hidden surface; and inputting the three-dimensional coordinates of the intersection points of the light rays and the hidden surface into the SDF-Net to calculate the symbol distance value and the local position characteristic of the vertex of the human face.

Further, a set of multi-view depth maps is calculated from the set of multi-view images by a monocular depth estimation algorithm.

Further, the SDF-Net and each component network of the implicit bidirectional reflection function are optimized through the convergence of the loss function;

loss functionThe method comprises the following steps:

wherein alpha is ₁ As a first coefficient, alpha ₂ Is the second coefficient, alpha ₃ Is a third coefficient, alpha ₄ Is a fourth coefficient;

pixel level loss

Wherein P represents the pixel where the sampling point is located, P is the combination of the pixels,for the pixel where the sampling point is located, +.>Mask at pixel p for image i, +.>For the pixel value of image i at pixel p, c _p (i) For the pixel value calculated by adopting the bidirectional reflection function, i is the image sequence number;

mask loss

Wherein M represents a mask;s is a set of colored images _i,α (p) is an activation functionAlpha is a super parameter;

symbol distance field loss

In the method, in the process of the invention,representing the desire;

for point p ^* Is the nearest neighbor of (2), registration loss->Can be expressed as:

loss in normal direction

Further, implicit bidirectional reflectance functionThe method comprises the following steps:

wherein,representing diffuse reflection albedo, x _p Representing the intersection of a ray with a hidden surface, +.>Representing light rays and primaryIntersection points of the initial coarse mesh->Representing local position features corresponding to the jth picture,/->Representing global image features corresponding to the jth picture,/->Representing diffuse reflection shadows, < >>Representing the intersection of a ray with an initial coarse mesh +.>Is arranged in the normal direction of (a),representing global depth features corresponding to the jth picture, a _s Representing specular reflection albedo, < >>Represents specular reflection shadows, n _p Representing the intersection point x of a ray with a hidden surface _p V denotes the direction of light.

Further, the detailed three-dimensional face geometric model is as follows:

wherein G is _d A three-dimensional face geometry model representing details,represents the face vertex set, n represents the normal direction of the face vertex,/->Symbol distance value representing face vertex, G _c Representing an initial coarse mesh.

A low quality three-dimensional face recognition system, comprising:

the two-dimensional face image acquisition module is used for acquiring a two-dimensional face image;

the global feature acquisition module is used for carrying out three-dimensional face reconstruction, rendering and feature extraction on the two-dimensional face image to obtain global features;

the matching and calculating module is used for carrying out three-dimensional face reconstruction on the two-dimensional face image, carrying out light matching on the rendered image, and calculating the symbol distance value and the local position characteristic of the face vertex;

the implicit bidirectional reflection function calculation module is used for calculating and obtaining an implicit bidirectional reflection function according to the global characteristics and the local position characteristics;

an optimization module for optimizing each component network of the SDF-Net and the implicit bidirectional reflection function;

the detailed three-dimensional face geometric model calculation module is used for calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net and the symbol distance value of the face vertex;

the color value calculation module is used for calculating and obtaining a color value corresponding to each vertex according to each component network of the optimized bidirectional reflection function;

and the combination module is used for combining the three-dimensional face geometric model containing details and the color value corresponding to each vertex to form the detail three-dimensional face model with high-fidelity textures.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the single image detail three-dimensional face reconstruction method as described above.

A computer readable storage medium storing a computer program which when executed by a processor implements the steps of a single image detail three-dimensional face reconstruction method as described above.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, three-dimensional face reconstruction is realized through the two-dimensional face image, no additional training data is needed, and as a self-supervision training method is adopted in the SDF-Net and bidirectional reflection function optimization process, only a single input picture is used, and the three-dimensional face reconstruction of a single image can be realized through the image set of three-dimensional face reconstruction of the two-dimensional face image by the method, and the steps of data set construction, preprocessing and the like are omitted; the method uses the implicit bidirectional reflection function to represent the face texture, and can obtain the high-fidelity three-dimensional face texture.

Furthermore, the invention fuses the local position features with the global image and depth features, and guides and calculates the three-dimensional face with high fidelity by using different fusion features aiming at each component network of different implicit bidirectional reflection functions.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the reconstruction geometry results of the present invention;

FIG. 3 is a graph of the reconstruction geometry and texture results of the present invention;

FIG. 4 is a flow chart of the present invention;

FIG. 5 is a schematic diagram of the system of the present invention.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. The drawings illustrate preferred embodiments of the invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

The invention is divided into texture and geometry parts: firstly, for an input two-dimensional face image, styleGAN in a 3DMM method is used for generating and complementing texture information of an invisible viewing angle on the basis of a reference image. The StyleGAN is trained based on the data set, and high-fidelity and clear face textures can be generated. By means of a traditional 3DMM model, a rough geometrical three-dimensional face with high-fidelity textures is obtained as an initial three-dimensional face. And rendering the initial three-dimensional face to obtain a multi-view two-dimensional face image, and generating a corresponding depth map by using a monocular depth estimation algorithm.

To fully utilize the two-dimensional image information, a cross-regional sensitive feature extractor is used to extract the corresponding depth map features. A local-global implicit differentiable rendering framework based on the symbol distance field is then used to optimize the SDF-Net and the respective component networks of the bi-directional reflection function. And calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net. According to each component network of the optimized bidirectional reflection function, obtaining a bidirectional reflection function calculated value (namely a color value) corresponding to each vertex, and finally, obtaining a three-dimensional face geometric model G containing details _d And registering the calculated values of the bidirectional reflection functions corresponding to each vertex, and realizing the restoration of geometric details. Based on the method, the invention has high-fidelity face textures and fine three-dimensional face geometry, and the three-dimensional face with personalized geometric details and high-fidelity textures can be obtained.

Specifically, referring to fig. 1 and fig. 4, the method for reconstructing a detail three-dimensional face of a single image according to the present invention is divided into two stages of coarse reconstruction and detail reconstruction, and specifically comprises the following steps:

(1) Coarse reconstruction stage

Step 1.1: firstly, for an input two-dimensional face image, a 3 DMM-based method is used for reconstructing a three-dimensional face, and a coarse-geometry three-dimensional face with high-fidelity textures is obtained as an initial three-dimensional face.

Step 1.2: rendering the initial three-dimensional face to obtain a multi-view image set I ^* ＝{I ₁ ,I ₂ ,I ₃ ,…,I _N Computing a set D of multi-view depth maps from the set of multi-view images by a monocular depth estimation algorithm ^* ＝{D ₁ ,D ₂ ,D ₃ ,…,D _N }，I ₁ For a first web at multiple viewing anglesImage, I ₂ For the second multiview image, I ₃ For the third multiview image, I _N For the nth multi-view image, D ₁ For the first multi-view depth map, D ₂ For the second multi-view depth map, D ₃ For the third multi-view depth map, D _N And N is the number of multi-view depth maps for the fourth multi-view depth map.

(2) Detail reconstruction stage

Step one: respectively extracting features from the multi-view image set and the multi-view depth map set to obtain global image features and global depth features;

step two: performing light matching on each pixel point on each picture in the multi-view image set, and obtaining an intersection point x of light and the hidden surface based on a light ray tracing algorithm _p Intersection point of light ray and coarse grid

Intersection point x of ray and hidden surface _p As sampling points, three-dimensional coordinates of the sampling points are input into SDF-Net to calculate the symbol distance value of the face vertex and the local position characteristics of the sampling points;

step three: the method uses a bi-directional reflection function Seen as being composed of four components +.>a _s ，Constructing;

wherein,representing diffuse reflection albedo, x _p Representing lightIntersection of line with hidden surface->Representing the intersection of a ray with the initial coarse mesh, +.>Representing local position features corresponding to the jth picture,/->Representing global image features corresponding to the jth picture,/->Representing diffuse reflection shadows, < >>Representing the intersection of a ray with an initial coarse mesh +.>Is arranged in the normal direction of (a),representing global depth features corresponding to the jth picture, a _s Representing the specular albedo, being constant, +.>Represents specular reflection shadows, n _p Representing the intersection point x of a ray with a hidden surface _p V denotes the direction of light.

Thus, an implicit bidirectional reflection functionCan be written as:

thus, for different components, different features are input into the fully connected network to calculate component values:

specifically, diffuse reflection albedo is calculated according to the global image characteristics and the local position characteristics

Calculating diffuse reflection shadows from global depth features and local position featuresSpecular reflection shadow->

Step four: the loss function is calculated to converge to optimize the SDF-Net and the respective component networks of the bi-directional reflection function.

Loss functionCan be expressed as:

pixel level loss

Wherein P represents the pixel where the sampling point is located, P is the combination of the pixels,for the pixel where the sampling point is located, +.>Mask at pixel p for image i, +.>For the pixel value of image i at pixel p, c _p (i) For the pixel value calculated using the bi-directional reflection function, i is the image sequence number.

Mask loss

Wherein M represents a mask;s is a set of colored pixels _i，α (p) is an activation function and α is a hyper-parameter.

Symbol distance field loss

In the method, in the process of the invention,indicating the desire.

In addition, provision is made forFor point p ^* Is the nearest neighbor of (2), registration loss->Can be expressed as: />

Loss in normal direction

Step five: and calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net.

Three-dimensional face geometry model G containing details _d And the calculated value (namely the color value) of the bi-directional reflection function corresponding to each vertex, thus forming the detail three-dimensional face model with high-fidelity textures.

Fig. 2 is a graph of the results of reconstruction for 5 input face pictures, each method. As can be seen from fig. 2, the three-dimensional face reconstructed by the method (MMFG method) has more personalized details than other similar methods (FaceScape method, pixiv2Vertex method, PBIDR method).

Fig. 3 shows the reconstruction geometry and texture results of the present invention, and fig. 3 shows four sets of sample results, three each, input image as input picture, generated 3d mesh as reconstruction geometry, texture 3d mesh as textured reconstruction results. As can be seen from fig. 3, the reconstruction result of the method contains both personalized geometric details and high-fidelity face textures.

Referring to fig. 5, another embodiment of the present invention provides a low-quality three-dimensional face recognition system, comprising:

Another embodiment of the invention provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the single image detail three-dimensional face reconstruction method as described above when the processor executes the computer program.

Another embodiment of the present invention provides a computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the single image detail three-dimensional face reconstruction method as described above.

The invention has the following advantages:

first: the method is a reconstruction method specific to the input picture, does not need additional training data, and can realize three-dimensional face reconstruction of a single image by using a small data set of the method by only using the single input picture due to the adoption of a self-supervision training method, and the steps of data set construction, preprocessing and the like are omitted;

second,: the method uses a bi-directional reflection function to represent the face texture so as to obtain the high-fidelity three-dimensional face texture. According to the invention, the local position features are fused with the global image and depth features, and different fusion features are used for guiding calculation aiming at different bidirectional reflection function components, so that the high-fidelity three-dimensional face can be better reconstructed.

The foregoing is illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the claims. The present invention is not limited to the above embodiments, and the specific structure thereof is allowed to vary. It is intended that all such variations as fall within the scope of the appended claims.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Claims

1. A single image detail three-dimensional face reconstruction method, which is characterized by including the following steps:

Obtain two-dimensional face images;

Perform 3D face reconstruction, rendering and feature extraction on 2D face images to obtain global features;

Perform three-dimensional face reconstruction on two-dimensional face images, perform light matching on the rendered images, and calculate the signed distance values and local position features of the face vertices;

Based on the global features and local location features, the implicit bidirectional reflection function is obtained;

Self-supervised training method is used to optimize each component network of SDF-Net and implicit bidirectional reflection function;

Based on the optimized SDF-Net and the signed distance values of the face vertices, calculate the detailed three-dimensional face geometric model;

According to each component network of the optimized bidirectional reflection function, the color value corresponding to each vertex is calculated;

A detailed 3D face geometric model containing details and the color value corresponding to each vertex are combined to form a detailed 3D face model with high-fidelity texture.

2. The single image detail three-dimensional face reconstruction method according to claim 1, characterized in that three-dimensional face reconstruction, rendering and feature extraction are performed on the two-dimensional face image to obtain global image features and global depth features, including Following steps:

Perform three-dimensional face reconstruction and rendering on the two-dimensional face image to obtain a multi-view image set, and calculate the multi-view depth map set through the monocular depth estimation algorithm based on the multi-view image set; the multi-view image set and the multi-view depth map set are separately Extract features to obtain global image features and global depth features.

3. The three-dimensional face reconstruction method of single image details according to claim 1, characterized in that three-dimensional face reconstruction is performed on the two-dimensional face image, the rendered image is subjected to light matching, and the symbols of the face vertices are calculated. Distance values and local location features include the following steps:

Perform 3D face reconstruction on the 2D face image and perform ray matching on the rendered image to obtain the intersection point of the ray and the hidden surface; input the 3D coordinates of the intersection point of the ray and the hidden surface into SDF-Net to calculate the signed distance of the face vertices values and local location features.

4. The single image detailed three-dimensional face reconstruction method according to claim 1, characterized in that the multi-view depth map set is calculated through a monocular depth estimation algorithm based on the multi-view image set.

5. The single image detail three-dimensional face reconstruction method according to claim 1, characterized in that, through the convergence of the loss function, each component network of SDF-Net and the implicit bidirectional reflection function is optimized;

loss function for:

Among them, α ₁ is the first coefficient, α ₂ is the second coefficient, α ₃ is the third coefficient, and α ₄ is the fourth coefficient;

Pixel level loss

In the formula, p represents the pixel where the sampling point is located, and P is the combination of pixels. is the pixel where the sampling point is located,/> is the mask of image i at pixel p, /> is the pixel value of image i at pixel p, c _p (i) is the pixel value calculated using the bidirectional reflection function, and i is the image serial number;

mask loss

In the formula, M represents the mask; is the set of colored images, s _{i, α} (p) is the activation function, and α is the hyperparameter;

Signed distance field loss

In the formula, express expectations;

is the nearest neighbor point of point p, then the registration loss/> It can be expressed as:/>

normal loss

6. The single image detailed three-dimensional face reconstruction method according to claim 1, characterized in that the implicit bidirectional reflection function for:

in, represents the diffuse reflection albedo, x _p represents the intersection point of the light ray and the hidden surface, /> Represents the intersection point of the ray and the initial rough mesh, /> Indicates the local location features corresponding to the j-th picture,/> Indicates the global image features corresponding to the j-th image,/> Represents diffuse shadow, /> Represents the intersection point of the ray and the initial rough mesh/> normal direction of ,/> represents the global depth feature corresponding to the j-th picture, a _s represents the specular reflection albedo,/> Represents the specular reflection shadow, n _p represents the normal direction of the intersection point x _p between the light and the hidden surface, and v represents the direction of the light.

7. The detailed three-dimensional face reconstruction method of a single image according to claim 1, characterized in that the detailed three-dimensional face geometric model is:

Among them, G _d represents the detailed three-dimensional face geometric model, V represents the face vertex set, n represents the normal direction of the face vertices, represents the signed distance value of the face vertices, and G _c represents the initial rough mesh.

8. A low-quality three-dimensional face recognition system, characterized by:

Two-dimensional face image acquisition module, used to acquire two-dimensional face images;

The global feature acquisition module is used to perform three-dimensional face reconstruction, rendering and feature extraction on two-dimensional face images to obtain global features;

The matching and calculation module is used to perform three-dimensional face reconstruction on two-dimensional face images, perform light matching on the rendered pictures, and calculate the symbolic distance values and local position features of the face vertices;

The implicit bidirectional reflection function calculation module is used to calculate the implicit bidirectional reflection function based on global features and local location features;

Optimization module, used to optimize each component network of SDF-Net and implicit bidirectional reflection function;

The detailed three-dimensional face geometric model calculation module is used to calculate the detailed three-dimensional face geometric model based on the optimized SDF-Net and the signed distance value of the face vertices;

The color value calculation module is used to calculate the color value corresponding to each vertex based on each component network of the optimized bidirectional reflection function;

The combination module is used to combine the detailed 3D face geometric model and the color value corresponding to each vertex to form a detailed 3D face model with high-fidelity texture.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements the claims as claimed in The steps of the single image detail three-dimensional face reconstruction method described in any one of 1 to 7.

10. A computer-readable storage medium, the computer-readable storage medium stores a computer program, characterized in that when the computer program is executed by a processor, the single image as claimed in any one of claims 1 to 7 is realized. Detailed steps of the 3D face reconstruction method.