Method, system, equipment and medium for reconstructing detail three-dimensional face of single image
Technical Field
The invention belongs to the technical field of three-dimensional face recognition, and particularly relates to a method, a system, equipment and a medium for reconstructing a single-image detail three-dimensional face.
Background
In recent years, face images are widely used in the life of people, and applications such as face recognition, face beautifying and face editing are ubiquitous. The concepts of "universe", "digital person" and the like also enter the field of view of the masses, and bring brand new experience to the life and entertainment of people. Three-dimensional face reconstruction has also received widespread attention as an important component of "digital man" technology. The existing methods are mainly divided into two main categories: methods based on implicit spatial coding and methods based on explicit spatial regression.
However, some existing methods have certain defects in geometric details and face textures. For three-dimensional face geometry, most methods are based on a linear three-dimensional face deformable model (3 DMM), the result is very linear, the reconstruction result lacks personalized geometric details, and different individuals have the problem of small visual difference. For three-dimensional face geometry, most methods construct the texture of a three-dimensional face by regressing vertex-by-vertex RGB values, the texture lacks reality, and the problems of illumination and the like are not considered.
In addition, most methods for reconstructing three-dimensional faces are trained based on data set construction, however, three-dimensional face data is difficult to acquire, open source data sets are fewer and different in quality, data preprocessing and registration are needed for different data sets, a large amount of manpower and time are consumed, and the reconstruction result has a great correlation with preprocessing quality. Therefore, exploring a three-dimensional face reconstruction method with personalized face geometry and high-fidelity texture is an urgent and challenging problem.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention aims to provide a single image detail three-dimensional face reconstruction method, a system, equipment and a medium.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a single image detail three-dimensional face reconstruction method comprises the following steps:
acquiring a two-dimensional face image;
carrying out three-dimensional face reconstruction, rendering and feature extraction on the two-dimensional face image to obtain global features;
performing three-dimensional face reconstruction on the two-dimensional face image, performing light matching on the rendered image, and calculating a symbol distance value and local position characteristics of the face vertex;
obtaining an implicit bidirectional reflection function according to the global features and the local position features;
optimizing each component network of the SDF-Net and the implicit bidirectional reflection function by adopting a self-supervision training method;
calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net and the symbol distance value of the face vertex;
according to each component network of the optimized bidirectional reflection function, calculating to obtain a color value corresponding to each vertex;
and combining the three-dimensional face geometric model containing the details and the color values corresponding to each vertex to form the detail three-dimensional face model with high-fidelity textures.
Further, three-dimensional face reconstruction, rendering and feature extraction are performed on the two-dimensional face image to obtain global image features and global depth features, and the method comprises the following steps:
reconstructing and rendering a three-dimensional face image to obtain a multi-view image set, and calculating a multi-view depth map set through a monocular depth estimation algorithm according to the multi-view image set; and respectively extracting features from the multi-view image set and the multi-view depth map set to obtain global image features and global depth features.
Further, performing three-dimensional face reconstruction on a two-dimensional face image, performing light matching on the rendered image, and calculating a symbol distance value and local position characteristics of a face vertex, wherein the method comprises the following steps of:
performing three-dimensional face reconstruction on the two-dimensional face image, and performing ray matching on the rendered picture to obtain an intersection point of the ray and the hidden surface; and inputting the three-dimensional coordinates of the intersection points of the light rays and the hidden surface into the SDF-Net to calculate the symbol distance value and the local position characteristic of the vertex of the human face.
Further, a set of multi-view depth maps is calculated from the set of multi-view images by a monocular depth estimation algorithm.
Further, the SDF-Net and each component network of the implicit bidirectional reflection function are optimized through the convergence of the loss function;
loss functionThe method comprises the following steps:
wherein alpha is 1 As a first coefficient, alpha 2 Is the second coefficient, alpha 3 Is a third coefficient, alpha 4 Is a fourth coefficient;
pixel level loss
Wherein P represents the pixel where the sampling point is located, P is the combination of the pixels,for the pixel where the sampling point is located, +.>Mask at pixel p for image i, +.>For the pixel value of image i at pixel p, c p (i) For the pixel value calculated by adopting the bidirectional reflection function, i is the image sequence number;
mask loss
Wherein M represents a mask;s is a set of colored images i,α (p) is an activation functionAlpha is a super parameter;
symbol distance field loss
In the method, in the process of the invention,representing the desire;
for point p * Is the nearest neighbor of (2), registration loss->Can be expressed as:
loss in normal direction
Further, implicit bidirectional reflectance functionThe method comprises the following steps:
wherein,representing diffuse reflection albedo, x p Representing the intersection of a ray with a hidden surface, +.>Representing light rays and primaryIntersection points of the initial coarse mesh->Representing local position features corresponding to the jth picture,/->Representing global image features corresponding to the jth picture,/->Representing diffuse reflection shadows, < >>Representing the intersection of a ray with an initial coarse mesh +.>Is arranged in the normal direction of (a),representing global depth features corresponding to the jth picture, a s Representing specular reflection albedo, < >>Represents specular reflection shadows, n p Representing the intersection point x of a ray with a hidden surface p V denotes the direction of light.
Further, the detailed three-dimensional face geometric model is as follows:
wherein G is d A three-dimensional face geometry model representing details,represents the face vertex set, n represents the normal direction of the face vertex,/->Symbol distance value representing face vertex, G c Representing an initial coarse mesh.
A low quality three-dimensional face recognition system, comprising:
the two-dimensional face image acquisition module is used for acquiring a two-dimensional face image;
the global feature acquisition module is used for carrying out three-dimensional face reconstruction, rendering and feature extraction on the two-dimensional face image to obtain global features;
the matching and calculating module is used for carrying out three-dimensional face reconstruction on the two-dimensional face image, carrying out light matching on the rendered image, and calculating the symbol distance value and the local position characteristic of the face vertex;
the implicit bidirectional reflection function calculation module is used for calculating and obtaining an implicit bidirectional reflection function according to the global characteristics and the local position characteristics;
an optimization module for optimizing each component network of the SDF-Net and the implicit bidirectional reflection function;
the detailed three-dimensional face geometric model calculation module is used for calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net and the symbol distance value of the face vertex;
the color value calculation module is used for calculating and obtaining a color value corresponding to each vertex according to each component network of the optimized bidirectional reflection function;
and the combination module is used for combining the three-dimensional face geometric model containing details and the color value corresponding to each vertex to form the detail three-dimensional face model with high-fidelity textures.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the single image detail three-dimensional face reconstruction method as described above.
A computer readable storage medium storing a computer program which when executed by a processor implements the steps of a single image detail three-dimensional face reconstruction method as described above.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, three-dimensional face reconstruction is realized through the two-dimensional face image, no additional training data is needed, and as a self-supervision training method is adopted in the SDF-Net and bidirectional reflection function optimization process, only a single input picture is used, and the three-dimensional face reconstruction of a single image can be realized through the image set of three-dimensional face reconstruction of the two-dimensional face image by the method, and the steps of data set construction, preprocessing and the like are omitted; the method uses the implicit bidirectional reflection function to represent the face texture, and can obtain the high-fidelity three-dimensional face texture.
Furthermore, the invention fuses the local position features with the global image and depth features, and guides and calculates the three-dimensional face with high fidelity by using different fusion features aiming at each component network of different implicit bidirectional reflection functions.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the reconstruction geometry results of the present invention;
FIG. 3 is a graph of the reconstruction geometry and texture results of the present invention;
FIG. 4 is a flow chart of the present invention;
FIG. 5 is a schematic diagram of the system of the present invention.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. The drawings illustrate preferred embodiments of the invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The invention is divided into texture and geometry parts: firstly, for an input two-dimensional face image, styleGAN in a 3DMM method is used for generating and complementing texture information of an invisible viewing angle on the basis of a reference image. The StyleGAN is trained based on the data set, and high-fidelity and clear face textures can be generated. By means of a traditional 3DMM model, a rough geometrical three-dimensional face with high-fidelity textures is obtained as an initial three-dimensional face. And rendering the initial three-dimensional face to obtain a multi-view two-dimensional face image, and generating a corresponding depth map by using a monocular depth estimation algorithm.
To fully utilize the two-dimensional image information, a cross-regional sensitive feature extractor is used to extract the corresponding depth map features. A local-global implicit differentiable rendering framework based on the symbol distance field is then used to optimize the SDF-Net and the respective component networks of the bi-directional reflection function. And calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net. According to each component network of the optimized bidirectional reflection function, obtaining a bidirectional reflection function calculated value (namely a color value) corresponding to each vertex, and finally, obtaining a three-dimensional face geometric model G containing details d And registering the calculated values of the bidirectional reflection functions corresponding to each vertex, and realizing the restoration of geometric details. Based on the method, the invention has high-fidelity face textures and fine three-dimensional face geometry, and the three-dimensional face with personalized geometric details and high-fidelity textures can be obtained.
Specifically, referring to fig. 1 and fig. 4, the method for reconstructing a detail three-dimensional face of a single image according to the present invention is divided into two stages of coarse reconstruction and detail reconstruction, and specifically comprises the following steps:
(1) Coarse reconstruction stage
Step 1.1: firstly, for an input two-dimensional face image, a 3 DMM-based method is used for reconstructing a three-dimensional face, and a coarse-geometry three-dimensional face with high-fidelity textures is obtained as an initial three-dimensional face.
Step 1.2: rendering the initial three-dimensional face to obtain a multi-view image set I * ={I 1 ,I 2 ,I 3 ,…,I N Computing a set D of multi-view depth maps from the set of multi-view images by a monocular depth estimation algorithm * ={D 1 ,D 2 ,D 3 ,…,D N },I 1 For a first web at multiple viewing anglesImage, I 2 For the second multiview image, I 3 For the third multiview image, I N For the nth multi-view image, D 1 For the first multi-view depth map, D 2 For the second multi-view depth map, D 3 For the third multi-view depth map, D N And N is the number of multi-view depth maps for the fourth multi-view depth map.
(2) Detail reconstruction stage
Step one: respectively extracting features from the multi-view image set and the multi-view depth map set to obtain global image features and global depth features;
step two: performing light matching on each pixel point on each picture in the multi-view image set, and obtaining an intersection point x of light and the hidden surface based on a light ray tracing algorithm p Intersection point of light ray and coarse grid
Intersection point x of ray and hidden surface p As sampling points, three-dimensional coordinates of the sampling points are input into SDF-Net to calculate the symbol distance value of the face vertex and the local position characteristics of the sampling points;
step three: the method uses a bi-directional reflection function Seen as being composed of four components +.>a s ,Constructing;
wherein,representing diffuse reflection albedo, x p Representing lightIntersection of line with hidden surface->Representing the intersection of a ray with the initial coarse mesh, +.>Representing local position features corresponding to the jth picture,/->Representing global image features corresponding to the jth picture,/->Representing diffuse reflection shadows, < >>Representing the intersection of a ray with an initial coarse mesh +.>Is arranged in the normal direction of (a),representing global depth features corresponding to the jth picture, a s Representing the specular albedo, being constant, +.>Represents specular reflection shadows, n p Representing the intersection point x of a ray with a hidden surface p V denotes the direction of light.
Thus, an implicit bidirectional reflection functionCan be written as:
thus, for different components, different features are input into the fully connected network to calculate component values:
specifically, diffuse reflection albedo is calculated according to the global image characteristics and the local position characteristics
Calculating diffuse reflection shadows from global depth features and local position featuresSpecular reflection shadow->
Step four: the loss function is calculated to converge to optimize the SDF-Net and the respective component networks of the bi-directional reflection function.
Loss functionCan be expressed as:
wherein alpha is 1 As a first coefficient, alpha 2 Is the second coefficient, alpha 3 Is a third coefficient, alpha 4 Is a fourth coefficient;
pixel level loss
Wherein P represents the pixel where the sampling point is located, P is the combination of the pixels,for the pixel where the sampling point is located, +.>Mask at pixel p for image i, +.>For the pixel value of image i at pixel p, c p (i) For the pixel value calculated using the bi-directional reflection function, i is the image sequence number.
Mask loss
Wherein M represents a mask;s is a set of colored pixels i,α (p) is an activation function and α is a hyper-parameter.
Symbol distance field loss
In the method, in the process of the invention,indicating the desire.
In addition, provision is made forFor point p * Is the nearest neighbor of (2), registration loss->Can be expressed as: />
Loss in normal direction
Step five: and calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net.
Wherein G is d A three-dimensional face geometry model representing details,represents the face vertex set, n represents the normal direction of the face vertex,/->Symbol distance value representing face vertex, G c Representing an initial coarse mesh.
Three-dimensional face geometry model G containing details d And the calculated value (namely the color value) of the bi-directional reflection function corresponding to each vertex, thus forming the detail three-dimensional face model with high-fidelity textures.
Fig. 2 is a graph of the results of reconstruction for 5 input face pictures, each method. As can be seen from fig. 2, the three-dimensional face reconstructed by the method (MMFG method) has more personalized details than other similar methods (FaceScape method, pixiv2Vertex method, PBIDR method).
Fig. 3 shows the reconstruction geometry and texture results of the present invention, and fig. 3 shows four sets of sample results, three each, input image as input picture, generated 3d mesh as reconstruction geometry, texture 3d mesh as textured reconstruction results. As can be seen from fig. 3, the reconstruction result of the method contains both personalized geometric details and high-fidelity face textures.
Referring to fig. 5, another embodiment of the present invention provides a low-quality three-dimensional face recognition system, comprising:
the two-dimensional face image acquisition module is used for acquiring a two-dimensional face image;
the global feature acquisition module is used for carrying out three-dimensional face reconstruction, rendering and feature extraction on the two-dimensional face image to obtain global features;
the matching and calculating module is used for carrying out three-dimensional face reconstruction on the two-dimensional face image, carrying out light matching on the rendered image, and calculating the symbol distance value and the local position characteristic of the face vertex;
the implicit bidirectional reflection function calculation module is used for calculating and obtaining an implicit bidirectional reflection function according to the global characteristics and the local position characteristics;
an optimization module for optimizing each component network of the SDF-Net and the implicit bidirectional reflection function;
the detailed three-dimensional face geometric model calculation module is used for calculating a detailed three-dimensional face geometric model according to the optimized SDF-Net and the symbol distance value of the face vertex;
the color value calculation module is used for calculating and obtaining a color value corresponding to each vertex according to each component network of the optimized bidirectional reflection function;
and the combination module is used for combining the three-dimensional face geometric model containing details and the color value corresponding to each vertex to form the detail three-dimensional face model with high-fidelity textures.
Another embodiment of the invention provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the single image detail three-dimensional face reconstruction method as described above when the processor executes the computer program.
Another embodiment of the present invention provides a computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the single image detail three-dimensional face reconstruction method as described above.
The invention has the following advantages:
first: the method is a reconstruction method specific to the input picture, does not need additional training data, and can realize three-dimensional face reconstruction of a single image by using a small data set of the method by only using the single input picture due to the adoption of a self-supervision training method, and the steps of data set construction, preprocessing and the like are omitted;
second,: the method uses a bi-directional reflection function to represent the face texture so as to obtain the high-fidelity three-dimensional face texture. According to the invention, the local position features are fused with the global image and depth features, and different fusion features are used for guiding calculation aiming at different bidirectional reflection function components, so that the high-fidelity three-dimensional face can be better reconstructed.
The foregoing is illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the claims. The present invention is not limited to the above embodiments, and the specific structure thereof is allowed to vary. It is intended that all such variations as fall within the scope of the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.