CN117011493B

CN117011493B - Three-dimensional face reconstruction method, device and equipment based on symbol distance function representation

Info

Publication number: CN117011493B
Application number: CN202311278686.6A
Authority: CN
Inventors: 张力洋; 柳欣; 胡众旺; 徐素文; 黄忠湖
Original assignee: Tiandu Xiamen Science And Technology Co ltd
Current assignee: Tiandu Xiamen Science And Technology Co ltd
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-16
Anticipated expiration: 2043-10-07
Also published as: CN117011493A

Abstract

The invention relates to a three-dimensional face reconstruction method, a device and equipment based on symbol distance function representation, wherein the method comprises the following steps: extracting a face area image of a user; matching and fitting the face region image with a three-dimensional deformable face statistical model to obtain a rough three-dimensional geometric shape of the face, and marking the rough three-dimensional geometric shape as a first reconstruction result; converting the first reconstruction result based on a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, and optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result; and converting the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional grid topological structure, and carrying out vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model. The invention can improve the quality and accuracy of the face reconstruction and meet the requirements of users on high-quality face representation.

Description

Three-dimensional face reconstruction method, device and equipment based on symbol distance function representation

Technical Field

The invention belongs to the technical field of face reconstruction, and particularly relates to a three-dimensional face reconstruction method, device and equipment based on symbol distance function representation.

Background

Three-dimensional face reconstruction plays a key role in the fields of computer vision and graphics, and provides many potential applications and benefits to people by recovering three-dimensional face models from two-dimensional images or videos. For example, the method has important background significance in the fields of face recognition and identity verification, virtual reality and augmented reality, personalized virtual character creation, digital entertainment and creative industry, face editing and beautifying and the like. The method provides infinite possibility for face recognition, virtual reality, medical application, creative industry and the like, and brings more convenience and innovation for our lives.

The single image face reconstruction refers to a three-dimensional model of a face being restored from a single two-dimensional image. Compared with the traditional face reconstruction method, the single image face reconstruction has obvious differences and difficulties. First, a single image face reconstruction uses only one two-dimensional image for reconstruction, whereas conventional face reconstruction typically requires multiple images or video sequences. This increases the difficulty because recovering the complete three-dimensional face structure from one image is a complex problem. This requires that the algorithm be able to infer missing depth, angle and shape information from limited information. Secondly, the single image face reconstruction needs to deal with the interference of factors such as illumination, expression, gesture and the like. One image often cannot provide accurate information of a face under different illumination conditions, and meanwhile, the influence of facial expression and head posture on the shape of the face needs to be considered. The complexity of these factors increases the challenges of accurate conversion from two-dimensional images to three-dimensional face models. In addition, the reconstruction of the face of the single image also needs to solve the problem of texture and detail restoration. Texture information in two-dimensional images is limited, while three-dimensional face models need to contain more detail and depth information. Therefore, how to accurately restore the texture and detail of the face during the reconstruction process is a difficulty, and it is necessary to introduce texture synthesis and enhancement algorithms to improve the reconstruction quality. Therefore, single image face reconstruction presents significant challenges compared to conventional face reconstruction methods.

The traditional three-dimensional face reconstruction method is mainly divided into two types, namely supervision and non-supervision. The supervised method typically relies on a large scale of labeled three-dimensional face datasets to achieve reconstruction by establishing a mapping relationship from two-dimensional images to three-dimensional models. However, these methods require significant manpower and resources to acquire and annotate three-dimensional data, limiting their scalability and cost effectiveness for practical applications. In addition, the supervision method is limited by factors such as illumination, angles, expressions and the like, and a high-quality three-dimensional reconstruction result is difficult to obtain. In order to overcome the limitation of the supervision method, an unsupervised three-dimensional face reconstruction method is generated. The non-supervision method is independent of the marked three-dimensional data set by recovering the three-dimensional structure of the human face from the single two-dimensional image, so that the cost of data acquisition and marking is reduced, and greater flexibility and adaptability are provided. With the development of deep learning technology, an unsupervised three-dimensional face reconstruction method based on a deep neural network is widely focused. The method utilizes the network architecture such as a generated countermeasure network, a variation self-encoder and the like, and realizes the reconstruction process from the two-dimensional image to the three-dimensional model by learning the characteristics and the distribution of a large number of two-dimensional face images. However, these methods still present challenges such as handling illumination and expression changes, capturing details and textures, and the like.

Disclosure of Invention

The invention aims to provide a three-dimensional face reconstruction method, device and equipment based on a symbol distance function representation so as to solve the problems.

In order to solve the technical problems, the invention is realized by the following technical scheme:

a three-dimensional face reconstruction method based on symbol distance function representation comprises the following steps:

s1, acquiring a front head portrait image of a user, and extracting a face area image from the front head portrait image by using a face detection method;

s2, acquiring a pre-trained three-dimensional deformable face statistical model, matching and fitting the face region image with the three-dimensional deformable face statistical model to obtain a rough three-dimensional geometric shape of the face, and marking the rough three-dimensional geometric shape as a first reconstruction result;

s3, converting the first reconstruction result based on a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, and optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result;

and S4, converting the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional grid topological structure, and carrying out vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model.

Preferably, step S1 specifically includes:

acquiring front face head portrait image of user；

Acquiring a face region in the front head portrait image by adopting a yolov7-face detection network model;

uniformly scaling the face area toThe pixel size obtains a face area image I:

wherein Fdetector represents a yolov7-face detection network model, and resize represents a size change operation N of an image _w For the adjusted dimension width, N _h For the adjusted dimensional height.

Preferably, step S2 specifically includes:

aiming at an input face area image I, constructing a corresponding face rough three-dimensional geometric shape G and a corresponding UV unfolded face texture image by using a Deep3DFace methodThe method comprises the steps of carrying out a first treatment on the surface of the Wherein,

where g= { V, S }, from the set of geometric verticesIs composed of geometrical surfaces S connected with vertexes, < >>Is a roofTotal number of points, UV expansion aims at spreading information of the three-dimensional face model surface to a two-dimensional plane in a plane projection mode.

Preferably, in step S3, the grid structure of the face rough three-dimensional geometry G is an unsealed solid surface, and the GT value acquisition of the sign distance function for the grid structure is defined as follows:

where x is defined as a point outside the geometry surface S in the face rough three-dimensional geometry G,giving the x point the luminescence in the direction of ray propagation +.>The intersection point coordinates of the line and the reconstruction surface are obtained through calculation of sphere-tracking operation operator,defined as the representation of the sign distance function for the geometric surface S from the reference at the x-point,/->Is defined as the thickness value of the mesh structure.

Preferably, in step S3, the network is geometrically refinedFormalized definition is as follows:

wherein the method comprises the steps ofFor position-coding operations, +.>A symbol distance function predicted value representing a point of the symbol distance function in the three-dimensional space;

differential rendering networkFormalized definition is as follows:

,

wherein,light propagation direction value representing point x, +.>Representation->Normal vector at>Representing that the differentiable rendering network predicts to be in +.>RGB values at.

6. The three-dimensional face reconstruction method based on symbol distance function representation according to claim 5, wherein step S3 is to optimize the overall optimization loss at x-point during optimizationThe expression of (2) is as follows:

wherein,，/>

for gradient operator->For mathematical expectations +.>，/>Developing a face texture image for the original input UV +.>Corresponding point->Pixel value of>、/>Is the weight.

Preferably, step S4 specifically includes:

carrying out forward propagation on an input vertex in the optimization process of a geometric refinement network on the rough three-dimensional geometric shape of the face to obtain a corresponding symbol distance function value;

gradient calculation is carried out by calculating the difference of the symbol distance function values of the adjacent vertexes, so that the normal direction of each vertex is obtained;

at each vertex, carrying out displacement according to the estimated normal direction and the corresponding symbol distance function value, converting the optimized symbol distance function into an original three-dimensional grid topological structure, and generating a final three-dimensional face reconstruction model; wherein the magnitude of the displacement depends on the sign distance function value, i.e. points near the implicit surface have smaller displacements and points far from the implicit surface have larger displacementsFine vertex for displacement and vertex displacement registrationExpressed as:

，

wherein the method comprises the steps ofIs a geometric vertex set->The ith vertex of (a), a->Representing vertex->Normal vector at.

The embodiment of the invention also provides a three-dimensional face reconstruction device based on the symbol distance function representation, which comprises:

the face region extraction unit is used for acquiring a front head portrait image of a user and extracting a face region image from the front head portrait image by using a face detection method;

the first reconstruction unit is used for acquiring a pre-trained three-dimensional deformable face statistical model, matching and fitting the face region image with the three-dimensional deformable face statistical model to obtain a rough three-dimensional geometric shape of the face, and marking the rough three-dimensional geometric shape as a first reconstruction result;

the second reconstruction unit is used for converting the first reconstruction result based on a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, and optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result;

and the vertex registration unit is used for converting the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional grid topological structure, and carrying out vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model.

The embodiment of the invention also provides three-dimensional face reconstruction equipment based on the symbol distance function representation, which comprises a memory and a processor, wherein the memory stores a computer program which can be executed by the processor to realize three-dimensional face reconstruction based on the symbol distance function representation.

In summary, after the above scheme is adopted, the beneficial effects of the invention are mainly represented in the following aspects:

(1) The three-dimensional face reconstruction method provided by the embodiment of the invention can more accurately capture the shape details of the face, such as wrinkles and the like, avoid the reconstruction which is too smooth in vision, and keep the accuracy and the authenticity of the geometric structure.

(2) The embodiment of the invention only utilizes the monocular camera equipment commonly arranged in daily electronic equipment to acquire the front head portrait image information of the user, and does not depend on complex multi-view or depth sensor equipment, so that the application of the embodiment of the invention is more convenient and practical, can be widely applied to various intelligent equipment and application scenes, and provides better face reconstruction experience for the user.

(3) The unsupervised learning mode has high applicability, and the unsupervised face reconstruction method does not need a large-scale marked three-dimensional data set as a training sample. The method reduces the cost of data acquisition and labeling, is more flexible and extensible, and can be suitable for various face types and features.

(4) The embodiment of the invention can refine and optimize the face geometry automatically by using a fitting optimization algorithm and an implicit expression technology. By fitting and adjusting the image data, a high-quality face model can be quickly generated, the requirement of manual intervention is reduced, and the efficiency and the accuracy of face reconstruction are improved.

(5) The consistency of the geometric structures is maintained, and the consistency of the topological sequence of the face geometric structures can be maintained in the registering and refining process of the geometric structures. This means that the reconstructed three-dimensional asset is more accurate and realistic in shape and topology, does not suffer from distortion or distortion, provides a more reliable three-dimensional asset, and facilitates further image or audio driven animation operations.

In general, the embodiment of the invention has wide application prospect and commercial value in the fields of face recognition, virtual reality, digital entertainment and the like. The method provides an effective solution for the technical development in the fields, can improve the quality and accuracy of face reconstruction, meets the requirements of users on high-quality face representation, and promotes the development and innovation of related industries.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a three-dimensional face reconstruction method based on a symbol distance function representation according to a first embodiment of the present invention.

Fig. 2 is a three-dimensional face reconstruction skeleton diagram.

Fig. 3 is a schematic structural diagram of a three-dimensional face reconstruction device based on a symbol distance function representation according to a second embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a first embodiment of the present invention provides a three-dimensional face reconstruction method based on a symbol distance function, which may be implemented by a three-dimensional face reconstruction device (hereinafter referred to as a reconstruction device) based on a symbol distance function, and in particular, executed by one or more processors in the reconstruction device, to implement the following steps:

s1, acquiring a front head portrait image of a user, and extracting a face area image from the front head portrait image by using a face detection method.

In this embodiment, the reconstruction device may be an electronic device equipped with a camera, such as a smart phone, a smart tablet, or the like. The reconstruction device acquires the front head portrait image of the face user through the equipped cameraThen extracting +.f. by using face detection network>A face area image I of the person.

Specifically, the embodiment can use the yolov7-face human face detection network model to image from the front head portraitAnd acquiring a face area. Then uniformly scaling the face area size to +.>The pixel size results in a face region image I, which can be described as:

wherein Fdetector represents a yolov7-face detection network model, and resize represents a size change operation N of an image _w For the adjusted dimension width, N _h For the adjusted dimensional height. For example, the present example selectsBut is not limited thereto.

In addition, it should be noted that other algorithms or models may be used for face detection, and the present invention is not limited in particular.

S2, acquiring a pre-trained three-dimensional deformable face statistical model, matching and fitting the face region image with the three-dimensional deformable face statistical model to obtain a rough three-dimensional geometric shape of the face, and marking the rough three-dimensional geometric shape as a first reconstruction result.

In this embodiment, the three-dimensional deformable face statistical model provides statistical information about three-dimensional shape, three-dimensional texture, facial expression, and the like of a face, and can recover a high-quality and realistic three-dimensional face model from a single image without supervision. By fitting and optimizing the face region image I with the prior three-dimensional deformable face statistical models, the reconstruction of the face geometric structure and texture can be realized. In particular, in this example, the three-dimensional deformable face statistical model is predicted using a priori a Deep3DFace Deep learning model currently available.

Specifically, for the input face region image I, a Deep3DFace method is used to construct a corresponding face rough three-dimensional geometry G and a corresponding UV-unfolded face texture image。

Where g= { V, S }, from the set of geometric verticesIs composed of geometrical surfaces S connected with vertexes, < >>For the number of vertexes, UV expansion aims at tiling and extending information of the three-dimensional face model surface to a two-dimensional plane in a plane projection mode, and the UV expansion is a standard operator and is a conventional operation of three-dimensional model and corresponding texture mapping.

And S3, converting the first reconstruction result by adopting a function based on the symbol distance to obtain three-dimensional implicit expression data of the first reconstruction result, and optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result.

In this embodiment, after the first reconstruction result is fitted through the Deep3DFace network, since the expression form of the three-dimensional deformable face statistical model is limited by the low-dimensional expression capability of the model base, fine details such as wrinkles cannot be reconstructed on high-frequency details, so that the three-dimensional implicit expression form based on the symbol distance function is introduced to better describe the geometric structure of the face, and finer geometric details are presented.

Firstly, converting the first reconstruction result by adopting a function based on a symbol distance to obtain three-dimensional implicit expression data of the first reconstruction result.

Wherein the symbolic distance function is a common three-dimensional implicit expression form for describing the geometry of the object. The symbolic distance function is implicitly expressed by defining a function that returns the distance to the nearest surface for a given spatial coordinate and takes into account the normal vector information of the surface. A distance of positive indicates that the point is outside the object, a distance of negative indicates that the point is inside the object, and a distance of zero indicates that the point is on the object surface.

For the rough three-dimensional geometry G of the face obtained in the above step, the mesh structure is an unsealed three-dimensional surface, so the GT (Ground Truth) value of the sign distance function for the rough three-dimensional geometry G of the face in this example is defined as follows:

where x is defined as a point outside the geometry surface S in the face rough three-dimensional geometry G,giving the x point the luminescence in the direction of ray propagation +.>The intersection point coordinates of the line and the reconstruction surface are obtained through calculation of sphere-tracking operation operator,defined as the representation of the sign distance function for the geometric surface S from the reference at the x-point,/->Is defined as the thickness value of the mesh structure for better consideration of the fine deviation.

And then, carrying out fitting optimization on the constructed three-dimensional implicit expression data.

In this embodiment, the mapping relationship is constructed by using two depth network structures for the three-dimensional implicit expression data optimization process, the two depth networks being geometry refinement networks respectivelyAnd differential rendering network->Geometric refinement network->Formalized definition is as follows:

wherein the method comprises the steps ofFor position-coding operations in NERF, e.g.>For the position-coding operation of point x, +.>A symbol distance function predictor representing a point in three-dimensional space of the symbol distance function.

In this embodiment, the network is refined for geometryThe embodiment adopts a multi-layer perceptron (MLP) structure, which is composed of 8 fully connected layers, wherein jump connection is introduced into an input layer to be connected with a 4 th layer to enhance information transmission and gradient flow, and a software activation function is used between layers for non-linear activation, wherein the input dimension of the input layer is 3-dimensional and representsThe coordinate position of the point in the three-dimensional space, the dimension of the middle layer is 512 dimensions, and the output dimension of the final output layer is 1 dimension, which represents the symbol distance function value corresponding to the point. By adopting this specific network structure, the geometric refinement network of the present embodiment is able to learn and predict the corresponding symbol distance function values from the three-dimensional coordinate positions of the input points. Therefore, the reconstruction and refinement of the geometric surface of the discrete point set can be realized, and more accurate geometric information is provided for subsequent three-dimensional reconstruction and application.

Differential rendering networkFormalized definition of (c) is as follows:

,

wherein,light propagation direction value representing point x, +.>Representation->Normal vector at>Representing the predicted +.>RGB values at.

In this embodiment, a differentiable rendering network is aimed atA multi-layer perceptron (MLP) structure of four full-connection layers is designed, a Relu activation function is arranged between layers, and a Tanh activation function is arranged on an output layer to improve the performance of a model and the stability of training. The dimension of the middle layer is 512 dimension, and the input dimension of the input layer is 9 dimension; representing various components of rendering parameters for defining attributes such as illumination and viewing angle during rendering to produce a final rendering effect; the output dimension of the output layer is 3-dimensional, representing the intersection +.>RGB value color values. With such a network structure, color information at the intersection point of the sense of realism can be predicted and generated from the input rendering parameters.

In this embodiment, the end-to-end learning of geometry and rendering properties can be achieved through a simultaneous optimization geometry refinement network and a differentiable rendering network, covering the coordinate points of the entire geometry surface S. In the simultaneous optimization process, the geometry refinement network and the differentiable rendering network are coupled to jointly optimize the objective function.

Wherein, inPoint Overall optimization loss->The expression of (2) is as follows:

wherein,representing coarse geometry initialization loss,/->，By constructing a geometrically refined network->Predicted +.>A sign distance function value predicted at a point +.>Constraint, forcing geometrically refined network->And updating parameters.

Using the coarse geometry initialization penalty described above aloneOnly the locations of the surface points are considered, and the distance continuity and smoothness constraints are ignored, which may lead to problems of insufficient accuracy of the initialization results. This example introduces Eikonal loss->To achieve regularization effects, in particular, the optimization process of Eikonal loss is achieved by minimizing the difference between the gradient of the implicit function output and the unit gradient, which can preserve the continuity and smoothness of the distance field so that the initialization end result more accurately represents the face geometry:

for gradient operator->For mathematical expectations, geometrically refine the network +.>At->Optimization loss of points is determined byAnd->The composition of the composite material comprises the components,

through the steps, the three-dimensional geometric structure based on the 3DMM statistical model is reconstructed from the input face region image I, and is implicitly represented by using a symbol distance function, and the embodiment introduces a differential rendering mode to optimize the three-dimensional implicit expression, and allows parameters of the implicit function to be directly optimized through a back propagation algorithm so as to minimize rendering loss and improve reconstruction results.

Needle pointIn the above step, the present embodiment calculates the value of +.>Intersection point coordinates of the outgoing line of sight and the reconstruction surface +.>Construction of photometric loss (photometric loss)>To measure the difference between the rendered pixel value and the gt pixel value:

wherein the method comprises the steps ofCorresponding point of original input UV developed face texture image +.>Pixel values of (2)

Final joint optimization of various losses, which is composed ofThe overall optimization loss of the points, formalized expressed as follows:

it can be seen that the embodiment optimizes the geometry refinement network and the differentiable rendering network simultaneously, realizes the end-to-end learning of the geometry and the rendering attribute, and covers the coordinate points of the whole geometry surface. In the simultaneous optimization process, the geometry refinement network and the differentiable rendering network are coupled to jointly optimize the objective function. The objective function comprehensively considers the difference of rendering results and the loss of geometric deformation, and simultaneously updates the parameters of the geometric refinement network and the differentiable rendering network in each iteration through a back propagation algorithm so as to gradually improve the reconstruction results. The differentiable rendering network provides a feedback signal of the rendering result, guides the geometric refinement network to optimize parameters of the implicit function, and further improves the accuracy of the differentiable rendering network by adjusting the parameters of the implicit function.

The step S4 specifically includes:

at each vertex, carrying out displacement according to the estimated normal direction and the corresponding symbol distance function value, converting the optimized symbol distance function into an original three-dimensional grid topological structure, and generating a final three-dimensional face reconstruction model; wherein the magnitude of the displacement depends on the sign distance function value, namely, the point near the implicit surface has smaller displacement, the point far from the implicit surface has larger displacement, and the fine vertex for vertex displacement registration is performedExpressed as:

，

wherein the method comprises the steps ofIs a geometric vertex set->The ith vertex of (a), a->For the total number of the vertexes,representing vertex->Normal vector at.

Through the steps, the optimized symbol distance function can be converted into an original three-dimensional grid topological structure, so that a reconstruction result with higher precision and authenticity is obtained, and the topological consistency of the original grid is kept, so that fine three-dimensional face reconstruction is realized.

The effect of the embodiment of the invention can be further demonstrated and verified from the following experimental results.

Selecting different face images for testing, wherein the effect is shown in fig. 2, and the first column is an input original two-dimensional image; the second column is the effect of rendering the 3DMM model grid obtained by Deep3DFace initial reconstruction to the original image; the third column is the effect of rendering the refined grid finally obtained to the original image; the fourth column is the effect of rendering to the original image after attaching the texture map to the refined grid finally obtained by the invention. According to the display, compared with the reconstruction effect based on the 3DMM substrate, the embodiment of the invention has the advantages that the high-frequency characteristics (such as wrinkles and the like) of the human face are more reserved, and the reconstruction performance is good when the image input with different complexion is processed. However, because the input information of three-dimensional face reconstruction by a single image is very limited, the invention still needs to be further optimized and promoted under the condition that geometrical structural distortion can exist in some input images with shielding or additional illumination influence.

To further evaluate the performance of the methods of the present invention, the examples of the present invention were compared to some face reconstruction methods on a NoW dataset (Sanyal S, bolkart T, feng H, et al Learning to regress 3D face shape and expression from an image without 3D supervision[C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognizing 2019:7763-7772.). Specifically, selecting method 1: PRNet (Feng Y, wu F, shao X, et al, joint 3d face reconstruction and dense alignment with position map regression network[C ]// Proceedings of the European conference on computer vision (ECCV). 2018:534-551.) and method 2: ringNet (Sanyal S, bolkart T, feng H, et al Learning to regress 3D face shape and expression from an image without 3D supervision[C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recovery 2019:7763-7772.) was compared. The comparison results are shown in Table 1, and it can be seen that the present invention has a finer reconstruction effect than the methods 1 and 2.

TABLE 1 comparison of NoW dataset evaluation index

Referring to fig. 3, the second embodiment of the present invention further provides a three-dimensional face reconstruction device based on a symbol distance function representation, which includes:

a face region extraction unit 210, configured to obtain a front head portrait image of a user, and extract a face region image from the front head portrait image by using a face detection method;

the first reconstruction unit 220 is configured to obtain a pre-trained three-dimensional deformable face statistical model, match and fit the face region image with the three-dimensional deformable face statistical model, obtain a rough three-dimensional geometric shape of the face, and record the rough three-dimensional geometric shape as a first reconstruction result;

the second reconstruction unit 230 is configured to convert the first reconstruction result by using a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, and optimize the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result;

and the vertex registration unit 240 is configured to convert the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional mesh topology structure, and perform vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model.

The third embodiment of the present invention further provides a three-dimensional face reconstruction device based on a symbol distance function representation, which includes a memory and a processor, where the memory stores a computer program, and the computer program is capable of being executed by the processor to implement the three-dimensional face reconstruction based on the symbol distance function representation.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A three-dimensional face reconstruction method based on symbol distance function representation is characterized by comprising the following steps:

s1, acquiring a front head portrait image of a user, acquiring a face area in the front head portrait image by adopting a yolov7-face detection network model, and uniformly scaling the face area to the pixel size to obtain a face area image I:

where Fdetector represents the yolov7-face detection network model, resize represents the size change operation of the image,for the adjusted dimension width +.>The size and the height are adjusted;

s2, acquiring a pre-trained three-dimensional deformable face statistical model, constructing a corresponding face rough three-dimensional geometric shape G and a corresponding UV unfolded face texture image by using a Deep3DFace method aiming at an input face region image I, obtaining the face rough three-dimensional geometric shape, and recording the face rough three-dimensional geometric shape as a first reconstruction result,

wherein G= { V, S }, consists of a geometric shape vertex set and a geometric shape surface S connected by the vertices,for the total number of vertexes, the UV expansion aims at spreading and expanding the information on the surface of the three-dimensional face model to a two-dimensional plane in a plane projection mode;

s3, converting the first reconstruction result based on a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result,

wherein the geometrically refined network formalization is defined as follows:

wherein the method comprises the steps ofFor position-coding operations, +.>A sign distance function predictor representing a sign distance function at a point in three-dimensional space, wherein the differentiable rendering network +.>Formalized definition is as follows:

,

wherein,light propagation direction value representing point x, +.>Representation->Normal vector at>Representing that the differentiable rendering network predicts to be in +.>The RGB values at which the color data are obtained,

wherein, the grid structure of the face rough three-dimensional geometric shape G is an unclosed three-dimensional surface, and the GT value acquisition of the symbol distance function of the grid structure is defined as follows:

where x is defined as a point outside the geometry surface S in the face rough three-dimensional geometry G,calculating the intersection point coordinates of the line and the reconstruction surface along the light propagation direction v by using sphere tracking sphere-tracking operator for the x point>Defined as the representation of the sign distance function for the geometric surface S from the reference at the x-point,/->Defining a thickness value of the grid structure;

s4, converting the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional grid topological structure, carrying out vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model, wherein,

the vertex registration of the three-dimensional geometric data is as follows: for the rough three-dimensional geometry of the human face, forward propagation is carried out on input vertexes in the optimization process of a geometric refinement network to obtain corresponding symbol distance function values, gradient calculation is carried out by calculating the difference of symbol distance function values of adjacent vertexes to obtain the normal direction of each vertex, displacement is carried out at each vertex according to the estimated normal direction and the corresponding symbol distance function values, the optimized symbol distance function is converted into an original three-dimensional grid topological structure to generate a final three-dimensional human face reconstruction model,

the three-dimensional geometric data is subjected to vertex refinement: points near the implicit surface have a small displacement, while points far from the implicit surface have a large displacement, and the refined vertices for which vertex displacement registration is performed are expressed as:

2. The three-dimensional face reconstruction method based on the symbol distance function representation according to claim 1, wherein step S3 is to optimize the overall optimization loss at x-point during the optimizationThe expression of (2) is as follows:

wherein,，/>

3. A three-dimensional face reconstruction device based on a symbolic distance function representation, comprising:

the face region extraction unit is used for acquiring a front face image of a user, acquiring a face region in the front face image by adopting a yolov7-face detection network model, and uniformly scaling the face region to the pixel size to obtain a face region image I:

a first reconstruction unit, configured to obtain a pre-trained three-dimensional deformable face statistical model, construct a corresponding face rough three-dimensional geometry G and a corresponding UV-unfolded face texture image by using a Deep3DFace method for an input face region image I, obtain a face rough three-dimensional geometry, and record the face rough three-dimensional geometry as a first reconstruction result,

a second reconstruction unit for converting the first reconstruction result based on a symbol distance function to obtain three-dimensional implicit expression data of the first reconstruction result, and optimizing the three-dimensional implicit expression data through a geometric refinement network and a differentiable rendering network to obtain a fine second reconstruction result,

wherein the geometrically refined network formalization is defined as follows:

,

a vertex registration unit for converting the second reconstruction result back to three-dimensional geometric data in the form of a three-dimensional grid topological structure, performing vertex registration and refinement on the three-dimensional geometric data to generate a final three-dimensional face reconstruction model, wherein,

4. A three-dimensional face reconstruction device based on a symbolic distance function representation, comprising a memory and a processor, wherein the memory has stored therein a computer program executable by the processor to implement the three-dimensional face reconstruction method based on a symbolic distance function representation as claimed in any one of claims 1 to 2.