CN113111861A

CN113111861A - Face texture feature extraction method, 3D face reconstruction method, device and storage medium

Info

Publication number: CN113111861A
Application number: CN202110519887.5A
Authority: CN
Inventors: 陈达勤; 王淳; 浣军; 宋博宁; 娄明; 李曈
Original assignee: Beijing Shenshang Technology Co ltd
Current assignee: Beijing Shenshang Technology Co ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-07-13

Abstract

The invention discloses a method and a device for extracting human face texture features and reconstructing a 3D human face and a storage medium, wherein the method comprises the following steps: extracting 2D face key points of the face image; obtaining an initial texture map according to the face geometric model; mapping the key points of the human face to an initial texture map to obtain the initial positions of the key points; predicting the motion field of each key point to obtain the predicted position of each key point in the initial texture map; and constructing transformation according to the initial position and the predicted position of the key point, and warping the initial texture map to obtain a final texture map. According to the scheme, the key points of the human face are mapped to the initial texture map to obtain initial positions, the positions of the key points in the initial texture map are predicted, the initial texture map is distorted through the initial positions and the predicted positions to obtain a final texture map, the obtained texture map is more real, sensory differences between a rendered image and an actual image in the 3D human face construction process are reduced, and the construction reality degree of the 3D human face model is improved.

Description

Face texture feature extraction method, 3D face reconstruction method, device and storage medium

Technical Field

The invention belongs to the technical field of 3D face modeling, and particularly relates to a face texture feature extraction and 3D face reconstruction method, a device and a storage medium.

Background

Image-based 3D face reconstruction is the reconstruction of a face from a single or limited number of images, which may be taken by any device in any scene. The 3D face reconstruction based on images is currently and commonly based on a scheme of a deformable 3D face Model (3D deformable Model, 3DMM), and a 3D face Model is obtained by solving appropriate deformable 3D face Model parameters from an image.

The 3D face model is usually represented in mesh format, consisting of vertices vertex and triangular patch faces. Each vertex has 3D coordinates x, y, z, and the triangular patches define the connections between the vertices. Usually, the connection relation of the triangular patches is not changed, and the coordinates of the vertexes are adjusted to obtain the face models with different forms. 3DMM as a kind of parameterized statistical model, the vertex 3D coordinates of the model can be expressed by the following formula:

wherein S is the 3D coordinate of the vertex (vertex) of the final face 3D model,

is the mean face 3D model (mean geometry) in statistical sense, B_idA group of mutually orthogonal linear substrates are obtained according to a statistical model, and the geometrical shapes of the human face under neutral expression, such as facial shapes, five sense organs shapes and the like, are influenced; b is_expIs a set of linear bases orthogonal to each other obtained from a statistical model, corresponding to the compilation of expressions, also called blendshape.

B_id、B_expGiven a fixed value for the 3D mm model, given any set of legal coefficients α, β, the 3D geometry of a corresponding face can be determined. And the recovery of 3D geometric information from a face image means that a suitable set of coefficients α and β is solved so that the rendered face shape represented by S corresponds to the given face image.

Some 3DMM models also include a statistical model T for texture:

wherein the content of the first and second substances,

is the average texture, B_tDifferent textures can be obtained by solving the appropriate coefficient delta for a fixed value given by the 3DMM model, but the textures have low reality sense.

In the existing scheme for estimating deformable 3D face model parameters from images, model parameters are regressed from face images by using a neural network, a face geometric model and textures are constructed according to the model parameters, and rendering is carried out. However, in the prior art, the texture is not taken from a real image, so that the sensory difference between a rendered image and an actual image is large, and the calculation of the visual difference is not reasonable. How to reduce the sensory difference between the rendered image and the actual image and improve the construction reality degree of the 3D face model is worthy of study.

Disclosure of Invention

The invention provides a method and equipment for extracting texture features of a human face, reconstructing a 3D human face and a storage medium, aiming at solving the problem of large sensory difference between a rendered image and an actual image caused by unreal texture extraction in the prior art, wherein the extracted texture image is more accurate by predicting a motion field of a key point of the human face and distorting the texture image through the predicted motion field, the sensory difference between the rendered image and the actual image is reduced, and the construction reality degree of a 3D human face model is improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a face texture feature extraction method in a first aspect, which comprises the following steps:

extracting 2D face key points of the face image;

obtaining an initial texture map according to the face geometric model;

mapping the key points of the human face to an initial texture map to obtain the initial positions of the key points;

predicting the motion field of each key point to obtain the predicted position of each key point in the initial texture map;

and constructing transformation according to the initial position and the predicted position of the key point, and warping the initial texture map to obtain a final texture map.

According to the scheme, the key points of the human face are mapped to the initial texture map to obtain initial positions, the positions of the key points in the initial texture map are predicted, the initial texture map is distorted through the initial positions and the predicted positions to obtain a final texture map, the obtained texture map is more real, sensory differences between a rendered image and an actual image in the 3D human face construction process are reduced, and the construction reality degree of the 3D human face model is improved.

In one possible design, the obtaining the initial texture map according to the geometric model of the face includes:

projecting the geometric model of the human face, and determining the corresponding position of the vertex of the geometric model of the human face which is not shielded in the image of the human face;

extracting color information of the corresponding position of the vertex of the geometric model of the face which is not shielded in the face image;

and expanding the extracted color information according to the sampling relation between the human face geometric model and the texture map to obtain an initial texture map.

In one possible design, a neural network model is used to predict the motion field of each keypoint to obtain the predicted position of each keypoint.

The second aspect of the present invention provides a 3D face reconstruction method, including the following steps:

acquiring a face geometric model according to the face image;

extracting a face texture image from a face image based on a face geometric model by adopting the face texture feature extraction method of the first aspect;

converting a human face geometric model into information in a 2D mode, wherein the information in the 2D mode is a depth map or a normal vector map;

and inputting the information of the 2D mode and the face texture map into the trained neural network model to obtain a rendered image of the 3D face.

During 3D face reconstruction, the texture map is one of key factors influencing the construction reality degree of the 3D face model, the texture map acquired in the mode is more real, sensory differences between a rendered image and an actual image in the 3D face construction process are reduced, and the construction reality degree of the 3D face model is improved.

In one possible design, the obtaining the geometric model of the human face includes:

acquiring an initial face geometric model of at least one face image and converting the at least one face image into a first silhouette image;

acquiring a projected image of the face geometric model, and converting the projected image into a second silhouette image;

and adjusting relevant parameters of the face geometric model and vertex coordinates of a corresponding optimization area of the face geometric model to enable the difference between the second silhouette image and the first silhouette image to be smaller than a threshold value, and acquiring the face geometric model corresponding to the second silhouette image at the moment as a final face geometric model, wherein the relevant parameters of the face geometric model comprise a parameter alpha relevant to the shape of the face, a parameter beta relevant to the expression of the face, a scale parameter s, a rotation parameter R and a translation parameter t.

When the 3D face is reconstructed, the face geometric model is also one of the key factors of the construction fidelity of the 3D face model. In the existing method, the human face shape estimation is inaccurate, and the sensory difference between a rendered image and an actual image is large. The scheme further optimizes the face geometric model, improves the accuracy of face shape estimation, and further improves the construction reality degree of the 3D face model. When the face geometric model is optimized, the initial face geometric model is converted into the second silhouette image, and the difference between the second silhouette image and the first silhouette image obtained by converting the face image is minimized by adjusting the relevant parameters of the face geometric model, namely, the initial face geometric model is adjusted to the optimal model, so that the estimation accuracy of the face form is improved.

In one possible design, the method further includes:

performing semantic segmentation on a face image to obtain at least one semantic segmentation region, wherein the semantic segmentation region comprises an eye region, an eyebrow region, a nose region and a mouth region;

determining vertex coordinates respectively corresponding to each semantic segmentation area in the at least one semantic segmentation area in the face geometric model;

the adjusting of the relevant parameters of the face geometric model and the vertex coordinates of the corresponding area of the face geometric model comprises:

and under the condition of keeping the vertex coordinates corresponding to other semantic segmentation areas unchanged, adjusting the related parameters of the face geometric model and the vertex coordinates corresponding to one semantic segmentation area to ensure that the difference between the first and second silhouette images is smaller than a threshold value.

According to the scheme, at least one semantic segmentation area is optimized and adjusted to achieve local optimization.

In a possible design, the adjusting the relevant parameters of the geometric face model and the vertex coordinates of the corresponding optimized region of the geometric face model so that the difference between the first and second silhouette images is smaller than a threshold value includes:

performing mask processing on an unoccluded area in the semantic segmentation area to obtain a mask image of the unoccluded area;

constructing a loss function according to the vertex coordinates of the corresponding optimized region of the face geometric model, the first silhouette image, the second silhouette image and the mask image of the unoccluded region;

calculating a derivative of a loss function to a relevant parameter of the face geometric model;

and adjusting the relevant parameters of the face geometric model to the minimum loss function according to the gradient by using a gradient descent algorithm.

A third aspect of the present invention provides a face 3D face reconstruction device, comprising:

the image preprocessing unit is used for processing the face image to obtain 2D key points of the face image;

the face geometric model acquisition unit is used for acquiring a face geometric model according to a face image;

the initial texture feature extraction unit is used for extracting an initial texture image from the face image according to the face geometric model;

the texture warping unit is used for warping the initial texture map according to the fact that 2D face key points of the face image are mapped to key points in the initial texture map and a predicted motion field of each key point to obtain a final texture map;

the face geometric model conversion unit converts the face geometric model into information in a 2D mode, wherein the information in the 2D mode is a depth map or a normal vector map;

and the rendering unit generates a 3D face according to the information of the 2D mode and the final texture map.

A fourth aspect of the present invention provides an apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a face texture feature extraction method as described in any one of the first aspect or a 3D face reconstruction method as described in the second aspect when executing the computer program.

A fifth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a face texture feature extraction method as described in any one of the first aspects or a 3D face reconstruction method as described in the second aspect.

Compared with the prior art, the invention at least has the following advantages and beneficial effects:

1. according to the scheme, the key points are mapped to the initial texture map to obtain initial positions, the motion field of each key point is predicted to obtain a predicted position, the distortion of the texture is realized through the construction and transformation of the initial positions and the predicted positions of the key points, the texture map obtained through the distortion is more real, the rendering effect is improved, and the 3D face reconstruction accuracy is improved.

2. According to the scheme, the texture features of the face are distorted on the 2D layer, the sampling deviation caused by inaccurate estimation of the face geometric model is made up to a certain extent, the deterministic sampling process is changed into flexible sampling with certain tolerance, and the accuracy of subsequent 3D face construction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a process diagram of preprocessing a face image.

Fig. 2 is a flowchart of a method for extracting facial texture features.

Fig. 3 is a flowchart of a 3D face reconstruction method.

FIG. 4 is a silhouette image of a geometric model of a human face.

Fig. 5 is a schematic diagram of a 3D face reconstruction device.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. Specific structural and functional details disclosed herein are merely illustrative of example embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments of the present invention.

It should be understood that, for the term "and/or" as may appear herein, it is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time; in addition, for the character "/" that may appear herein, it generally means that the former and latter associated objects are in an "or" relationship.

It should be understood that specific details are provided in the following description to facilitate a thorough understanding of example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure the examples in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring example embodiments.

Examples

As shown in fig. 2, a first aspect of the present invention provides a face texture feature extraction method, which may be executed on a feature extraction device, where the feature extraction device may be a computer, a server, a portable intelligent device, or other intelligent devices. The method for extracting the face texture features comprises the following steps of S101 to S103.

Step S101, uploading a face image by a user, as shown in a in fig. 1, after receiving the face image, the feature extraction device first pre-processes the face, including extracting 2D key points of the face image, to obtain a b image in fig. 1. And aligning the face correction of the image to a standard reference coordinate system according to the key point to obtain a normalized face image. Further, the normalized face image can be further processed, including but not limited to pre-processing operations such as deblurring, super resolution, illumination equalization and the like, so that the image quality is improved, and the acquisition precision of the 3D face model coefficient and the external parameters relative to the camera coordinate system is improved.

The human face image is subjected to semantic segmentation to obtain at least one semantic segmentation region and a region of a shielding object, wherein the semantic segmentation region comprises an eye region, an eyebrow region, a nose region and a mouth region, any region in the semantic segmentation region can also be a region shielded by the shielding object, or can also be a region which is not shielded by the shielding object, namely an effective human face region, and the shielding object comprises a region shielded by glasses, a mask or other shielding objects. As shown in a c diagram in fig. 1, each piece of different color information represents a specific semantic region of the face, wherein, since the c diagram is adjusted to be black and white, a specific semantic region of the face represented at different depths in the c diagram is labeled as I_mask. If the human face image has occlusion, the human face image can be supplemented by a human face supplementation algorithm, and then the texture image is extracted. If a plurality of face images exist, a good texture map can be manufactured by at least one of a multi-image complementary fusion mode and a super-resolution algorithm and combining a face complementing algorithm.

Step S102, obtaining an initial texture map according to the face geometric model, specifically, obtaining the face geometric model according to the face image, and then obtaining the initial texture map according to the face geometric model, wherein the step S1021 to step S1022 are included.

And S1021, acquiring a human face geometric model according to the human face image. And errors may exist in the normalization processing process, at this time, the normalized face image is input into the trained parameter regression model to obtain a 3D face model coefficient corresponding to the face image and an external parameter of the face model relative to a camera coordinate system, wherein the 3D face model coefficient comprises a parameter alpha related to the face shape and a parameter beta related to the face expression, and the external parameter of the face model relative to the camera coordinate system comprises a scale parameter s, a rotation parameter R and a translation parameter t.

According to the estimated 3D face model coefficient sumConstructing a face geometric model relative to external parameters of a camera coordinate system, and obtaining a corresponding position through similarity transformation alignment to obtain an initial face geometric model S₀，

Wherein the content of the first and second substances,

If the number of vertices of the geometric model is small, step S1015 may be performed: for the initial face geometric model S₀And performing vertex interpolation or mesh smoothing processing, or performing vertex interpolation and mesh smoothing processing simultaneously. By adopting the steps, the number of vertexes can be increased, the continuity of the face mesh is maintained, the sampling rate is enhanced, the definition of the texture image is improved, the adjustment freedom degree of fine adjustment of the face geometric model is enhanced, and the shape of the face image can be better fitted.

And step S1022, after the face geometric model is determined, the initial texture map is determined according to the face geometric model.

Specifically, the geometric model of the face is projected, and the corresponding position of the vertex of the geometric model of the face, which is not occluded, in the face image is determined, where the occluded region may be a glasses occluded region or a region occluded by other occlusion objects.

And extracting color information of the corresponding position of the vertex of the geometric model of the face, which is not shielded, in the face image.

And expanding the extracted color information according to a sampling relation between the human face geometric model and the texture map to obtain an initial texture map, and specifically, performing UV expansion operation on the color information to obtain the initial texture map. At this time, if the obtained texture map is unclear, the definition of the texture map can be improved by adopting a super-resolution algorithm.

And step S103, mapping the key points of the human face to the initial texture map to obtain the initial positions of the key points.

And S104, predicting the motion field of each key point by using the neural network model to obtain the predicted position of each key point in the initial texture map.

And S105, constructing transformation according to the initial position and the predicted position of the key point, and twisting the initial texture map to obtain a final texture map.

According to the method for extracting the human face texture features, provided by the first aspect of the invention, the texture is distorted, and if shielding exists in the semantic segmentation area of the human face image, completion or multi-image fusion can be carried out on the basis of multiple human face images to obtain complete texture, so that the integrity of the texture is improved. According to the scheme, the neural network model based on deep learning realizes local distortion deformation, so that the tolerance of the texture extraction process to the errors of the geometric model of the face is made up, and the definition of the texture is effectively improved.

A second aspect of the present invention provides a 3D face reconstruction method, which may be executed on a face reconstruction device, where the face reconstruction device may be a computer, a server, a portable intelligent device, or other intelligent devices. As shown in fig. 3, the 3D face reconstruction method includes the following steps S201 to S204. The method comprises the following steps:

step S201, a human face geometric model is obtained according to the human face image. This step can be obtained based on steps S101 and S102 in the first aspect, and is not described herein again.

Step S202, extracting a face texture image from the face image based on the face geometric model by adopting the face texture feature extraction method of the first aspect, and the specific process is detailed in the specific steps of the first aspect.

Step S203, converting the face geometric model into information of a 2D mode, wherein the information of the 2D mode is a depth map or a normal vector map;

and S204, inputting the 2D mode information and the face texture map into the trained neural network model to obtain a rendered image of the 3D face.

A third aspect of the present embodiment provides a 3D face reconstruction method, which is an optimization scheme of the method of the second aspect. The 3D face reconstruction method specifically comprises the following steps S301 to S304. The method comprises the following steps:

step S301, acquiring a face geometric model according to the face image, wherein the step S3011 to step S30312 are specifically included.

Step S3011, an initial face geometric model may be obtained based on steps S101 and S102 in the first aspect.

Step S3012, further optimizing the face geometric model, specifically, the optimizing steps are as follows:

converting a face image uploaded by a user into a first silhouette image I_silhouette。

Acquiring a projected image of the face geometric model, and converting the projected image into a second silhouette image IS_silhouette. In this step, the projection image of the geometric model of the human face can be obtained by performing perspective projection operation on the geometric model of the human face.

Further, adjusting relevant parameters of the face geometric model and vertex coordinates of a corresponding optimization area of the face geometric model to enable the difference between the second silhouette image and the first silhouette image to be smaller than a threshold value, and acquiring the face geometric model corresponding to the second silhouette image at the moment as a final face geometric model.

The difference between the first and second silhouette images may be achieved by: performing mask processing on an unoccluded area in the semantic segmentation area to obtain a mask image of the unoccluded area; and constructing a loss function L, solving the derivative of the loss function L to the five parameters of alpha, beta, s, R and t, and adjusting the parameter value according to the gradient by using a gradient descent algorithm. And repeating the iteration for a plurality of times to minimize the loss function L, namely adjusting the parameters to the optimal parameters.

Step S103 is a process of performing iterative optimization on the initial geometric model, and the iterative optimization may be performed by local optimization, global optimization, and first local optimization and then global optimization. The first and second images are further processed when a certain area is optimized for optimization adjustment, that is, the corresponding optimized areas of the first and second images are simultaneously adjusted to be the same color, which may be black or white, and the areas are opposite colors.

Now, a detailed process of this embodiment will be described in a manner of performing only global optimization, where the eye region, the eyebrow region, the nose region, and the mouth region of the first and second silhouette images are first turned to white, and the face region is black, as shown in fig. 4.

Constructing a loss function L:

wherein, I_silhouetteFor converting a face image into a first silhouette image, IS_silhouette(alpha, beta, s, R, t) is a second silhouette image converted from the geometric model of the human face; i is_maskThe mask image is a mask image of an unoccluded region of the human face obtained by semantic segmentation; lambda [ alpha ]_iRepresents V_iAnd V_jWeight between which the relative displacement changes, V_iIndicating the optimized region, i.e. the ith vertex, V in the face image_jRepresentation and vertex V_iAdjacent vertex, N_iRepresentation and vertex V_iAdjacent vertex V_jSet of sequence numbers of S_allFor the eyebrow area, eye area, nose area and mouth area of the human face_iA set of sequence numbers of.

And solving the derivatives of the loss function L to the five parameters of alpha, beta, s, R and t, and adjusting the parameter values according to the gradient by using a gradient descent algorithm. And repeating the iteration for multiple times to minimize the loss function L, namely adjusting the parameters to the optimal parameters even if the difference between the second silhouette image and the first silhouette image is smaller than a threshold value, and acquiring the optimal human face geometric model corresponding to the second silhouette image as the final human face geometric model S.

In the optimization process, alpha, beta, s, R and t are regarded as adjustable variables, and the geometric information is regarded as adjustable variables, so that the geometric information is allowed to deform in a certain range on the basis of the constraint of alpha, beta, s, R and t, and the expression range of the geometric information is enlarged, and the accuracy of constructing the face geometric model is improved.

The detailed process of this embodiment will now be described in a manner of performing a local optimization:

taking mouth optimization as an example, the mouth regions of the first and second silhouette images are adjusted to be white or black, and correspondingly, the eye region, the eyebrow region, the nose region and the face region are adjusted to be black or white. Determining vertex V corresponding to mouth region in human face geometric model_iKeeping the vertex coordinates corresponding to other semantic segmentation areas unchanged, such as an eye area, an eyebrow area and a nose area, only adjusting the vertex coordinates of the mouth area and related parameters of the geometric model of the face, and specifically, constructing a loss function L according to the vertex coordinates of the corresponding optimized area of the geometric model of the face, namely the vertex coordinates of the mouth area, the first silhouette image and the second silhouette image:

wherein, I_silhouetteFor converting a face image into a first silhouette image, IS_silhouette(alpha, beta, s, R, t) is a second silhouette image converted from the geometric model of the human face; i is_maskObtaining a mask image of an obtained area where the face is not shielded according to semantic segmentation; lambda [ alpha ]_iRepresents V_iAnd V_jWeight between which the relative displacement changes, V_iI-th vertex, V, representing the optimization region, i.e. the mouth region_jRepresentation and vertex V_iAdjacent vertex, N_iRepresentation and vertex V_iAdjacent vertex V_jSet of sequence numbers of S_mouthIs a set of sequence numbers for the vertices of the optimization region, i.e., the mouth region.

In the optimization process, alpha, beta, s, R and t are taken as adjustable variables, and the geometric information is taken as the adjustable variables, so that the geometric information is allowed to deform in a certain range on the basis of the constraint of alpha, beta, s, R and t, and the expression range of the geometric information is enlarged, and the accuracy of constructing the face geometric model is improved.

If only the eye area, the eyebrow area and the nose area are optimized separately, the processing mode of the mouth can be referred to.

Optimally, the detailed process of the embodiment is now described in a manner of local optimization and then global optimization:

the description will be given by taking an example of a sequence of optimizing the mouth region, the eye region, the eyebrow region, and the nose region, and then performing global end-to-end optimization, but the local optimization may also adopt other ordering modes.

And obtaining an optimal second silhouette image under local optimization by adopting the mouth optimization mode, and determining a human face geometric model S1 corresponding to the optimal second silhouette image. And adjusting the eye areas of the first and second silhouette images to be white or black, and adjusting the mouth areas, the eyebrow areas, the nose areas and the face areas to be black or white. Keeping vertex coordinates corresponding to other semantic segmentation areas unchanged, such as a mouth area, an eyebrow area and a nose area, only adjusting vertex coordinates of an eye area and related parameters of a face geometric model, and specifically, constructing a loss function construction loss function L according to the vertex coordinates of a corresponding optimization area of the face geometric model, namely the vertex coordinates of the eye area, a first silhouette image and a second silhouette image, and continuously performing iterative optimization on the eye area:

wherein, I_silhouetteFor converting a face image into a first silhouette image, IS_silhouette(alpha, beta, s, R, t) is a second silhouette image converted from the geometric model of the human face; i is_maskObtaining a mask image of an obtained area where the face is not shielded according to semantic segmentation; lambda [ alpha ]_iRepresents V_iAnd V_jWeight between which the relative displacement changes, V_iI-th vertex, V, representing an optimized region, i.e. an eye region_jRepresentation and vertex V_iAdjacent vertex, N_iRepresentation and vertex V_iAdjacent vertex V_jSet of sequence numbers of S_eyeIs a set of sequence numbers for the vertices of the optimization area, i.e. the eye area.

And solving the derivative of the loss function L to the relevant parameters of the five human face geometric models of alpha, beta, s, R and t, and adjusting the parameter value according to the gradient by using a gradient descent algorithm. And repeating the iteration for multiple times to minimize the loss function L, namely adjusting the parameters to the optimal parameters even if the difference between the second silhouette image and the first silhouette image is smaller than a threshold value, and acquiring the optimal human face geometric model corresponding to the second silhouette image as a final human face geometric model S2, namely optimizing the human face geometric model S1 to the human face geometric model S2.

The optimization of the eyebrow area and the nose area is also carried out by adopting the method, when the eyebrow area is optimized, the eyebrow areas of the first and second silhouette images are adjusted to be white or black, and the mouth area, the eye area, the nose area and the face area are adjusted to be black or white; and keeping the vertex coordinates of the eye region, the nose region and the mouth region unchanged, and only adjusting the vertex coordinates of the eyebrow region and the related parameters of the face geometric model to obtain the optimal face geometric model S3. When the nose area is optimized, adjusting the nose areas of the first and second silhouette images to be white or black, and adjusting the mouth area, the eye area, the eyebrow area and the face area to be black or white; and keeping the vertex coordinates of the eye region, the eyebrow region and the mouth region unchanged, and only adjusting the vertex coordinates of the nose region and the related parameters of the human face geometric model to obtain the optimal human face geometric model S4.

Finally, performing end-to-end global optimization, and adjusting the nose area, the mouth area, the eye area and the eyebrow area of the first and second silhouette images to be white or black, and the other face areas to be black or white; and adjusting relevant parameters of the face geometric model and vertex coordinates of an eye region, an eyebrow region, a nose region and a mouth region in the face geometric model, and solving the face geometric model corresponding to the second silhouette image as a final face geometric model S by adopting the method for constructing the loss function.

The scheme is based on initial geometric information S₀According to semantic division of geometric vertexes of the human face, local human face geometries such as eyes, a nose, a mouth and a face are sequentially and independently optimized and fine-tuned through local 3D space transformation, the vertexes of corresponding semantics are allowed to change along with changes of geometric parameters, each vertex can freely move in the space, and smooth constraint is added between adjacent vertexes. When the corresponding position is optimized, other vertex parts are not moved, the change is limited to be local, and smooth continuity is realized. After each semantic segmentation area is optimized, end-to-end optimization of geometric coefficients and vertexes is carried out. By adopting the scheme of the embodiment, whether the optimization is local optimization or end-to-end optimization, the alpha, beta, s, R and t are regarded as adjustable variables, and the geometric information is also regarded as adjustable variables, so that the geometric information is allowed to deform in a certain range on the basis of the constraint of the alpha, beta, s, R and t, the expression range of the geometric information is enhanced, and the accuracy of constructing the face geometric model is enhanced.

Based on the method of the step, a plurality of face images of the same person can be optimized in a combined mode, the same alpha coefficient is shared in the optimization process, specifically, the first silhouette image and the second silhouette image corresponding to the plurality of face images are obtained through the method, the same alpha coefficient is shared in each optimization process, the respective beta, s, R and t and the vertex coordinates of the corresponding optimization area of the geometric face model corresponding to each image are adjusted, and the optimization areas are the same in each optimization process.

Step S302, a face texture image is extracted from a face image based on a face geometric model by adopting the face texture feature extraction method of the first aspect, and the specific process is detailed in the specific steps of the first aspect. If the occlusion area exists in the face image, in the step, the face image can be supplemented by a face supplementation algorithm, and then the texture image is extracted. If a plurality of face images exist, the face images can be supplemented by at least one of a multi-image complementary fusion mode and a super-resolution algorithm and combining a face supplementation algorithm, and then texture images are extracted to manufacture good texture images. And twisting the plurality of initial texture maps respectively by adopting the method.

This step needs to be performed after the geometric face model training optimization convergence of step S301 and the relevant parameters of the geometric face model are solidified.

And step S303, converting the face geometric model into information of a 2D mode, wherein the information of the 2D mode is a depth map or a normal vector map. The method can be realized by adopting a differentiable renderer, and the geometric model of the human face is input into the differentiable renderer which is trained to output a depth map and a normal vector map.

And S304, inputting the 2D mode information and the face texture map into the trained neural network model to obtain a rendered image of the 3D face. The differentiable renderer is based on the graphics principle, rendering reality deviation is caused, the calculation of the perception difference between a rendered image and a real image is not facilitated, the method is used for achieving rendering by adding a neural network model, and the reality degree of 3D face construction is further improved. The neural network model adopts a deep neural network to realize rendering.

Rendering is an important step of 3D face reconstruction, and the third aspect of the scheme utilizes a generation confrontation network to construct a neural renderer to replace a traditional graphics rendering engine and improve the reality of rendering.

According to the face texture feature extraction method, the learnable distortion is carried out on the 2D layer, the sampling deviation caused by inaccurate estimation of face geometric model parameters alpha, beta, s, R and t is made up to a certain extent, the deterministic sampling process is changed into flexible sampling with certain tolerance, and the accuracy of subsequent 3D face construction is improved.

The invention provides a human face 3D human face reconstruction device, which comprises an image preprocessing unit, a human face geometric model obtaining unit, an initial texture feature extracting unit, a texture distortion unit, a human face geometric model converting unit and a rendering unit.

The image preprocessing unit processes the face image to obtain 2D key points of the face image. In the optimized human face 3D human face reconstruction device, the image preprocessing device also comprises the step of carrying out region segmentation on the human face image.

The face geometric model obtaining unit is used for obtaining a face geometric model according to a face image, and specifically, the face geometric model obtaining unit can be based on a parametric regression model or based on the parametric regression model and a geometric optimization model, wherein the parametric regression model takes the face image as input and takes a 3D face model coefficient and an external parameter of the face model relative to a camera coordinate system as output; the geometric optimization model adopts the steps S3011 to S30312 in the third aspect to realize the optimization of the face geometric model.

The initial texture feature extraction unit is used for extracting an initial texture image from the face image according to the face geometric model.

The texture warping unit warps the initial texture map according to the fact that 2D face key points of the face image are mapped to key points in the initial texture map and the predicted motion field of each key point to obtain a final texture map, the texture warping unit can be achieved based on a mapping module, a neural network model and a warping module, and the mapping module maps the key points of the face to the initial texture map to obtain the initial position of each key point; predicting a local motion field of each key point in the initial texture map by using the neural network model to obtain a predicted position of each key point; and the distortion module constructs transformation according to the initial position and the corresponding predicted position, and distorts the initial texture map to obtain a final texture map.

The face geometric model conversion unit converts the face geometric model into information of a 2D mode, the information of the 2D mode is a depth map or a normal vector map, and the face geometric model conversion unit can render by adopting a differentiable renderer to obtain the depth map or the normal vector map.

The rendering unit generates a 3D face according to the information of the 2D mode and the final texture map, and the rendering unit can adopt a neural network renderer.

The texture warping unit is a differentiable learnable module based on a neural network model and needs to be trained before use. During training, an initial texture map corresponding to the face image in the training set is obtained by any one of the methods in the first aspect to the third aspect and is input into the neural network model, and a final distorted texture map corresponding to the face image is obtained. Inputting the distorted texture map and the fine-tuned human face geometric model into a differentiable renderer to obtain a rendered human face image I_render. Because the geometric model and the texture of the human face are both from the preprocessed human face image I, the human face image I_renderShould be matched with the face image I in the non-shielding I_maskAfter the rendered face image is obtained, as identically as possible, by constructing a loss function W,

W＝|I_render-I|*I_mask，

the texture warping unit is trained by minimizing the loss function W, as follows: the texture warping unit to key point motion field prediction is initialized to Identity mapping (a mapping that has the output equal to the input), i.e. the default predicted position is the current position. And fixing the fine-tuned human face geometric model without changing in training. Calculating a loss function W, and training a texture distortion module by using a gradient descent algorithm; the training is completed until the loss function converges to a small stable value.

The rendering unit adopts a neural network renderer, and similarly, needs to be trained before use. During training, the relevant parameters of the face geometric model and the texture distortion unit are fixed, and are not changed during training. And training the neural renderer by a method of constructing the loss function until the constructed loss function converges to a smaller stable value, thereby completing the training.

When the equipment is constructed, considering that the geometric parameter estimation and the texture related parameter estimation are mutually influenced, a face geometric model obtaining unit and an initial texture feature extracting unit are trained separately until two parts of networks converge to a better weight, the two parts of networks are communicated, and end-to-end fine tuning training is carried out to obtain a more optimal geometric and texture parameter combination.

A fifth aspect of the present invention provides a face texture feature extraction method, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the face texture feature extraction method in the first aspect when executing the computer program. By way of specific example, the Memory may include, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Flash Memory (Flash Memory), a first-in-first-out Memory (FIFO), a first-in-last-out Memory (FILO), and/or the like; the processor is not limited to a processor integrated with a NPU (neutral-network processing unit) by using a microprocessor of model STM32F105 series, an ARM, an X86, and the like.

A sixth aspect of the present invention provides a face 3D face reconstruction apparatus, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor, when executing the computer program, implements the face 3D face reconstruction method according to the second aspect or the third aspect, and any one of the possibilities. As in the fourth aspect, the Memory may include, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Flash Memory (Flash Memory), a first-in-first-out Memory (FIFO), a first-in-last-out Memory (FILO), and/or the like; the processor is not limited to a processor integrated with a NPU (neutral-network processing unit) by using a microprocessor of model STM32F105 series, an ARM, an X86, and the like.

A sixth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the face texture feature extraction method of the first aspect or the face 3D face reconstruction method of the second or third aspect and any possible one of them. The computer-readable storage medium refers to a carrier for storing data, and may include, but is not limited to, floppy disks, optical disks, hard disks, flash memories, flash disks and/or Memory sticks (Memory sticks), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

Finally, it should be noted that: the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A face texture feature extraction method is characterized by comprising the following steps:

extracting 2D face key points of the face image;

obtaining an initial texture map according to the face geometric model;

2. The method for extracting facial texture features according to claim 1, wherein the obtaining of the initial texture map according to the facial geometric model comprises:

3. The method for extracting facial texture features as claimed in claim 1, wherein a neural network model is used to predict the motion field of each key point to obtain the predicted position of each key point.

4. A3D face reconstruction method is characterized by comprising the following steps:

acquiring a face geometric model according to the face image;

extracting a face texture image from a face image based on a face geometric model by adopting the face texture feature extraction method of claims 1-3;

5. The 3D face reconstruction method according to claim 4, wherein the obtaining the face geometric model comprises:

6. A3D face reconstruction method according to claim 5, characterized in that the method further comprises:

7. The method of claim 5, wherein the adjusting the parameters related to the geometric model of the human face and the vertex coordinates of the corresponding optimized region of the geometric model of the human face to make the difference between the first and second silhouette images smaller than a threshold value comprises:

8. A face 3D face reconstruction device, comprising:

9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the face texture feature extraction method of any one of claims 1 to 3 or the 3D face reconstruction method of any one of claims 4 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the face texture feature extraction method of any one of claims 1 to 3 or the 3D face reconstruction method of any one of claims 4 to 7.