CN116843834A

CN116843834A - Three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment

Info

Publication number: CN116843834A
Application number: CN202310809920.7A
Authority: CN
Inventors: 朱翔昱; 徐淼; 雷震
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-10-03

Abstract

The invention provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, a device and equipment, wherein the method comprises the following steps: acquiring a face image to be reconstructed; inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed by the three-dimensional face reconstruction model; the three-dimensional face reconstruction model is obtained by training based on a three-dimensional reconstruction prediction result of a sample face image and a label three-dimensional reconstruction result of the sample face image, and restraining six-degree-of-freedom pose of the sample face image based on an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of a face part. The method, the device and the equipment provided by the invention have the advantages that the accuracy of the face gesture is improved, the accuracy and the reliability of the face reconstruction are further improved, and the robustness of the face reconstruction is improved.

Description

Three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment

Technical Field

The present invention relates to the field of computer vision, and in particular, to a method, apparatus, and device for three-dimensional face reconstruction and pose estimation with six degrees of freedom.

Background

With the development of deep learning, technologies such as virtual wearing, virtual makeup, video editing, animation production, fatigue recognition and the like are continuously improved and perfected, and the application demands of many mobile phones and computers are also continuously increasing.

The affine transformation-based face reconstruction method has the problem that reconstruction is unstable due to distortion generated in the picture caused by the change of the face posture. In addition, the previous work about the pose of the face often only focuses on three euler angles of the face, namely pitch, yaw and roll, ignores the offset of the face in the picture, and the obtained rotation angle parameter is not accurate enough, so that the requirement of application scenes such as AR (Virtual Reality)/VR (Augmented Reality) and the like cannot be met. For example: in the application of virtual glasses try-on, inaccurate head posture estimation can cause inaccurate projection of the glasses on the head, and the required posture is not only three rotation angles, but also the offset of the face on the picture is particularly important; in the recently occurring application of virtual makeup changes, the fitting channel requires accurate face pose to achieve pixel-level overlap of virtual makeup on the face, nor is the previous approach adequate.

Disclosure of Invention

The invention provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment, which are used for solving the defect that in the face reconstruction method based on affine transformation in the prior art, the reconstruction is unstable due to distortion generated in a picture due to the change of the face pose.

The invention provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, which comprises the following steps:

acquiring a face image to be reconstructed;

inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed by the three-dimensional face reconstruction model;

the three-dimensional face reconstruction model is obtained by training based on a three-dimensional reconstruction prediction result of a sample face image and a label three-dimensional reconstruction result of the sample face image, and restraining six-degree-of-freedom pose of the sample face image based on an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of a face part.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the training steps of the three-dimensional face reconstruction model comprise:

Acquiring a first sample face image, a second sample face image only comprising a face, a true six-degree-of-freedom pose of the face, a label three-dimensional reconstruction result, a label association matrix, label three-dimensional coordinates corresponding to each point and an initial three-dimensional face reconstruction model;

projecting the initial three-dimensional face reconstruction model and the real six-degree-of-freedom pose of the face into an image to obtain a face position coordinate label;

determining the position coordinates of the face in the second sample face image;

extracting global features of the face image of the first sample face image, three-dimensional features and local features corresponding to the second sample face image;

determining a three-dimensional reconstruction prediction result of the second sample face image based on the three-dimensional features corresponding to the second sample face image;

determining an incidence matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part based on the global features and the local features of the face image;

based on the incidence matrix and the three-dimensional reconstruction prediction result, obtaining a predicted three-dimensional point coordinate corresponding to each two-dimensional pixel;

and based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, labeling the face position coordinates and the position coordinates, associating the association matrix with the label association matrix, carrying out parameter iteration on the initial three-dimensional face reconstruction model by the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the label three-dimensional coordinates corresponding to each point to obtain a three-dimensional face reconstruction model.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the method comprises the steps of carrying out parameter iteration on the initial three-dimensional face reconstruction model based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, labeling the face position coordinates and the position coordinates, associating the association matrix with the label association matrix, and carrying out parameter iteration on the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the label three-dimensional coordinates corresponding to each point to obtain a three-dimensional face reconstruction model, wherein the method comprises the following steps:

determining a first loss based on the three-dimensional reconstruction prediction result and the tag three-dimensional reconstruction result;

determining a second loss based on the face position coordinate label and the position coordinate;

determining a third penalty based on the association matrix and the tag association matrix;

determining a fourth loss based on the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the tag three-dimensional coordinates corresponding to each point;

and carrying out parameter iteration on the initial three-dimensional face reconstruction model based on the first loss, the second loss, the third loss and the fourth loss to obtain the three-dimensional face reconstruction model.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the three-dimensional reconstruction prediction result of the second sample face image is determined based on the three-dimensional features corresponding to the second sample face image, and the three-dimensional reconstruction prediction result comprises the following steps:

determining a UV position diagram of the face based on the three-dimensional features;

and sequentially carrying out UV pairing and grid sampling on the UV position map of the face to obtain the three-dimensional reconstruction prediction result.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the method for determining the incidence matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part based on the global features and the local features of the face image comprises the following steps:

sampling face two-dimensional point coordinates from the position coordinates of the face in the second sample face image;

obtaining a full-image two-dimensional point coordinate based on the mapping from the second sample face image to the first sample face image;

determining a two-dimensional local feature based on the two-dimensional point coordinates of the face, and determining a two-dimensional global feature based on the two-dimensional point coordinates of the whole image;

determining three-dimensional local features based on the three-dimensional reconstruction prediction result, and obtaining three-dimensional global features based on the three-dimensional local features and a multi-layer perceptron;

And based on a transducer model, fusing the two-dimensional local features, the two-dimensional global features, the three-dimensional local features and the three-dimensional global features to obtain an incidence matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the human face part.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel are obtained based on the correlation matrix and the three-dimensional reconstruction prediction result, and then the method further comprises the following steps:

and estimating the pose based on the two-dimensional pixels and the predicted three-dimensional point coordinates corresponding to the two-dimensional pixels to obtain the pose of the face prediction with six degrees of freedom.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention, the determining of the position coordinates of the face in the second sample face image comprises the following steps:

inputting the second sample face image into a face segmentation model, and obtaining and outputting the position coordinates corresponding to the face by the face segmentation model; the face segmentation model is constructed based on a ResNet model.

The invention also provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation device, which comprises:

The acquisition unit is used for acquiring the face image to be reconstructed;

the three-dimensional reconstruction unit is used for inputting the face image to be reconstructed into a three-dimensional face reconstruction model, obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of a face in the face image to be reconstructed from the three-dimensional face reconstruction model;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the three-dimensional face reconstruction and six-degree-of-freedom pose estimation methods when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a three-dimensional face reconstruction and six degrees of freedom pose estimation method as described in any of the above.

The invention also provides a computer program product, which comprises a computer program, wherein the computer program realizes the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method according to any one of the above methods when being executed by a processor.

According to the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment provided by the invention, the three-dimensional face reconstruction model is obtained by training based on the three-dimensional reconstruction prediction result of the sample face image and the label three-dimensional reconstruction result of the sample face image and constraining the six-degree-of-freedom pose of the sample face image based on the three-dimensional reconstruction prediction result and the correlation matrix of the two-dimensional pixels of the face part, so that the accuracy of the face pose is improved, and the accuracy and reliability of the face reconstruction are further improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention;

FIG. 2 is a schematic flow chart of a training step of the three-dimensional face reconstruction model provided by the invention;

fig. 3 is a schematic flow chart of step 280 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention;

FIG. 4 is a schematic flow chart of step 250 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention;

fig. 5 is a schematic flow chart of step 260 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the invention;

FIG. 6 is a schematic structural diagram of a three-dimensional face reconstruction and six-degree-of-freedom pose estimation device provided by the invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced in sequences other than those illustrated and described herein, and that "first," "second," etc. distinguished objects generally are of the type.

The application provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, and fig. 1 is a flow chart diagram of the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the application, as shown in fig. 1, the method comprises the following steps:

step 110, a face image to be reconstructed is acquired.

Specifically, the face image to be reconstructed, that is, the image to be reconstructed of the three-dimensional face, may be acquired in advance by an image acquisition device, may be acquired in real time, may be acquired by downloading or scanning through the internet, and the embodiment of the application is not limited in particular.

Step 120, inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of a face in the face image to be reconstructed from the three-dimensional face reconstruction model;

Specifically, in order to better obtain the three-dimensional reconstruction result and the pose with six degrees of freedom of the face, before step 120, the three-dimensional face reconstruction model needs to be obtained through the following steps:

the sample face image and the label three-dimensional reconstruction result of the sample face image can be collected in advance, and an initial three-dimensional face reconstruction model can be constructed in advance.

In the process, the sample face image can be input into an initial three-dimensional face reconstruction model, a three-dimensional reconstruction prediction result of the sample face image is obtained and output by the initial three-dimensional face reconstruction model, six-degree-of-freedom pose of the sample face image is restrained based on the three-dimensional reconstruction prediction result and an incidence matrix of two-dimensional pixels of a face part in the training process, and the six-degree-of-freedom pose comprises 3D position information and face pose information.

Wherein, the association matrix of each three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part is marked as a matrix M, if the ith pixel is at the face vertex P _i ，P _j ，P _k On the triangular surface patch, the pixel i and the threeThe corresponding probability of each vertex is w _i ，w _j ，w _k I.e. row i column, row i column k is w _i ，w _j ，w _k The remainder are set to 0.

It is noted that w _i ，w _j ，w _k The sum of (2) is 1.

The correlation matrix between the three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part can be determined based on the global features of the face image corresponding to the sample face image and the local features corresponding to the cut-out image only comprising the face in the sample face image.

After obtaining a three-dimensional reconstruction prediction result based on the initial three-dimensional face reconstruction model, comparing the three-dimensional reconstruction prediction result with a label three-dimensional reconstruction result of a sample face image collected in advance, calculating to obtain a loss function value according to the difference degree between the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, carrying out parameter iteration on the initial three-dimensional face reconstruction model based on the loss function value, carrying out regression constraint on the six-freedom-degree pose of the sample face image based on the three-dimensional reconstruction prediction result and the correlation matrix of face part two-dimensional pixels in the training process, and marking the initial three-dimensional face reconstruction model after parameter iteration as a three-dimensional face reconstruction model.

It can be understood that the greater the degree of difference between the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result of the sample face image collected in advance, the greater the loss function value; the smaller the degree of difference between the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result of the pre-collected sample face image, the smaller the loss function value.

In the training process of the three-dimensional face reconstruction model, the function of obtaining and outputting a three-dimensional reconstruction result of a face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed from the three-dimensional face reconstruction model is learned.

According to the method provided by the embodiment of the invention, the three-dimensional face reconstruction model is obtained by training based on the three-dimensional reconstruction prediction result of the sample face image and the label three-dimensional reconstruction result of the sample face image and constraining the six-degree-of-freedom pose of the sample face image based on the three-dimensional reconstruction prediction result and the incidence matrix of the two-dimensional pixels of the face part, so that the accuracy of the face pose is improved, and the accuracy and reliability of the face reconstruction are further improved.

Based on the above embodiments, fig. 2 is a schematic flow chart of a training step of the three-dimensional face reconstruction model provided by the present invention, and as shown in fig. 2, the training step of the three-dimensional face reconstruction model includes:

Step 210, acquiring a first sample face image, a second sample face image only containing a face, a true six-degree-of-freedom pose of the face, a label three-dimensional reconstruction result, a label association matrix, label three-dimensional coordinates corresponding to each point and an initial three-dimensional face reconstruction model;

step 220, projecting the initial three-dimensional face reconstruction model and the real six-degree-of-freedom pose of the face into an image to obtain a face position coordinate label;

step 230, determining the position coordinates of the face in the second sample face image;

step 240, extracting a global feature of the face image of the first sample face image, a three-dimensional feature and a local feature corresponding to the second sample face image;

step 250, determining a three-dimensional reconstruction prediction result of the second sample face image based on the three-dimensional features corresponding to the second sample face image;

step 260, determining an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of the face part based on the global feature and the local feature of the face image;

step 270, based on the correlation matrix and the three-dimensional reconstruction prediction result, obtaining predicted three-dimensional point coordinates corresponding to each two-dimensional pixel;

And 280, based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, labeling the face position coordinates and the position coordinates, associating the association matrix with the label association matrix, and carrying out parameter iteration on the initial three-dimensional face reconstruction model by the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the label three-dimensional coordinates corresponding to each point to obtain a three-dimensional face reconstruction model.

Specifically, in order to better obtain the three-dimensional face reconstruction model, the three-dimensional face reconstruction model may be trained by:

the method can acquire a first sample face image, a second sample face image only containing a face, a true six-degree-of-freedom pose of the face, a label three-dimensional reconstruction result, a label association matrix, label three-dimensional coordinates corresponding to each point and an initial three-dimensional face reconstruction model. The first sample face image, the second sample face image only containing the face and the real six-degree-of-freedom pose of the face are all determined based on a scanning model and camera parameters thereof, when the scanning model scans the face image, the coordinate system of the face scanning model is a world coordinate system, and the camera projection parameters are determined according to the scanning model. The face scanning model is obtained by scanning a three-dimensional face by a scanner.

Then, projecting the initial three-dimensional face reconstruction model and the real six-degree-of-freedom pose of the face into an image to obtain a face position coordinate label, for example, projecting the initial three-dimensional face reconstruction model onto a black background image through the real six-degree-of-freedom pose of the face and camera parameters, setting the face to be white, presetting a solid background image with all pixel values of 255 for an original image 800 x 800, setting the covered part of the initial three-dimensional face reconstruction model after projecting the image through the real six-degree-of-freedom pose of the face and the camera parameters to be 0, and obtaining a binary face segmentation position image to obtain the face position coordinate label. The process takes the camera parameters as camera internal parameters, and takes the pose with the real six degrees of freedom of the human face as camera external parameters for projection. Wherein the camera projection parameters are determined from the scan model.

Projecting an initial three-dimensional face reconstruction model onto an image through a face true six-degree-of-freedom pose and camera parameters, wherein a triangular surface patch formed by every three face vertexes contains partial face pixels, and determining the probability of the pixels corresponding to the face vertexes through the barycenter coordinates and the pixel coordinates of the triangular surface patch.

It should be noted that, each two-dimensional pixel has a probability of corresponding to only three vertices of the triangular patch where it is located, and the probabilities of corresponding to the remaining vertices are all 0.

Specifically, each associated matrix is labeled as matrix M, if the ith pixel is at face vertex P _i ，P _j ，P _k On the triangular surface patch, the probability of the pixel i corresponding to the three vertexes is w _i ，w _j ，w _k I.e. row i column, row i column k is w _i ，w _j ，w _k The remainder are set to 0.

It is noted that w _i ，w _j ，w _k The sum of (2) is 1.

And determining the position coordinates of the face in a second sample face image, wherein the second sample face image can be obtained by cutting after the face is detected from the first sample face image through a face detection model.

For example, a second sample face image including only a face may be input to the face segmentation model, and the face may be segmented to obtain the position coordinates of the face in the second sample face image.

Further, global features of the face image of the first sample face image, three-dimensional features and local features corresponding to the second sample face image can be extracted, and here, the first sample face image and the second sample face image can be input into the feature extraction model to obtain global features of the face image of the first sample face image, three-dimensional features corresponding to the second sample face image and local features.

The feature extraction model may be a multi-layer convolutional neural network (Convolutional Neural Network, CNN) with a cascade structure, a deep neural network (Deep Neural Networks, DNN), a res net model, or the like, which is not particularly limited in the embodiment of the present invention.

For example, an encoder for constructing a feature extraction model based on a ResNet model is used for extracting features in an input image, and then a decoder is constructed through deconvolution layer stacking to decode the image features to respectively obtain three-dimensional features, global features and local features of a face image.

The encoder of the feature extraction model is constructed by the ResNet model, and can share the three-dimensional features and the local features, and parameters after model training can also be shared, but the decoder has the same structure and cannot share the parameters.

The global feature of the face image reflects the feature information of the global level of the face image of the first sample, the three-dimensional feature of the face image of the second sample reflects the feature information of the three-dimensional level of the face image of the second sample, and the local feature of the face image of the second sample reflects the feature information of the local level of the face image of the second sample.

And determining a three-dimensional reconstruction prediction result of the second sample face image based on the three-dimensional features corresponding to the second sample face image, for example, determining a UV position diagram of the face based on the three-dimensional features corresponding to the second sample face image, and obtaining the three-dimensional reconstruction prediction result of the second sample face image based on the UV position diagram.

And determining an incidence matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part based on the global features and the local features of the face image, wherein the incidence matrix can be realized based on a transducer model.

And then, based on the incidence matrix and the three-dimensional reconstruction prediction result, obtaining the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel.

Finally, after the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, the face position coordinate marking and the position coordinate, the association matrix and the label association matrix, the predicted three-dimensional point coordinate corresponding to each two-dimensional pixel and the label three-dimensional coordinate corresponding to each point are obtained, parameter iteration is carried out on the initial three-dimensional face reconstruction model based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, the face position coordinate marking and the position coordinate, the association matrix and the label association matrix, the predicted three-dimensional point coordinate corresponding to each two-dimensional pixel and the label three-dimensional coordinate corresponding to each point, and the three-dimensional face reconstruction model is obtained.

Based on the above embodiments, fig. 3 is a schematic flow chart of step 280 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the present invention, and as shown in fig. 3, step 280 includes:

step 281, determining a first loss based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result;

Step 282, determining a second loss based on the face position coordinate label and the position coordinate;

step 283, determining a third loss based on the correlation matrix and the tag correlation matrix;

step 284, determining a fourth loss based on the predicted three-dimensional point coordinates corresponding to the two-dimensional pixels and the label three-dimensional coordinates corresponding to the points;

and step 285, performing parameter iteration on the initial three-dimensional face reconstruction model based on the first loss, the second loss, the third loss and the fourth loss to obtain the three-dimensional face reconstruction model.

Specifically, the first loss can be determined based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, and the three-dimensional point is replaced by the UV position map, and the specific calculation formula of the first loss is as follows:

L _r ＝W·∑||U ^* -U||

wherein U is ^* The UV position map is generated according to the label three-dimensional reconstruction result, U is a position map predicted by the model, and W is a weight matrix with the same size as the UV position map.

It can be appreciated that the greater the degree of difference between the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, the greater the first loss; the smaller the degree of difference between the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, the smaller the first loss.

The second penalty may be determined based on the face location coordinate label and the location coordinates.

It can be understood that the greater the degree of difference between the face position coordinate label and the position coordinate, the greater the second loss; the smaller the degree of difference between the face position coordinate label and the position coordinate, the smaller the second loss.

A third penalty may be determined based on the association matrix and the tag association matrix, the third penalty being formulated as follows:

wherein M and M ^* Respectively a predicted incidence matrix and a label incidence matrix, wherein lambda is a weight, m is the number of two-dimensional pixels, n is the number of three-dimensional points, i represents an ith row, j represents a jth column, and D _KL Indicating loss of KL divergence (Kullback-Leibler Divergence).

It can be appreciated that the greater the degree of difference between the correlation matrix and the tag correlation matrix, the greater the third loss; the smaller the degree of difference between the correlation matrix and the tag correlation matrix, the smaller the third loss.

The fourth loss may be determined based on the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the tag three-dimensional coordinates corresponding to each point, where the fourth loss is formulated as follows:

L _c ＝||X ^3d -X ^3d* || ₁

wherein X is ^3d Representing the predicted three-dimensional point coordinates, X, corresponding to each two-dimensional pixel ^3d* And representing the three-dimensional coordinates of the label corresponding to each point.

It can be understood that the greater the degree of difference between the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the tag three-dimensional coordinates corresponding to each point, the greater the fourth loss; the smaller the degree of difference between the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the tag three-dimensional coordinates corresponding to each point, the smaller the fourth loss.

After the first loss, the second loss, the third loss and the fourth loss are obtained, parameter iteration can be performed on the initial three-dimensional face reconstruction model based on the first loss, the second loss, the third loss and the fourth loss, and the three-dimensional face reconstruction model is obtained.

Here, the initial three-dimensional face reconstruction model may be iterated by parameters based on the sum of the first, second, third, and fourth losses, or based on the weighted sum of the first, second, third, and fourth losses, resulting in a three-dimensional face reconstruction model.

Wherein the formula for determining the total loss based on the weighted sum of the first loss, the second loss, the third loss, and the fourth loss is as follows:

L＝λ ₁ L _r +λ ₂ L _s +λ ₃ L _m +λ ₄ L _c

wherein L represents total loss, lambda ₁ 、λ ₂ 、λ ₃ And lambda (lambda) ₄ Weight parameters corresponding to the first loss, the second loss, the third loss and the fourth loss are respectively L _r L is the first loss _s For the second loss, L _m For the third loss, L _c And is the fourth loss.

According to the method provided by the embodiment of the invention, the parameters of the initial three-dimensional face reconstruction model are optimized according to the first loss, the second loss, the third loss and the fourth loss, whether the total loss is converged or not is judged, and if the total loss is converged, training is stopped, so that the parameters of the three-dimensional face reconstruction model are obtained; otherwise, repeating the method until a trained three-dimensional face reconstruction model is obtained, thereby improving the reconstruction accuracy of the three-dimensional face reconstruction model.

Based on the above embodiment, fig. 4 is a schematic flow chart of step 250 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the present invention, and as shown in fig. 4, step 250 includes:

step 251, determining a UV position diagram of the face based on the three-dimensional features;

and step 252, sequentially performing UV pairing and grid sampling on the UV position map of the face to obtain the three-dimensional reconstruction prediction result.

Specifically, the three-dimensional feature, i.e., the three-dimensional feature map, may be used as a UV position map of the face, and after the UV position map of the face is obtained, UV pairing (UV Coordinates) and Grid sampling (Grid sampling) may be sequentially performed on the UV position map of the face, so as to obtain a three-dimensional reconstruction prediction result.

The three-dimensional reconstruction prediction result is obtained through a neural network regression UV position diagram mode, and preparation is made for subsequent association learning and posture estimation.

Based on the above embodiment, fig. 5 is a schematic flow chart of step 260 in the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method provided by the present invention, and as shown in fig. 5, step 260 includes:

step 261, sampling face two-dimensional point coordinates from the position coordinates of the face in the second sample face image;

step 262, obtaining full-image two-dimensional point coordinates based on the mapping from the second sample face image to the first sample face image;

step 263, determining a two-dimensional local feature based on the two-dimensional point coordinates of the face, and determining a two-dimensional global feature based on the two-dimensional point coordinates of the full map;

step 264, determining a three-dimensional local feature based on the three-dimensional reconstruction prediction result, and obtaining a three-dimensional global feature based on the three-dimensional local feature and a multi-layer perceptron;

and step 265, based on a transducer model, fusing the two-dimensional local feature, the two-dimensional global feature, the three-dimensional local feature and the three-dimensional global feature to obtain an association matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the human face part.

Specifically, two-dimensional point coordinates of the face may be sampled from the position coordinates of the face in the second sample face image, where the two-dimensional point coordinates of the face reflect each two-dimensional position information of the face in the second sample face image.

The full-image two-dimensional point coordinates can be obtained based on the mapping from the second sample face image to the first sample face image, and the full-image two-dimensional point coordinates reflect the two-dimensional position information of the full-image plane in the first sample face image.

The two-dimensional local feature can be determined based on the two-dimensional point coordinates of the face, and the two-dimensional global feature can be determined based on the two-dimensional point coordinates of the whole image, wherein the two-dimensional local feature reflects local feature information of a two-dimensional layer, and the two-dimensional global feature reflects global feature information of the two-dimensional layer.

Then, based on the three-dimensional reconstruction prediction result, the three-dimensional local feature can be determined, and based on the three-dimensional local feature and the multi-layer perceptron, the three-dimensional global feature can be obtained.

The three-dimensional local feature here reflects the local feature information of the three-dimensional layer, and the three-dimensional global feature here reflects the global feature information of the three-dimensional layer.

Here, the three-dimensional local feature may be mapped to a three-dimensional global feature based on a Multi-Layer Perceptron (MLP).

After the two-dimensional local feature, the two-dimensional global feature, the three-dimensional local feature and the three-dimensional global feature are obtained, the two-dimensional local feature, the two-dimensional global feature, the three-dimensional local feature and the three-dimensional global feature can be fused based on a transducer model, so that an incidence matrix of a three-dimensional reconstruction prediction result and two-dimensional pixels of a human face part is obtained, and the incidence matrix has the following formula:

M＝T(f)

wherein T is a transducer model containing an attention mechanism, f is a two-dimensional local feature, a two-dimensional global feature, a fusion feature of a three-dimensional local feature and a three-dimensional global feature, and M is a learned association matrix.

It can be appreciated that the transducer model containing the self-attention mechanism can handle fusion features well, making the learned correlation matrix more accurate.

According to the scheme, the face image is input into the trained three-dimensional face reconstruction model, network forward calculation is carried out, global features and local features of the face are extracted, the shape priori of the three-dimensional face is combined, the three-dimensional reconstruction prediction result of the face is reconstructed, the corresponding relation between three-dimensional face points and two-dimensional pixels is learned, and therefore the six-degree-of-freedom pose of the face is calculated. The correlation matrix learning is optimized through the fusion feature of the learning and processing of the transducer model containing the attention mechanism, and the robustness of the face reconstruction is improved through the reconstruction method based on perspective projection, and meanwhile the accuracy of the face pose is improved.

According to the method provided by the embodiment of the invention, based on the transducer model, the two-dimensional local feature, the two-dimensional global feature, the three-dimensional local feature and the three-dimensional global feature are fused to obtain the three-dimensional reconstruction prediction result and the incidence matrix of the two-dimensional pixels of the face part, so that the accuracy of incidence matrix learning is improved, and the subsequent pose calculation is more accurate.

Based on the above embodiment, step 270 further includes:

and step 271, estimating the pose based on the two-dimensional pixels and the predicted three-dimensional point coordinates corresponding to the two-dimensional pixels to obtain the pose with six degrees of freedom of face prediction.

Specifically, pose estimation is performed based on each two-dimensional pixel and the predicted three-dimensional Point coordinates corresponding to each two-dimensional pixel to obtain a pose of six degrees of freedom of face prediction, where the pose estimation may be a PnP (Perselect-n-Point) algorithm, and the embodiment of the invention is not limited in particular.

Based on the above embodiment, step 230 includes:

Specifically, the second sample face image may be input into a face segmentation model, and the position coordinates corresponding to the face are obtained and output by the face segmentation model, for example, an encoder of the face segmentation model may be constructed by a res net model to extract the features of the face image, and then a decoder may be constructed by stacking deconvolution layers to decode the features of the image to obtain the segmented image, thereby determining the position coordinates corresponding to the face.

The three-dimensional face reconstruction and six-degree-of-freedom pose estimation device provided by the invention is described below, and the three-dimensional face reconstruction and six-degree-of-freedom pose estimation device described below and the three-dimensional face reconstruction and six-degree-of-freedom pose estimation method described above can be correspondingly referred to each other.

Based on any one of the above embodiments, the present invention provides a three-dimensional face reconstruction and six-degree-of-freedom pose estimation device, and fig. 6 is a schematic structural diagram of the three-dimensional face reconstruction and six-degree-of-freedom pose estimation device provided by the present invention, as shown in fig. 6, the device includes:

an acquiring unit 610, configured to acquire a face image to be reconstructed;

the three-dimensional reconstruction unit 620 is configured to input the face image to be reconstructed into a three-dimensional face reconstruction model, obtain and output a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of a face in the face image to be reconstructed from the three-dimensional face reconstruction model;

According to the device provided by the embodiment of the invention, the three-dimensional face reconstruction model is obtained by training based on the three-dimensional reconstruction prediction result of the sample face image and the label three-dimensional reconstruction result of the sample face image and constraining the six-degree-of-freedom pose of the sample face image based on the three-dimensional reconstruction prediction result and the incidence matrix of the two-dimensional pixels of the face part, so that the accuracy of the face pose is improved, and the accuracy and reliability of the face reconstruction are further improved.

Based on any one of the above embodiments, the training step of the three-dimensional face reconstruction model is specifically configured to:

the sample acquiring unit is used for acquiring a first sample face image, a second sample face image only containing a face, a true six-degree-of-freedom pose of the face, a label three-dimensional reconstruction result, a label association matrix, label three-dimensional coordinates corresponding to each point and an initial three-dimensional face reconstruction model;

the face position coordinate marking unit is used for projecting the initial three-dimensional face reconstruction model and the real six-degree-of-freedom pose of the face into an image to obtain a face position coordinate marking;

a position coordinate determining unit, configured to determine a position coordinate of a face in the second sample face image;

The extraction feature unit is used for extracting the global feature of the face image of the first sample face image, the three-dimensional feature and the local feature corresponding to the second sample face image;

a three-dimensional reconstruction prediction result determining unit, configured to determine a three-dimensional reconstruction prediction result of the second sample face image based on the three-dimensional feature corresponding to the second sample face image;

the incidence matrix determining unit is used for determining an incidence matrix of the three-dimensional reconstruction prediction result and the two-dimensional pixels of the face part based on the global features and the local features of the face image;

determining a predicted three-dimensional point coordinate unit, which is used for obtaining predicted three-dimensional point coordinates corresponding to each two-dimensional pixel based on the incidence matrix and the three-dimensional reconstruction prediction result;

and the parameter iteration unit is used for carrying out parameter iteration on the initial three-dimensional face reconstruction model based on the three-dimensional reconstruction prediction result and the label three-dimensional reconstruction result, the face position coordinate label and the position coordinate, the association matrix and the label association matrix, the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the label three-dimensional coordinates corresponding to each point, and the three-dimensional face reconstruction model is obtained.

Based on any of the above embodiments, the parameter iteration unit is specifically configured to:

Based on any of the above embodiments, a three-dimensional reconstruction prediction result unit is determined, specifically for:

Based on any of the above embodiments, the determining an incidence matrix unit is specifically configured to:

Based on any of the above embodiments, the determination of the predicted three-dimensional point coordinate unit is then specifically further configured to:

Based on any of the above embodiments, the determining a location coordinate unit is specifically configured to:

Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform three-dimensional face reconstruction and six-degree-of-freedom pose estimation methods, including: acquiring a face image to be reconstructed; inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed by the three-dimensional face reconstruction model; the three-dimensional face reconstruction model is obtained by training based on a three-dimensional reconstruction prediction result of a sample face image and a label three-dimensional reconstruction result of the sample face image, and restraining six-degree-of-freedom pose of the sample face image based on an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of a face part.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the three-dimensional face reconstruction and six-degree-of-freedom pose estimation methods provided by the above methods, and the method includes: acquiring a face image to be reconstructed; inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed by the three-dimensional face reconstruction model; the three-dimensional face reconstruction model is obtained by training based on a three-dimensional reconstruction prediction result of a sample face image and a label three-dimensional reconstruction result of the sample face image, and restraining six-degree-of-freedom pose of the sample face image based on an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of a face part.

In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the three-dimensional face reconstruction and six-degree-of-freedom pose estimation methods provided by the above methods, the method comprising: acquiring a face image to be reconstructed; inputting the face image to be reconstructed into a three-dimensional face reconstruction model, and obtaining and outputting a three-dimensional reconstruction result of the face image to be reconstructed and a six-degree-of-freedom pose of the face in the face image to be reconstructed by the three-dimensional face reconstruction model; the three-dimensional face reconstruction model is obtained by training based on a three-dimensional reconstruction prediction result of a sample face image and a label three-dimensional reconstruction result of the sample face image, and restraining six-degree-of-freedom pose of the sample face image based on an association matrix of the three-dimensional reconstruction prediction result and two-dimensional pixels of a face part.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A three-dimensional face reconstruction and six-degree-of-freedom pose estimation method is characterized by comprising the following steps:

acquiring a face image to be reconstructed;

2. The three-dimensional face reconstruction and six-degree-of-freedom pose estimation method according to claim 1, wherein the training step of the three-dimensional face reconstruction model comprises:

3. The three-dimensional face reconstruction and six-degree-of-freedom pose estimation method according to claim 2, wherein the performing parameter iteration on the initial three-dimensional face reconstruction model based on the three-dimensional reconstruction prediction result and the tag three-dimensional reconstruction result, the face position coordinate labeling and the position coordinate, the association matrix and the tag association matrix, the predicted three-dimensional point coordinates corresponding to each two-dimensional pixel and the tag three-dimensional coordinates corresponding to each point, and obtaining a three-dimensional face reconstruction model comprises:

4. The method for three-dimensional face reconstruction and pose estimation with six degrees of freedom according to claim 2, wherein the determining the three-dimensional reconstruction prediction result of the second sample face image based on the three-dimensional feature corresponding to the second sample face image comprises:

5. The method for three-dimensional face reconstruction and six-degree-of-freedom pose estimation according to claim 2, wherein said determining an association matrix of the three-dimensional reconstruction prediction result and the face partial two-dimensional pixels based on the global feature and the local feature of the face image comprises:

6. The method for reconstructing a three-dimensional face and estimating pose with six degrees of freedom according to claim 2, wherein the obtaining predicted three-dimensional point coordinates corresponding to each two-dimensional pixel based on the correlation matrix and the three-dimensional reconstruction prediction result further comprises:

7. The method for three-dimensional face reconstruction and six-degree-of-freedom pose estimation according to claim 2, wherein said determining the position coordinates of the face in the second sample face image comprises:

8. The three-dimensional face reconstruction and six-degree-of-freedom pose estimation device is characterized by comprising:

the acquisition unit is used for acquiring the face image to be reconstructed;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the three-dimensional face reconstruction and six degrees of freedom pose estimation method according to any of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the three-dimensional face reconstruction and six degrees of freedom pose estimation method according to any of claims 1 to 7.