CN111524216B

CN111524216B - Method and device for generating three-dimensional face data

Info

Publication number: CN111524216B
Application number: CN202010281603.9A
Authority: CN
Inventors: 希滕; 姜志超; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2023-06-27
Anticipated expiration: 2040-04-10
Also published as: CN111524216A

Abstract

The embodiment of the disclosure discloses a method and a device for generating three-dimensional face data, and relates to the field of computer vision. The method comprises the following steps: respectively inputting the random noise data into a shape generation network and a texture generation network to obtain a three-dimensional vertex position map and a three-dimensional texture map; generating three-dimensional face data based on the three-dimensional vertex position mapping, the three-dimensional texture mapping and preset camera gesture parameters; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not. The method realizes the generation of high-precision three-dimensional face data.

Description

Method and device for generating three-dimensional face data

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for generating three-dimensional face data.

Background

The three-dimensional data of the human face has wide application in the fields of human face video processing technology, living body detection and identification, medical cosmetology and the like. However, in a practical scenario, high-precision three-dimensional data of a face is very rare, especially dense key point data of the face.

The current method comprises the steps of fitting a nonlinear 3DMM (3D Morphable Models, three-dimensional deformable model) on a two-dimensional face through two-dimensional face image data to generate a contour curved surface, and further generating three-dimensional key points of the face. However, the three-dimensional key points of the human face generated by the method have low precision and cannot be applied to actual scenes.

Disclosure of Invention

Embodiments of the present disclosure propose a method and apparatus, an electronic device, and a computer readable medium for generating three-dimensional face data.

In a first aspect, embodiments of the present disclosure provide a method of generating three-dimensional face data, including: respectively inputting the random noise data into a shape generation network and a texture generation network to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model; generating three-dimensional face data based on the three-dimensional vertex position mapping, the three-dimensional texture mapping and preset camera gesture parameters; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not.

In some embodiments, the three-dimensional face data includes three-dimensional face keypoint data.

In some embodiments, the above method further comprises: and training based on the three-dimensional sample face data to obtain a shape generation network and a texture generation network.

In some embodiments, the training to obtain the shape generating network and the texture generating network based on the three-dimensional sample face data includes: inputting the random sample noise signals into a shape generating network to be trained and a texture generating network to be trained, and extracting vertex position maps of three-dimensional sample noise and texture maps of the three-dimensional sample noise corresponding to the random sample noise signals; generating a prediction result of the three-dimensional sample face based on the vertex position map of the three-dimensional sample noise, the texture map of the three-dimensional sample noise and camera gesture parameters corresponding to the three-dimensional sample face data; and judging the predicted result of the three-dimensional sample face and the three-dimensional sample face data by utilizing the to-be-trained discriminator, and iteratively adjusting parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator according to the judging result.

In some embodiments, the three-dimensional sample face data includes three-dimensional sample face key point scan data, and the three-dimensional sample face prediction result includes three-dimensional sample face key point prediction data; the above-mentioned utilizing the discriminator to be trained judges the prediction result and three-dimensional sample face data of three-dimensional sample face, including: respectively carrying out authenticity judgment on the key point scanning data of the three-dimensional sample face and the key point prediction data of the three-dimensional sample face by utilizing a to-be-trained discriminator; determining a discrimination error of a discriminator; and iteratively adjusting parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained according to the discrimination result, including: based on the discrimination errors, parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained are adjusted through back propagation.

In some embodiments, the three-dimensional sample face data includes key point scan data of the three-dimensional sample face and a two-dimensional image corresponding to the three-dimensional sample face, and the training based on the three-dimensional sample face data obtains a shape generating network and a texture generating network, including: inputting a two-dimensional image corresponding to the three-dimensional sample face into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping and a texture mapping corresponding to the three-dimensional sample face; generating a key point prediction result of the three-dimensional sample face based on the vertex position mapping and the texture mapping corresponding to the three-dimensional sample face and camera gesture parameters corresponding to the three-dimensional sample face data; judging the authenticity of the three-dimensional sample face key point prediction result by utilizing the to-be-trained discriminator, determining the discrimination error of the to-be-trained discriminator based on the difference between the three-dimensional sample face key point prediction result and the three-dimensional sample face key point scanning data, and iteratively adjusting the parameters of the to-be-trained shape generation network, the to-be-trained texture generation network and the to-be-trained discriminator based on the discrimination error.

In a second aspect, embodiments of the present disclosure provide an apparatus for generating three-dimensional face data, including: the first generation unit is configured to input random noise data into a shape generation network and a texture generation network respectively to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model; the second generation unit is configured to generate three-dimensional face data based on the three-dimensional vertex position mapping, the three-dimensional texture mapping and preset camera gesture parameters; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not.

In some embodiments, the apparatus further comprises: and the training unit is configured to train to obtain a shape generation network and a texture generation network based on the three-dimensional sample face data.

In some embodiments, the training unit is configured to train to derive the shape generation network and the texture generation network as follows: inputting the random sample noise signals into a shape generating network to be trained and a texture generating network to be trained, and extracting vertex position maps of three-dimensional sample noise and texture maps of the three-dimensional sample noise corresponding to the random sample noise signals; generating a prediction result of the three-dimensional sample face based on the vertex position map of the three-dimensional sample noise, the texture map of the three-dimensional sample noise and camera gesture parameters corresponding to the three-dimensional sample face data; and judging the predicted result of the three-dimensional sample face and the three-dimensional sample face data by utilizing the to-be-trained discriminator, and iteratively adjusting parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator according to the judging result.

In some embodiments, the three-dimensional sample face data includes three-dimensional sample face key point scan data, and the three-dimensional sample face prediction result includes three-dimensional sample face key point prediction data; the training unit is configured to use a to-be-trained discriminator to discriminate the three-dimensional sample face prediction result and the three-dimensional sample face data according to the following mode: respectively judging key point scanning data of the three-dimensional sample face and key point prediction data of the three-dimensional sample face by utilizing a to-be-trained discriminator to judge authenticity; determining a discrimination error of a discriminator; and the training unit is configured to: based on the discrimination errors, parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained are adjusted through back propagation.

In some embodiments, the three-dimensional sample face data includes key point scan data of the three-dimensional sample face and a two-dimensional image corresponding to the three-dimensional sample face, and the training unit is configured to train to obtain the shape generating network and the texture generating network as follows: inputting a two-dimensional image corresponding to the three-dimensional sample face into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping and a texture mapping corresponding to the three-dimensional sample face; generating a key point prediction result of the three-dimensional sample face based on the vertex position mapping and the texture mapping corresponding to the three-dimensional sample face and camera gesture parameters corresponding to the three-dimensional sample face data; judging the authenticity of the three-dimensional sample face key point prediction result by utilizing the to-be-trained discriminator, determining the discrimination error of the to-be-trained discriminator based on the difference between the three-dimensional sample face key point prediction result and the three-dimensional sample face key point scanning data, and iteratively adjusting the parameters of the to-be-trained shape generation network, the to-be-trained texture generation network and the to-be-trained discriminator based on the discrimination error.

In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of generating three-dimensional face data as provided in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method of generating three-dimensional face data provided in the first aspect.

According to the method and the device for generating the three-dimensional face data, the random noise data are respectively input into the shape generating network and the texture generating network to obtain the three-dimensional vertex position mapping and the three-dimensional texture mapping of the three-dimensional face model, the three-dimensional face data are generated based on the three-dimensional vertex position mapping and the three-dimensional texture mapping and the camera gesture parameters acquired in advance, the shape generating network and the texture generating network are obtained based on training of the generating countermeasure network, the shape generating network and the texture generating network comprise the trained generator in the generating type countermeasure network, the discriminator in the generating type countermeasure network is used for discriminating whether the face represented by the three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face, high-precision three-dimensional face data generation is achieved, and abundant and high-quality training data can be provided for scenes depending on the three-dimensional face data effectively.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method of generating three-dimensional face data according to the present disclosure;

FIG. 3 is a schematic diagram of one implementation principle of a flow of the method of generating three-dimensional face data of the present disclosure;

FIG. 4 is a flow diagram of a training method of a shape generation network and a texture generation network;

FIG. 5 is another flow diagram of a training method of a shape generation network and a texture generation network;

FIG. 6 is a schematic structural diagram of one embodiment of an apparatus of the present disclosure for generating three-dimensional face data;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which the method of generating three-dimensional face data or the apparatus of generating three-dimensional face data of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The

terminal devices

101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be user side devices on which various applications may be installed. Such as image/video processing class applications, payment applications, social platform class applications, and so forth. The user 110 may upload face images using the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablet computers, electronic book readers, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server running various services, such as a server providing background support for applications running on the

terminal devices

101, 102, 103. The server 105 may receive the face image or the face video sent by the

terminal device

101, 102, 103, perform three-dimensional processing on the face image or the face video, and feed back the processing result to the

terminal device

101, 102, 103, for example, when it is determined that the living body of the object to be identified passes through after three-dimensional reconstruction based on the face, send data of corresponding user rights to the

terminal device

101, 102, 103.

The server 105 may also receive the image or video data uploaded by the

terminal devices

101, 102, 103 to construct a sample set corresponding to the deep learning model of various application scenarios in the face image or video processing technology.

It should be noted that, the method for generating three-dimensional face data provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the device for generating three-dimensional face data is generally disposed in the server 105.

In some scenarios, server 105 may obtain source data (e.g., noise figures) from a database, memory, or other device, at which point exemplary system architecture 100 may be absent from

terminal devices

101, 102, 103, and network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of generating three-dimensional face data according to the present disclosure is shown. The method for generating the three-dimensional face data comprises the following steps:

step 201, inputting random noise data into a shape generating network and a texture generating network respectively to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model.

In this embodiment, an execution subject of the method of generating three-dimensional face data (such as a server shown in fig. 1) may acquire random noise data generated in advance from a storage device, or generate random noise data by uniformly sampling or gaussian distribution sampling by a signal generator. And respectively inputting the acquired random noise data into a pre-trained shape generation network and a texture generation network. The shape generation network may generate a three-dimensional vertex position map (i.e., UV position map) of the three-dimensional face model, and the texture generation network may generate a three-dimensional texture map (UV texture map) of the three-dimensional face model.

The UV coordinate system is a mapping coordinate system of the three-dimensional model, the UV position chart indicates the position characteristics of vertexes in the three-dimensional model, the gray value of three channels of each pixel indicates the three-dimensional coordinate of one three-dimensional key point in the three-dimensional face, and the UV texture chart indicates the surface texture characteristics of the three-dimensional model.

In the present embodiment, the shape generation network and the texture generation network described above are derived based on the generation of the countermeasure network training. The shape generating network and the texture generating network comprise generators in a trained generating type countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained based on three-dimensional vertex position mapping and three-dimensional texture mapping restoration generated by the shape generating network and the texture generating network is a real face or not.

When the generating countermeasure network is trained, random noise can be input into the shape generating network to be trained and the texture generating network to be trained, the shape generating network to be trained and the texture generating network to be trained are used as generators in the generating countermeasure network, a three-dimensional vertex position map and a three-dimensional texture map are respectively generated, a three-dimensional face model is reconstructed based on the generated three-dimensional vertex position map and the three-dimensional texture map, and key point data of the three-dimensional face are obtained. The discriminator may discriminate whether the generated three-dimensional face model is a three-dimensional face (i.e., a real face) or a virtual face (i.e., a non-real face) constructed by performing three-dimensional key point detection on the face. The training target of the discriminator is to accurately discriminate real and virtual faces, and the training target of the generator is to generate a three-dimensional vertex position map and a three-dimensional texture map corresponding to three-dimensional face data which make the discriminator difficult to accurately discriminate. In the training process, parameters of the generator and the discriminator are adjusted through iteration, so that the authenticity discrimination probability of the discriminator on the three-dimensional face constructed based on the three-dimensional vertex position mapping and the three-dimensional texture mapping generated by the generator is close to 0.5.

Step 202, generating three-dimensional face data based on the three-dimensional vertex position map, the three-dimensional texture map and preset camera pose parameters.

The camera pose parameters are parameters characterizing the pose of a camera that acquires a three-dimensional face, including rotation parameters including pitch angle, yaw angle, and roll angle, and translation parameters. In this embodiment, preset camera pose parameters may be acquired to construct a three-dimensional face model.

In some optional implementation manners of this embodiment, a plurality of camera poses may be preset, and the three-dimensional vertex position map and the three-dimensional texture map obtained in step 201 are respectively combined with pose parameters of various camera poses to generate three-dimensional face data of different poses.

Specifically, the three-dimensional vertex position map and the three-dimensional texture map obtained in step 202 and the obtained camera pose parameters may be input into a trained three-dimensional face reconstruction model, and three-dimensional face data may be reconstructed. The trained three-dimensional face reconstruction model can be obtained based on real three-dimensional face data and corresponding camera gesture labeling data.

Or the three-dimensional vertex position mapping and the three-dimensional texture mapping can be converted into a three-dimensional coordinate system to obtain three-dimensional face data of the standard gesture, and then the three-dimensional face data of the standard gesture is converted according to the acquired camera gesture parameters to obtain the three-dimensional face data corresponding to each camera gesture.

Here, the three-dimensional face data may be a three-dimensional face model, may include three-dimensional face key point data, and may be dense three-dimensional face key point data or sparse three-dimensional face key point data. The sparse three-dimensional face key point data may be three-dimensional point data extracted from dense three-dimensional face key point data, which characterizes key parts of the face (such as five sense organs, forehead, cheek, chin, etc.).

Fig. 3 shows a schematic diagram of one implementation principle of a method flow of generating three-dimensional face data according to the present disclosure. As shown in fig. 3, random noise Ps and Pt are input to a shape generation network and a texture generation network, respectively, to generate a UV position map and a UV texture map, respectively. After joining the camera parameters Pc and Pl, three-dimensional face data shown in fig. 3 can be generated.

According to the method for generating three-dimensional face data, random noise data are respectively input into a shape generation network and a texture generation network to obtain a three-dimensional vertex position map and a three-dimensional texture map of a three-dimensional face model, and then the three-dimensional face data are generated based on the three-dimensional vertex position map and the three-dimensional texture map and camera gesture parameters acquired in advance; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not, so that high-precision three-dimensional face data are generated. The method can quickly generate various high-quality three-dimensional face data, and can efficiently provide rich and high-quality training data for scenes depending on the three-dimensional face data.

In some embodiments, the method further comprises the step of training a texture generation network of the shape generation network based on the three-dimensional sample face data. The three-dimensional sample face data may include three-dimensional key point data of a face obtained by three-dimensional scanning of a real face. The three-dimensional sample face data can be mapped to a two-dimensional image plane, a three-dimensional vertex position map and a three-dimensional texture map of the two-dimensional data obtained by mapping are extracted by adopting a shape generating network to be trained and a texture generating network to be trained, and the three-dimensional vertex position map and the three-dimensional texture map extracted by the shape generating network to be trained and the texture generating network to be trained are based on camera parameters corresponding to the three-dimensional sample face data obtained in advance, so that a three-dimensional face is reconstructed; and judging whether the reconstructed three-dimensional face and the three-dimensional key point data of the face obtained by carrying out three-dimensional scanning on the real face are real faces or not by utilizing a discriminator. And constructing a loss function based on the discrimination error of the discriminator, and adjusting parameters of the shape generating network to be trained and the texture generating network to be trained through back propagation iteration.

In some embodiments, the shape generation network and texture generation network described above may be trained in accordance with the flow shown in FIG. 4. As shown in fig. 4, a flow 400 of a training method of a shape generation network and a texture generation network may include:

Step 401, inputting the random sample noise signal into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping of the three-dimensional sample noise and a texture mapping of the three-dimensional sample noise corresponding to the random sample noise signal.

The shape generation network to be trained and the texture generation network to be trained can be constructed based on the structural units of the neural network such as the convolution layer, the pooling layer and the like and used as generators in the generation countermeasure network. After random sample noise is input into a shape generating network to be trained and a texture generating network to be trained, two-dimensional images generated by the shape generating network to be trained and the texture generating network to be trained are used as vertex position mapping (UV position mapping) and texture mapping (UV texture mapping) of three-dimensional sample noise.

Step 402, generating a prediction result of the three-dimensional sample face based on the vertex position map of the three-dimensional sample noise, the texture map of the three-dimensional sample noise, and the camera pose parameters corresponding to the three-dimensional sample face data.

The three-dimensional sample face data may include camera pose parameters corresponding to the sample face data, where the camera pose parameters may be obtained by detecting a face pose in the three-dimensional sample face data. And reconstructing the three-dimensional sample face by using the camera attitude parameters and the vertex position mapping and the texture mapping generated based on the random noise signals to obtain a prediction result of the three-dimensional sample face.

And step 403, judging the three-dimensional sample face prediction result and the three-dimensional sample face data by utilizing the to-be-trained discriminator, and iteratively adjusting parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator according to the judgment result.

The three-dimensional sample face prediction result reconstructed based on the UV position map and the UV texture map generated by the shape generation network to be trained and the texture generation network to be trained can be used as non-real three-dimensional face data, and the three-dimensional sample face data is real three-dimensional face data. In this embodiment, a discriminator may be constructed based on the classification model to discriminate whether the input three-dimensional face data is real face data. The arbiter may calculate the probability that the input three-dimensional face data is real face data. If the authenticity probability value corresponding to the three-dimensional sample face data is far greater than the authenticity probability value corresponding to the prediction result of the three-dimensional sample face data, parameters of a shape generating network to be trained, a texture generating network to be trained and the discriminator to be trained can be adjusted, and vertex position mapping and texture mapping are respectively generated by utilizing the networks after the parameters are adjusted, so that the authenticity of the reconstructed three-dimensional sample face is judged. And repeatedly executing the steps until the difference between the authenticity probability value corresponding to the three-dimensional sample face data output by the discriminator and the authenticity probability value corresponding to the prediction result of the three-dimensional sample face data is reduced to a certain range, and when the two authenticity probability values are both close to 0.5, stopping training, and fixing parameters of the shape generating network to be trained and the texture generating network to be trained to obtain the trained shape generating network and texture generating network.

Further, the three-dimensional sample face data may include key point scan data of the three-dimensional sample face, and the prediction result of the three-dimensional sample face may include key point prediction data of the three-dimensional sample face. The key point scanning data of the three-dimensional sample face can be key point data obtained by scanning the three-dimensional face or a three-dimensional face model, and each key point scanning data comprises three-dimensional space coordinates of a key point. In the above-mentioned training method flow 400, the to-be-trained discriminant device may be used to respectively discriminate the key point scan data of the three-dimensional sample face and the key point prediction data of the three-dimensional sample face to perform the authenticity discrimination, and determine the discrimination error of the discriminant device; and adjusting parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminant to be trained through back propagation based on the discriminant error.

Here, the discrimination errors may include a first discrimination error for real face data and a second discrimination error for non-real face data. The first discrimination error characterization judges the key point scanning data of the three-dimensional sample face as the confidence coefficient of the unreal face data; and the second discrimination error characterization judges the key point predicted data of the three-dimensional sample face as the confidence coefficient of the real face data. The first discrimination error and the second discrimination error may be directly added as a loss function, or a weighted sum of the first discrimination error and the second discrimination error may be calculated as a loss function. Based on the loss function, the parameters of the generator to be trained (including the shape generating network to be trained and the texture generating network to be trained) and the arbiter to be trained are alternately updated in iterative operations by back propagation.

The training process 400 uses the random noise signal as a sample, so that the shape generating network to be trained and the texture generating network to be trained can learn how to accurately convert the noise signal into the position map and the texture map of the three-dimensional face data, and the shape generating network and the texture generating network obtained by training are ensured to be suitable for three-dimensional face data construction based on the random noise data.

With continued reference to FIG. 5, a flow chart of another exemplary training method of the shape generation network and texture generation network is shown. As shown in fig. 5, a flow 500 of a training method of a shape generation network and a texture generation network may include:

step 501, inputting a two-dimensional image corresponding to a three-dimensional sample face into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping and a texture mapping corresponding to the three-dimensional sample face.

In this embodiment, the three-dimensional sample face data may include key point scan data of the three-dimensional sample face and a two-dimensional image corresponding to the three-dimensional sample face. The shape generating network to be trained and the texture generating network to be trained can be constructed, and after the two-dimensional images are respectively input into the shape generating network to be trained and the texture generating network to be trained, vertex position maps (UV position maps) and texture maps (UV texture maps) corresponding to the three-dimensional sample face data are obtained.

Step 502, generating a key point prediction result of the three-dimensional sample face based on the vertex position mapping and the texture mapping corresponding to the three-dimensional sample face and the camera gesture parameters corresponding to the three-dimensional sample face data.

The camera pose parameters corresponding to the three-dimensional sample face data can be calculated based on coordinates of the matched key points in the three-dimensional sample face data and the corresponding two-dimensional image, or the corresponding camera pose parameters can be obtained when the three-dimensional sample face data is constructed.

In this embodiment, three-dimensional face reconstruction may be performed by using the camera pose parameter and the vertex position map and the texture map generated based on the two-dimensional image, so as to obtain a key point prediction result of the three-dimensional sample face.

Step 503, judging the reality of the key point prediction result of the three-dimensional sample face by using the to-be-trained discriminator, determining the discrimination error of the to-be-trained discriminator based on the difference between the prediction result of the three-dimensional sample face key point and the key point scanning data of the three-dimensional sample face, and iteratively adjusting the parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator based on the discrimination error.

In this embodiment, a discriminator may be constructed, and the authenticity of the key point prediction result of the three-dimensional sample face is discriminated by using the discriminator, so as to obtain an authenticity discrimination result of the key point prediction result of the three-dimensional sample face, where the authenticity discrimination result may include a confidence that the key point prediction result of the three-dimensional sample face is an actual three-dimensional face key point.

In this embodiment, the difference between the predicted result of the three-dimensional sample face key point and the key point scan data of the three-dimensional sample face may be calculated, and the discrimination error of the discriminator may be determined according to the difference. Specifically, the smaller the difference between the predicted result of the three-dimensional sample face key point and the key point scanning data of the three-dimensional sample face is, the closer the predicted result of the three-dimensional sample face key point is to the real three-dimensional face data, the larger the expected probability that the predicted result of the three-dimensional sample face key point is judged to be the real three-dimensional face data is; on the contrary, the larger the difference between the predicted result of the three-dimensional sample face key point and the key point scanning data of the three-dimensional sample face is, the smaller the expected probability that the predicted result of the three-dimensional sample face key point is judged to be the real three-dimensional face data is. The discrimination error of the discriminator to be trained can be determined according to the consistency between the discrimination result of the discriminator to be trained and the expected probability that the prediction result of the three-dimensional sample face key point is discriminated to be real three-dimensional face data. And then, according to the discrimination errors, adopting a counter-propagation method to iteratively adjust the parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained for a plurality of times.

According to the flow 500 of the training method of the shape generating network and the texture generating network, the two-dimensional face image is used as the input of the shape generating network and the texture generating network to learn the generating method of the UV position diagram and the UV texture diagram, so that the randomness of input data is reduced, and the training speed of the shape generating network and the texture generating network can be increased. And the three-dimensional sample face data is used as the real three-dimensional face data to calculate the discrimination error of the discriminator, so that the training speed of the discriminator is accelerated, the training efficiency of generating an countermeasure network is improved, and the method is beneficial to reducing the calculation resources consumed in the training process.

Referring to fig. 6, as an implementation of the above method for generating three-dimensional face data, the present disclosure provides an embodiment of an apparatus for generating three-dimensional face data, where an embodiment of the apparatus corresponds to the embodiment of the above method, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for generating three-dimensional face data of the present embodiment includes a first generating unit 601 and a second generating unit 602. The first generating unit 601 is configured to input random noise data into a shape generating network and a texture generating network respectively, so as to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model; the second generating unit 603 is configured to generate three-dimensional face data based on the three-dimensional vertex position map and the three-dimensional texture map and preset camera pose parameters; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not.

In some embodiments, the apparatus 600 further comprises: and the training unit is configured to train to obtain a shape generation network and a texture generation network based on the three-dimensional sample face data.

In some embodiments, the three-dimensional sample face data includes three-dimensional sample face key point scan data, and the three-dimensional sample face prediction result includes three-dimensional sample face key point prediction data; the training unit is configured to use a to-be-trained discriminator to discriminate the three-dimensional sample face prediction result and the three-dimensional sample face data according to the following mode: respectively judging key point scanning data of the three-dimensional sample face and key point prediction data of the three-dimensional sample face by utilizing a to-be-trained discriminator to judge authenticity; determining a discrimination error of a discriminator; and the training unit is configured to adjust parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminant to be trained by back propagation based on the discriminant error.

The units in the above-described device 600 correspond to the steps in the method described with reference to fig. 2 to 5. Thus, the operations, features and technical effects that can be achieved by the method for generating three-dimensional face data described above are equally applicable to the apparatus 600 and the units contained therein, and are not described herein again.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the server shown in fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 7 is only one example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, a hard disk; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 7 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 701. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: respectively inputting the random noise data into a shape generation network and a texture generation network to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model; generating three-dimensional face data based on the three-dimensional vertex position mapping, the three-dimensional texture mapping and preset camera gesture parameters; the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise generators in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of the three-dimensional vertex position mapping and the three-dimensional texture mapping generated based on the shape generating network and the texture generating network is a real face or not.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first generation unit and a second generation unit. The names of these units do not limit the unit itself in some cases, for example, the first generation unit may also be described as "a unit that inputs random noise data into a shape generation network and a texture generation network, respectively, to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which features described above or their equivalents may be combined in any way without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A method of generating three-dimensional face data, comprising:

respectively inputting the random noise data into a shape generation network and a texture generation network to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model;

generating three-dimensional face data based on the three-dimensional vertex position map, the three-dimensional texture map and preset camera gesture parameters;

the shape generating network and the texture generating network are obtained based on the training of the generating countermeasure network, the shape generating network and the texture generating network comprise a generator in the trained generating countermeasure network, and a discriminator in the generating countermeasure network is used for discriminating whether a face represented by three-dimensional face data obtained through restoration of a three-dimensional vertex position map and a three-dimensional texture map generated based on the shape generating network and the texture generating network is a real face or not;

The shape generating network and the texture generating network are obtained based on three-dimensional sample face data training;

the shape generating network and the texture generating network are obtained based on the three-dimensional sample face data in a training mode, and the training steps comprise: inputting a random sample noise signal into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping of three-dimensional sample noise and a texture mapping of the three-dimensional sample noise corresponding to the random sample noise signal; generating a prediction result of the three-dimensional sample face based on the vertex position mapping of the three-dimensional sample noise, the texture mapping of the three-dimensional sample noise and camera gesture parameters corresponding to the three-dimensional sample face data; and judging the prediction result of the three-dimensional sample face and the three-dimensional sample face data by utilizing a to-be-trained discriminator, and iteratively adjusting parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator according to the judgment result.

2. The method of claim 1, wherein the three-dimensional face data comprises three-dimensional face keypoint data.

3. The method of claim 1, wherein the three-dimensional sample face data comprises keypoint scan data of a three-dimensional sample face, and the prediction result of the three-dimensional sample face comprises keypoint prediction data of the three-dimensional sample face;

the method for judging the three-dimensional sample face prediction result and the three-dimensional sample face data by utilizing the to-be-trained judging device comprises the following steps:

respectively judging the authenticity of the key point scanning data of the three-dimensional sample face and the key point prediction data of the three-dimensional sample face by using the to-be-trained discriminator;

determining a discrimination error of the discriminator; and

the iterative adjustment of the parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained according to the discrimination result comprises:

based on the discrimination errors, parameters of the shape generating network to be trained, the texture generating network to be trained and the discriminator to be trained are adjusted through back propagation.

4. A method of generating three-dimensional face data, comprising:

the three-dimensional sample face data comprises key point scanning data of a three-dimensional sample face and two-dimensional images corresponding to the three-dimensional sample face, the shape generating network and the texture generating network are obtained based on the three-dimensional sample face data in a training mode, and the training step comprises the following steps:

inputting a two-dimensional image corresponding to the three-dimensional sample face into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping and a texture mapping corresponding to the three-dimensional sample face;

Generating a key point prediction result of the three-dimensional sample face based on the vertex position mapping and the texture mapping corresponding to the three-dimensional sample face and camera gesture parameters corresponding to the three-dimensional sample face data;

judging the authenticity of the three-dimensional sample face key point prediction result by utilizing a to-be-trained discriminator, determining a discrimination error of the to-be-trained discriminator based on the difference between the three-dimensional sample face key point prediction result and the three-dimensional sample face key point scanning data, and iteratively adjusting the parameters of the to-be-trained shape generation network, the to-be-trained texture generation network and the to-be-trained discriminator based on the discrimination error.

5. An apparatus for generating three-dimensional face data, comprising:

the first generation unit is configured to input random noise data into a shape generation network and a texture generation network respectively to obtain a three-dimensional vertex position map and a three-dimensional texture map of the three-dimensional face model;

the second generation unit is configured to generate three-dimensional face data based on the three-dimensional vertex position mapping, the three-dimensional texture mapping and preset camera gesture parameters;

Wherein the shape generation network and the texture generation network are trained by a training unit based on three-dimensional sample face data, the training unit being configured to train to derive the shape generation network and the texture generation network as follows: inputting a random sample noise signal into a shape generating network to be trained and a texture generating network to be trained, and extracting a vertex position mapping of three-dimensional sample noise and a texture mapping of the three-dimensional sample noise corresponding to the random sample noise signal; generating a prediction result of the three-dimensional sample face based on the vertex position mapping of the three-dimensional sample noise, the texture mapping of the three-dimensional sample noise and camera gesture parameters corresponding to the three-dimensional sample face data; and judging the prediction result of the three-dimensional sample face and the three-dimensional sample face data by utilizing a to-be-trained discriminator, and iteratively adjusting parameters of the to-be-trained shape generating network, the to-be-trained texture generating network and the to-be-trained discriminator according to the judgment result.

6. The apparatus of claim 5, wherein the three-dimensional face data comprises three-dimensional face keypoint data.

7. The apparatus of claim 5, wherein the three-dimensional sample face data comprises keypoint scan data of a three-dimensional sample face, and the prediction result of the three-dimensional sample face comprises keypoint prediction data of the three-dimensional sample face;

the training unit is configured to judge the predicted result of the three-dimensional sample face and the three-dimensional sample face data by using a to-be-trained discriminator in the following manner:

determining a discrimination error of the discriminator; and

the training unit is configured to:

8. An apparatus for generating three-dimensional face data, comprising:

the shape generating network and the texture generating network are obtained by training a training unit based on three-dimensional sample face data;

the three-dimensional sample face data comprise key point scanning data of a three-dimensional sample face and two-dimensional images corresponding to the three-dimensional sample face, and the training unit is configured to train to obtain the shape generating network and the texture generating network according to the following modes:

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.