CN114119697A

CN114119697A - PIC-based 3D face model face texture diversity completion method

Info

Publication number: CN114119697A
Application number: CN202111403229.6A
Authority: CN
Inventors: 王祥丰; 林佳; 金博; 朱骏
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-01

Abstract

A3D face model face texture diversity completion method based on PIL comprises the steps of constructing a high-definition face texture data set through a texture completion network and a face texture data set; and then estimating parameter distribution of a latent space of a mask area of the facial texture by adopting a Convolution Variation Automatic Encoder (CVAE), decomposing implicit information in the parameter distribution after sampling, training an improved PIC network model by using distribution regularization loss and appearance matching loss as generation loss, and finally generating complete facial texture by adopting the trained improved PIC network model to replace the 3D face model with the incomplete texture. The method can generate complete facial texture for the 3d face model reconstructed by the single image by using texture recovery and the face images with different angles.

Description

PIC-based 3D face model face texture diversity completion method

Technical Field

The invention relates to a technology in the field of Image processing, in particular to a diversified Completion method for facial textures of a 3D face model based on diversified Image Completion (PIC).

Background

The existing 3D face reconstruction algorithm based on deep learning can quickly generate a 3D model result of an input image, but still faces many problems in the training process. The texture definition and integrity of the reconstructed face are very critical problems, and the single-image 3D face reconstruction has a very obvious problem that a single image cannot show the complete face texture, particularly the side face figure image, and how to recover the missing side face partial texture as much as possible is very important.

The face parameterized model designed after the morphed model of the 3D face synthesis also contains texture parameters, so that given a picture of a person, the face texture information closest to the person can be obtained by fitting the texture parameters. However, this approach has high requirements on the face database used for the parameterized model, for example, the parameterized model using the white face database has relatively poor fitting effect on facial texture of asian people, and the facial texture fitted to non-skin color images has relatively serious distortion, for example, ancient images. UV-GAN [ DengJ, Cheng S, Xue N, et al UV-GAN: adaptive Facial UV Map Completion for position-innovative Face Recognition [ J ].2017 ] takes advantage of the powerful ability to generate an anti-network to convert Facial textures to images-images. The real pixels in the image are used to construct the facial texture that is closest to the given image. LAB [ Wu W, Qian C, Yang S, et al.Look at Boundary: A Boundary-Aware Face Alignment Algorithm [ C ]//2018IEEE/CVF Conference on Computer Vision and Pattern registration. IEEE,2018 ] an Attention-Net-GAN was designed by constructing a facial texture map by PRNet [ Zhu X, Zoen L, Liu X, et al.Face Alignment acquisition targets Poses: A3D Solution [ C ]//2016IEEE Conference on Computer Vision and Pattern Registration (CVPR). IEEE,2016 ], directly yielding a complete facial texture. The time for acquiring the incomplete facial texture by the UV-GAN is saved. The UV-GAN and LAB are mainly intended to solve occlusion caused by head pose of a person in an input image, and do not pay attention to the case where a face is occluded by other objects. IS Na et al [ Na I S, Tran C, Nguyen D, et al, facial UV map composition for position-innovative face recognition: a novel adaptive approach based on coordinated identification of residual UNets [ J ]. Human-central computing and Information Sciences,2020,10(1):1-17.] design a generative confrontation network specifically for this problem, and can input a person image with face occlusion and output a complete face image. The part of the training set is added with random masks such as sunglasses for eyes, glasses masks, mouth masks and the like based on the key points of the face of the image.

Disclosure of Invention

The invention provides a PIL-based 3D face model face texture diversity completion method, which aims at the defects that the texture definition and the integrity of a face reconstructed by the prior art are low, the single-image 3D face reconstruction cannot display complete face texture, the face texture details cannot be recovered by combining texture images through interpolation processing generally, and if the single-view face provides side images, the shielded face details cannot be recovered through interpolation processing and the like.

The invention is realized by the following technical scheme:

the invention relates to a 3D face model face texture diversity completion method based on PIC, which constructs a high-definition face texture data set through a texture completion network and a face texture data set; and then estimating parameter distribution of a latent space of a mask area of the facial texture by adopting a Convolution Variation Automatic Encoder (CVAE), decomposing implicit information in the parameter distribution after sampling, training an improved PIC network model by using distribution regularization loss and appearance matching loss as generation loss, and finally generating complete facial texture by adopting the trained improved PIC network model to replace the 3D face model with the incomplete texture.

The invention relates to a system for realizing the method, which comprises the following steps: incomplete texture generation unit, residual encoder, residual decoder and long and short attention unit, wherein: an incomplete texture generation unit generates an incomplete texture image according to the 3D face reconstruction model and the public face database; the residual encoder encodes the incomplete texture image to obtain implicit information, the implicit information is sampled, the residual decoder decodes the implicit information to obtain a complete texture image, and the long and short attention units combine the sampled implicit information according to the encoded information of the incomplete texture image to finally obtain diversified face texture completion.

Technical effects

The invention creates an algorithm for constructing a face texture data set, improves a face completion PIC frame, applies the face texture data set to face texture completion, generates a large number of high-definition side figure images by modifying semantic vectors of seerettyface style GAN, generates incomplete face textures by a 3D face reconstruction frame such as 3DDFA/PRNet, and obtains the high-definition face texture data set by symmetric completion. The invention is based on improving PIC and applying it to face texture complement, eliminating the reconstruction path of PIC, and changing the local appearance matching loss into global appearance matching loss.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic diagram of a network architecture according to an embodiment.

Detailed Description

As shown in fig. 1, the present embodiment relates to a method for completing facial texture diversity of a 3D face model based on PIC, which includes:

step one, constructing a high-definition facial texture data set, which specifically comprises the following steps:

a1. the high-definition facial texture data set adopts a high-definition face generation preprocessing model disclosed by SeePrettyFace based on StyleGAN, and a mapping relation from (18,512) dimensional vectors to (1024, 3) dimensional vectors is established; then, in the randomly generated pictures of the high-definition face generation preprocessing model, a person side photo can be found, the corresponding semantic vector is fixed in the vector (0:18,200:400) with 18,512 dimensions, and the rest positions are further randomly generated, so that a large number of person side high-definition photos can be rapidly acquired.

The high-definition face generation preprocessing model adopts a (18,512) dimensional vector to represent each face, and a slightly different new face model can be generated by slightly changing the vector.

The mapping relation is as follows: the high definition face generation pre-processing model establishes (18,512) dimensional vectors to (1024, 3) dimensional vectors, which have some attributes such as age, gender, expression, etc.

a2. Generating corresponding high-definition incomplete textures by utilizing a pre-trained network model based on the mobilene according to the high-definition face side data set obtained in the step a1, and obtaining a high-definition complete texture data set through mirror symmetry processing;

the architecture of the network model based on the Mobilenet is realized by adopting but not limited to 3DDFA ([ Zhu X, Zhen L, Liu X, et al.face Alignment Across Large Poses: A3D Solution [ C ]//2016IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2016 ]

a3. And adding false illumination and false shadow to each complete texture in the high-definition complete texture data set, and finally acquiring the data pair of the incomplete illumination face texture and the complete illumination face texture.

Step two, acquiring a potential vector z through a coding network E_cThe method specifically comprises the following steps:

b1. obtaining incomplete texture I using random texture mask_mAnd corresponding complete texture I_gA deletion portion I_c；

The coding network E is realized by adopting a coding network E based on PIC (Zheng C, Cham T J, J Cai. Pluralogic Image compression [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019.).

The random texture mask is: random stripe/block masks and face masks for the case of face images, i.e., front/left/right face texture masks.

b2. Obtaining parameter distribution of a latent space by adopting conditional variation self-encoder CVAE, and then sampling from the parameter distribution of the latent spacePotential vector z_c，z_cThe information including the deletion region specifically includes: calculating the variation lower limit of the conditional log likelihood function logp (Ic | Im) of the training example: logp (Ic | Im) ≥ KL (q)_ψ(z_c|I_c)||p_φ(z_c|I_m))+E_qψ(z_c|I_c)[logpθ(I_c|z_C，I_m)]Wherein: i is_gAs an original image, I_mTo be an observable moiety, I_cIs a deletion part. The KL divergence term represents the significant sampling function q it will learn_ψ(·|I_c) Regularization as fixed latent prior

Step three: obtaining, by a decoder, implicit information z encoded by an encoding network E_cAcquiring information of a missing area;

the deletion area refers to: due to missing part of the image I_cFor inferring the importance function q during training_ψ(·|I_c)＝N_ψ(. cndot.). Thus, the potential vector z of the sample_cContaining information of the missing region.

Step four: obtaining different generation results through the generation network G and the implicit information z obtained in the step two, specifically: when the important function q is selected_ψ(·|I_c) When sampling, information containing missing region can be obtained, and likelihood function p_θ(I_c|z_C，I_m) Emphasis on I_cAnd (4) reconstructing. In contrast, learning from the absence of I_cConditional a priori of (a)_φ(·|I_m) While sampling, let the likelihood function model and I_cIndependent of the original instance, creative generation may be facilitated.

The generation network G is realized by adopting a PIC-based generation network G (PIC [ ZHENG C, Cham T J, J Cai. Pluralogic Image compression [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2019 ]).

Step five: updating coded network parameters by back-propagation with minimized overhead lossCounting and generating network parameters until the loss converges, comprising: minimizing challenge loss

Wherein I_{g Shen n}To generate an image, I_gDiscriminating the network parameter theta for real images by back-propagation updating_DMinimizing distribution regularization loss

Wherein: conditional prior of learning

Also gaussian distributed, regularized to q_ψ(·|I_c) And minimizing appearance matching loss

Wherein: i is_{g Shen n}To generate an image, I_gIs a real image.

As shown in fig. 2, the system for implementing the method according to this embodiment includes: incomplete texture generation unit, residual encoder, residual decoder and long and short attention unit, wherein: an incomplete texture generation unit generates an incomplete texture image according to the 3D face reconstruction model and the public face database; the residual encoder encodes the incomplete texture image to obtain implicit information, the implicit information is sampled, the residual decoder decodes the implicit information to obtain a complete texture image, and the long and short attention units combine the sampled implicit information according to the encoded information of the incomplete texture image to finally obtain diversified face texture completion.

The incomplete texture generating unit comprises: the face image interpolation system comprises a 3D face model generating unit, an interpolation face texture generating unit and a incomplete face texture generating unit, wherein: the 3D face model generating unit constructs a 3D face model according to input face image information, the interpolation face texture generating unit performs image interpolation processing according to vertex information and face image information of the 3D face model to obtain an interpolated face texture result, and the incomplete face texture generating unit acquires face texture of a visible part according to one layer of depth information of an intercepted surface to obtain an incomplete face texture result.

Through specific practical experiments, under the specific environment setting of Tesla P40 GPU and Python PyTorch framework, lambda is 10^-4The fixed learning rate of (b) was trained ab initio using Adam optimization, with β 1 ═ 0 and β 2 ═ 0.999. The final weight used is α_KL＝α_app＝20，α_ad1. Parameters to run the above method, training time with random irregular and centre hole-free training models takes approximately 5 weeks. The item is 59ms over the average extrapolated time.

Compared with the prior art, the method can construct the facial texture data set at zero cost, eliminate the reconstruction path of the PIC by improving the PIC and applying the PIC to the facial texture completion, change the local appearance matching loss into the global appearance matching loss, and enable the texture completion to be more complete and the completed texture to be richer and clearer. Meanwhile, the method is the existing first 3D face model face texture diversity completion method.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A3D human face model face texture diversity completion method based on PIC is characterized in that a high-definition face texture data set is constructed through a texture completion network and a face texture data set; and then estimating parameter distribution of a latent space of a mask region of the facial texture by adopting a convolution variational automatic encoder, decomposing implicit information in the parameter distribution after sampling, training an improved PIC network model by using distribution regularization loss and appearance matching loss as generation loss, and finally generating complete facial texture by adopting the trained improved PIC network model to replace the 3D face model with incomplete texture.

2. The PIC-based face texture diversity completion method for the 3D face model according to claim 1, wherein the constructing of the high definition face texture dataset specifically comprises:

a1. the high-definition facial texture data set adopts a high-definition face generation preprocessing model based on StyleGAN, and a mapping relation from a (18,512) dimensional vector to a (1024,1024,3) dimensional vector is established; then finding a figure side photo in the randomly generated pictures of the high-definition face generation preprocessing model, fixing the corresponding semantic vectors in (18,512) -dimensional vectors (0:18,200:400), and further randomly generating the rest positions, thereby quickly obtaining a large number of figure side high-definition photos;

3. The PIC-based face texture diversity completion method for the 3D face model based on the PIC of claim 2, wherein the high definition face generation preprocessing model is represented by a (18,512) -dimensional vector for each face, and a slightly different new face pattern can be generated by only slightly changing the vector.

4. The PIC-based method for complementing facial texture diversity of 3D face model as claimed in claim 1, wherein the incomplete texture I is obtained by using a random texture mask based on the parameter distribution of the latent space in the facial texture mask region_mAnd corresponding complete texture I_gA deletion portion I_c(ii) a However, the device is not suitable for use in a kitchenThen, conditional variation self-encoder CVAE is adopted to obtain parameter distribution of latent space, and potential vector z is obtained by sampling from parameter distribution of latent space_c，z_cInformation containing a missing region; then obtaining the implicit information z of the coding network E code through the decoder_cInformation of the missing region is acquired.

5. The PIC-based face texture diversity completion method for the 3D face model based on the PIC of claim 4, wherein the random texture mask is: random stripe/block masks and face masks for the case of face images, i.e., front/left/right face texture masks.

6. The PIC-based 3D face model face texture diversity completion method of claim 1 or 4, wherein the sampling is: calculating the variation lower limit of the conditional log likelihood function logp (Ic | Im) of the training example: logp (Ic | Im) ≥ KL (q)_ψ(z_c|I_c)||p_φ(z_c|I_m))+E_qψ(z_c|I_c)[logpθ(I_c|z_C，I_m)]Wherein: i is_gAs an original image, I_mTo be an observable moiety, I_cIs a deletion moiety; the KL divergence term represents the significant sampling function q it will learn_ψ(·|I_c) Regularization as fixed latent prior

7. The PIC-based face texture diversity completion method for the 3D face model based on the PIC of claim 1, wherein the training of the improved PIC network model comprises: and obtaining different generation results through the generation network G and the implicit information z obtained in the step two, minimizing the countermeasure loss, and updating the coding network parameters and the generation network parameters through back propagation until the loss converges.

8. Root of herbaceous plantThe PIC-based 3D face model face texture diversity completion method of claim 7, wherein the minimization of the countermeasure loss

Wherein I_genTo generate an image, I_gDiscriminating the network parameter theta for real images by back-propagation updating_DMinimizing distribution regularization loss

Wherein: conditional prior of learning

Wherein: i is_genTo generate an image, I_gIs a real image.

9. A system for implementing the PIC-based 3D human face model face texture diversity completion method of any one of claims 1-8, comprising: incomplete texture generation unit, residual encoder, residual decoder and long and short attention unit, wherein: an incomplete texture generation unit generates an incomplete texture image according to the 3D face reconstruction model and the public face database; the method comprises the steps that a residual encoder encodes an incomplete texture image to obtain implicit information, the implicit information is sampled, a residual decoder decodes the implicit information to obtain a complete texture image, and a long and short attention unit combines the sampled implicit information according to the encoded information of the incomplete texture image to finally obtain diversified face texture completion;