CN108171770B

CN108171770B - Facial expression editing method based on generative confrontation network

Info

Publication number: CN108171770B
Application number: CN201810048098.6A
Authority: CN
Inventors: 张刚; 韩琥; 张�杰; 山世光; 陈熙霖
Original assignee: Seetatech Beijing Technology Co ltd
Current assignee: Seetatech Beijing Technology Co ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2021-04-06
Anticipated expiration: 2038-01-18
Also published as: CN108171770A

Abstract

The invention discloses a facial expression editing method based on a generative confrontation network, which comprises the following steps of: in the data preparation stage, the face image is manually marked and cut; a model design stage, wherein a generator and a discriminator generate a model; in the model training stage, a real face picture with labels and a picture generated by a generator are input into a discriminator, training is carried out, and the discriminator is used for distinguishing the distribution of a real sample and a generated sample, and learning the distribution of face expression and the distribution of face identity information; inputting the face picture to be edited and the expression control vector into a generator, and outputting the face picture controlled by the expression control vector; then, the trained discriminator is actually trained; repeating the steps to complete the construction of the model; the input image is used for testing the constructed model. The invention can ensure that the generator generates the face picture which is closer to the real face picture distribution, better keeps the face identity information and more effectively edits the face picture by expression.

Description

Facial expression editing method based on generative confrontation network

Technical Field

The invention relates to an editing method, in particular to a facial expression editing method based on a generative confrontation network, and belongs to the technical field of computer vision.

Background

The facial expression editing requires that the facial expression in the photo is controlled while the face identity information is kept, and the technology has wide application in the fields of facial animation, social software, face recognition data set augmentation and the like. The current facial expression editing method is based on a three-dimensional face deformable model, and the representative methods are as follows: a single-camera and motion capture data-based facial expression editing method with patent number 201310451508.9, which comprises the following main technical means: generating a three-dimensional face model of a user by using a picture of the user, decoupling the three-dimensional face model, and separating out identity and expression; and then, by controlling the facial expression components, a new facial three-dimensional model is synthesized, and facial expression editing is realized. The problems and disadvantages of this method are: the method is only suitable for editing the expression of the three-dimensional face model and is not suitable for editing the expression of the two-dimensional face picture. Because when the facial expression changes, not only the face shape changes, but also the face surface texture changes. Therefore, the facial expression editing method based on the three-dimensional face model is difficult to modify the facial texture.

Disclosure of Invention

In order to solve the defects of the technology, the invention provides a facial expression editing method based on a generative confrontation network.

In order to solve the technical problems, the invention adopts the technical scheme that: a facial expression editing method based on a generative confrontation network comprises the following overall steps:

step S1, data preparation phase

a. Manually labeling each face in the RGB image set, and labeling face identity information and face expression information; the marking information of each picture is represented by [ i, j ], wherein i represents that the picture belongs to the ith individual, and j represents that the picture belongs to the jth expression;

b. cutting the faces in the marked image set from the pictures through a face detector and a face characteristic point detector, and aligning the faces;

step S2, model design phase

a. The model consists of two parts, namely a generator G and a discriminator D; the generator G is used for generating a face picture controlled by expression control vectors according to the input face picture to be edited and the expression control vectors; the discriminator D is used for distinguishing the distribution of real samples and generated samples according to the pictures generated by the generator G and the real face pictures with labels, and learning the distribution of face expression and the distribution of face identity information;

b. forming a frame edited by the facial expressions based on the generative confrontation network by using the generator G and the discriminator D so as to carry out confrontation training;

step S3, model training phase

a. Inputting the real labeled human face picture and the picture generated by the generator G into a discriminator D, training and enabling the discriminator D to be used for distinguishing the distribution of a real sample and a generated sample, and learning the distribution of human face expression and the distribution of human face identity information; the picture generated by the generator G is marked as false [0], and the real face picture with the label is marked as true [1, i, j ];

b. and inputting the face picture img0[ i, j ] to be edited and the expression control vector y into a generator G, and outputting the face picture controlled by the expression control vector. Then inputting the false picture [0] output by the generator G into the discriminator D, so that the discriminator discriminates the false picture [1], the face identity information i and the face expression information j, and the generator G is ensured to generate a face picture which is more real, better in identity information retention and more effective in expression control;

c. repeating the step a for 3 times and then repeating the step b for 1 time, so that the discriminant D is better trained; the better the discriminant D is trained, the more beneficial the training of the generator G is;

d. storing the model parameters once per epoch, editing the facial expressions on the test set, and observing the output picture effect of the generator G; stopping model training when the generator G generates a human face picture meeting the requirements; meanwhile, storing the face picture to generate a model parameter with the best visual effect;

step S4, model testing phase

a. The input image is an image I containing a human face;

b. inputting the image I into a face detector, obtaining a face position, cutting the image I by using the face position to obtain a face image img0, and aligning the face of the face image img 0;

c. and inputting the aligned face picture and the expression control vector into a generator G to obtain a face picture img1 after expression editing.

The invention uses a unique full convolution network as a generator, and simultaneously performs true and false classification, facial expression classification and facial identity classification on the discriminator, thereby ensuring that the generator generates a facial picture which is closer to the distribution of a real facial picture, better keeping the facial identity information and editing the facial picture more effectively by expression.

Drawings

FIG. 1 is a diagram of a model design architecture of the present invention.

Fig. 2 is a schematic overall flowchart of step S4.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

A facial expression editing method based on a generative confrontation network comprises the following overall steps:

step S1, data preparation phase

a. Manually labeling each face in the RGB image set, and labeling face identity information and face expression information; the marking information of each picture is represented by [ i, j ], i represents that the picture belongs to the ith individual (i is more than or equal to 0 and less than N), and j represents that the picture belongs to the jth expression (j is more than or equal to 0 and less than M); the whole picture set comprises N persons and M expressions;

step S2, model design phase

a. The model consists of two parts, namely a generator G and a discriminator D; the generator G is used for generating a face picture controlled by expression control vectors according to the input face picture to be edited and the expression control vectors; the discriminator D is used for distinguishing the distribution of real samples and generated samples according to the pictures generated by the generator G and the real face pictures with labels, and learning the distribution of face expression and the distribution of face identity information; the overall framework of the model is shown in FIG. 1;

b. forming a frame edited by the facial expressions based on the generative confrontation network by using the generator G and the discriminator D so as to carry out confrontation training; the network structures of the generator G and the discriminator D are shown in tables 1 and 2, respectively.

Table 1 network structure of generator G

Generator G
	Aligned color face pictures 128 x 3
3 × 3 convolution, batch normalization, exponential linear unit activation; 3 x 3 convolution, batch normalization, exponential linear cell activation
	Maximum pooling 2 x 2
3 × 3 convolution, batch normalization, exponential linear unit activation; 3 x 3 convolution, batch normalization, exponential linear cell activation
	Maximum pooling 2 x 2
3 × 3 convolution, batch normalization, exponential linear unit activation; 3 x 3 convolution, batch normalization, exponential linear cell activation
	Nearest neighbor upsampling 2 x 2
3 × 3 convolution, batch normalization, exponential linear unit activation; 3 x 3 convolution, batch normalization, exponential linear cell activation
	Nearest neighbor upsampling 2 x 2
3 × 3 convolution, batch normalization, exponential linear unit activation; 3 x 3 convolution, batch normalization, exponential linear cell activation
	3 x 3 convolution, hyperbolic tangent function activation

TABLE 2 network architecture for discrimination D

Discriminator D
	Aligned color face pictures 128 x 3
3 × 3 convolution, batch normalization, leaky modified Linear cell activation (slope 0.02)
	3 × 3 convolution (step 2), batch normalization, leaky modified linear cell activation (slope 0.02)
3 × 3 convolution (step 2), batch normalization, leaky modified linear cell activation (slope 0.02)
	3 × 3 convolution (step 2), batch normalization, leaky modified linear cell activation (slope 0.02)
3 × 3 convolution (step 2), batch normalization, leaky modified linear cell activation (slope 0.02)
	Global mean pooling
Full connection layer
	True/false classification; classifying the expressions; identity classification

Step S3, model training phase

a. Inputting the real face picture with the label and the picture generated by the generator G into a discriminator D, training and enabling the discriminator D to be used for distinguishing the distribution of the real sample and the generated sample, and learning the distribution of the face expression and the distribution of the face identity information. Wherein, the picture generated by the generator G is marked as false [0], and the real face picture with label is marked as true [1, i, j ]. The face identity label i and the face expression label j are both from a real face picture with labels;

b. and inputting the face picture img0[ i, j ] to be edited and the expression control vector y into a generator G, and outputting the face picture controlled by the expression control vector. And inputting the false picture [0] output by the generator G into a discriminator D, so that the discriminator judges the false picture [1], the face identity information i and the face expression information j. Therefore, the generator G can generate a more real face picture, the identity information can be well kept, and the expression control is more effective;

d. model parameters are saved once per epoch (1 epoch equals one training with all samples in the picture set) and facial expression editing is performed on the test set, observing the output picture effect of generator G. And stopping model training when the generator G generates the human face picture meeting the requirements. Meanwhile, the face picture is stored to generate the model parameter with the best visual effect.

Step S4, model testing phase

a. The input image is an image I containing a human face;

c. inputting the aligned face picture and the expression control vector into a generator G to obtain a face picture img1 after expression editing; the overall flow chart of this step is shown in fig. 2.

Compared with the prior art, the invention has the following key points and advantages:

first, with respect to the generator G network: 1) the activation function adopts an exponential linear unit, and the upsampling layer adopts nearest neighbor upsampling 2 x 2; 2) and adding the expression control information to an input end, and connecting the expression control information with the input face picture to form a four-channel tensor through a full connecting layer and remodeling operation.

The beneficial effects are that: the activation function adopts an exponential linear unit to increase the nonlinearity of the network, so that the network has stronger nonlinear fitting capacity; the sampling layer adopts nearest neighbor upsampling 2 x 2 comparison deconvolution operation, and the picture generation effect is better; the expression control information is added to the input end, and the advantage is that the face picture information and the expression control information can interact earlier, so that a better control effect is achieved.

Second, with regard to the arbiter D network: the discriminator D network is used not only to distinguish the distribution of real samples and generated samples, but also to learn the distribution of facial expressions and the distribution of facial identity information.

The beneficial effects are that: the discriminator D simultaneously performs true and false classification, facial expression classification and facial identity classification, so that the generator G can be ensured to generate a facial picture distribution which is closer to the real facial picture, the facial identity information is better kept, and the facial picture is more effectively edited by expression.

Thirdly, the confrontation training process: 1) in the countercheck training process, the discriminator D is used for distinguishing the distribution of real samples and generated samples, learning the distribution of facial expressions and the distribution of facial identity information; the generator G is used for deceiving the discriminator D on one hand to reduce the difference between the generated sample and the real sample; on the other hand, the picture generated by G is judged to be the same person as the input picture of G by D, and the expression information is the same as the expression control vector.

The beneficial effects are that: the generator G can be ensured to generate a face picture which is closer to the real face picture distribution, the face identity information can be better kept, and the face picture can be more effectively edited by expression.

The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make variations, modifications, additions or substitutions within the technical scope of the present invention.

Claims

1. A facial expression editing method based on a generative confrontation network is characterized in that: the method comprises the following overall steps:

step S1, data preparation phase

a1, manually labeling each face in the RGB image set, and marking face identity information and face expression information; the marking information of each picture is represented by [ i, j ], wherein i represents that the picture belongs to the ith individual, and j represents that the picture belongs to the jth expression;

b1, cutting the faces in the marked image set from the pictures through a face detector and a face characteristic point detector, and aligning the faces;

step S2, model design phase

a2, the model is composed of two parts, which are a generator G and a discriminator D respectively; the generator G is used for generating a face picture controlled by expression control vectors according to the input face picture to be edited and the expression control vectors; the discriminator D is used for distinguishing the distribution of real samples and generated samples according to the pictures generated by the generator G and the real face pictures with labels, and learning the distribution of face expression and the distribution of face identity information;

b2, composing a frame of facial expression edition based on a generative confrontation network by using the generator G and the discriminator D, thereby carrying out confrontation training;

step S3, model training phase

a3, inputting the real face picture with the label and the picture generated by the generator G into a discriminator D, training and enabling the discriminator D to be used for distinguishing the distribution of real samples and generated samples, and learning the distribution of face expression and the distribution of face identity information; the picture generated by the generator G is marked as false [0], and the real face picture with the label is marked as true [1, i, j ];

b3, inputting a face picture img0[ i, j ] to be edited and an expression control vector y into a generator G, and outputting a face picture controlled by the expression control vector; then inputting the false picture [0] output by the generator G into the discriminator D, so that the discriminator discriminates the false picture [1], the face identity information i and the face expression information j, and the generator G is ensured to generate a face picture which is more real, better in identity information retention and more effective in expression control;

c3, repeating the step a3 every 3 times, then repeating the step b 31 time, and training the discriminator D and the generator G;

d3, storing the model parameters once per epoch, editing the facial expression on the test set, and observing the output picture effect of the generator G; stopping model training when the generator G generates a human face picture meeting the requirements; meanwhile, saving the model parameters generated by the current face picture;

step S4, model testing phase

a4, the input image is an image I containing a human face;

b4, inputting the image I into a face detector and obtaining a face position, cutting the image I by using the face position to obtain a face image img0, and aligning the face of the face image img 0;

c4, inputting the aligned face picture and the expression control vector into a generator G to obtain an expression edited face picture img 1.