CN110706303A

CN110706303A - Face image generation method based on GANs

Info

Publication number: CN110706303A
Application number: CN201910975725.5A
Authority: CN
Inventors: 和红杰; 陈泓佑; 陈帆
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2019-10-15
Filing date: 2019-10-15
Publication date: 2020-01-17
Anticipated expiration: 2039-10-15
Also published as: CN110706303B

Abstract

The invention discloses a face image generation method based on GANs, which relates to the technical field of computers, wherein a face image generated by a generator can be associated with not only random vectors but also feature vectors, so that the generated image is directly influenced by the features of a training image, and the interpretability is increased; the gradient disappearance can be effectively avoided, decoding training is possible before binary countermeasure training, the gradient disappearance phenomenon caused by optimizing JS divergence can be avoided, and the quality of generated images is improved; the decoder can learn good image structural features, so that the generator can learn the good structural features, the images with distorted faces are reduced, and meanwhile, the definition of the images can be learned more reasonably; due to the characteristic decoding constraint, the gradient descending direction is also constrained to a certain extent when the objective function is optimized, and less epoch number can be used in the training process.

Description

Face image generation method based on GANs

Technical Field

The invention relates to the technical field of computers, in particular to a human face image generation method based on GANs.

Background

The image generation method based on the GANs is one of the hot spots of the current artificial intelligence research, and theoretically, the image generation method based on the GANs can effectively simulate a lot of image contents, such as: human face, buildings, indoor scenes, flowers, animal images, and the like. The generation of the images is also of practical significance, for example, the effective generation of real faces or cartoon faces can save the virtual generation of some common character roles in movie and television or cartoon works, thereby saving the cost; the indoor background information which some photographers want to protect can be effectively protected for the generation of the indoor scene; the number of the images in a certain category is small, and more images in the category can be acquired, so that the purpose of data augmentation is achieved.

The basic structure of the GANs includes two neural networks (hereinafter referred to as "networks" for "neural networks") including a generator G and a discriminator D. The generator G mainly obtains a generated image G (z) by feeding a random vector z, and the discriminator D performs two-classification training by using the training image set image x and the generated image G (z) as positive and negative samples. The purpose of the generator G is to make G (z) as similar as possible to the image sample distribution in the training set X. The purpose of the discriminator D is to distinguish the true attributes of the positive and negative samples of the training set image x and the generated image g (z) as much as possible. The two training sets are sequentially subjected to learning training, and finally the generator G has the capacity of generating images similar to the training set in a mode of resisting learning.

One of the more important improvements in improving the diversity and quality of images generated by GANs is to optimize other divergences instead of optimizing JS divergences. For the GANs model, the classical optimization goal is to optimize the JS divergence between the training data set distribution and the generated data set distribution, such as raw GANs and DCGANs. However, the JS divergence optimization is insufficient, and when the intersection area between the distribution of the training data set and the distribution of the generation data set is small, the JS divergence is approximately constant, so that the problem that the gradient of the optimized objective function disappears is caused, and the image generation quality of the GANs is influenced. Typically, instead of JS divergence, other distances or divergences are used, WGANs, WGANsGP, bgans using W distance between training data set distribution and generating data set distribution as optimization target, and LSGANs using pearson divergence as optimization target. Compared with DCGANs for optimizing JS divergence, WGANs, WGANsGP, BEGANs and LSGANs are replaced by other divergences, and the image generation quality is improved to a certain extent. WGANs use relatively rough weight pruning when the arbiter network meets the 1-L condition, affecting the quality of generation to some extent. The WGANsGP substitutes 1-center gradient punishment used by the discriminator network for weight pruning, and the processing mode is more reasonable. The BEGANs improve the judger by means of coding and decoding ideas, and further optimize the W distance, but the optimization target is more complex, and small plaque areas or detailed textures are easy to appear in the generated image and lose. In addition, they require a large number of parameter update iterations to achieve a good training result.

Disclosure of Invention

The present invention is to provide a method for generating a face image based on GANs, which can alleviate the above problems.

In order to alleviate the above problems, the technical scheme adopted by the invention is as follows:

the invention provides a face image generation method based on GANs, which comprises the following steps:

s1, acquiring a training set X, wherein the training set X consists of a plurality of face images;

s2, extracting the hidden features of all the face images in the training set X to obtain a hidden feature set C of the face images;

s3, face image decoding training, specifically comprising:

s31, sampling the blocksize face image from the training set X without repetition, and carrying out pixel value scale transformation on the sampled face image;

s32, calculating the Boolean value of the discriminant function delta

Wherein t is the current iteration epoch number, r is the frequency of controlling invoking decoding constraint, l is the last decoding constraint condition, if δ is 1, the step S33 is continuously executed, if δ is 0, the generator G of the GANs is taken as the antagonistic learning generator G, and the step S4 is skipped;

s33, constructing a decoder Dec which has the same network structure as the generator G in the GANs and weight sharing, performing decoding training on the decoder Dec by using a RMSProp optimization method according to the formula (2),

where λ is the decoding loss function weight coefficient, x_iIs the ith human face image subjected to pixel value scale conversion in step S31, c_iIs x_iCorresponding hidden features of the face image in the hidden feature set C of the face image, Dec (C)_i) Denotes c_iAn output image decoded by a decoder Dec, k being a blocksize value;

through the training of the decoder Dec, the generator G of the GANs is updated in a parameter sharing mode, and the generator G after the updating is used as a counterstudy generator G;

s4, the counterlearning in face image generation specifically includes:

s41, sampling the blocksize face image from the training set X without repetition, and carrying out pixel value scale transformation on the sampled face image;

s42, taking the blocksize image subjected to the pixel value scale transformation in the step S41 as a positive sample, generating blocksize random vectors by adopting a random generation method, and feeding the blocksize random vectors serving as an input information source into a countermeasure learning generator G to obtain a blocksize generated image serving as a negative sample;

s43, feeding the positive samples and the negative samples into a discriminator D of the GANs, performing weight updating training on the discriminator D by using a RMSProp optimization method, outputting the trained discriminator D, wherein the optimized loss function is a formula (3),

wherein, D (x)_i) Is the discriminator D on the ith positive sample x_iA discrimination value of D (G (z))_i) Is the arbiter D for the ith negative sample G (z)_i) A discrimination value of (1);

s44, feeding the batchsize random vectors obtained in the step S42 into the counterlearning generator G, performing weight updating training on the counterlearning generator G by using a RMSProp optimization method, wherein the optimized loss function is an expression (4),

if the current epoch training is not finished or the current iteration epoch number does not reach the maximum epoch number, jumping to step S41, and if the current epoch training is finished and the current iteration epoch number reaches the maximum epoch number, outputting a generator G which is well trained;

s45, storing and obtaining GANs composed of the trained discriminator D and the trained generator G;

and S5, generating an image random vector by adopting a random generation method, generating a face image by using the GANs obtained in the step S45 and taking the image random vector as input, and performing pixel value scale transformation on the generated face image to finish the generation of the face image.

The technical effect of the technical scheme is as follows: the face image generated by the generator can be associated with not only the random vector but also the feature vector, which shows that the generated image is directly influenced by the features of the training image, and the interpretability is increased; the gradient disappearance can be effectively avoided, decoding training is possible before binary countermeasure training, the gradient disappearance phenomenon caused by optimizing JS divergence can be avoided, and the quality of generated images is improved; the decoder can learn good image structural features, so that the generator can learn the good structural features, the images with distorted faces are reduced, and meanwhile, the definition of the images can be learned more reasonably; due to the characteristic decoding constraint, the gradient descending direction is also constrained to a certain extent when the objective function is optimized, and less epoch number can be used in the training process.

Optionally, the step S2 specifically includes:

s21, sampling the blocksize face image from the training set X without repetition, and carrying out pixel value scale transformation on the sampled face image;

s22, training a feature learning network by using the batchsize face image subjected to the pixel value scale transformation in the step S21;

s23, extracting the implicit characteristics of the face image after the pixel value scale transformation in the step S21 through a characteristic learning network, if the implicit characteristics of the face image in the training set X are not extracted completely, jumping to the step S21, otherwise, outputting a face image implicit characteristic set C.

The technical effect of the technical scheme is as follows: the hidden features of each image in the face image training set X can be obtained after the processing by the technical scheme, X and C are in one-to-one correspondence, and each face image has the corresponding hidden features.

Optionally, the step S21, the step S31, and the step S41 of performing the pixel value scaling on the face image are to perform the pixel value scaling on the face image according to equation (6) to [ -1,1],

wherein i is the label of a certain image in the Batchsize images, and i belongs to [1, Batchsize ].

The technical effect of the technical scheme is as follows: the pixel value range of each image can be changed into real number of [ -1,1], the training set image is standardized, and the training set image is convenient to feed into a network for learning.

Optionally, the step S22 specifically includes:

constructing an initial feature learning network, feeding the batchsize facial image subjected to scale transformation in the step S21 into the feature learning network, fully training by using a mean square error loss function shown in an Adam optimizer optimization formula (6), completing the training of the feature learning network after the maximum epoch number of the feature learning network is reached,

wherein x is_i ^*Is the ith reconstructed image output by the feature learning network and the ith human face image X in the training set X_iAnd correspondingly.

The technical effect of the technical scheme is as follows: the training image and the reconstructed image can be the same as much as possible, and the formula (6) is a convex optimization function, so that the optimization of the objective function can be facilitated; when the network converges and reaches a certain training frequency, the closer the training image is to the reconstructed image, the more the characteristics of the network intermediate layer can represent the training set image.

Optionally, the feature learning network is any one of a deep neural network, a convolutional neural network, a U-Net type automatic coding machine, a DenseNet type automatic coding machine, and a sparse automatic coding machine.

The technical effect of the technical scheme is as follows: the network combination formula (6) can be used for reconstructing face images of a training set, and after full training, the features in the feature set are derived from the features of a network middle layer.

Alternatively, in step S44, the generator G needs to train twice in succession.

The technical effect of the technical scheme is as follows: the training effect can be improved to a certain extent on the generation quality.

Optionally, the step S5 specifically includes:

s51, setting the number N of required images, determining the distribution type of the random vectors of the images, enabling the random vectors of the images to be consistent with the distribution type of the random vectors in the step S42, and loading the GANs obtained in the step S45;

s52, generating N image random vectors by using a random generation method, sequentially feeding the N image random vectors into a trained generator G, and outputting to obtain N generated images with the pixel value range of [ -1,1 ];

s53, using the formula (7) to carry out pixel value scale transformation on the N generated images in the step S52, completing the generation of the face image,

G(z_j)←127.5×(G(z_j)+1) (7)

wherein, G (z)_j) Is the j image generated, z_jIs the image random vector, j 1,2, 3.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a face image generation method according to an embodiment of the present invention;

FIG. 2 is a general block diagram of the human face image generation GANs in the embodiment of the invention;

FIG. 3 is a flow chart of feature extraction in an embodiment of the present invention;

FIG. 4 is a flow chart of decoding constraint and countermeasure learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The foregoing background and the following examples are given by way of explanation of the key terms involved:

and (3) GANs: the general adaptive Networks, which is a mode of learning through countermeasures, enables a generator network to output samples similar to the distribution of a training set by feeding in excitation vectors.

JS divergence: Jensen-Shannon divergence can measure the distance between two distributions, the bigger the difference between the two distributions is, the larger the JS divergence value of the two distributions is, and the difference exceeds a certain degree, so that the JS divergence value of the two distributions approaches to a constant value. The smaller the difference between the two distributions. The smaller their JS divergence values are, the minimum JS divergence value of 0 can be achieved if and only if the two distributions are the same.

DCGANs: deep Convolution generated adaptive Networks (DEEP) are the GANs models of a generator G and a discriminator D designed by utilizing a Convolution neural network, and the Deep Convolution generated adaptive Networks are one of the GANs standard models, and have relatively great breakthrough.

WGANs: wasserstein general adaptive Networks, a W distance generating countermeasure network, is a method of changing an optimization function to optimize the W distance between a generated image set distribution and a training image set distribution. Compared with the GANs for optimizing JS divergence, the method can overcome the problem of gradient disappearance caused by JS divergence, but WGANs need to enable the discriminator D to meet 1-Lipschitz continuity, so a brute force weight pruning method is used for the discriminator D.

WGANsGP: wasserstein genetic adaptive Networks, Gradient Penalty. Compared with WGANs, the WGANs model with the gradient penalty avoids a brute force processing mode of weight pruning, and the training effect is better.

BEGANs: boundary balanced GANs, Boundary balanced basic Networks. And improving the GANs optimization function by using a boundary balance strategy to avoid the gradient disappearance phenomenon possibly brought by optimizing JS divergence. The objective function it optimizes is the W distance.

LSGANs: the raw Squares genetic adaptive Networks. And the optimized objective function is converted into Pearson divergence, so that the gradient disappearance phenomenon possibly brought by JS divergence is avoided. The design of the loss function is a least square loss function.

Batchsize: the size of the sample fed by each batch of training GANs.

An Epoch: and in the training period, the number of the training sets is fixed, the training is carried out by feeding Batchsize samples into the GANs each time without repeating, and one training period is carried out after all the training set samples are covered (generally, samples sampled in each batch are sequentially sampled from the training sets.

Example 1

Referring to fig. 1, fig. 2 and fig. 4, an embodiment of the present invention provides a method for generating a human face image based on GANs, including the following steps:

and S1, acquiring a training set X, wherein the training set X consists of a plurality of face images.

In this embodiment, two methods for acquiring the training set X are given: the first is by center-cropping the CELEBA face data set into a fixed size face image such as: 64X 6496X 64, etc.; the second is to crawl people pictures on a public network through a crawler technology, then cut out face images through a face recognition technology, and finally zoom the image size to a fixed size as follows: 64X 6496X 64, etc.

S2, extracting the hidden features of all the face images in the training set X to obtain a hidden feature set C of the face images.

S3 human face image decoding training

S31, sampling the blocksize face image X from the training set X without repeated sampling₁,x₂,...,x_k(k ═ batch size) according toPerforming a pixel value scaling to [ -1,1]Wherein i is the label of a certain image in the Batchsize images, i belongs to [1, Batchsize [ ]]The transformed image is still marked as x₁,x₂,...,x_k(k＝batchsize)。

S32, calculating the Boolean value of the discriminant function delta

Where t is the current iteration epoch number, r is the frequency of controlling invoking decoding constraints, l is the last decoding constraint condition, if δ is 1, the step S33 is continued, and if δ is 0, the generator G of the GANs is used as the counterlearning generator G, and the step S4 is skipped, that is, the generator G of the GANs performs the counterlearning directly in the following step without decoding training.

through the training of the decoder Dec, the generator G of the GANs is updated in a parameter sharing manner, and the generator G updated this time is used as the counterlearning generator G, that is, the counterlearning of the generator G updated this time is performed in the following steps.

S4 confrontation learning in face image generation

S41, collecting from the training set X without repeatingSample size sheet face image x₁,x₂,...,x_k(k ═ batch size) according to

Performing a pixel value scaling to [ -1,1]Wherein i is the label of a certain image in the Batchsize images, i belongs to [1, Batchsize [ ]]The transformed image is still marked as x₁,x₂,...,x_k(k＝batchsize)。

S42, acquiring positive and negative samples of a discriminator D; the pixel value scaled image x of the batch size sheet in step S41₁,x₂,...,x_k(k ═ batch size) as a positive sample; randomly generating batchsize vectors z by adopting a random generation method₁,z₂,...,z_k(k ═ batch size; z vector is the excitation vector of the image generated by the counterstudy generator G, which is the input information source of the counterstudy generator G and has a fixed dimension, e.g. 100 dimensions; each vector element in the z vector is a real number, with a range of values [ -1,1]Uniform distribution is obeyed. However, there is a normal distribution with 0 mean and 1 standard deviation, and it should be noted that the value range of the vector elements is not necessarily [ -1,1]) Fed into the resist learning generator G, a blocksize sheet generation image G (z)₁),G(z₂),...,G(z_k) (k ═ batch size) as a negative sample.

wherein, D (x)_i) Is the discriminator D on the ith positive sample x_iA discrimination value of D (G (z))_i) Is the arbiter D for the ith negative sample G (z)_i) The discrimination value of (1).

S44, obtaining batchsize random vectors z obtained in the step S42₁,z₂,...,z_kIs fed to the antagonistic learning generator G,using an RMSProp optimization method to carry out weight updating training on the counterstudy generator G, continuously training for two times G, wherein the optimized loss function is an expression (4),

and repeating the steps S3-S4, feeding Batchsize images to the GANs for training each time for the training set X, finishing an epoch training task after covering all the face images of the training set X, then training the process until the maximum epoch number, and finally storing to obtain the final trained generator G.

And S45, storing and obtaining the GANs consisting of the trained discriminator D and the trained generator G.

S5 generation of face image

S51, setting the number N of required images, determining the distribution type of random vectors of the images to be consistent with the distribution type of the random vectors when the GANs are trained in the step S42, and loading the GANs obtained in the step S45;

s52, generating N random vectors z of images by using random generation method₁,z₂,...,z_NSequentially feeding the N random vectors of the images into the trained generator G to obtain the original values G (z) of the output images of the N generators G (the value range of the output values is [ -1,1]])。

S53, using formula (7) to carry out image pixel value scale transformation on N G (z) to make the pixel value be [0,255], at this time, storing the N images G (z), completing the generation of the face image,

G(z_j)←127.5×(G(z_j)+1) (7)

wherein, G (z)_j) Is the jth image generated, j being 1,2, 3.

Compared with the prior art, the method for generating the human face image based on the GANs has the following advantages:

1. increase interpretability: most of the feed information of the generator G in the GANs is a random vector z, which indicates that the generator G performs image generation by using a noise signal z as an excitation feature, i.e. the feature z of G (z) is hard to be interpreted to have an intrinsic relation with the feature c of the training set X. In the proposed face generation method, the excitation signal of the generator G (the decoder Dec shares the weight of the generator G, and the network structure is consistent) is no longer only a random vector z, but also a feature vector c of the training image x. Therefore, the face image generated by the generator G can be not only related to z but also related to c, which shows that the generated image G (z) is directly influenced by the characteristic c of the training image x, and the interpretability is increased.

2. The problem of gradient disappearance when JS divergence is optimized is alleviated: the optimized expressions (3) and (4) are JS divergence for optimizing the distribution of the training image set X and the distribution of the generated image set g (z), and it can be seen from the nature of JS divergence that JS divergence values are less likely to approach a constant as the distributions of the two images are more similar, thereby effectively avoiding gradient disappearance. Before the binary confrontation training, decoding training is possible, and the aim is to reconstruct the training image set X, which is equivalent to improving the similarity between the distribution of G (Z) and the distribution of X. This is favorable to avoiding optimizing the gradient disappearance phenomenon that JS divergence leads to and taking place, and then improves the quality of generating the image.

3. Improving the visual effect of the generated image: since the decoder Dec is subjected to decoding training before the countermeasure training, according to the knowledge in the step (2), since the decoding is performed by making the output image of the decoder Dec strictly consistent with the training image in terms of pixels for decoding reconstruction, the decoder Dec can learn a good image structural feature, so that the generator G can learn a better structural feature, and reduce images with distorted faces. While the sharpness (texture features) of the image can be more reasonably learned.

4. Number of iterations of less training: because the feature decoding constraint is carried out on the target function, the gradient descending direction is also constrained to a certain extent when the target function is optimized, and less epoch number can be used in the training process.

Example 2

Referring to fig. 3, step S2 in embodiment 1 specifically includes:

s21, sequentially sampling from the training set X without repeated sampling (in the same epoch period, sequentially sampling when the sequence of the training set samples is determined is not repeated sampling) batch size face image X₁,x₂,...,x_k(k ═ batch size) and scaled to [ -1,1] according to the line pixel value scale of equation (5)]The transformed image is still marked as x₁,x₂,...,x_k(k＝batchsize)。

S22, training a feature learning network by using the batchsize face image subjected to the pixel value scale transformation in the step S21, specifically as follows:

constructing an initial feature learning network, and carrying out dimension conversion on the pixel value of the batch size image x in the step S21₁,x₂,...,x_k(k ═ batch size) is fed into the feature learning network, full training is carried out by using a mean square error loss function shown in an Adam optimizer optimization formula (6), after the maximum epoch number of the feature learning network is reached, the training of the feature learning network is completed,

S23, after the feature learning network is fully trained, the face image x after the pixel value scale transformation in the step S21 is processed₁,x₂,...,x_k(k ═ batch size) is fed into the feature learning network, and the output value of the network middle layer is recorded, so that the corresponding hidden feature c of the face image is obtained₁,c₂,...,c_k(k＝batchsize)。

According to the feature extraction method, the steps S21 to S23 are repeated until the extraction of the implicit features of the whole training set X is completed, so that the face image feature set C is obtained. The whole face image feature extraction process is a pre-training process.

In this embodiment, the feature learning network may be any one of a deep neural network, a convolutional neural network, a U-Net type automatic coding machine, a DenseNet type automatic coding machine, and a sparse automatic coding machine.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A face image generation method based on GANs is characterized by comprising the following steps:

s3, face image decoding training, specifically comprising:

s32, calculating the Boolean value of the discriminant function delta

s4, the counterlearning in face image generation specifically includes:

2. The method for generating human face images based on GANs according to claim 1, wherein the step S2 specifically includes:

3. The method for generating human face images based on GANs according to claim 2, wherein the step S21, the step S31 and the step S41 are performed by performing pixel value scaling on the human face image according to equation (5) to [ -1,1],

4. The method for generating human face images based on GANs according to claim 2, wherein the step S22 specifically comprises:

constructing an initial feature learning network, feeding the batchsize facial image subjected to the pixel value scale transformation in the step S21 into the feature learning network, fully training by using a mean square error loss function shown in an Adam optimizer optimization formula (6), completing the training of the feature learning network after the maximum epoch number of the feature learning network is reached,

5. The method for generating human face images based on GANs as claimed in claim 4, wherein said feature learning network is any one of a deep neural network, a convolutional neural network, a U-Net type automatic coding machine, a DenseNet type automatic coding machine and a sparse automatic coding machine.

6. The method for generating human face images based on GANs according to claim 1, wherein in step S44, the counterlearning generator G needs to train twice in succession.

7. The method for generating human face images based on GANs according to claim 1, wherein the step S5 specifically includes:

G(z_j)←127.5×(G(z_j)+1) (7)