CN111914617A

CN111914617A - Face attribute editing method based on balanced stack type generation countermeasure network

Info

Publication number: CN111914617A
Application number: CN202010521351.2A
Authority: CN
Inventors: 王啸天; 陈百基
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-11-10
Anticipated expiration: 2040-06-10
Also published as: CN111914617B

Abstract

The invention discloses a face attribute editing method based on a balanced stack type generation countermeasure network, which comprises the following steps: 1) acquiring a data set containing a face image and an attribute label, and preprocessing the data set; 2) constructing a plurality of condition generating type confrontation networks consisting of paired generators and discriminators according to the size of the face image; 3) utilizing the preprocessed data set, aiming at different face attributes, adopting a weighted learning and residual image generation mode, and independently training all condition generation type countermeasure networks; 4) and stacking all the trained generators to form a stack structure, and sequentially editing corresponding face attributes aiming at the preprocessed unknown face images. According to the invention, a weighted learning and residual image generation mode is applied in a training process, and a stack structure is adopted in an attribute editing process, so that the model can effectively solve the problem of data imbalance, the editing capability of a few types of samples is enhanced, the image generation effect in an attribute irrelevant area is improved, and the problem of attribute entanglement is avoided.

Description

Face attribute editing method based on balanced stack type generation countermeasure network

Technical Field

The invention relates to the technical field of image editing and machine learning, in particular to a face attribute editing method based on a balanced stack generation type confrontation network.

Background

Face attribute editing is directed to changing specific attributes, such as adding glasses, removing mustache, whitening skin, and even replacing gender, given a face image. The visual task achieves the semantic controllability of the image and fine-grained image transformation. Meanwhile, with the rise of selfie wave on social media, the wide spread of network videos, the intelligent design of games or animation roles, the generation of large-scale face image data on the internet every year and the increasingly strong demand for face attribute editing are met. Therefore, the face attribute editing is widely applied to application scenes such as facial beautification, video restoration, role synthesis and the like. In addition, the face image with the edited specific attributes can also be used for performing data enhancement on other vision type machine learning tasks, such as face recognition, face detection and face tracking.

In recent years, the generative countermeasure network has become a popular model in human face attribute editing research by virtue of its strong image generation capability, and has obtained high-fidelity and high-diversity editing results. Given a data set containing face images and attribute labels, in a generative confrontation network, a trained generator can manipulate specific image attributes to fool a discriminator, which can be used to distinguish between real images and false images synthesized by the generator. The quality of the edited image is improved in the mutual confrontation of the generator and the discriminator. However, face multi-attribute editing remains a challenging task due to the large number of attribute combinations, i.e., the large number of attributes themselves, each of which contains multiple values. Similar to other machine learning tasks, the collection of samples, i.e., pairs of face images and attribute labels, is particularly expensive and limited, so they cannot represent the entirety of all attribute combinations. Current multi-property editing methods suffer from three problems due to lack of samples and imbalance. The first problem is attribute entanglement, i.e. editing of one attribute changes other attributes, while balancing of one attribute affects the balance of other attributes. This is because existing methods do not have enough samples to distinguish between different attributes. Secondly. Due to the lack of enough samples for learning, the generated image is not satisfactory enough for learning in regions with irrelevant attributes, which is reflected in high noise and inaccurate editing. Finally, the existing methods edit the minority attributes poorly because they often treat all samples equally, so that the samples of the minority attributes are ignored, and simultaneous learning of multiple attributes causes the existing methods to fail to perform balance adjustment individually for each attribute.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a face attribute editing method based on a balanced stack type generation confrontation network, which utilizes a plurality of condition generation confrontation networks to independently learn the editing of a plurality of attributes in a training stage and avoids the problem of attribute entanglement. In addition, each condition generating type confrontation network directly learns that the target is a residual image, and the generating effect of the attribute-independent area is improved. Meanwhile, weighted learning is adopted, so that samples of a small number of types of attribute values can obtain larger learning weight, the editing effect on the small number of types of attribute values is improved, and the challenges brought by unbalanced data sets are properly solved.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a human face attribute editing method based on a balanced stack type generation type confrontation network adopts a mode of weighting learning and training a plurality of condition generation type confrontation networks to solve the problem of unbalanced attributes, stacks all generators of the trained condition generation type confrontation networks to form a stack type structure to solve the problem of attribute entanglement, and solves the problem of inaccurate image editing by utilizing a residual image generation mode; which comprises the following steps:

1) acquiring a data set containing a face image and an attribute label, and preprocessing the data set;

2) constructing a plurality of condition generating type confrontation networks consisting of paired generators and discriminators according to the size of the face image;

3) utilizing the preprocessed data set, aiming at different face attributes, adopting a weighted learning and residual image generation mode, and independently training all condition generation type countermeasure networks;

4) and stacking all the trained generators to form a stack structure, and sequentially editing corresponding face attributes aiming at the preprocessed unknown face images.

In step 1), the data set is obtained by a face data set published on the internet; the face image is an image containing a single face, and the preprocessing comprises the following steps: cutting, scaling and normalizing to make the face in the image occupy the main picture, and the pixel value size is between-1 and 1, representing a face image by x, all x forming a face image set

Namely, it is

Y for the attribute tag_jThe expression refers to a label of a value of j-th attribute of the face, wherein j is 1,2, …, m and m are the number of attributes, and all attribute labels y_jForm attribute label set

Namely, it is

After the m attribute labels are combined, 1 face image is added to form 1 pair of samples.

In step 2), according to the size of the face image, constructing a plurality of condition generating type confrontation networks composed of paired generators and discriminators, specifically as follows:

i, constructing a generator G in the structure of an encoder-decoder_j：

Generator G_jThe encoder in (1) comprises a plurality of convolutional layers which receive two variables, including a variable from a set of face images

Face image of

And one from the target property tag set

Object property tag of

Wherein

Is different from the attribute label y_jValue v of_k，y_jFrom attribute tag collections

v_kTaking the value of the jth attribute k, wherein j is 1,2, …, m is the number of attributes; after 1 attribute label is combined, 1 face image is added to form 1 pair of samples; then G_jMapping the two received variables into a hidden code z for extracting the characteristics of the face image and eliminating redundant information; then the implicit code z is inputted into a decoder composed of a plurality of deconvolution layers to generate a residual image for changing the attribute of the target face

Wherein

Defined as a face image x and an edited image

The difference value of (a) to (b),

for all that is

The set of components is composed of a plurality of groups,

for all that is

A set of constructs; finally, the edited image

Through x and

superposing the pixels on a pixel level; since the minority class samples are easy to be ignored in the unbalanced data set, each sample in the data set needs to be given a learning weight ω to represent its importance in training, and the value of ω of the sample with the minority class attribute value is large, and vice versa; for a face image x, omega is according to j attribute value v_kIs defined as follows:

wherein, | upsilon_jL refers to the number of different values of the jth attribute,

indicating the number of samples for which the jth attribute is k, i.e. k

Wherein (x, y)_j) For face image x and attribute label y_jThe number of 1 pair of samples that are formed,

as sets of facial images

And attribute tag collections

A set pair is formed, | represents the cardinality of a set; ω measures the maximum number of samples with value on the jth attribute and v with value on the jth attribute_kIs greater than or equal to 1,

the smaller the representative weight ω, the larger the representative weight ω, and vice versa, this weight encourages the model to pay more attention to samples with a few classes of attribute values during the learning process;

II, constructing a complete discriminator D by two sub-discriminators_j：

Discriminator D_jThe method comprises two sub-discriminators: authenticity discriminator D_j ^realAnd a category discriminator D_j ^cls(ii) a Authenticity discriminator D_j ^realThe method is used for judging the authenticity of the input image and predicting the probability that the input image is a real image; class discriminator D_j ^clsThe method is used for identifying whether the attribute value of the input image accords with the target attribute value or not and predicting the conformity degree, wherein different values of the same attribute can be regarded as different categories; the two sub-discriminators share one multilayer convolutional neural network but have independent double-layer full-connection layers;

III, constructing a training target of the generator:

to combat the loss

Along with the other three loss components: class loss

Loss of reconstruction

And regularization loss

Are considered together into a generator G_j(ii) a To combat the loss

Is used to ensure that the generated image is authentic; class loss

Controlling the face image to be correctly according to the target attribute value

To edit; loss of reconstruction

Then the generator G is enhanced during the editing process_jThe ability to retain information for attribute independent regions; loss of regularization

Measuring the L-1 norm of a residual image to enhance the sparsity of the residual image, wherein the residual image should have a large number of zero-valued pixels; finally, generator G_jTraining target L^GComprises the four components, and is defined as:

in the formula (I), the compound is shown in the specification,

are balance parameters, which refer to the importance of the corresponding loss, respectively; notably, all four loss components need to be computationally considered to weight samples with a few classes of attributes; these loss components are specifically as follows:

i. fight against loss

This loss quantifies the degree of realism of the resulting image, which is defined as:

in the formula (I), the compound is shown in the specification,

representing a computational mathematical expectation, attribute tag y_jFrom attribute tag collections

Edited image

From edited image sets

v_kThe facticity discriminator takes the value of the jth attribute k, j is 1,2, …, m, m is the number of attributes, omega is the learning weight

Is used to judge the authenticity of the input image by minimizing the penalty, by G_jThe similarity of the synthesized image and the real image is improved;

class ii, class loss

The loss measures the edited value of the jth attribute of the face image x and the target attribute value

The degree of coincidence, which takes the form of a weighted binary cross-entropy loss function, is defined as follows:

in the formula (I), the compound is shown in the specification,

Object property label

From a target property tag set

Edited image

From edited image sets

v_kJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,

is a binary cross entropy function and is defined as:

category discriminator

Used for judging whether the jth attribute of the input image is correctly edited or not;

iii, loss of reconstruction

The loss avoids information loss in the process of reconstructing the attribute irrelevant area in the generated image; loss of reconstruction

The original image x and the reconstructed image g are measured_jThe difference between, among them, the set of reconstructed images

Is/are as follows

Is an edited image

Through generator G_jAccording to the attribute value v_kThe result after generation, i.e.

The reconstruction loss is defined as follows:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

Reconstructed image g_jFrom a set of reconstructed images

m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor₁Represents the L-1 norm;

iv, regularization loss

In generator G_jIn the method, a residual image is not a complete image and is used as a direct learning target, the residual image represents local pixel change on target attributes, theoretically, the residual image should be sparse, and a large number of zero-value pixels exist; thus, a regularization penalty is introduced and is defined as follows:

in the formula (I), the compound is shown in the specification,

Residual image

From sets of residual images

IV, constructing a discriminator D_jThe training target of (1):

the countermeasure loss and the classification loss are taken into account in the discriminator D_jIn the training target of (2), the following is defined:

in the formula (I), the compound is shown in the specification,

is a balance factor, indicating the importance of the class loss; two losses are defined as follows:

i, fight against loss:

this loss is used to encourage the authenticity discriminator D_j ^realDistinguishing between real and false images generated, and applying to generator G_jSimilar to the above antagonistic losses, also using a base D_j ^realThe weighted logarithm of the discrimination result quantifies the reality of the input image, and smaller countermeasure loss represents poorer performance of the generator, and the countermeasure loss is defined as follows:

in the formula，

The face image is from a set of face images

Edited image

From edited image sets

v_kTaking the value of the jth attribute k, wherein j is 1,2, …, m is the number of attributes, and ω is the learning weight;

ii, class loss:

the loss also quantifies the degree of correlation between the predicted value and the true value of the jth attribute of the input image, and minimizes the class discriminator D_j ^clsIs lost in category, ensure

For accurate judgment of the attribute values of the input image, the category loss is defined as follows:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

j is 1,2, m, m is the number of attributes, ω is the learning weight,

is a binary cross entropy function defined as

Category discriminator

v, constructing respective optimizers of the generator and the discriminator:

in order to improve the training stability and speed, the generator and the discriminator both adopt an Adam optimizer, and the Adam optimizer comprehensively considers the first moment estimation and the second moment estimation of the gradient so as to update the learning step length and automatically adjust the learning rate.

In step 3), each condition generates a training of the antagonistic network, comprising the following steps:

3.1) loading the preprocessed data set and the constructed conditional generation type countermeasure network;

3.2) inputting the face images in the data set and artificially specified target attribute values to a generator and a discriminator in batches;

3.3) calculating respective loss values according to respective output values of the generator and the discriminator and respective training targets;

3.4) calculating respective gradient values according to respective loss values of the generator and the discriminator and an Adam optimizer;

and 3.5) realizing back propagation according to respective gradient values of the generator and the discriminator and adjusting respective parameters.

In step 4), all trained generators are stacked to form a stack structure, and corresponding face attributes are sequentially edited aiming at the preprocessed unknown face images, and the method comprises the following steps:

4.1) stacking all generators;

4.2) preprocessing unknown face images and artificially setting a plurality of target attribute values;

4.3) inputting the face image and the target attribute value to all generators one by one, respectively changing the corresponding attributes, generating a residual image, combining the residual image with the input face image, and outputting a complete face image; the first generator receives the preprocessed unknown face image, the other generators receive the complete face image output by the previous generator, and the output of the last generator is the complete face image with all the properties edited.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention solves the human face multi-attribute editing task by utilizing the stack structure for the first time, so that each human face attribute can be independently edited, and the balance adjustment of one attribute does not influence the balance of other attributes.

2. According to the invention, a residual image generation mode is introduced in a multi-attribute editing scene for the first time, so that information loss caused by reconstruction of the attribute-independent partial images is avoided.

3. According to the invention, by applying the learning weight to each sample, the influence caused by the unbalanced data problem is inhibited, and the editing capability of the model on the minority class attribute values is improved.

4. When considering additional characteristics or neglecting the existing characteristics, the invention does not need to train the whole model from the beginning, only needs to add or remove part of the conditional generation type countermeasure network, and has the flexible characteristic.

5. The method for independently training the conditional generation type countermeasure network aiming at the single attribute and the residual image generation mode reduce the learning difficulty of the single attribute editing task, and each conditional generation type countermeasure network can adopt a lighter-weight network structure, thereby reducing the training cost.

6. The invention finally generates an image only depending on the generators of the conditional generative countermeasure networks of the relevant attributes, rather than all generators of the generative countermeasure networks, which makes the model of the invention more efficient.

Drawings

FIG. 1 is a logic flow diagram of the present invention.

FIG. 2 is a schematic diagram of the training phase of the present invention.

FIG. 3 is a schematic diagram of the image generation stage according to the present invention.

FIG. 4 is a diagram illustrating an example of independently editing a single attribute according to the present invention.

FIG. 5 is a diagram illustrating an example of editing 3 attributes one by one according to the present invention.

Detailed Description

The present invention will be further described with reference to the following specific examples.

As shown in fig. 1 to fig. 3, the face attribute editing method based on a balanced stack generation confrontation network provided in this embodiment uses techniques such as a stack structure, a residual image generation method, and weighted learning, and includes the following steps:

1) acquiring a face image x and an attribute label y_jAnd performing preprocessing. The data set is obtained through a face data set published on the internet. The face image is an image containing a single face, and is preprocessed: cutting, scaling and normalizing, wherein the face in the image occupies a main picture, the pixel value size is between-1 and 1, x represents a face image, and all x form a face image set

Namely, it is

The attribute label is a label indicating values (also called attribute values) of a plurality of attributes of a face, and is represented by y_jWherein j is 1,2, …, m; in this example, only 8 attributes are taken, i.e., m is 8, and for a single binary attribute, such as mouth open and close, and the presence or absence of glasses, each attribute has only two values after being processed: 0 and 1, and have different imbalance ratios (i.e., the ratio of the number of samples with different values of the same attribute), all attribute labels y_jForm attribute label set

Namely, it is

After 8 attribute labels are combined, 1 face image is added to form 1 pair of samples. They are obtained from a face data set CelebA published on the web and divided into training data and test data according to a certain proportion, containing 182,637 pairs and 19,962 pairs of samples, respectively.

The obtained face image is as follows: the image comprises a human face, an image with RGB three channels, a size of 128 multiplied by 128 and a format of jpg.

The obtained data of 8 attributes are shown in table 1:

TABLE 1 Attribute and corresponding imbalance ratio

In table 1, there are two expression forms for each attribute, corresponding to two attribute values. Mouth type: open and closed, pouch: presence and absence, eyebrows: thickness and thinness, cheek color: red and common colors, glasses: with and without, the gill-breaking urheen: presence and absence, skin tone: white and normal skin tone, upper lip fiddle: with and without.

2) According to the size of the face image, a plurality of condition generating type confrontation networks consisting of paired generators and discriminators are constructed, and the method comprises the following steps:

i, constructing a generator G in the structure of an encoder-decoder_jContains four convolutional layers and four anti-convolutional layers, and is shown in Table 2.

Table 2 generator structure

Generator
	Conv(64,3,2),BN,Leaky ReLU
Conv(128,3,2),BN,Leaky ReLU
	Conv(256,3,2),BN,Leaky ReLU
Conv(512,3,2),BN,Leaky ReLU
	DeConv(256,3,2),BN,Leaky ReLU
DeConv(128,3,2),BN,Leaky ReLU
	DeConv(64,3,2),BN,Leaky ReLU
DeConv(3,3,2),Tanh

In table 2, Conv (d, k, s) and DeConv (d, k, s) represent the convolutional and deconvolution layers, respectively, with d dimensions, a kernel size of k, and s as a step size. BN is batch normalization. Leaky Relu and Tanh are two different activation functions, respectively.

II, discriminator D_jThe method comprises two sub-discriminators: authenticity discriminator D_j ^realAnd a category discriminator D_j ^cls. See table 3 for details.

TABLE 3 discriminator structure

In table 3, Conv (d, k, s) and DeConv (d, k, s) represent the convolutional and deconvolution layers, respectively, with d dimensions, a kernel size of k, and s as a step size. BN is batch normalization. FC (c) is the fully connected layer with output in c dimension. m is the number of target attributes. Leaky Relu and Sigmoid are two different activation functions, respectively.

III, construction of Generator G_jThe training target of (1). To combat the loss

Along with the other three loss components: class loss

Loss of reconstruction

And regularization loss

Are considered together into a generator G_j(ii) a Finally, G_jTraining target L^GThe four components are contained, and are defined as follows:

in the formula (I), the compound is shown in the specification,

are balance parameters, which refer to the importance of the corresponding loss, respectively; notably, all four loss components need to be computationally considered to weight samples with a few classes of attributes; these losses are defined as follows:

i. fight against loss

This loss quantifies the degree of realism of the generated image. It is defined as:

in the formula (I), the compound is shown in the specification,

Edited image

From edited image sets

class ii, class loss

The loss measures the edited value of the jth attribute of the image x and the target attribute value

The degree of coincidence. It takes the form of a weighted binary cross entropy loss function, defined as follows:

in the formula (I), the compound is shown in the specification,

Object property label

From a target property tag set

Edited image

From edited image sets

v_kJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,

is a binary cross entropy function defined as

Category discriminator

For determining whether the jth attribute of the input image was edited correctly.

Iii, loss of reconstruction

Is/are as follows

Is an edited image

The reconstruction loss is defined as follows:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

Reconstructed image g_jFrom a set of reconstructed images

m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor₁Representing the L-1 norm.

Iv, regularization loss

In our generator G_jThe residual image, rather than the full image, acts as a direct learning target, representing local pixel variations in the target properties. The residual image should theoretically be sparse with a large number of zero valued pixels. Therefore, we introduce a regularization penalty and define as follows:

in the formula (I), the compound is shown in the specification,

Residual image

From sets of residual images

IV, constructing generator G_jThe training target of (1). The countermeasure loss and the classification loss are taken into account in the discriminator D_jIn the training target of (2), the following is defined:

in the formula (I), the compound is shown in the specification,

i, fight against loss:

the loss is used to encourage the discriminator

Distinguishing between real images and generated false images. And is applied to the generator G_jSimilar to the above antagonistic losses, we also use the basis

And quantifying the authenticity of the input image by the weighted logarithm of the discrimination result. Smaller competing losses indicate poorer performance of the generator. The loss of opposition is defined as follows:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

Edited image

From edited image sets

v_kFor the value of the jth attribute k,

j is 1,2, …, m is the number of attributes, and ω is the learning weight.

Ii, class loss:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

m is the number of attributes, ω is the learning weight,

is a binary cross entropy function defined as

Category discriminator

V, constructing a generator G_jAnd a discriminator D_jThe optimizer of (1). In order to improve the training stability and speed, an Adam optimizer is adopted in both the generator and the discriminator.

3) As shown in FIG. 2, the method of the present invention utilizes a preprocessed data set to independently train all conditional generative confrontation networks by using weighted learning and residual image generation methods for different face attributes.

A residual image generation mode and weighted learning are adopted in the training process, so that the problems of attribute entanglement and poor editing effect on a few types of attribute values are solved. For a face image x, the learning weight omega is according to the j attribute value v_kIs defined as follows:

wherein, | upsilon_jI denotes the firstThe number of different values of the j attributes,

indicating the number of samples for which the jth attribute is k, i.e. k

as sets of facial images

And attribute tag collections

A set pair is formed, | represents the cardinality of a set; wherein, the learning weight has two characteristics: the more the number of corresponding attribute value samples is, the smaller the weight is; the weight value is always more than or equal to one; according to the above calculation formula of the learning weight ω and the number of samples (or imbalance ratio) of different attribute values, the weights shown in table 4 below can be calculated.

TABLE 4 weight values of all attribute values

The whole training process comprises the following steps:

3.1) loading the preprocessed data set and the constructed condition generating countermeasure network;

3.4) calculating gradient values of respective parameters according to respective loss values of the generator and the discriminator and an Adam optimizer;

4) As shown in fig. 3, the method of the present invention stacks all trained generators to form a stack structure, and sequentially edits corresponding face attributes for a preprocessed unknown face image, and comprises the following steps:

4.1) stacking all generators;

and 4.3) inputting the face image and the target attribute value to all generators one by one, respectively changing the corresponding attributes, generating a residual image, combining the residual image with the input face image, and outputting a complete face image. Wherein, except the first generator receives the preprocessed unknown face image, the other generators receive the complete face image output by the last generator. The output of the last generator is a complete face image with all the properties edited.

Finally, example results as shown in fig. 4 and 5 can be obtained by the method of the present invention. FIG. 4 illustrates an example of the present invention editing individual properties independently. It can be seen that when all the conditions are not stacked in the confrontation network, the invention can independently and correctly edit 8 attributes (pouch, eyebrow, mouth shape, upper lip, beard, glasses, cheek color, skin color) of the input image, and has the advantages of good editing effect of a few types of attribute values, accurate editing, low noise, and no mutual interference between different attributes, which can be verified by means of residual images. Fig. 5 shows an example of the invention editing 3 attributes (mouth shape, upper lip, skin tone) one by one. It can be seen from the figure that when all the condition generating type confrontation network stacking is carried out, the invention can edit 3 attributes of the input image one by one, and finally finish editing the 3 attributes, and has the advantages of good editing effect of a few types of attribute values, accurate editing, low noise and no mutual interference among different attributes, which can be proved by means of residual images. The advantages reflect the functions of the stack structure, the weighted learning and the residual image generation mode of the invention together.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A human face attribute editing method based on a balanced stack type generation type confrontation network is characterized in that the method adopts a mode of weighting learning and training a plurality of condition generation type confrontation networks to solve the problem of unbalanced attributes, stacks all the generators of the trained condition generation type confrontation networks to form a stack type structure to solve the problem of attribute entanglement, and solves the problem of inaccurate image editing by using a residual image generation mode; which comprises the following steps:

2. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 1), the data set is obtained by a face data set published on the internet; the face image is an image containing a single face, and the preprocessing comprises the following steps: cutting, scaling and normalizing to make the face in the image occupy the main picture, and the pixel value size is between-1 and 1, representing a face image by x, all x forming a face image set

Namely, it is

Namely, it is

3. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 2), according to the size of the face image, constructing a plurality of condition generating type confrontation networks composed of paired generators and discriminators, specifically as follows:

i, constructing a generator G in the structure of an encoder-decoder_j：

Face image of

And one from the target property tag set

Object property tag of

Wherein

Wherein

Defined as a face image x and an edited image

The difference value of (a) to (b),

for all that is

The set of components is composed of a plurality of groups,

for all that is

A set of constructs; finally, the edited image

Through x and

wherein, | Y_jL refers to the number of different values of the jth attribute,

indicating the number of samples for which the jth attribute is k, i.e. k

as sets of facial images

And attribute tag collections

1 pair of sets is formed, | represents the cardinality of one set; ω measures the maximum number of samples with value on the jth attribute and v with value on the jth attribute_kIs greater than or equal to 1,

II, constructing a complete discriminator D by two sub-discriminators_j：

III, constructing a training target of the generator:

to combat the loss

Along with the other three loss components: class loss

Loss of reconstruction

And regularization loss

Are considered together into a generator G_j(ii) a To combat the loss

Is used to ensure that the generated image is authentic; class loss

To edit; loss of reconstruction

in the formula (I), the compound is shown in the specification,

i. fight against loss

in the formula (I), the compound is shown in the specification,

Edited image

From edited image sets

class ii, class loss

in the formula (I), the compound is shown in the specification,

Object property label

From a target property tag set

Edited image

From edited image sets

v_kJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,

is a binary cross entropy function and is defined as:

category discriminator

iii, loss of reconstruction

Measure the originalImage x and reconstructed image g_jThe difference between, among them, the set of reconstructed images

Is/are as follows

Is an edited image

The reconstruction loss is defined as follows:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

Reconstructed image g_jFrom a set of reconstructed images

m is the number of attributes, omega is the learning weight, | | |₁Represents the L-1 norm;

iv, regularization loss

in the formula (I), the compound is shown in the specification,

Residual image

From sets of residual images

IV, constructing a discriminator D_jThe training target of (1):

in the formula (I), the compound is shown in the specification,

i, fight against loss:

in the formula (I), the compound is shown in the specification,

The face image is from a face image set x and an edited image

From edited image sets

Is used for judging the authenticity of the input image;

ii, class loss:

in the formula (I), the compound is shown in the specification,

Face image x is from a set of face images

m is the number of attributes, ω is the learning weight, l (y)_jAnd x) is a binary cross-entropy function defined as

Category discriminator

v, constructing respective optimizers of the generator and the discriminator:

4. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 3), each condition generates a training of the antagonistic network, comprising the following steps:

5. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 4), all trained generators are stacked to form a stack structure, and corresponding face attributes are sequentially edited aiming at the preprocessed unknown face images, and the method comprises the following steps:

4.1) stacking all generators;