CN111914617A - Face attribute editing method based on balanced stack type generation countermeasure network - Google Patents

Face attribute editing method based on balanced stack type generation countermeasure network Download PDF

Info

Publication number
CN111914617A
CN111914617A CN202010521351.2A CN202010521351A CN111914617A CN 111914617 A CN111914617 A CN 111914617A CN 202010521351 A CN202010521351 A CN 202010521351A CN 111914617 A CN111914617 A CN 111914617A
Authority
CN
China
Prior art keywords
attribute
image
loss
face
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010521351.2A
Other languages
Chinese (zh)
Other versions
CN111914617B (en
Inventor
王啸天
陈百基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010521351.2A priority Critical patent/CN111914617B/en
Publication of CN111914617A publication Critical patent/CN111914617A/en
Application granted granted Critical
Publication of CN111914617B publication Critical patent/CN111914617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Abstract

The invention discloses a face attribute editing method based on a balanced stack type generation countermeasure network, which comprises the following steps: 1) acquiring a data set containing a face image and an attribute label, and preprocessing the data set; 2) constructing a plurality of condition generating type confrontation networks consisting of paired generators and discriminators according to the size of the face image; 3) utilizing the preprocessed data set, aiming at different face attributes, adopting a weighted learning and residual image generation mode, and independently training all condition generation type countermeasure networks; 4) and stacking all the trained generators to form a stack structure, and sequentially editing corresponding face attributes aiming at the preprocessed unknown face images. According to the invention, a weighted learning and residual image generation mode is applied in a training process, and a stack structure is adopted in an attribute editing process, so that the model can effectively solve the problem of data imbalance, the editing capability of a few types of samples is enhanced, the image generation effect in an attribute irrelevant area is improved, and the problem of attribute entanglement is avoided.

Description

Face attribute editing method based on balanced stack type generation countermeasure network
Technical Field
The invention relates to the technical field of image editing and machine learning, in particular to a face attribute editing method based on a balanced stack generation type confrontation network.
Background
Face attribute editing is directed to changing specific attributes, such as adding glasses, removing mustache, whitening skin, and even replacing gender, given a face image. The visual task achieves the semantic controllability of the image and fine-grained image transformation. Meanwhile, with the rise of selfie wave on social media, the wide spread of network videos, the intelligent design of games or animation roles, the generation of large-scale face image data on the internet every year and the increasingly strong demand for face attribute editing are met. Therefore, the face attribute editing is widely applied to application scenes such as facial beautification, video restoration, role synthesis and the like. In addition, the face image with the edited specific attributes can also be used for performing data enhancement on other vision type machine learning tasks, such as face recognition, face detection and face tracking.
In recent years, the generative countermeasure network has become a popular model in human face attribute editing research by virtue of its strong image generation capability, and has obtained high-fidelity and high-diversity editing results. Given a data set containing face images and attribute labels, in a generative confrontation network, a trained generator can manipulate specific image attributes to fool a discriminator, which can be used to distinguish between real images and false images synthesized by the generator. The quality of the edited image is improved in the mutual confrontation of the generator and the discriminator. However, face multi-attribute editing remains a challenging task due to the large number of attribute combinations, i.e., the large number of attributes themselves, each of which contains multiple values. Similar to other machine learning tasks, the collection of samples, i.e., pairs of face images and attribute labels, is particularly expensive and limited, so they cannot represent the entirety of all attribute combinations. Current multi-property editing methods suffer from three problems due to lack of samples and imbalance. The first problem is attribute entanglement, i.e. editing of one attribute changes other attributes, while balancing of one attribute affects the balance of other attributes. This is because existing methods do not have enough samples to distinguish between different attributes. Secondly. Due to the lack of enough samples for learning, the generated image is not satisfactory enough for learning in regions with irrelevant attributes, which is reflected in high noise and inaccurate editing. Finally, the existing methods edit the minority attributes poorly because they often treat all samples equally, so that the samples of the minority attributes are ignored, and simultaneous learning of multiple attributes causes the existing methods to fail to perform balance adjustment individually for each attribute.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face attribute editing method based on a balanced stack type generation confrontation network, which utilizes a plurality of condition generation confrontation networks to independently learn the editing of a plurality of attributes in a training stage and avoids the problem of attribute entanglement. In addition, each condition generating type confrontation network directly learns that the target is a residual image, and the generating effect of the attribute-independent area is improved. Meanwhile, weighted learning is adopted, so that samples of a small number of types of attribute values can obtain larger learning weight, the editing effect on the small number of types of attribute values is improved, and the challenges brought by unbalanced data sets are properly solved.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a human face attribute editing method based on a balanced stack type generation type confrontation network adopts a mode of weighting learning and training a plurality of condition generation type confrontation networks to solve the problem of unbalanced attributes, stacks all generators of the trained condition generation type confrontation networks to form a stack type structure to solve the problem of attribute entanglement, and solves the problem of inaccurate image editing by utilizing a residual image generation mode; which comprises the following steps:
1) acquiring a data set containing a face image and an attribute label, and preprocessing the data set;
2) constructing a plurality of condition generating type confrontation networks consisting of paired generators and discriminators according to the size of the face image;
3) utilizing the preprocessed data set, aiming at different face attributes, adopting a weighted learning and residual image generation mode, and independently training all condition generation type countermeasure networks;
4) and stacking all the trained generators to form a stack structure, and sequentially editing corresponding face attributes aiming at the preprocessed unknown face images.
In step 1), the data set is obtained by a face data set published on the internet; the face image is an image containing a single face, and the preprocessing comprises the following steps: cutting, scaling and normalizing to make the face in the image occupy the main picture, and the pixel value size is between-1 and 1, representing a face image by x, all x forming a face image set
Figure BDA0002532248580000031
Namely, it is
Figure BDA0002532248580000032
Y for the attribute tagjThe expression refers to a label of a value of j-th attribute of the face, wherein j is 1,2, …, m and m are the number of attributes, and all attribute labels yjForm attribute label set
Figure BDA0002532248580000033
Namely, it is
Figure BDA0002532248580000034
After the m attribute labels are combined, 1 face image is added to form 1 pair of samples.
In step 2), according to the size of the face image, constructing a plurality of condition generating type confrontation networks composed of paired generators and discriminators, specifically as follows:
i, constructing a generator G in the structure of an encoder-decoderj
Generator GjThe encoder in (1) comprises a plurality of convolutional layers which receive two variables, including a variable from a set of face images
Figure BDA0002532248580000035
Face image of
Figure BDA0002532248580000036
And one from the target property tag set
Figure BDA0002532248580000037
Object property tag of
Figure BDA0002532248580000038
Wherein
Figure BDA0002532248580000039
Is different from the attribute label yjValue v ofk,yjFrom attribute tag collections
Figure BDA00025322485800000310
vkTaking the value of the jth attribute k, wherein j is 1,2, …, m is the number of attributes; after 1 attribute label is combined, 1 face image is added to form 1 pair of samples; then GjMapping the two received variables into a hidden code z for extracting the characteristics of the face image and eliminating redundant information; then the implicit code z is inputted into a decoder composed of a plurality of deconvolution layers to generate a residual image for changing the attribute of the target face
Figure BDA00025322485800000311
Wherein
Figure BDA00025322485800000312
Defined as a face image x and an edited image
Figure BDA00025322485800000313
The difference value of (a) to (b),
Figure BDA00025322485800000314
for all that is
Figure BDA00025322485800000315
The set of components is composed of a plurality of groups,
Figure BDA00025322485800000316
for all that is
Figure BDA00025322485800000317
A set of constructs; finally, the edited image
Figure BDA00025322485800000318
Through x and
Figure BDA00025322485800000319
superposing the pixels on a pixel level; since the minority class samples are easy to be ignored in the unbalanced data set, each sample in the data set needs to be given a learning weight ω to represent its importance in training, and the value of ω of the sample with the minority class attribute value is large, and vice versa; for a face image x, omega is according to j attribute value vkIs defined as follows:
Figure BDA0002532248580000041
wherein, | upsilonjL refers to the number of different values of the jth attribute,
Figure BDA0002532248580000042
indicating the number of samples for which the jth attribute is k, i.e. k
Figure BDA0002532248580000043
Wherein (x, y)j) For face image x and attribute label yjThe number of 1 pair of samples that are formed,
Figure BDA0002532248580000044
as sets of facial images
Figure BDA0002532248580000045
And attribute tag collections
Figure BDA0002532248580000046
A set pair is formed, | represents the cardinality of a set; ω measures the maximum number of samples with value on the jth attribute and v with value on the jth attributekIs greater than or equal to 1,
Figure BDA0002532248580000047
the smaller the representative weight ω, the larger the representative weight ω, and vice versa, this weight encourages the model to pay more attention to samples with a few classes of attribute values during the learning process;
II, constructing a complete discriminator D by two sub-discriminatorsj
Discriminator DjThe method comprises two sub-discriminators: authenticity discriminator Dj realAnd a category discriminator Dj cls(ii) a Authenticity discriminator Dj realThe method is used for judging the authenticity of the input image and predicting the probability that the input image is a real image; class discriminator Dj clsThe method is used for identifying whether the attribute value of the input image accords with the target attribute value or not and predicting the conformity degree, wherein different values of the same attribute can be regarded as different categories; the two sub-discriminators share one multilayer convolutional neural network but have independent double-layer full-connection layers;
III, constructing a training target of the generator:
to combat the loss
Figure BDA0002532248580000048
Along with the other three loss components: class loss
Figure BDA0002532248580000049
Loss of reconstruction
Figure BDA00025322485800000410
And regularization loss
Figure BDA00025322485800000411
Are considered together into a generator Gj(ii) a To combat the loss
Figure BDA00025322485800000412
Is used to ensure that the generated image is authentic; class loss
Figure BDA0002532248580000051
Controlling the face image to be correctly according to the target attribute value
Figure BDA0002532248580000052
To edit; loss of reconstruction
Figure BDA0002532248580000053
Then the generator G is enhanced during the editing processjThe ability to retain information for attribute independent regions; loss of regularization
Figure BDA0002532248580000054
Measuring the L-1 norm of a residual image to enhance the sparsity of the residual image, wherein the residual image should have a large number of zero-valued pixels; finally, generator GjTraining target LGComprises the four components, and is defined as:
Figure BDA0002532248580000055
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000056
are balance parameters, which refer to the importance of the corresponding loss, respectively; notably, all four loss components need to be computationally considered to weight samples with a few classes of attributes; these loss components are specifically as follows:
i. fight against loss
Figure BDA0002532248580000057
This loss quantifies the degree of realism of the resulting image, which is defined as:
Figure BDA0002532248580000058
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000059
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800000510
Edited image
Figure BDA00025322485800000511
From edited image sets
Figure BDA00025322485800000512
vkThe facticity discriminator takes the value of the jth attribute k, j is 1,2, …, m, m is the number of attributes, omega is the learning weight
Figure BDA00025322485800000513
Is used to judge the authenticity of the input image by minimizing the penalty, by GjThe similarity of the synthesized image and the real image is improved;
class ii, class loss
Figure BDA00025322485800000514
The loss measures the edited value of the jth attribute of the face image x and the target attribute value
Figure BDA00025322485800000515
The degree of coincidence, which takes the form of a weighted binary cross-entropy loss function, is defined as follows:
Figure BDA00025322485800000516
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800000517
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800000518
Object property label
Figure BDA0002532248580000061
From a target property tag set
Figure BDA0002532248580000062
Edited image
Figure BDA0002532248580000063
From edited image sets
Figure BDA0002532248580000064
vkJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,
Figure BDA0002532248580000065
is a binary cross entropy function and is defined as:
Figure BDA0002532248580000066
category discriminator
Figure BDA0002532248580000067
Used for judging whether the jth attribute of the input image is correctly edited or not;
iii, loss of reconstruction
Figure BDA0002532248580000068
The loss avoids information loss in the process of reconstructing the attribute irrelevant area in the generated image; loss of reconstruction
Figure BDA0002532248580000069
The original image x and the reconstructed image g are measuredjThe difference between, among them, the set of reconstructed images
Figure BDA00025322485800000610
Is/are as follows
Figure BDA00025322485800000611
Is an edited image
Figure BDA00025322485800000612
Through generator GjAccording to the attribute value vkThe result after generation, i.e.
Figure BDA00025322485800000613
The reconstruction loss is defined as follows:
Figure BDA00025322485800000614
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800000615
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800000616
Face image x is from a set of face images
Figure BDA00025322485800000617
Reconstructed image gjFrom a set of reconstructed images
Figure BDA00025322485800000618
m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor1Represents the L-1 norm;
iv, regularization loss
Figure BDA00025322485800000619
In generator GjIn the method, a residual image is not a complete image and is used as a direct learning target, the residual image represents local pixel change on target attributes, theoretically, the residual image should be sparse, and a large number of zero-value pixels exist; thus, a regularization penalty is introduced and is defined as follows:
Figure BDA00025322485800000620
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800000621
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800000622
Residual image
Figure BDA00025322485800000623
From sets of residual images
Figure BDA00025322485800000624
m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor1Represents the L-1 norm;
IV, constructing a discriminator DjThe training target of (1):
the countermeasure loss and the classification loss are taken into account in the discriminator DjIn the training target of (2), the following is defined:
Figure BDA0002532248580000071
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000072
is a balance factor, indicating the importance of the class loss; two losses are defined as follows:
i, fight against loss:
this loss is used to encourage the authenticity discriminator Dj realDistinguishing between real and false images generated, and applying to generator GjSimilar to the above antagonistic losses, also using a base Dj realThe weighted logarithm of the discrimination result quantifies the reality of the input image, and smaller countermeasure loss represents poorer performance of the generator, and the countermeasure loss is defined as follows:
Figure BDA0002532248580000073
in the formula,
Figure BDA0002532248580000074
Representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA0002532248580000075
The face image is from a set of face images
Figure BDA0002532248580000076
Edited image
Figure BDA0002532248580000077
From edited image sets
Figure BDA0002532248580000078
vkTaking the value of the jth attribute k, wherein j is 1,2, …, m is the number of attributes, and ω is the learning weight;
ii, class loss:
the loss also quantifies the degree of correlation between the predicted value and the true value of the jth attribute of the input image, and minimizes the class discriminator Dj clsIs lost in category, ensure
Figure BDA0002532248580000079
For accurate judgment of the attribute values of the input image, the category loss is defined as follows:
Figure BDA00025322485800000710
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800000711
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800000712
Face image x is from a set of face images
Figure BDA00025322485800000713
j is 1,2, m, m is the number of attributes, ω is the learning weight,
Figure BDA00025322485800000714
is a binary cross entropy function defined as
Figure BDA0002532248580000081
Category discriminator
Figure BDA0002532248580000082
Used for judging whether the jth attribute of the input image is correctly edited or not;
v, constructing respective optimizers of the generator and the discriminator:
in order to improve the training stability and speed, the generator and the discriminator both adopt an Adam optimizer, and the Adam optimizer comprehensively considers the first moment estimation and the second moment estimation of the gradient so as to update the learning step length and automatically adjust the learning rate.
In step 3), each condition generates a training of the antagonistic network, comprising the following steps:
3.1) loading the preprocessed data set and the constructed conditional generation type countermeasure network;
3.2) inputting the face images in the data set and artificially specified target attribute values to a generator and a discriminator in batches;
3.3) calculating respective loss values according to respective output values of the generator and the discriminator and respective training targets;
3.4) calculating respective gradient values according to respective loss values of the generator and the discriminator and an Adam optimizer;
and 3.5) realizing back propagation according to respective gradient values of the generator and the discriminator and adjusting respective parameters.
In step 4), all trained generators are stacked to form a stack structure, and corresponding face attributes are sequentially edited aiming at the preprocessed unknown face images, and the method comprises the following steps:
4.1) stacking all generators;
4.2) preprocessing unknown face images and artificially setting a plurality of target attribute values;
4.3) inputting the face image and the target attribute value to all generators one by one, respectively changing the corresponding attributes, generating a residual image, combining the residual image with the input face image, and outputting a complete face image; the first generator receives the preprocessed unknown face image, the other generators receive the complete face image output by the previous generator, and the output of the last generator is the complete face image with all the properties edited.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention solves the human face multi-attribute editing task by utilizing the stack structure for the first time, so that each human face attribute can be independently edited, and the balance adjustment of one attribute does not influence the balance of other attributes.
2. According to the invention, a residual image generation mode is introduced in a multi-attribute editing scene for the first time, so that information loss caused by reconstruction of the attribute-independent partial images is avoided.
3. According to the invention, by applying the learning weight to each sample, the influence caused by the unbalanced data problem is inhibited, and the editing capability of the model on the minority class attribute values is improved.
4. When considering additional characteristics or neglecting the existing characteristics, the invention does not need to train the whole model from the beginning, only needs to add or remove part of the conditional generation type countermeasure network, and has the flexible characteristic.
5. The method for independently training the conditional generation type countermeasure network aiming at the single attribute and the residual image generation mode reduce the learning difficulty of the single attribute editing task, and each conditional generation type countermeasure network can adopt a lighter-weight network structure, thereby reducing the training cost.
6. The invention finally generates an image only depending on the generators of the conditional generative countermeasure networks of the relevant attributes, rather than all generators of the generative countermeasure networks, which makes the model of the invention more efficient.
Drawings
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is a schematic diagram of the training phase of the present invention.
FIG. 3 is a schematic diagram of the image generation stage according to the present invention.
FIG. 4 is a diagram illustrating an example of independently editing a single attribute according to the present invention.
FIG. 5 is a diagram illustrating an example of editing 3 attributes one by one according to the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1 to fig. 3, the face attribute editing method based on a balanced stack generation confrontation network provided in this embodiment uses techniques such as a stack structure, a residual image generation method, and weighted learning, and includes the following steps:
1) acquiring a face image x and an attribute label yjAnd performing preprocessing. The data set is obtained through a face data set published on the internet. The face image is an image containing a single face, and is preprocessed: cutting, scaling and normalizing, wherein the face in the image occupies a main picture, the pixel value size is between-1 and 1, x represents a face image, and all x form a face image set
Figure BDA0002532248580000101
Namely, it is
Figure BDA0002532248580000102
The attribute label is a label indicating values (also called attribute values) of a plurality of attributes of a face, and is represented by yjWherein j is 1,2, …, m; in this example, only 8 attributes are taken, i.e., m is 8, and for a single binary attribute, such as mouth open and close, and the presence or absence of glasses, each attribute has only two values after being processed: 0 and 1, and have different imbalance ratios (i.e., the ratio of the number of samples with different values of the same attribute), all attribute labels yjForm attribute label set
Figure BDA0002532248580000103
Namely, it is
Figure BDA0002532248580000104
After 8 attribute labels are combined, 1 face image is added to form 1 pair of samples. They are obtained from a face data set CelebA published on the web and divided into training data and test data according to a certain proportion, containing 182,637 pairs and 19,962 pairs of samples, respectively.
The obtained face image is as follows: the image comprises a human face, an image with RGB three channels, a size of 128 multiplied by 128 and a format of jpg.
The obtained data of 8 attributes are shown in table 1:
TABLE 1 Attribute and corresponding imbalance ratio
Figure BDA0002532248580000105
In table 1, there are two expression forms for each attribute, corresponding to two attribute values. Mouth type: open and closed, pouch: presence and absence, eyebrows: thickness and thinness, cheek color: red and common colors, glasses: with and without, the gill-breaking urheen: presence and absence, skin tone: white and normal skin tone, upper lip fiddle: with and without.
2) According to the size of the face image, a plurality of condition generating type confrontation networks consisting of paired generators and discriminators are constructed, and the method comprises the following steps:
i, constructing a generator G in the structure of an encoder-decoderjContains four convolutional layers and four anti-convolutional layers, and is shown in Table 2.
Table 2 generator structure
Generator
Conv(64,3,2),BN,Leaky ReLU
Conv(128,3,2),BN,Leaky ReLU
Conv(256,3,2),BN,Leaky ReLU
Conv(512,3,2),BN,Leaky ReLU
DeConv(256,3,2),BN,Leaky ReLU
DeConv(128,3,2),BN,Leaky ReLU
DeConv(64,3,2),BN,Leaky ReLU
DeConv(3,3,2),Tanh
In table 2, Conv (d, k, s) and DeConv (d, k, s) represent the convolutional and deconvolution layers, respectively, with d dimensions, a kernel size of k, and s as a step size. BN is batch normalization. Leaky Relu and Tanh are two different activation functions, respectively.
II, discriminator DjThe method comprises two sub-discriminators: authenticity discriminator Dj realAnd a category discriminator Dj cls. See table 3 for details.
TABLE 3 discriminator structure
Figure BDA0002532248580000111
Figure BDA0002532248580000121
In table 3, Conv (d, k, s) and DeConv (d, k, s) represent the convolutional and deconvolution layers, respectively, with d dimensions, a kernel size of k, and s as a step size. BN is batch normalization. FC (c) is the fully connected layer with output in c dimension. m is the number of target attributes. Leaky Relu and Sigmoid are two different activation functions, respectively.
III, construction of Generator GjThe training target of (1). To combat the loss
Figure BDA0002532248580000122
Along with the other three loss components: class loss
Figure BDA0002532248580000123
Loss of reconstruction
Figure BDA0002532248580000124
And regularization loss
Figure BDA0002532248580000125
Are considered together into a generator Gj(ii) a Finally, GjTraining target LGThe four components are contained, and are defined as follows:
Figure BDA0002532248580000126
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000127
are balance parameters, which refer to the importance of the corresponding loss, respectively; notably, all four loss components need to be computationally considered to weight samples with a few classes of attributes; these losses are defined as follows:
i. fight against loss
Figure BDA0002532248580000128
This loss quantifies the degree of realism of the generated image. It is defined as:
Figure BDA0002532248580000129
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800001210
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800001211
Edited image
Figure BDA00025322485800001212
From edited image sets
Figure BDA00025322485800001213
vkThe facticity discriminator takes the value of the jth attribute k, j is 1,2, …, m, m is the number of attributes, omega is the learning weight
Figure BDA00025322485800001214
Is used to judge the authenticity of the input image by minimizing the penalty, by GjThe similarity of the synthesized image and the real image is improved;
class ii, class loss
Figure BDA00025322485800001215
The loss measures the edited value of the jth attribute of the image x and the target attribute value
Figure BDA00025322485800001216
The degree of coincidence. It takes the form of a weighted binary cross entropy loss function, defined as follows:
Figure BDA0002532248580000131
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000132
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA0002532248580000133
Object property label
Figure BDA0002532248580000134
From a target property tag set
Figure BDA0002532248580000135
Edited image
Figure BDA0002532248580000136
From edited image sets
Figure BDA0002532248580000137
vkJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,
Figure BDA0002532248580000138
is a binary cross entropy function defined as
Figure BDA0002532248580000139
Category discriminator
Figure BDA00025322485800001310
For determining whether the jth attribute of the input image was edited correctly.
Iii, loss of reconstruction
Figure BDA00025322485800001311
The loss avoids information loss in the process of reconstructing the attribute irrelevant area in the generated image; loss of reconstruction
Figure BDA00025322485800001312
The original image x and the reconstructed image g are measuredjThe difference between, among them, the set of reconstructed images
Figure BDA00025322485800001313
Is/are as follows
Figure BDA00025322485800001314
Is an edited image
Figure BDA00025322485800001315
Through generator GjAccording to the attribute value vkThe result after generation, i.e.
Figure BDA00025322485800001316
The reconstruction loss is defined as follows:
Figure BDA00025322485800001317
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800001318
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800001319
Face image x is from a set of face images
Figure BDA00025322485800001320
Reconstructed image gjFrom a set of reconstructed images
Figure BDA00025322485800001321
m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor1Representing the L-1 norm.
Iv, regularization loss
Figure BDA00025322485800001322
In our generator GjThe residual image, rather than the full image, acts as a direct learning target, representing local pixel variations in the target properties. The residual image should theoretically be sparse with a large number of zero valued pixels. Therefore, we introduce a regularization penalty and define as follows:
Figure BDA00025322485800001323
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000141
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA0002532248580000142
Residual image
Figure BDA0002532248580000143
From sets of residual images
Figure BDA0002532248580000144
m is the attribute number, omega is the learning weight, | | · | | non-woven phosphor1Representing the L-1 norm.
IV, constructing generator GjThe training target of (1). The countermeasure loss and the classification loss are taken into account in the discriminator DjIn the training target of (2), the following is defined:
Figure BDA0002532248580000145
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000146
is a balance factor, indicating the importance of the class loss; two losses are defined as follows:
i, fight against loss:
the loss is used to encourage the discriminator
Figure BDA0002532248580000147
Distinguishing between real images and generated false images. And is applied to the generator GjSimilar to the above antagonistic losses, we also use the basis
Figure BDA0002532248580000148
And quantifying the authenticity of the input image by the weighted logarithm of the discrimination result. Smaller competing losses indicate poorer performance of the generator. The loss of opposition is defined as follows:
Figure BDA0002532248580000149
in the formula (I), the compound is shown in the specification,
Figure BDA00025322485800001410
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA00025322485800001411
Face image x is from a set of face images
Figure BDA00025322485800001412
Edited image
Figure BDA00025322485800001413
From edited image sets
Figure BDA00025322485800001414
vkFor the value of the jth attribute k,
j is 1,2, …, m is the number of attributes, and ω is the learning weight.
Ii, class loss:
the loss also quantifies the degree of correlation between the predicted value and the true value of the jth attribute of the input image, and minimizes the class discriminator Dj clsIs lost in category, ensure
Figure BDA00025322485800001415
For accurate judgment of the attribute values of the input image, the category loss is defined as follows:
Figure BDA0002532248580000151
in the formula (I), the compound is shown in the specification,
Figure BDA0002532248580000152
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure BDA0002532248580000153
Face image x is from a set of face images
Figure BDA0002532248580000154
m is the number of attributes, ω is the learning weight,
Figure BDA0002532248580000155
is a binary cross entropy function defined as
Figure BDA0002532248580000156
Category discriminator
Figure BDA0002532248580000157
For determining whether the jth attribute of the input image was edited correctly.
V, constructing a generator GjAnd a discriminator DjThe optimizer of (1). In order to improve the training stability and speed, an Adam optimizer is adopted in both the generator and the discriminator.
3) As shown in FIG. 2, the method of the present invention utilizes a preprocessed data set to independently train all conditional generative confrontation networks by using weighted learning and residual image generation methods for different face attributes.
A residual image generation mode and weighted learning are adopted in the training process, so that the problems of attribute entanglement and poor editing effect on a few types of attribute values are solved. For a face image x, the learning weight omega is according to the j attribute value vkIs defined as follows:
Figure BDA0002532248580000158
wherein, | upsilonjI denotes the firstThe number of different values of the j attributes,
Figure BDA0002532248580000159
indicating the number of samples for which the jth attribute is k, i.e. k
Figure BDA00025322485800001510
Wherein (x, y)j) For face image x and attribute label yjThe number of 1 pair of samples that are formed,
Figure BDA00025322485800001511
as sets of facial images
Figure BDA00025322485800001512
And attribute tag collections
Figure BDA00025322485800001513
A set pair is formed, | represents the cardinality of a set; wherein, the learning weight has two characteristics: the more the number of corresponding attribute value samples is, the smaller the weight is; the weight value is always more than or equal to one; according to the above calculation formula of the learning weight ω and the number of samples (or imbalance ratio) of different attribute values, the weights shown in table 4 below can be calculated.
TABLE 4 weight values of all attribute values
Figure BDA0002532248580000161
The whole training process comprises the following steps:
3.1) loading the preprocessed data set and the constructed condition generating countermeasure network;
3.2) inputting the face images in the data set and artificially specified target attribute values to a generator and a discriminator in batches;
3.3) calculating respective loss values according to respective output values of the generator and the discriminator and respective training targets;
3.4) calculating gradient values of respective parameters according to respective loss values of the generator and the discriminator and an Adam optimizer;
and 3.5) realizing back propagation according to respective gradient values of the generator and the discriminator and adjusting respective parameters.
4) As shown in fig. 3, the method of the present invention stacks all trained generators to form a stack structure, and sequentially edits corresponding face attributes for a preprocessed unknown face image, and comprises the following steps:
4.1) stacking all generators;
4.2) preprocessing unknown face images and artificially setting a plurality of target attribute values;
and 4.3) inputting the face image and the target attribute value to all generators one by one, respectively changing the corresponding attributes, generating a residual image, combining the residual image with the input face image, and outputting a complete face image. Wherein, except the first generator receives the preprocessed unknown face image, the other generators receive the complete face image output by the last generator. The output of the last generator is a complete face image with all the properties edited.
Finally, example results as shown in fig. 4 and 5 can be obtained by the method of the present invention. FIG. 4 illustrates an example of the present invention editing individual properties independently. It can be seen that when all the conditions are not stacked in the confrontation network, the invention can independently and correctly edit 8 attributes (pouch, eyebrow, mouth shape, upper lip, beard, glasses, cheek color, skin color) of the input image, and has the advantages of good editing effect of a few types of attribute values, accurate editing, low noise, and no mutual interference between different attributes, which can be verified by means of residual images. Fig. 5 shows an example of the invention editing 3 attributes (mouth shape, upper lip, skin tone) one by one. It can be seen from the figure that when all the condition generating type confrontation network stacking is carried out, the invention can edit 3 attributes of the input image one by one, and finally finish editing the 3 attributes, and has the advantages of good editing effect of a few types of attribute values, accurate editing, low noise and no mutual interference among different attributes, which can be proved by means of residual images. The advantages reflect the functions of the stack structure, the weighted learning and the residual image generation mode of the invention together.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (5)

1. A human face attribute editing method based on a balanced stack type generation type confrontation network is characterized in that the method adopts a mode of weighting learning and training a plurality of condition generation type confrontation networks to solve the problem of unbalanced attributes, stacks all the generators of the trained condition generation type confrontation networks to form a stack type structure to solve the problem of attribute entanglement, and solves the problem of inaccurate image editing by using a residual image generation mode; which comprises the following steps:
1) acquiring a data set containing a face image and an attribute label, and preprocessing the data set;
2) constructing a plurality of condition generating type confrontation networks consisting of paired generators and discriminators according to the size of the face image;
3) utilizing the preprocessed data set, aiming at different face attributes, adopting a weighted learning and residual image generation mode, and independently training all condition generation type countermeasure networks;
4) and stacking all the trained generators to form a stack structure, and sequentially editing corresponding face attributes aiming at the preprocessed unknown face images.
2. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 1), the data set is obtained by a face data set published on the internet; the face image is an image containing a single face, and the preprocessing comprises the following steps: cutting, scaling and normalizing to make the face in the image occupy the main picture, and the pixel value size is between-1 and 1, representing a face image by x, all x forming a face image set
Figure FDA0002532248570000013
Namely, it is
Figure FDA0002532248570000014
Y for the attribute tagjThe expression refers to a label of a value of j-th attribute of the face, wherein j is 1,2, …, m and m are the number of attributes, and all attribute labels yjForm attribute label set
Figure FDA0002532248570000011
Namely, it is
Figure FDA0002532248570000012
After the m attribute labels are combined, 1 face image is added to form 1 pair of samples.
3. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 2), according to the size of the face image, constructing a plurality of condition generating type confrontation networks composed of paired generators and discriminators, specifically as follows:
i, constructing a generator G in the structure of an encoder-decoderj
Generator GjThe encoder in (1) comprises a plurality of convolutional layers which receive two variables, including a variable from a set of face images
Figure FDA00025322485700000219
Face image of
Figure FDA00025322485700000220
And one from the target property tag set
Figure FDA0002532248570000021
Object property tag of
Figure FDA0002532248570000022
Wherein
Figure FDA0002532248570000023
Is different from the attribute label yjValue v ofk,yjFrom attribute tag collections
Figure FDA0002532248570000024
vkTaking the value of the jth attribute k, wherein j is 1,2, …, m is the number of attributes; after 1 attribute label is combined, 1 face image is added to form 1 pair of samples; then GjMapping the two received variables into a hidden code z for extracting the characteristics of the face image and eliminating redundant information; then the implicit code z is inputted into a decoder composed of a plurality of deconvolution layers to generate a residual image for changing the attribute of the target face
Figure FDA0002532248570000025
Wherein
Figure FDA0002532248570000026
Defined as a face image x and an edited image
Figure FDA0002532248570000027
The difference value of (a) to (b),
Figure FDA0002532248570000028
for all that is
Figure FDA0002532248570000029
The set of components is composed of a plurality of groups,
Figure FDA00025322485700000210
for all that is
Figure FDA00025322485700000211
A set of constructs; finally, the edited image
Figure FDA00025322485700000212
Through x and
Figure FDA00025322485700000213
superposing the pixels on a pixel level; since the minority class samples are easy to be ignored in the unbalanced data set, each sample in the data set needs to be given a learning weight ω to represent its importance in training, and the value of ω of the sample with the minority class attribute value is large, and vice versa; for a face image x, omega is according to j attribute value vkIs defined as follows:
Figure FDA00025322485700000214
wherein, | YjL refers to the number of different values of the jth attribute,
Figure FDA00025322485700000215
indicating the number of samples for which the jth attribute is k, i.e. k
Figure FDA00025322485700000216
Wherein (x, y)j) For face image x and attribute label yjThe number of 1 pair of samples that are formed,
Figure FDA00025322485700000217
as sets of facial images
Figure FDA00025322485700000221
And attribute tag collections
Figure FDA00025322485700000218
1 pair of sets is formed, | represents the cardinality of one set; ω measures the maximum number of samples with value on the jth attribute and v with value on the jth attributekIs greater than or equal to 1,
Figure FDA0002532248570000031
the smaller the representative weight ω, the larger the representative weight ω, and vice versa, this weight encourages the model to pay more attention to samples with a few classes of attribute values during the learning process;
II, constructing a complete discriminator D by two sub-discriminatorsj
Discriminator DjThe method comprises two sub-discriminators: authenticity discriminator Dj realAnd a category discriminator Dj cls(ii) a Authenticity discriminator Dj realThe method is used for judging the authenticity of the input image and predicting the probability that the input image is a real image; class discriminator Dj clsThe method is used for identifying whether the attribute value of the input image accords with the target attribute value or not and predicting the conformity degree, wherein different values of the same attribute can be regarded as different categories; the two sub-discriminators share one multilayer convolutional neural network but have independent double-layer full-connection layers;
III, constructing a training target of the generator:
to combat the loss
Figure FDA0002532248570000032
Along with the other three loss components: class loss
Figure FDA0002532248570000033
Loss of reconstruction
Figure FDA0002532248570000034
And regularization loss
Figure FDA0002532248570000035
Are considered together into a generator Gj(ii) a To combat the loss
Figure FDA0002532248570000036
Is used to ensure that the generated image is authentic; class loss
Figure FDA0002532248570000037
Controlling the face image to be correctly according to the target attribute value
Figure FDA0002532248570000038
To edit; loss of reconstruction
Figure FDA0002532248570000039
Then the generator G is enhanced during the editing processjThe ability to retain information for attribute independent regions; loss of regularization
Figure FDA00025322485700000310
Measuring the L-1 norm of a residual image to enhance the sparsity of the residual image, wherein the residual image should have a large number of zero-valued pixels; finally, generator GjTraining target LGComprises the four components, and is defined as:
Figure FDA00025322485700000311
in the formula (I), the compound is shown in the specification,
Figure FDA00025322485700000312
are balance parameters, which refer to the importance of the corresponding loss, respectively; notably, all four loss components need to be computationally considered to weight samples with a few classes of attributes; these loss components are specifically as follows:
i. fight against loss
Figure FDA00025322485700000313
This loss quantifies the degree of realism of the resulting image, which is defined as:
Figure FDA0002532248570000041
in the formula (I), the compound is shown in the specification,
Figure FDA0002532248570000042
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA0002532248570000043
Edited image
Figure FDA0002532248570000044
From edited image sets
Figure FDA0002532248570000045
vkThe facticity discriminator takes the value of the jth attribute k, j is 1,2, …, m, m is the number of attributes, omega is the learning weight
Figure FDA0002532248570000046
Is used to judge the authenticity of the input image by minimizing the penalty, by GjThe similarity of the synthesized image and the real image is improved;
class ii, class loss
Figure FDA0002532248570000047
The loss measures the edited value of the jth attribute of the face image x and the target attribute value
Figure FDA0002532248570000048
The degree of coincidence, which takes the form of a weighted binary cross-entropy loss function, is defined as follows:
Figure FDA0002532248570000049
in the formula (I), the compound is shown in the specification,
Figure FDA00025322485700000410
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA00025322485700000411
Object property label
Figure FDA00025322485700000412
From a target property tag set
Figure FDA00025322485700000413
Edited image
Figure FDA00025322485700000414
From edited image sets
Figure FDA00025322485700000415
vkJ is 1,2, …, m, m is the number of attributes, ω is the learning weight,
Figure FDA00025322485700000416
is a binary cross entropy function and is defined as:
Figure FDA00025322485700000417
category discriminator
Figure FDA00025322485700000418
Used for judging whether the jth attribute of the input image is correctly edited or not;
iii, loss of reconstruction
Figure FDA00025322485700000419
The loss avoids information loss in the process of reconstructing the attribute irrelevant area in the generated image; loss of reconstruction
Figure FDA00025322485700000420
Measure the originalImage x and reconstructed image gjThe difference between, among them, the set of reconstructed images
Figure FDA00025322485700000421
Is/are as follows
Figure FDA00025322485700000422
Is an edited image
Figure FDA00025322485700000423
Through generator GjAccording to the attribute value vkThe result after generation, i.e.
Figure FDA00025322485700000424
The reconstruction loss is defined as follows:
Figure FDA0002532248570000051
in the formula (I), the compound is shown in the specification,
Figure FDA0002532248570000052
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA0002532248570000053
Face image x is from a set of face images
Figure FDA00025322485700000513
Reconstructed image gjFrom a set of reconstructed images
Figure FDA0002532248570000054
m is the number of attributes, omega is the learning weight, | | |1Represents the L-1 norm;
iv, regularization loss
Figure FDA0002532248570000055
In generator GjIn the method, a residual image is not a complete image and is used as a direct learning target, the residual image represents local pixel change on target attributes, theoretically, the residual image should be sparse, and a large number of zero-value pixels exist; thus, a regularization penalty is introduced and is defined as follows:
Figure FDA0002532248570000056
in the formula (I), the compound is shown in the specification,
Figure FDA0002532248570000057
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA0002532248570000058
Residual image
Figure FDA0002532248570000059
From sets of residual images
Figure FDA00025322485700000510
m is the number of attributes, omega is the learning weight, | | |1Represents the L-1 norm;
IV, constructing a discriminator DjThe training target of (1):
the countermeasure loss and the classification loss are taken into account in the discriminator DjIn the training target of (2), the following is defined:
Figure FDA00025322485700000511
in the formula (I), the compound is shown in the specification,
Figure FDA00025322485700000512
is a balance factor, indicating the importance of the class loss; two losses are defined as follows:
i, fight against loss:
this loss is used to encourage the authenticity discriminator Dj realDistinguishing between real and false images generated, and applying to generator GjSimilar to the above antagonistic losses, also using a base Dj realThe weighted logarithm of the discrimination result quantifies the reality of the input image, and smaller countermeasure loss represents poorer performance of the generator, and the countermeasure loss is defined as follows:
Figure FDA0002532248570000061
in the formula (I), the compound is shown in the specification,
Figure FDA0002532248570000062
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA0002532248570000063
The face image is from a face image set x and an edited image
Figure FDA0002532248570000064
From edited image sets
Figure FDA0002532248570000065
vkThe facticity discriminator takes the value of the jth attribute k, j is 1,2, …, m, m is the number of attributes, omega is the learning weight
Figure FDA0002532248570000066
Is used for judging the authenticity of the input image;
ii, class loss:
the loss also quantifies the degree of correlation between the predicted value and the true value of the jth attribute of the input image, and minimizes the class discriminator Dj clsIs lost in category, ensure
Figure FDA0002532248570000067
For accurate judgment of the attribute values of the input image, the category loss is defined as follows:
Figure FDA0002532248570000068
in the formula (I), the compound is shown in the specification,
Figure FDA0002532248570000069
representing a computational mathematical expectation, attribute tag yjFrom attribute tag collections
Figure FDA00025322485700000610
Face image x is from a set of face images
Figure FDA00025322485700000611
m is the number of attributes, ω is the learning weight, l (y)jAnd x) is a binary cross-entropy function defined as
Figure FDA00025322485700000612
Category discriminator
Figure FDA00025322485700000613
Used for judging whether the jth attribute of the input image is correctly edited or not;
v, constructing respective optimizers of the generator and the discriminator:
in order to improve the training stability and speed, the generator and the discriminator both adopt an Adam optimizer, and the Adam optimizer comprehensively considers the first moment estimation and the second moment estimation of the gradient so as to update the learning step length and automatically adjust the learning rate.
4. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 3), each condition generates a training of the antagonistic network, comprising the following steps:
3.1) loading the preprocessed data set and the constructed conditional generation type countermeasure network;
3.2) inputting the face images in the data set and artificially specified target attribute values to a generator and a discriminator in batches;
3.3) calculating respective loss values according to respective output values of the generator and the discriminator and respective training targets;
3.4) calculating respective gradient values according to respective loss values of the generator and the discriminator and an Adam optimizer;
and 3.5) realizing back propagation according to respective gradient values of the generator and the discriminator and adjusting respective parameters.
5. The face property editing method based on the balanced stack generation type confrontation network as claimed in claim 1, wherein: in step 4), all trained generators are stacked to form a stack structure, and corresponding face attributes are sequentially edited aiming at the preprocessed unknown face images, and the method comprises the following steps:
4.1) stacking all generators;
4.2) preprocessing unknown face images and artificially setting a plurality of target attribute values;
4.3) inputting the face image and the target attribute value to all generators one by one, respectively changing the corresponding attributes, generating a residual image, combining the residual image with the input face image, and outputting a complete face image; the first generator receives the preprocessed unknown face image, the other generators receive the complete face image output by the previous generator, and the output of the last generator is the complete face image with all the properties edited.
CN202010521351.2A 2020-06-10 2020-06-10 Face attribute editing method based on balanced stack type generation type countermeasure network Active CN111914617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010521351.2A CN111914617B (en) 2020-06-10 2020-06-10 Face attribute editing method based on balanced stack type generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010521351.2A CN111914617B (en) 2020-06-10 2020-06-10 Face attribute editing method based on balanced stack type generation type countermeasure network

Publications (2)

Publication Number Publication Date
CN111914617A true CN111914617A (en) 2020-11-10
CN111914617B CN111914617B (en) 2024-05-07

Family

ID=73237577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010521351.2A Active CN111914617B (en) 2020-06-10 2020-06-10 Face attribute editing method based on balanced stack type generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN111914617B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991150A (en) * 2021-02-08 2021-06-18 北京字跳网络技术有限公司 Style image generation method, model training method, device and equipment
CN112990078A (en) * 2021-04-02 2021-06-18 深圳先进技术研究院 Facial expression generation method based on generation type confrontation network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678340A (en) * 2016-01-20 2016-06-15 福州大学 Automatic image marking method based on enhanced stack type automatic encoder
CN108932693A (en) * 2018-06-15 2018-12-04 中国科学院自动化研究所 Face editor complementing method and device based on face geological information
CN109377535A (en) * 2018-10-24 2019-02-22 电子科技大学 Facial attribute automatic edition system, method, storage medium and terminal
CN109615582A (en) * 2018-11-30 2019-04-12 北京工业大学 A kind of face image super-resolution reconstruction method generating confrontation network based on attribute description

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678340A (en) * 2016-01-20 2016-06-15 福州大学 Automatic image marking method based on enhanced stack type automatic encoder
CN108932693A (en) * 2018-06-15 2018-12-04 中国科学院自动化研究所 Face editor complementing method and device based on face geological information
CN109377535A (en) * 2018-10-24 2019-02-22 电子科技大学 Facial attribute automatic edition system, method, storage medium and terminal
CN109615582A (en) * 2018-11-30 2019-04-12 北京工业大学 A kind of face image super-resolution reconstruction method generating confrontation network based on attribute description

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于贺;余南南;: "基于多尺寸卷积与残差单元的快速收敛GAN胸部X射线图像数据增强", 信号处理, no. 12, 25 December 2019 (2019-12-25) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991150A (en) * 2021-02-08 2021-06-18 北京字跳网络技术有限公司 Style image generation method, model training method, device and equipment
CN112990078A (en) * 2021-04-02 2021-06-18 深圳先进技术研究院 Facial expression generation method based on generation type confrontation network

Also Published As

Publication number Publication date
CN111914617B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Wang et al. Face aging with identity-preserved conditional generative adversarial networks
TWI772805B (en) Method for training generative adversarial network, method for generating image, and computer-readable storage medium
CN108717568B (en) A kind of image characteristics extraction and training method based on Three dimensional convolution neural network
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN110706152B (en) Face illumination migration method based on generation of confrontation network
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN109214298A (en) A kind of Asia women face value Rating Model method based on depth convolutional network
CN111914617B (en) Face attribute editing method based on balanced stack type generation type countermeasure network
CN112541865A (en) Underwater image enhancement method based on generation countermeasure network
CN113724354B (en) Gray image coloring method based on reference picture color style
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN113642621A (en) Zero sample image classification method based on generation countermeasure network
CN111915545A (en) Self-supervision learning fusion method of multiband images
CN114581552A (en) Gray level image colorizing method based on generation countermeasure network
CN114004333A (en) Oversampling method for generating countermeasure network based on multiple false classes
CN113112416A (en) Semantic-guided face image restoration method
Wei et al. Universal deep network for steganalysis of color image based on channel representation
CN109947960A (en) The more attribute Combined estimator model building methods of face based on depth convolution
CN113609944A (en) Silent in-vivo detection method
CN114049675B (en) Facial expression recognition method based on light-weight two-channel neural network
CN111489405A (en) Face sketch synthesis system for generating confrontation network based on condition enhancement
Jolly et al. Bringing monochrome to life: A GAN-based approach to colorizing black and white images
CN115035052A (en) Forged face-changing image detection method and system based on identity difference quantification
CN114187380A (en) Color transfer method based on visual saliency and channel attention mechanism
Qian et al. An efficient fuzzy clustering-based color transfer method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant