CN115082292A

CN115082292A - Human face multi-attribute editing method based on global attribute editing direction

Info

Publication number: CN115082292A
Application number: CN202210628783.2A
Authority: CN
Inventors: 徐雪妙; 曾瑞华; 徐洋洋
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-09-20

Abstract

The invention discloses a face multi-attribute editing method based on a global attribute editing direction, which comprises the following steps: 1) acquiring a data set attribute association diagram, an attribute semantic embedded set and a scale factor; 2) constructing a global attribute editing network, wherein the input of the network is a data set attribute association diagram, an attribute semantic embedded set and a scale factor, and the output is a global attribute editing direction; 3) designing three target loss functions to optimize the constructed network and storing the optimized network as a model; 4) and in the model test stage, multi-attribute editing is carried out on the face through a user-defined scale factor and a stored model. The method and the device can edit the attributes of the face based on the global attribute editing direction, can solve the problem that multiple times of single editing are needed when other methods are used for editing, and can generate more natural and reasonable face multi-attribute editing effect and more excellent face appearance characteristic maintenance.

Description

Human face multi-attribute editing method based on global attribute editing direction

Technical Field

The invention relates to the technical field of editing of hidden space attributes based on editing directions, in particular to a human face multi-attribute editing method based on a global attribute editing direction, which is used for editing multiple attributes of a given real human face image to obtain an edited human face image with excellent attribute editing effect and human face appearance characteristics.

Background

The human face attribute editing work can realize the human face attribute editing with coarse granularity such as human face aging, human face conversion and the like, and also can realize the human face attribute editing with fine granularity such as human face expression modification, human face hair color and the like, so that the human face editing task plays an important role in daily life and practical application, and the research on the human face editing task in recent years is also widely concerned by the academic and industrial fields.

Most of the existing work focuses on the aspect of face single-attribute editing, and research on the work of face multi-attribute editing is less. Although the existing human face single-attribute editing work can achieve the result of multi-attribute editing by performing multiple single-attribute editing on a human face, the mode may cause the edited human face to exceed the editing space boundary in a hidden space when the attribute of a single human face is edited for multiple times, so that the result of multi-attribute editing has serious artifacts or ghost faces; meanwhile, some single-attribute editing works can cause some irrelevant attributes to change when the face attributes are edited for many times, so that the identity characteristic information of the face is lost, or the attributes needing to be edited are not reasonably edited due to the conflict among the editing attributes.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, provides a human face multi-attribute editing method based on a global attribute editing direction, can learn the global attribute editing direction through a global attribute editing network, solves the problem that single attribute editing work needs to edit a single attribute for multiple times to achieve the purpose when editing multiple attributes, and simultaneously realizes more excellent multi-attribute editing effect and human face appearance characteristic maintenance.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: the face multi-attribute editing method based on the global attribute editing direction comprises the following steps:

1) acquiring a data set attribute association diagram, an attribute semantic embedded set and a scale factor;

calculating a data set attribute association diagram according to attribute labels of each face in the CelebA-HQ face data set; inputting a set of required attribute description texts into a pre-trained CLIP text encoder, and outputting to obtain an attribute semantic embedded set; selecting an original face image and a target face image from CelebA-HQ face data set, subjecting the original face image and the target face image to a pre-trained hidden variable encoder to obtain an original hidden variable and a target hidden variable, subjecting the original hidden variable and the target hidden variable to a pre-trained hidden variable attribute classifier respectively to obtain an original attribute score and a target attribute score, and calculating the difference between the original attribute score and the target attribute score to obtain a scale factor;

2) model construction

Constructing a global attribute editing network, wherein the input of the network is a data set attribute association diagram, an attribute semantic embedded set and a scale factor, and the output is a global attribute editing direction, the global attribute editing network consists of an improved graph convolution neural network and an improved full-connection network, the input of the improved graph convolution neural network is a data set attribute association diagram and an attribute semantic embedded set, and the output is an initial attribute editing direction, aiming at obtaining the editing direction of each initialized attribute, the input of the improved full-connection network is the initial attribute editing direction and the scale factor, and the output is the global attribute editing direction, aiming at adjusting the editing strength of each initialized attribute editing direction by the scale factor, thereby obtaining a more accurate global attribute editing direction;

3) model optimization

In order to better edit the attributes and retain the identity information, three target loss functions are designed to optimize the constructed global attribute editing network, and after optimization, an optimal global attribute editing network is obtained and comprises an optimal improved graph convolution neural network and an optimal improved full-connection network;

4) face property editing

In the model testing stage, the global attribute editing direction output by the optimal global attribute editing network is acted on a given face image in a hidden space, and the multi-attribute edited face image which is excellent in attribute editing effect and can keep identity information is obtained.

Further, the step 1) comprises the following steps:

a. obtaining dataset attribute association graphs

The CelebA-HQ face data set comprises 30000 face images in total, each face image has labels with 40 attributes, the label value of one attribute is 1 to indicate that the face image has the attribute, the label value is-1 to indicate that the face image does not have the attribute, and the face image attribute association graph A is obtained for one face image _i ，A _i Essentially, it is a adjacency matrix, firstly, the attribute association diagram A of the face image is obtained _i Initializing to all zero, if the labels on the jth attribute and the kth attribute of the face image are both 1, then associating the attribute of the face image with the attribute A _i The corresponding position is 1, namely A _i [j][k]And A _i [k][j]Changing the value of the relation to the value of the attribute of the face image into 1, traversing the combination of all j and k, setting the value of the main diagonal of the adjacent matrix to be 1, and obtaining the adjacent matrix which is the attribute association diagram A of the face image through calculation _i Data set attribute association graph

The solving method is to sum and normalize 30000 face image attribute association graphs, and the method is represented as follows:

in the formula, A _i The attribute association graph of one face image is shown, n represents the number of the face images of the CelebA-HQ face data set, the value is 30000, A represents that the attribute association graphs of the 30000 images are processedThe matrix obtained after summation, D ^-1 An inverse matrix of the degree matrix representing the summed matrix a,

expressing a matrix obtained by normalizing the matrix obtained by summation, namely a data set attribute correlation diagram to be finally obtained;

b. attribute semantic embedded collections

Firstly, initializing a set of required attribute description texts, wherein each attribute description text is an English character string for describing a corresponding attribute, and obtaining corresponding attribute semantic embedding by passing the English character strings through a pre-trained CLIP text encoder, wherein the attribute semantic embedding set is an attribute semantic embedding set and is expressed as follows:

E＝[e ₁ ,e ₂ ,...,e _M ] ^T

wherein E represents an attribute semantic embedding set, E is a vector with the length of M, M represents the number of attributes and also represents the length of the vector, T represents the transposition of the vector, E ₁ Semantic Embedded representation representing the first Attribute, e ₂ Semantic Embedded representation representing the second Attribute, e _M A semantic embedded representation representing an Mth attribute;

c. scaling factor

Randomly selecting two face images in CelebA-HQ face data set as original image I ^o And a target image I ^t The original image I is processed ^o And a target image I ^t The original hidden variable w is obtained by the pre-trained hidden variable encoder ^o And a target hidden variable w ^t ，w ^o And w ^t Respectively representing the original image I ^o And a target image I ^t Mapping in a hidden space; implicit variable w ^o And w ^t Obtaining an original attribute score S by a pre-trained hidden variable attribute classifier ^o And a target attribute score S ^t Expressed as:

in the formula, w ^o And w ^t Respectively representing original hidden variables and target hidden variables, C representing a pre-trained hidden variable attribute classifier, S ^o Represents the original attribute score, S ^t Representing a target attribute score, S ^o And S ^t Is a length M vector, M representing the number of attributes, and also the length of the attribute score vector, T representing the transpose of the vector,

represents the original attribute score S ^o The value at the first position is,

represents the original attribute score S ^o The value at the second position is such that,

represents the original attribute score S ^o The value at the M-th position,

representing a target attribute score S ^t The value at the first position is,

representing a target attribute score S ^t The value at the second position is such that,

representing a target attribute score S ^t A value at the Mth position;

get the original attribute score S ^o And a target attribute score S ^t Thereafter, a scaling factor can be calculated, expressed as:

Alpha＝[α ₁ ,α ₂ ,...,α _M ] ^T

in the formula (I), the compound is shown in the specification,

representing a target attribute score S ^t The value at the i-th position,

represents the original attribute score S ^o The value at the i-th position, Alpha, represents a scale factor, Alpha _i Representing the value of the scale factor at the i-th position, alpha ₁ Representing the value of the scale factor at a first position, alpha ₂ Indicating the value of the scale factor at the second position, alpha _M Denotes the value of the scale factor at the mth position, M denotes the number of attributes, and T denotes the transpose of the vector.

Further, the step 2) comprises the following steps:

constructing a global attribute editing network, wherein the input of the network is a data set attribute association diagram, an attribute semantic embedded set and a scale factor, the output is a global attribute editing direction, and the global attribute editing network consists of an improved graph convolution neural network and an improved full-connection network, and is specifically represented as follows:

a. improved graph convolution neural network

Constructing an improved graph volume neural network, wherein the improved graph volume neural network comprises two layers of network modules, and the network modules are marked as block1 and block2, the input of block1 is an obtained data set attribute association diagram and an attribute semantic embedded set, the output is an intermediate variable, the input of block2 is an intermediate variable output by block1 and a data set attribute association diagram, the output is an initial attribute editing direction, and the whole process is represented as follows:

in the formula, N _init Representing improved atlas nerveThe network outputs the resulting initial attribute edit direction,

representing a data set attribute correlation diagram, E representing an attribute semantic embedded set, W representing weight parameters needing to be learned in an improved graph convolution neural network, expressing vector point multiplication operation, and sigma (DEG) representing a nonlinear activation function;

b. improved fully connected network

Constructing an improved fully-connected network, wherein the improved fully-connected network comprises two fully-connected layers, which are marked as Linear1 and Linear2, the input of Linear1 is the initial attribute editing direction and the scale factor, the activation function is the Leaky-ReLu function, the output of Linear1 is the input of Linear2, Linear2 is the no activation function, the final output of the global attribute editing direction, and the whole process is represented as:

N _g ＝F(N _init ×Alpha)

in the formula, N _g Representing global property edit direction resulting from output of the improved fully-connected network, F representing the improved fully-connected network, N _init Represents the initial property edit direction with a dimension of [40,512]]Alpha represents a scale factor with dimensions [40,1 ]]And x represents the product operation, specifically, N is _init Multiplying each 512-dimensional vector by the corresponding value in Alpha.

Further, in step 3), in order to better edit the attributes and retain the identity information, three objective loss functions are designed to optimize the constructed global attribute editing network, and after optimization, an optimal global attribute editing network is obtained, wherein the optimal global attribute editing network comprises an optimal improved graph convolution neural network and an optimal improved fully-connected network; the optimization is specifically a gradient descent algorithm, and it is desirable to optimize weight parameters in the constructed global attribute editing network by the gradient descent algorithm so that values of three objective functions are as small as possible, where the three objective loss functions are expressed as follows:

a. multi-attribute edit loss

Existing original hidden variable w ^o And a target hidden variable w ^t The original attribute score S can be obtained through the hidden variable attribute classifier ^o And a target attribute score S ^t Adding the obtained global property editing direction to the original hidden variable w ^o The edited hidden variable can be obtained, and then the attribute score of the edited hidden variable can be obtained by passing the edited hidden variable through a pre-trained hidden variable attribute classifier, wherein the process is represented as follows:

S ^e ＝C(w ^o +N _g )

in the formula, S ^e Representing edited attribute scores, C representing hidden variable attribute classifier, w ^o Representing original hidden variables, N _g The global attribute editing direction output by the global attribute editing network is represented;

to make the edited attribute score S ^e As close to the target attribute score S as possible ^t The multi-attribute editing loss is designed, and is specifically expressed as follows:

in the formula, L _mae Indicating the loss of the multi-property editing,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

representing the value of the original attribute score at the position i, and log representing logarithmic operation;

b. multi-attribute retention loss

In order to ensure that the property not edited does not change, the multi-property retention loss is designed, and is specifically expressed as follows:

in the formula, L _map Indicating that the multi-attribute retention loss is,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

the value of the original attribute score at the i position, | | · | | non-woven phosphor ₂ Represents l ₂ -a norm;

c. space conservation loss

In order to prevent the original hidden variables from being excessively changed during editing, the space conservation loss is designed, and is specifically expressed as follows:

L _sp ＝||N _g || ₂

in the formula, L _sp Represents the space conservation loss, N _g Representing global property editing direction, | · | | non-woven phosphor ₂ Is represented by ₂ Norm.

Further, in step 4), in the model testing stage, the global attribute editing direction output by the optimal global attribute editing network is applied to a given face image in the hidden space to obtain a multi-attribute edited face image which has an excellent attribute editing effect and can maintain identity information, including the following steps:

4.1) in the testing stage, the attribute association diagram and the attribute semantic embedded set of the data set are subjected to an optimal improved graph convolution neural network to obtain an initial attribute editing direction, wherein the initial attribute editing direction is expressed as follows:

in the formula, N _init Indicating initial Property editing Direction, M _GCN Represents an optimal improved graph convolution neural network,

representing a data set attribute association graph, and E represents an attribute semantic embedded set;

4.2) inputting the scale factor customized by the user and the initial attribute editing direction into the optimal improved fully-connected network, and outputting to obtain the global attribute editing direction, wherein the process is represented as follows:

N _g ＝M _F (N _init ×Alpha _test )

in the formula, N _g Indicating global property edit direction, M _F Fully connected network representing an optimum improvement, N _init Indicating initial Property edit Direction, Alpha _test Representing a user-defined scale factor;

4.3) for a given original image, the original image is subjected to a pre-trained hidden variable encoder to obtain an original hidden variable, then the global attribute editing direction is acted on the original hidden variable to obtain an edited hidden variable, finally the edited hidden variable is sent to a pre-trained decoder, and a final multi-attribute edited face image is obtained through output, wherein the process is represented as follows:

I ^e ＝G(w ^o +N _g )

in the formula I ^e Representing a multi-attribute edited face image, G representing a pre-trained decoder, w ^o Representing original hidden variables, N _g Indicating a global property edit direction.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention solves the problem that the prior single attribute editing work needs to edit the attributes of the human face for many times when multi-attribute editing is carried out by searching a global attribute editing direction through a deep learning network, and is simpler in editing difficulty.

2. Compared with other attribute editing work, the method has the advantages that the reasoning time is shorter, and the editing speed of multi-attribute editing on one face is higher.

3. Compared with most other attribute editing work, the method can edit more attributes on the face.

4. The method can generate more reasonable and natural editing effect when editing the multi-attribute group of the face, and can well keep the appearance characteristic of the face.

Drawings

FIG. 1 is a logic flow diagram of the method of the present invention.

FIG. 2 is a graph of the correlation of the attributes of the data sets obtained by the present invention.

FIG. 3 is a schematic diagram of the scale factor obtained by the present invention.

FIG. 4 is a schematic diagram of an improved graph-convolution neural network constructed in accordance with the present invention.

Fig. 5 is a schematic diagram of an improved fully-connected network constructed in accordance with the present invention.

Fig. 6 is a schematic view of a multi-attribute editing process of a human face according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, the present embodiment provides a human face multi-attribute editing method based on a global attribute editing direction, which includes the following steps:

1) and acquiring a data set attribute association diagram, an attribute semantic embedded set and a scale factor.

The process of obtaining the attribute association graph of the data set is shown in fig. 2, and the specific process is described as follows: firstly, downloading CelebA-HQ data set from the Internet, wherein the data set comprises 30000 faces, and the resolution of each face image is 1024 × 1024, as shown in the leftmost picture in FIG. 2. Meanwhile, each picture is labeled with 40 attributes, specifically [ "beard", "willow leaf eyebrow", "attractive", "eye pouch", "bald head", "bang", "large lip", "big nose", "black hair", "golden hair", "fuzzy", "brown hair", "thick eyebrow", "round, double chin", "glasses", "goat beard", "gray hair", "thick make-up", "high cheekbone", "male sex bone", "male sex" are labeled"," slightly open mouth "," beard "," slender eyes "," no-urheen "," oval face "," white skin "," sharp nose "," posterior movement of hairline "," red-wet double cheeks "," ludwigia chinensis "," smile "," straight hair "," curly hair "," earring "," hat "," lipstick "," necklace "," tie "," young "]The label value of each attribute is 1 or-1, if the label value of a certain attribute is 1, the attribute of the face image is represented, and if the label value is-1, the attribute of the face image is not represented. For a given face image, we can calculate the face image attribute association graph A _i The method is to initialize an all-zero adjacent matrix A _i The dimension of the matrix is [40,40 ]]If the labels on the jth attribute and the kth attribute of the face image are both 1, then A is used _i The corresponding position on the matrix is 1, i.e. A _i [j][k]And A _i [k][j]Change to 1, traverse all combinations of j and k, and neighbor matrix A _i Has a value of 1, i.e. A _i [j][j]Above, the calculated adjacency matrix a _i Namely a human face image attribute association graph A _i Wherein j and k have values in the range of [0,39 ]]And each obtained attribute association diagram of the face image is a symmetric matrix. The middle part of fig. 2 is a property association diagram representing 30000 individual face images, i.e. 30000 dimensions are [40,40 ]]Matrix of (2), data set attribute correlation diagram

in the formula, A _i Attribute association diagram representing human face imageN represents the number of face images of the CelebA-HQ face data set, the value is 30000, A represents a matrix obtained by summing attribute association graphs of the 30000 images, and the dimensionality is [40,40 ]]，D ^-1 The inverse of the degree matrix representing the summed matrix a, the degree matrix being derived by summing all of the rows in the a matrix and placing them on the main diagonal, and setting all other values outside the main diagonal to 0,

represents a matrix obtained by normalizing the matrix obtained by summing, each value in the matrix being in the range of [0,1 ]]，

Namely a data set attribute association diagram which is finally obtained by us;

the specific process for obtaining attribute semantic embedding is as follows: first, a set of 40 attribute description texts is initialized, each attribute description text is an english character string describing a corresponding attribute, for example, the description of the smile attribute is 'smiling'. The 40 English character strings are embedded with 40 corresponding attribute semantemes by a pre-trained CLIP text encoder, and the CLIP text encoder can be directly downloaded from the Internet. Each attribute description text outputs a semantic embedding, the dimension of each semantic embedding is [1,512], the 40 attribute semantic embeddings are attribute semantic embedding sets, the dimension is [40,512], and the semantic embeddings can be expressed as:

E＝[e ₁ ,e ₂ ,...,e _M ] ^T

in the formula, E represents attribute semantic embedded set, E is a vector with the length of M, M represents the number of attributes and the value of 40, and also represents the length of the vector, T represents transposition of the vector, E ₁ Semantic Embedded representation representing the first Attribute, e ₂ Semantic Embedded representation representing the second Attribute, e _M A semantic embedded representation representing an Mth attribute;

the specific process for obtaining the scale factor is shown in fig. 3, and the specific process is described as follows: randomly selecting two face images in CelebA-HQ face data set as original imagesI ^o And a target image I ^t The original image I is processed ^o And a target image I ^t The original hidden variable w is obtained by the pre-trained hidden variable encoder ^o And a target hidden variable w ^t ，w ^o And w ^t Respectively represent the original image I ^o And a target image I ^t Mapping in hidden space, w ^o And w ^t Has a dimension of [1,18,512 ]](ii) a Implicit variable w ^o And w ^t The method comprises the steps of pre-training a hidden variable attribute classifier, wherein an activation function of the last layer of the hidden variable attribute classifier is a sigmoid (·) function, and obtaining an original attribute score S ^o And a target attribute score S ^t Expressed as:

in the formula, w ^o And w ^t Respectively representing original hidden variables and target hidden variables, C representing a pre-trained hidden variable attribute classifier, S ^o Represents the original attribute score, S ^t Representing a target attribute score, S ^o And S ^t Is a vector of length M, with dimensions [1,40 ]]The value at each position is in the range of [0,1 ]]M denotes the number of attributes, and also the length of the attribute score vector, and has a value of 40, T denotes the transpose of the vector,

represents the original attribute score S ^o The value at the M-th position,

representing a target attribute score S ^t The value at the first position is,

representing a target attribute score S ^t A value at the Mth position;

obtaining the original attribute score S ^o And a target attribute score S ^t We can then calculate a scaling factor, expressed as:

Alpha＝[α ₁ ,α ₂ ,...,α _M ] ^T

in the formula (I), the compound is shown in the specification,

representing a target attribute score S ^t The value at the i-th position,

represents the original attribute score S ^o The value at the ith position, Alpha, represents a scale factor with dimensions [1,40 ]]，α _i Representing the value of the scale factor at the i-th position, alpha ₁ Representing the value of the scale factor at a first position, alpha ₂ Representing the value of the scale factor at the second position, α _M Denotes the value of the scale factor at the mth position, M denotes the number of attributes, and T denotes the transpose of the vector. Conditions therein

Means if

And

one greater than 0.5 and the other less than 0.5, since the two values represent scores of the property, the actual representation is if the original hidden variable and the target hidden variable have one property and the other does not have the property.

2) And (5) constructing a model.

Constructing a global attribute editing network, wherein the input of the network is a data set attribute association diagram, an attribute semantic embedded set and a scale factor, and the output is a global attribute editing direction, the global attribute editing network consists of an improved graph convolution neural network and an improved full-connection network, and is specifically represented as follows:

a. improved graph convolution neural network

An improved graph convolution neural network is constructed, as shown in fig. 4, the improved graph convolution neural network comprises two layers of network modules, which are marked as block1 and block2, the input of block1 is an obtained data set attribute association diagram and an attribute semantic embedded set, the output is an intermediate variable, the input of block2 is an intermediate variable and a data set attribute association diagram output by block1, the output obtains an initial attribute editing direction, and the whole process can be simplified and expressed as:

in the formula, N _init Representing the edit direction of the initial attribute obtained by the output of the improved graph convolution neural network, and the dimension is [40,512]]，

Representing a data set attribute correlation diagram with dimensions [40,40 ]]E denotes an attribute semantic embedding set with dimension [40,512]]W represents the weight parameter to be learned in the improved convolutional neural network, and the dimension is [512,512 ]]And represents a vector dot product operation, and σ () represents a nonlinear activation function, in particular, a Leaky-ReLu function.

b. Improved fully connected network

An improved fully-connected network is constructed, as shown in fig. 5, the improved fully-connected network includes two fully-connected layers, which are denoted as Linear1 and Linear2, the input of Linear1 is the initial attribute editing direction and the scale factor, the activation function is the leak-ReLu function, the output of Linear1 is the input of Linear2, Linear2 is the no activation function, the global attribute editing direction is finally output, and the whole process can be represented as:

N _g ＝F(N _init ×Alpha)

in the formula, N _g Representing the editing direction of the global attribute obtained by the output of the improved fully-connected network, and the dimension is [1,512]]F denotes an improved fully-connected network, N _init Represents the initial property edit direction with a dimension of [40,512]]Alpha represents a scale factor with dimensions [40,1 ]]X represents the product operation, N is _init Multiplying each 512-dimensional vector by the corresponding value in Alpha is equivalent to scaling each value in each 512-dimensional vector by a certain factor.

3) And (6) optimizing the model.

In order to better edit the attributes and retain the identity information, three target loss functions are designed to optimize the constructed global attribute editing network, and after optimization, an optimal global attribute editing network is obtained and comprises an optimal improved graph convolution neural network and an optimal improved full-connection network; the optimization is specifically a gradient descent algorithm, and we want to optimize the weight parameters in the constructed global attribute editing network through the gradient descent algorithm so that the values of the three objective functions are as small as possible, and the loss function from the three objectives can be expressed as follows:

the multi-attribute editing loss is calculated as follows: existing original hidden variable w ^o And a target hidden variable w ^t Through the hidden variable attribute classifier, the original attribute score S can be obtained ^o And a target attribute score S ^t Adding the obtained global property editing direction to the original hidden variable w ^o The dimension of the global property edit direction is [1,512]]Original hidden variableHas a dimension of [1,18,512 ]]The specific way of adding here is to copy the last dimension vector 18 times in the global property editing direction to obtain the dimension [1,18,512 ]]The global attribute editing direction is added to the original hidden variable one by one, the edited hidden variable can be obtained, the edited hidden variable passes through a pre-trained hidden variable attribute classifier, and the attribute score of the edited hidden variable can be obtained, wherein the process can be expressed as follows:

S ^e ＝C(w ^o +N _g )

in the formula, S ^e Representing the edited attribute score with a dimension of [1, 40%]C denotes a hidden variable attribute classifier, w ^o Representing original hidden variables, N _g A global property edit direction representing an improved fully-connected network output;

to make the edited attribute score S ^e As close to the target attribute score S as possible ^t We have designed multi-attribute edit loss, which is specifically expressed as follows:

in the formula, L _mae Indicating the loss of the multi-property editing,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

the multi-attribute retention penalty is calculated as follows: in order to make the property not edited not change, we design the multi-property retention loss, which is specifically expressed as follows:

in the formula, L _map Indicating that the multi-attribute retention loss is,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

the value of the original attribute score at the i position, | | · | | non-woven phosphor ₂ Is represented by ₂ -a norm;

the spatial retention penalty is calculated as follows: in order to prevent the original hidden variables from being changed excessively during editing, a space conservation loss is designed, and the space conservation loss is specifically expressed as follows:

L _sp ＝||N _g || ₂

in the formula, L _sp Represents the space conservation loss, N _g Representing global property editing direction, | · | | non-woven phosphor ₂ Is represented by ₂ -a norm.

We optimize ten generations of the improved atlas neural network and the improved fully-connected network using a gradient descent algorithm, and save the improved atlas neural network as the optimal improved atlas neural network and the improved fully-connected network as the optimal improved fully-connected network.

4) And editing the face attribute.

In a model testing stage, a global attribute editing direction output by an optimal global attribute editing network is acted on a given face image in a hidden space to obtain a multi-attribute edited face image which has an excellent attribute editing effect and can keep identity information, and the method comprises the following steps:

4.1) in the testing stage, the data set attribute association diagram and the attribute semantic embedded set pass through an optimal improved graph convolution neural network to obtain an initial attribute editing direction, wherein the initial attribute editing direction can be expressed as follows:

in the formula, N _init Indicating initial Property edit Direction, M _GCN A convolutional neural network representing an optimal improvement,

4.2) inputting the scale factor customized by the user and the initial attribute editing direction into the optimal improved fully-connected network, and outputting to obtain the global attribute editing direction, wherein the process can be expressed as follows:

N _g ＝M _F (N _init ×Alpha _test )

4.3) after obtaining the global attribute editing direction, generating a flow of an edited face as shown in fig. 6, for a given original image, pre-training an original image with a hidden variable encoder to obtain an original hidden variable, then acting the global attribute editing direction on the original hidden variable to obtain an edited hidden variable, finally sending the edited hidden variable to a pre-trained decoder, and outputting to obtain a final multi-attribute edited face image, where the flow can be represented as:

I ^e ＝G(w ^o +N _g )

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The face multi-attribute editing method based on the global attribute editing direction is characterized by comprising the following steps of:

2) model construction

3) model optimization

4) face property editing

In the model testing stage, the global attribute editing direction obtained by the output of the optimal global attribute editing network acts on a given face image in a hidden space, and the multi-attribute edited face image which has an excellent attribute editing effect and can keep identity information is obtained.

2. The method for editing human face multiple attributes based on global attribute editing direction as claimed in claim 1, wherein the step 1) comprises the following steps:

a. obtaining dataset attribute association graphs

The CelebA-HQ face data set comprises 30000 face images in total, each face image has labels with 40 attributes, the label value of one attribute is 1 to indicate that the face image has the attribute, the label value is-1 to indicate that the face image does not have the attribute, and the face image attribute association graph A is obtained for one face image _i ，A _i Essentially a adjacency matrix, firstly, the attribute association graph A of the face image is _i Initializing to be all zero, if labels on jth attribute and kth attribute of the face image are both 1, then associating graph A with face image attribute _i The corresponding position is 1, namely A _i [j][k]And A _i [k][j]Changing the value of the relation to the value of the attribute of the face image into 1, traversing the combination of all j and k, setting the value of the main diagonal of the adjacent matrix to be 1, and obtaining the adjacent matrix which is the attribute association diagram A of the face image through calculation _i Data set attribute association graph

The solution is to sum and normalize 30000 face image attribute association graphs, and is expressed as follows:

in the formula, A _i Representing the attribute association diagram of one face image, n representing the number of face images of the CelebA-HQ face data set, the value being 30000, A representing a matrix obtained by summing the attribute association diagrams of the 30000 images, and D ^-1 An inverse matrix of the degree matrix representing the summed matrix a,

b. attribute semantic embedded collections

E＝[e ₁ ,e ₂ ,...,e _M ] ^T

c. scaling factor

Randomly selecting two face images in CelebA-HQ face data set as original image I ^o And a target image I ^t The original image I is processed ^o And a target image I ^t The original hidden variable w is obtained by the pre-trained hidden variable encoder ^o And a target hidden variable w ^t ，w ^o And w ^t Respectively representing the original image I ^o And a target image I ^t Mapping in a hidden space; implicit variable w ^o And w ^t Pre-trained hidden variable attribute classifier to obtain original attribute score S ^o And a target attribute score S ^t Expressed as:

represents the original attribute score S ^o The value at the M-th position,

representing a target attribute score S ^t The value at the first position is such that,

representing a target attribute score S ^t A value at the Mth position;

obtaining the original attribute score S ^o And a target attribute score S ^t Thereafter, a scaling factor can be calculated, expressed as:

Alpha＝[α ₁ ,α ₂ ,...,α _M ] ^T

in the formula (I), the compound is shown in the specification,

representing a target attribute score S ^t The value at the i-th position,

3. The method for editing human face multiple attributes based on global attribute editing direction as claimed in claim 1, wherein the step 2) comprises the following steps:

a. improved graph convolution neural network

Constructing an improved graph convolution neural network, wherein the improved graph convolution neural network comprises two layers of network modules which are marked as block1 and block2, the input of block1 is an obtained data set attribute association diagram and an attribute semantic embedded set, the output is an intermediate variable, the input of block2 is an intermediate variable and a data set attribute association diagram output by block1, the output obtains an initial attribute editing direction, and the whole process is represented as:

in the formula, N _init Representing the initial attribute edit direction resulting from the improved graph convolution neural network output,

b. improved fully connected network

N _g ＝F(N _init ×Alpha)

in the formula, N _g Representing global property edit direction resulting from output of the improved fully-connected network, F representing the improved fully-connected network, N _init Indicating initial property edit direction, dimensionDegree of [40,512]Alpha represents a scale factor with dimensions [40,1 ]]And x represents the product operation, specifically, N is _init Multiplying each 512-dimensional vector by the corresponding value in Alpha.

4. The method for editing human face multiple attributes based on global attribute editing direction according to claim 1, wherein: in step 3), in order to better edit the attributes and retain the identity information, three target loss functions are designed to optimize the constructed global attribute editing network, and after optimization, an optimal global attribute editing network is obtained, wherein the optimal global attribute editing network comprises an optimal improved graph convolution neural network and an optimal improved fully-connected network; the optimization is specifically a gradient descent algorithm, and it is desirable to optimize weight parameters in the constructed global attribute editing network by the gradient descent algorithm so that values of three objective functions are as small as possible, where the three objective loss functions are expressed as follows:

a. multi-attribute edit loss

Existing original hidden variable w ^o And a target hidden variable w ^t Through the hidden variable attribute classifier, the original attribute score S can be obtained ^o And a target attribute score S ^t Adding the obtained global property editing direction to the original hidden variable w ^o The edited hidden variable can be obtained, and then the attribute score of the edited hidden variable can be obtained by passing the edited hidden variable through a pre-trained hidden variable attribute classifier, wherein the process is represented as follows:

S ^e ＝C(w ^o +N _g )

in the formula, L _mae Indicating the loss of the multi-property editing,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

b. multi-attribute retention loss

in the formula, L _map Indicating that the multi-attribute retention loss is,

represents the value of the target attribute score at position i,

a value representing the edited attribute score at position i,

c. space retention loss

L _sp ＝||N _g || ₂

5. The method for editing human face multiple attributes based on global attribute editing direction according to claim 1, wherein: in step 4), in a model testing stage, a global attribute editing direction output by the optimal global attribute editing network is acted on a given face image in a hidden space to obtain a multi-attribute edited face image which has an excellent attribute editing effect and can maintain identity information, and the method includes the following steps:

in the formula, N _init Indicating initial Property edit Direction, M _GCN Represents an optimal improved graph convolution neural network,

N _g ＝M _F (N _init ×Alpha _test )

I ^e ＝G(w ^o +N _g )