CN114863527B

CN114863527B - Makeup style migration method based on FP-SCGAN model

Info

Publication number: CN114863527B
Application number: CN202210488449.1A
Authority: CN
Inventors: 李妹纳; 杭丽君; 熊攀
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2024-03-19
Anticipated expiration: 2042-05-06
Also published as: CN114863527A

Abstract

The invention discloses a makeup style migration method based on an FP-SCGAN model, which combines a feature pyramid with an SCGAN algorithm. The FP-SCGAN network includes four parts in total: PSEnc, FIEnc, MFDec and a markov discriminator. PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of pictures to be migrated, MFDec is used for fusing the facial features of the original pictures and the makeup features of the reference pictures, and a Markov discriminator is used for measuring the distance between the generated distribution and the actual distribution. The improved algorithm can solve the problems that the eyesockets have unnatural edges, lighter eyepieces cannot be migrated and the like during makeup migration, and compared with the current mainstream SCGAN makeup migration algorithm, the migration effect is improved.

Description

Makeup style migration method based on FP-SCGAN model

Technical Field

The invention belongs to the technical field of makeup migration methods, and relates to a makeup style migration method based on an FP-SCGAN model.

Background

Computer vision is one of the most popular research fields in the field of deep learning, and is widely applied to various fields at present. Along with the development and application of image processing algorithms, the development of the short video industry is accelerated, more and more functions such as camera filters, beauty and special effects are realized, and a large number of users are attracted. The application of these functions is indispensible from style migration algorithms in image processing algorithms.

The goal of image style migration is to migrate the style of a reference picture into another picture or pictures. Before neural networks, image style migration has a common idea: and analyzing an image in a certain style, establishing a mathematical or statistical model, and changing the image to be migrated so that the image can better accord with the established model. But there is a great disadvantage to doing so: a program can basically only make a certain style or a certain scene. Practical applications based on traditional style migration studies are therefore very limited. At present, a style migration algorithm is mainly based on deep learning, a neural network is adopted to extract characteristics of a style image and an image to be migrated, and the image is up-sampled and restored after the characteristics are fused, so that style migration is realized.

There are many style migration algorithms currently focusing on the migration of face attributes, wherein a typical face attribute migration is performed when makeup is migrated. The GAN-based cosmetic migration algorithm performs very well among many algorithms. The SCGAN can well migrate the reference makeup to the target image, and even for targets with very different makeup positions, good migration effect can still be generated. However, the problem that the eyebox has an unnatural edge, and lighter eyebox cannot migrate easily occurs in the migration process.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, which combines a feature pyramid and an SCGAN to generate a makeup migration network FP-SCGAN, so that the problems can be effectively solved, and the migration effect is improved. The method comprises the following steps:

the FP-SCGAN network comprises PSEnc, FIEnc, MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and the discriminator D, the network is converged when the dynamic balance is finally achieved, and the training specifically comprises the following steps:

s10, obtaining style characteristics: sending the non-makeup image x and the makeup image y into FIEnc, and obtaining facial features c of the picture to be migrated through feature extraction, downsampling and residual error modules _x ，c _y Sending the key region of the makeup reference image into PSEnc, performing feature extraction through a pretrained VGG19 network and fusing features by a feature pyramid to obtain style features s _x ,s _y ；

S20, obtaining the fusion characteristics of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multi-layer perceptron to map the style characteristics into a characteristic space to obtain a style characteristic coding code _x ,code _y The facial features and style features of the pictures to be migrated are coded and sent into an MFDec, and feature fusion is carried out through a decoder AdaIN; meanwhile, adaIN is used for introducing features in the shallow layer of the MFDec, and the feature x of fusion of the reference makeup image and the image to be migrated is obtained through the MFDec network _y ,y _x ,x _x ,y _y ；

S30, optimizing a discriminator and a generator: fixing parameters of a generator G, calculating generator loss, and optimizing a discriminator D to enhance discrimination capability of the discriminator D, then carrying out back propagation, updating the discriminator parameters, wherein the two discriminators are respectively used for discriminating a generated make-up image and a make-up removal image, and the two discriminators are identical in structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;

s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; dressing loss, which directs the migration of the dressing in the critical area; local vgg penalty that enforces retention of critical area semantic information; global vgg penalty that guarantees that the generated image is similar to the original image semantic information;

s50, updating generator parameters: at the same time x _y ,y _x Extracting content feature c from FIEnc _x,fake ,c _y,fake The method comprises the steps of carrying out a first treatment on the surface of the Then, c _x,fake And code _x C _y,fake And code _y Respectively send into MFDec to obtain x _rec Y _rec The method comprises the steps of carrying out a first treatment on the surface of the Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and reserves the basic characteristics of the original image; and finally, back propagation is carried out, and generator parameters are updated.

Preferably, the formula for calculating the generator loss is:

wherein E is _x～X Representing the true probability of the non-make-up image; e (E) _y～Y Representing the true probability of the applied image; e (E) _x～X,y～Y Representing joint probabilities of the generated images; d (D) _x (·),D _y (-) represents the arbiter output of the sampled self-generated data; d (D) _x ，D _y Representing a arbiter output sampled from the real data; g (x, y) is represented by x with reference to the makeup of yPerforming row migration; g (y, x) is migration with reference to the makeup of x.

Preferably, the formula for calculating the loss of the discriminator is:

wherein D is _x (·),D _y (. Cndot.) is the output function of the arbiter, E _x～X,y～Y Representing joint probability of generating images, wherein G (x, y) is migration by taking the makeup of x and y as a reference; g (y, x) is migration with reference to the makeup of x.

Preferably, the calculation formula of the identity loss is:

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁

wherein G (x, x) is migration by taking the makeup of x as a reference; g (y, y) is migration of y with reference to the makeup of y ₁ For the L1 penalty, the absolute error between the true data and the generated data is calculated.

Preferably, the formula for calculating the makeup loss is:

wherein,pairing data representing the generated x, +.>The generated pairing data of y is represented by x representing an image which is not made up, y representing an image which is made up, M _x,i A face mask for the pre-makeup image, where i is the sequence number of critical area including three parts of eye socket, face and lips, M _y,i A face mask for the image after makeup is composed of three parts including eye socket, face and lip, and G (x, y) for the reference of yThe examination is migrated; g (y, x) represents migration of y with reference to the makeup of x ₁ For the L1 penalty, the absolute error between the true data and the generated data is calculated.

Preferably, the calculation formula of the local vgg loss is:

wherein M is _x,i A face mask for the pre-makeup image, where i is the sequence number of critical area including three parts of eye socket, face and lips, M _y,i A face mask representing a post-makeup image, wherein i represents a sequence number of a key region including three parts of an eye socket, a face and lips, and G (x, y) represents migration of x with reference to a makeup of y; g (y, x) represents migration by taking the makeup of x as a reference, F _l (.) stands for layer one feature in the vgg network, I.I ₂ Representing the loss of L2, i.e. the square error between the calculated real data and the generated data.

Preferably, the calculation formula of the global vgg loss is as follows:

wherein G (x, y) represents migration by taking the makeup of x as a reference; g (y, x) represents migration by taking the makeup of x as a reference, F _l (.) stands for layer one feature in the vgg network, I.I ₂ Lost for L2.

Preferably, the calculation formula of the reconstruction loss is:

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁

wherein, G (G, x, y) represents that after the migration by taking the makeup of x as a reference, the y is migrated by taking the makeup of y as a reference, G (G, y, x) represents that after the migration by taking the makeup of y as a reference, the y is migrated by taking the makeup of x as a reference, the y is migrated ₁ For L1 loss, i.e. calculating true data and generated dataAbsolute error between the two.

The beneficial effects of the invention are as follows:

compared with the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of pictures to be migrated, MFDec is used for fusing the facial features of original pictures and the makeup features of reference pictures, and Markov discriminators are used for measuring the distance between generated distribution and actual distribution. The improved algorithm can solve the problems that the eyesockets have unnatural edges, lighter eyepieces cannot be migrated and the like during makeup migration, and compared with the current mainstream SCGAN makeup migration algorithm, the migration effect is improved.

Drawings

FIG. 1 is a diagram of the overall structure of an FP-SCGAN network in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the invention;

FIG. 2 is a FIEnc structure diagram in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the invention;

FIG. 3 is a PSEnc structure diagram in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the present invention;

FIG. 4 is a diagram of the MFDec structure in the makeup style migration method based on the FP-SCGAN model according to the embodiment of the invention;

FIG. 5 is a schematic diagram of a Markov discriminator in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the invention;

fig. 6 is a flow chart of steps of a makeup style migration method based on an FP-SCGAN model according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

On the contrary, the invention is intended to cover any alternatives, modifications, equivalents, and variations as may be included within the spirit and scope of the invention as defined by the appended claims. Further, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. The present invention will be fully understood by those skilled in the art without the details described herein.

Referring to fig. 1, the overall network structure diagram of the present invention, FP-SCGAN network is composed of four parts, PSEnc, FIEnc, MFDec and a discriminator.

PSEnc is used to extract reference makeup features, including various types of information such as color, texture, edges, etc. The network comprises feature extraction and feature fusion by adopting a feature pyramid.

The PSEnc network structure is shown in fig. 3. Because the input makeup reference image contains a lot of information irrelevant to the makeup, only images of three parts of the eye socket, the face and the lips extracted from the reference image are input. Firstly, feature extraction is carried out on a reference makeup image through a pretrained VGG19 network, and conv1_1, conv2_1, conv3_1 and conv4_1 feature images output by the VGG19 network are fused by adopting a feature pyramid structure. In order to enhance the feature extraction capability of the network, the four feature graphs extracted by the VGG19 are first convolved and then feature graph fusion is performed. And outputting four layers of features after feature pyramid fusion, and then sending the extracted features into a full-connection layer, and mapping the features into a proper scale for subsequent AdaIN.

The FIEnc network is used for extracting facial features of the pictures to be migrated and comprises a feature extraction module, a downsampling module and a residual error module.

The FIEnc network structure is seen in fig. 2, including feature extraction, downsampling and residual modules. To preserve the features of the image to be migrated, the network extracts facial features directly through stacked convolution. The first two processes are used for up-scaling and down-sampling of the features, and the residual module is used for improving the expressive power of the network.

The MFDec network is used for fusing the facial features of the original image and the cosmetic features of the reference image, and the decoder adopts AdaIN. The network includes a residual module, upsampling, and convolution.

Referring to fig. 4, firstly, three feature graphs with different scales output by the PSEnc are mapped into different network layers through the MLP, because the features of the reference graph are different from the feature space where the features of the original graph are located, and the features of the reference graph can be mapped into more reasonable feature space by adopting the MLP for mapping. And carrying out feature fusion on the mapped feature map and the feature map output by FIEnc through AdaIN. AdaIN was also used in the shallow layers of the MFDec network to introduce features in order to allow for complete retention of lighter makeup in the reference makeup. A residual module is adopted in a backbone structure of the network for improving the expression capability of the network, and upsampling and convolution are used for restoring the features into images.

When the multi-layer perceptron MLP is adopted, the reference features are closer to the distribution of the original image while keeping some original information, and the migration effect is more ideal.

All normalization layers of the MFDec use AdaIN to ensure that as many features as possible in the reference image are preserved, the expression of AdaIN being

Wherein x is the characteristic of original image, mu is the mean value of x in the channel direction, sigma ² The variance of x in the channel direction is epsilon is a minimum number, alpha is the mean value of the reference picture feature y in the channel direction, and gamma is the standard deviation of y in the channel direction.

The discriminator is used for measuring the distance between the generated distribution and the actual distribution. A markov discriminator is used in view of the blurriness of the image generated using a general discriminator. Compared to a normal arbiter, a markov arbiter will determine whether each local region is a generated image, the generated image will be finer,

the construction of the arbiter used is shown in fig. 5. Where SN is spectral normalization (Spectral Normalization), this normalization allows the network to meet lipschz continuity (Lipschitz continuity), limiting dramatic changes in the function, and making the model training process more stable. In addition, the design of the arbiter adopts the proposal in the WGAN, and the cross entropy loss is not adopted when the loss is calculated, but the L1 loss is adopted.

Network overall framework: when the network propagates forwards, the image to be made up can be directly input into FIEnc, and the image characteristics are obtained. The reference image is divided into three parts of an eye socket, skin and lips, and then the three parts are input into a full-connection layer, so that a style code is obtained, and the style code is the mean and variance of AdaIN. When the image features obtained in FIEnc pass through the residual layer of MFDec, as AdaIN is adopted in the residual layer, the mean and variance of the features will shift to values in the style code, i.e. the distribution of the features is migrated to the distribution of the reference image. And obtaining photos after makeup migration after the features are up-sampled twice.

The training of the network is performed in the mutual game of the generator G and the arbiter D, and the network converges when the dynamic balance is finally achieved. The loss function of FP-SCGAN is shown in equation (1.1). Wherein L is _adv To combat losses, including generator losses and arbiter losses, lambda _adv For its loss factor; l (L) _cyc Lambda for reconstruction loss _cyc For its loss factor;for global vgg penalty, lambda _g For its loss factor; />For local vgg loss, lambda _l For its loss factor; l (L) _makeup Lambda for loss of makeup _makeup For its loss factor. />The optimization process representing the network is the countermeasure training of G and D. When the parameters of G are fixed,representing that the confidence of D for the real sample is as enhanced as possible. When fixing the parameters of D +.>Representation optimization G, minimizing true samples as much as possibleAnd generating a gap between samples.

During network training, the specific steps are as follows:

input: training setWherein x represents an image without makeup and y represents an image with makeup +.>Pairing data representing the generated x, +.>Pairing data representing the generated y, M _x Face mask representing pre-makeup image, M _y A face mask representing the post-cosmetic image. The training batch size is B, the training set data size is N, the learning rate is gamma, and the learning rate is gamma, the training iteration number is J, I ₁ Is lost for L1.

Tasks: through training setContinuously iterating training to make the generator and the discriminator converged, thereby achieving the aim of makeup migration.

Referring to fig. 6, the specific steps are:

S20, obtaining the fusion characteristics of the reference makeup image and the image to be migrated: the obtained style characteristics are sent into the MLP to map the style characteristics to the characteristicsSpace, get style characteristic code _x ,code _y The facial features and style features of the pictures to be migrated are coded and sent into an MFDec, and feature fusion is carried out through a decoder AdaIN; meanwhile, adaIN is used for introducing features in the shallow layer of the MFDec, and the feature x of fusion of the reference makeup image and the image to be migrated is obtained through the MFDec network _y ,y _x ,x _x ,y _y ；

S10 specifically comprises taking B samples from N to form a batch

Sample x and sample y are fed into FIEnc (see FIG. 2 for structure) to give c _x C _y The calculation process is shown in formula (1.2), wherein X represents an input image, f is a feature extraction module, down is a downsampling module, and Res is a residual module.

c＝Res(Down(f(X))) (1.2)

The key region of the face is fed into PSEnc (see FIG. 3 for structural diagram) to obtain style characteristics s _x S _y The process of extracting the dressing style of the image is shown in the formula (1.3).

s＝concat(E(X*mask _eye ),E(X*mask _lip ),E(X*mask _face )) (1.3)

Wherein E is PSEnc, X is input image, mask _eye Mask for orbital portion of input image _lip Mask for inputting image _face Is a face mask for the input image. concat means that the three features are spliced according to the feature channel direction. E (-) is calculated as shown in formula (1.4), wherein mask _item Masking three critical areas, VGG is a pre-trained VGG network, FP is a feature pyramid. After the VGG network extracts the features, the feature pyramids are adopted to fuse the features with different sizes.

E(X*mask _item )＝FP(VGG(X*mask _item )) (1.4)

S20 specifically includes the step of comparing S _x And s _y Feeding MLP to obtain feature code _x Code _y 。

C to obtain _x And c _y ，code _x And code _y Feeding MFDec (structure see fig. 4) to give x _y ,y _x ,x _x ,y _y The calculation process is shown in the formula (1.5).

out＝conv(up(res(Dec(x,MLP(y _code ))))) (1.5)

Where conv is the convolutional layer, up is the upsampling, res is the residual block, and Dec () is the decoder.

S30 specifically includes fixing parameters of the generator G, calculating generator lossAndFor optimizing the discriminator D such that the discrimination capability of the discriminator D is enhanced. The calculation process is shown in formula (1.6). Wherein D is _x (·),D _y (-) represents the arbiter output of the sampled self-generated data; d (D) _x ，D _y Representing the arbiter output sampled from the real data.

Back propagation, updating the arbiter (structure see fig. 5) parameters. The two discriminators are used for discriminating the generated makeup image and the makeup removing image respectively, and the two discriminators are identical in structure.

Fixing parameters of the discriminator D, and calculating the discriminator lossAnd +.>For optimizing the generator G so that the deception ability of the generator G against the arbiter D is enhanced. The calculation process is shown in formula (1.7).

S40 specifically includes calculating identity loss L _idt (x _x X) and L _idt (y _y Y), the calculation process is shown as formula (1.8). The loss adopts a generator to reconstruct x and y, so that the network can retain the characteristics of the original image to a greater extent.

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁ (1.8)

Calculating dressing lossAnd +.>The calculation process is shown in the formula (1.9). The effect of this loss is to guide the migration of the critical area makeup.

Calculating local vgg lossAnd +.>The calculation process is shown as a formula (1.10), wherein M _y,i A face mask for the image after makeup is composed of the sequence numbers of key region including eye socket, face and lips, M _x,i Similarly, F _l (.) stands for layer one feature in vgg networks. The effect of this loss is to enhance the retention of critical area semantic information.

Computing global vgg penaltyAnd +.>The calculation process is shown in formula (1.11). The effect of this loss is to ensure that the generated image is similar to the original image semantic information.

S50 specifically includes combining x _y ,y _x Extracting content feature c from FIEnc _x,fake C _y,fake . Will c _x,fake And code _x Feeding inMFDec gets x _rec C, adding _y,fake And code _y Feeding MFDec to obtain y _rec 。

Calculating reconstruction loss L _cyc (x _rec X) and L _cyc (y _rec And y), wherein the calculation process is shown in a formula (1.12), wherein G (G (y, x, y) represents that after y is migrated by taking the makeup of x as a reference, the y is migrated by taking the makeup of y as a reference, and G (G (x, y, x) is the same as the above. The effect of this loss is to guide the network to make the whole style migration while preserving the basic features of the original image.

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁ (1.12)

Counter-propagating, updating the generator parameters.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A makeup style migration method based on an FP-SCGAN model is characterized in that an FP-SCGAN network comprises PSEnc, FIEnc, MFDec and a Markov discriminator, training of the network is carried out in a mutual game of a generator G and the discriminator D, and when dynamic balance is finally achieved, the network is converged, and the training specifically comprises the following steps:

S20, obtaining the fusion characteristics of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multi-layer perceptron to map the style characteristics into a characteristic space to obtain a style characteristic coding code _x ,code _y The obtained picture to be migratedFacial features and style feature codes are sent into the MFDec and feature fusion is carried out through a decoder AdaIN; meanwhile, adaIN is used for introducing features in the shallow layer of the MFDec, and the feature x of fusion of the reference makeup image and the image to be migrated is obtained through the MFDec network _y ,y _x ,x _x ,y _y ；

2. The method of claim 1, wherein the formula for calculating generator loss is:

wherein E is _x～X Representing the true probability of the non-make-up image; e (E) _y～Y Representing the true probability of the applied image; e (E) _x～X,y～Y Representing joint probabilities of the generated images; d (D) _x (·),D _y (-) represents the arbiter output of the sampled self-generated data; d (D) _x ，D _y Representing a arbiter output sampled from the real data; g (x, y) is migration by taking the makeup of x and y as a reference; g (y, x) is migration with reference to the makeup of x.

3. The method of claim 1, wherein the formula for calculating the discriminator loss is:

wherein D is _x (·),D _y (. Cndot.) represents the arbiter output of the sampled self-generated data, E _x～X,y～Y Representing joint probability of generating images, wherein G (x, y) is migration by taking the makeup of x and y as a reference; g (y, x) is migration with reference to the makeup of x.

4. The method of claim 1, wherein the identity loss is calculated by the formula:

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁

5. The method according to claim 1, wherein the calculation formula of the makeup loss is:

wherein,pairing data representing the generated x, +.>The generated pairing data of y is represented by x representing an image which is not made up, y representing an image which is made up, M _x,i A face mask for the pre-makeup image, where i is the sequence number of critical area including three parts of eye socket, face and lips, M _y,i A face mask representing a post-makeup image, wherein i represents a sequence number of a key region including three parts of an eye socket, a face and lips, and G (x, y) represents migration of x with reference to a makeup of y; g (y, x) represents migration of y with reference to the makeup of x ₁ For the L1 penalty, the absolute error between the true data and the generated data is calculated.

6. The method of claim 1, wherein the local vgg loss is calculated as:

7. The method of claim 1, wherein the global vgg penalty is calculated by the formula:

8. The method of claim 1, wherein the reconstruction loss is calculated as:

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁

wherein, G (G, x, y) represents that after the migration by taking the makeup of x as a reference, the y is migrated by taking the makeup of y as a reference, G (G, y, x) represents that after the migration by taking the makeup of y as a reference, the y is migrated by taking the makeup of x as a reference, the y is migrated ₁ For the L1 penalty, the absolute error between the true data and the generated data is calculated.