CN114863527A

CN114863527A - Dressing style migration method based on FP-SCGAN model

Info

Publication number: CN114863527A
Application number: CN202210488449.1A
Authority: CN
Inventors: 李妹纳; 杭丽君; 熊攀
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-08-05
Anticipated expiration: 2042-05-06
Also published as: CN114863527B

Abstract

The invention discloses a makeup style migration method based on an FP-SCGAN model, which combines a characteristic pyramid with an SCGAN algorithm. The FP-SCGAN network comprises four parts in total: PSEnc, FIEnc, MFDec, and markov discriminators. PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of the picture to be migrated, MFDEC is used for fusing the facial features of the original picture and the makeup features of the reference picture, and Markov discriminators are used for measuring the distance between the generated distribution and the actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.

Description

Dressing style migration method based on FP-SCGAN model

Technical Field

The invention belongs to the technical field of makeup migration methods, and relates to a makeup style migration method based on an FP-SCGAN model.

Background

Computer vision is one of the most popular research fields in the field of deep learning, and is widely applied in various fields at present. Along with the development and application of image processing algorithms, the development of the short video industry is accelerated, more and more functions of camera filters, beauty, special effects and the like appear, and a large number of users are attracted. The application of these functions is inseparable from the style migration algorithm in the image processing algorithm.

The goal of image style migration is to migrate the style of a reference picture to another picture or pictures. Before neural networks, image style migration has a common idea: images of a certain style are analyzed to establish a mathematical or statistical model, and then the images to be migrated are changed to better conform to the established model. However, this has a significant disadvantage: a program can basically only do a certain style or a certain scene. Practical applications based on traditional style migration studies are very limited. At present, a style migration algorithm is mainly based on deep learning, a neural network is adopted to extract features of a style image and an image to be migrated, and the style migration is realized by sampling and restoring the image after the features are fused.

Currently, many style migration algorithms focus on the migration of face attributes, wherein a typical face attribute is migrated during makeup migration. GAN-based makeup migration algorithms perform very well in many algorithms. The SCGAN can transfer reference makeup to a target image well, and can still produce good transfer effect even for targets with greatly different makeup positions. However, the problems that the eye socket has an unnatural edge, and the light makeup cannot migrate easily occur in the migration process.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, and combines a characteristic pyramid and a SCGAN generation makeup migration network FP-SCGAN, so that the problems can be effectively solved, and the migration effect is improved. The method comprises the following steps:

the FP-SCGAN network comprises a PSEnc, a FIEnc, an MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, the network is converged when the dynamic balance is finally achieved, and the training specifically comprises the following steps:

s10, obtaining style characteristics: make up free image x and made upSending the makeup image y into the FIEnc, and obtaining the facial features c of the picture to be migrated through a feature extraction, downsampling and residual error module _x ，c _y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s _x ,s _y ；

S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multilayer perceptron to map the style characteristics to a characteristic space to obtain a style characteristic code _x ,code _y The obtained facial features and style feature codes of the picture to be migrated are sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network _y ,y _x ,x _x ,y _y ；

S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;

s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;

and S50, updating generator parameters: at the same time x _y ,y _x Sending the content into FIEnc to extract content characteristics c _x,fake ,c _y,fake (ii) a Then, c is mixed _x,fake And code _x And c _y,fake And code _y Are fed separately into MFDec to give x _rec And y _rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.

Preferably, the formula for calculating the loss of the generator is:

wherein E is _x～X Representing the true probability of the non-makeup image; e _y～Y Representing the true probability of the applied image; e _x～X,y～Y Representing joint probabilities of generating images; d _x (·),D _y () represents the arbiter output of the sampled self-generated data; d _x ，D _y A discriminator output representing the sample from the real data; g (x, y) is to transfer x with the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.

Preferably, the formula for calculating the loss of the discriminator is as follows:

wherein D is _x (·),D _y (. as an output function of the arbiter, E _x～X,y～Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.

Preferably, the calculation formula of the identity loss is as follows:

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁

wherein G (x, x) is the migration of x with the makeup of x as a reference; g (y, y) is the migration of y with the makeup of y as reference, | | · | | purple ₁ The loss is L1, i.e. the absolute error between the real data and the generated data is calculated.

Preferably, the cosmetic loss is calculated by the following formula:

wherein the content of the first and second substances,

the pairing data representing the generated x is generated,

pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M _x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M _y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple ₁ For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.

Preferably, the calculation formula of the local vgg loss is as follows:

wherein M is _x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, face and lips, and M _y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F _l (.) represents the l-th layer feature in vgg networks, | · | | computationally ₂ Indicating the L2 loss, i.e., the squared error between the true data and the generated data.

Preferably, the global vgg penalty is calculated by the formula:

wherein G (x, y) represents the migration of x with the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F _l (.) represents the l-th layer feature in vgg networks, | · | | computationally ₂ Is lost as L2.

Preferably, the reconstruction loss is calculated by the formula:

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁

wherein G (G (y, x), y) represents that y is migrated with the makeup of x as a reference and then is migrated with the makeup of y as a reference, G (G (x, y), x) represents that x is migrated with the makeup of y as a reference and then is migrated with the makeup of x as a reference, | | | ₁ For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.

The invention has the following beneficial effects:

compared with the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of a picture to be migrated, MFDEC is used for fusing the facial features of an original picture and the makeup features of a reference picture, and a Markov discriminator is used for measuring the distance between a generated distribution and an actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.

Drawings

Fig. 1 is a diagram of an overall FP-SCGAN network structure in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the present invention;

FIG. 2 is a FIEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;

FIG. 3 is a PSEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;

fig. 4 is a structure diagram of an MFDec in a dressing style migration method based on an FP-SCGAN model according to an embodiment of the present invention;

FIG. 5 is a diagram of a Markov decision device in the FP-SCGAN model-based makeup style migration method according to an embodiment of the present invention;

fig. 6 is a flowchart of steps of a makeup style migration method based on the FP-SCGAN model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1, which is a diagram of the overall network structure of the present invention, the FP-SCGAN network is composed of four parts, which are PSEnc, FIEnc, MFDec and a discriminator.

PSEnc is used to extract reference cosmetic features including various types of information such as color, texture, edges, and the like. The network comprises feature extraction and feature fusion by adopting a feature pyramid.

The PSEnc network structure is shown in fig. 3. Because the input makeup reference image contains a lot of information irrelevant to makeup, only the images of the three parts of the eye sockets, the face and the lips extracted from the reference image are input. Firstly, feature extraction is carried out on a reference cosmetic image through a pre-training VGG19 network, and conv1_1, conv2_1, conv3_1 and conv4_1 feature maps output by a VGG19 network are fused by adopting a feature pyramid structure. In order to enhance the feature extraction capability of the network, the four feature maps extracted by the VGG19 are subjected to convolution processing and feature map fusion. And outputting four layers of characteristics after the characteristic pyramid fusion, and then sending the extracted characteristics into a full connection layer to map the characteristics into a proper scale for subsequent AdaIN.

The FIEnc network is used for extracting the facial features of the picture to be migrated and comprises a feature extraction module, a down sampling module and a residual error module.

FIEnc network architecture referring to fig. 2, the feature extraction, downsampling, and residual modules are included. In order to preserve the features of the image to be migrated, the network extracts the facial features directly by stacking convolution. The first two processes are used for feature upscaling and downsampling, and the residual error module is used for improving the expression capacity of the network.

The MFDEC network is used for fusing the facial features of the original image and the cosmetic features of the reference image, and the decoder adopts AdaIN. The network includes a residual module, upsampling, and convolution.

MFDec network structure referring to fig. 4, the three feature maps of different scales output by PSEnc are first mapped into different network layers by MLP because the features of the reference map are different from the feature space in which the features of the original map are located, and the mapping using MLP can map the features of the reference map into a more reasonable feature space. And performing feature fusion with a feature map output by FIEnc through AdaIN after mapping. AdaIN was also used in the shallow layers of the MFDec network to introduce features in order to allow the lighter makeup of the reference makeup to be fully retained. A residual error module is adopted in a backbone structure of the network to improve the expression capability of the network, and upsampling and convolution are used for restoring the characteristics into an image.

After the multilayer perceptron MLP is adopted, the reference characteristics are closer to the distribution of the original image while keeping some original information, and the migration effect is more ideal.

All layers of the MFDec use AdaIN, expressed as AdaIN, to ensure that as many features as possible remain in the reference image

Where x is the characteristic of the original image, μ is the mean value of x in the channel direction, σ ² Is the variance of x in the channel direction, epsilon is a minimum number, alpha is the mean value of the reference image feature y in the channel direction, and gamma is the standard deviation of y in the channel direction.

The discriminator is used to measure the distance between the generated distribution and the actual distribution. A markov discriminator is used in consideration of the fact that an image generated by a general discriminator is blurred. The markov discriminator determines whether each local region is a generated image, which is more detailed,

the structure of the discriminator used is shown in fig. 5. The SN is Spectral Normalization (Spectral Normalization), and the Normalization mode can enable the network to meet the Lipschitz continuity (Lipschitz continuity), limit the violent change of the function and enable the model training process to be more stable. In addition, the design of the discriminator adopted the recommendation in WGAN, and the loss was calculated without using cross-entropy loss, but instead with L1 loss.

The network overall framework is as follows: when the network forwards transmits, the image to be made up is directly input into the FIEnc to obtain the image characteristics. The reference image is divided into three parts, namely an orbit, a skin and a lip, and then is input into a full-connected layer, so that a style code is obtained, wherein the style code is the mean value and the variance of AdaIN. When the image features obtained in the FIEnc pass through the residual layer of the MFDec, the mean and variance of the features will shift to the values in the trellis code due to AdaIN in the residual layer, i.e., the distribution of the features is shifted to the distribution of the reference image. The picture after makeup transfer can be obtained after two times of upsampling.

The training of the network is carried out in the mutual game of the generator G and the discriminator D, and the network is converged when the dynamic balance is finally achieved. The loss function of FP-SCGAN is shown in formula (1.1). In the formula, L _adv To combat losses, including generator losses and discriminator losses, λ _adv Is its loss factor; l is _cyc For reconstruction of losses, λ _cyc Is its loss factor;

for global vgg penalty, λ _g Is its loss factor;

for local vgg loss, λ _l Is its loss factor; l is _makeup For cosmetic loss, λ _makeup Is its loss factor.

The optimization process of the network is represented as antagonistic training of G and D. When the parameters of G are fixed, the parameters,

the representation enhances the confidence of D to the real sample as much as possible. When the parameter of D is fixed,

represents an optimization G that minimizes the gap between the real and generated samples as much as possible.

During network training, the specific steps are as follows:

inputting: training set

Wherein x represents an unpainted image, y represents a made-up image,

the pairing data representing the generated x is generated,

pairing data representing generated y, M _x Face mask, M, representing pre-makeup images _y A face mask representing a post-makeup image. Training batch size is B, training set data volume is N, learning rate is gamma, training iteration number is J, | · | | survival ₁ Is lost as L1.

Task: through a training set

The continuous iterative training of the makeup transfer machine enables the generator and the discriminator to converge, thereby achieving the purpose of makeup transfer.

Referring to fig. 6, the specific steps are:

s10, obtaining style characteristics: sending the non-made-up image x and the made-up image y into the FINec, and obtaining the facial features c of the picture to be migrated through a feature extraction, down-sampling and residual error module _x ，c _y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s _x ,s _y ；

S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into MLP to map the style characteristics to a characteristic space to obtain a style characteristic code _x ,code _y The obtained facial features and style features of the picture to be migrated are coded and sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network _y ,y _x ,x _x ,y _y ；

S10 specifically includes taking B samples from N to form a batch

Feeding the sample x and the sample y into FIEnc (structure see FIG. 2) to obtain c _x And c _y The calculation process is shown as formula (1.2), wherein X represents an input image, f is a feature extraction module, Down is a Down-sampling module, and Res is a residual error module.

c＝Res(Down(f(X))) (1.2)

Sending the key region of the face into PSEnc (structure diagram, see FIG. 3) to obtain the style characteristics s _x And s _y The process of extracting the makeup style of the image is shown as the formula (1.3).

s＝concat(E(X*mask _eye ),E(X*mask _lip ),E(X*mask _face )) (1.3)

Wherein E is PSEnc, X is the input image, mask _eye Mask for an orbital mask of an input image _lip Mask for mouth of input image _face Is a face mask of the input image. concat represents the splicing of three features according to the feature channel direction. E (.) is calculated as shown in formula (1.4), wherein mask _item For the masks of three key regions, the VGG is a pre-trained VGG network, and the FP is a feature pyramid. And after the VGG network extracts the features, fusing the features with different sizes by adopting the feature pyramid.

E(X*mask _item )＝FP(VGG(X*mask _item )) (1.4)

S20 includes _x And s _y Sending MLP to obtain feature code _x And code _y 。

C to be obtained _x And c _y ，code _x And code _y Fed into MFDec (see FIG. 4 for structure) to obtain x _y ,y _x ,x _x ,y _y The calculation process is shown as formula (1.5).

out＝conv(up(res(Dec(x,MLP(y _code ))))) (1.5)

Wherein conv is a convolutional layer, up is upsampling, res is a residual module, and Dec (eta) is a decoder.

S30 includes fixing parameters of generator G, calculating loss of generator

And

for optimizing the discriminator D so that the discrimination capability of the discriminator D is enhanced. The calculation process is shown as formula (1.6). Wherein D is _x (·),D _y () represents the arbiter output of the sampled self-generated data; d _x ，D _y Representing the discriminator output sampled from the real data.

And (5) reversely propagating, and updating parameters of the discriminator (the structure is shown in figure 5). The two discriminators are used for discriminating the generated makeup image and the makeup removing image respectively, and the two discriminators are identical in structure.

Fixing the parameters of the discriminator D, calculating the discriminator loss

And

for optimizing the generator G such that the spoofing capability of the generator G for the discriminator D is enhanced. MeterThe calculation process is shown in the formula (1.7).

S40 specifically includes calculating an identity loss L _idt (x _x X) and L _idt (y _y Y), the calculation process is shown in formula (1.8). The loss uses the generator to reconstruct x and y, so that the network can retain the characteristics of the original image to a greater extent.

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁ (1.8)

Calculated cosmetic loss

And

the calculation process is shown as formula (1.9). The effect of this loss is to guide the migration of critical area makeup.

Calculating local vgg loss

And

the calculation process is shown as formula (1.10), wherein M _y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M _x,i In the same way, F _l (.) represent layer i features in vgg networks. The effect of this loss is to enhance the retention of semantic information for key regions.

Computing global vgg loss

And

the calculation process is shown as formula (1.11). The effect of this loss is to ensure that the generated image is similar to the original image semantic information.

S50 includes _y ,y _x Sending the content into FIEnc to extract content characteristics c _x,fake And c _y,fake . C is to _x,fake And code _x Fed into MFDec to obtain x _rec C is mixing _y,fake And code _y Fed into MFDec to obtain y _rec 。

Calculating the reconstruction loss L _cyc (x _rec X) and L _cyc (y _rec Y), the calculation process is shown in formula (1.12), wherein G (G (y, x), y) represents that y is transferred by taking the makeup of x as a reference and then is transferred by taking the makeup of y as a reference, and G (G (x, y), x) is the same. The loss plays a role in guiding the network to perform overall style migration, and meanwhile, the basic characteristics of the original image are reserved.

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁ (1.12)

And (5) back propagation and updating generator parameters.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A makeup style migration method based on FP-SCGAN model is characterized in that an FP-SCGAN network comprises PSEnc, FIEnc, MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, and the network converges when the dynamic balance is finally achieved, the training specifically comprises the following steps:

and S50, updating generator parameters: at the same time x _y ,y _x Sending the content into FIEnc to extract the contentCharacteristic c _x,fake ,c _y,fake (ii) a Then, c is mixed _x,fake And code _x And c _y,fake And code _y Are fed separately into MFDec to give x _rec And y _rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.

2. The method of claim 1, wherein the formula for calculating the generator loss is:

3. The method of claim 1, wherein the formula for calculating the discriminant loss is:

wherein D is _x (·),D _y (. represents the discriminator output of the sampled self-generated data, E _x～X,y～Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.

4. The method of claim 1, wherein the identity loss is calculated by the formula:

L _idt ＝||G(x,x)-x|| ₁ +||G(y,y)-y|| ₁

5. The method of claim 1, wherein the cosmetic loss is calculated by the formula:

wherein the content of the first and second substances,

the pairing data representing the generated x is generated,

pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M _x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M _y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple ₁ For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.

6. The method of claim 1, wherein the local vgg loss is calculated by the formula:

wherein M is _x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M _y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) denotes shifting y with the makeup of x as a reference, F _l (.) represents the l-th layer feature in vgg networks, | · | | computationally ₂ Indicating the L2 loss, i.e., the squared error between the true data and the generated data.

7. The method of claim 1, wherein the global vgg penalty is calculated by:

8. The method of claim 1, wherein the reconstruction loss is calculated by the formula:

L _cyc ＝||G(G(y,x),y)-y|| ₁ +||G(G(x,y),x)-x|| ₁

wherein, G (G (y, x), y) represents that y is transferred by taking the makeup of x as reference and then is transferred by taking the makeup of y as reference, G (G (x, y), x) represents that x is transferred by taking the makeup of y as reference and then is transferred by taking the makeup of x as reference, | | I \ | _ m ₁ For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.