CN114863527A - Dressing style migration method based on FP-SCGAN model - Google Patents

Dressing style migration method based on FP-SCGAN model Download PDF

Info

Publication number
CN114863527A
CN114863527A CN202210488449.1A CN202210488449A CN114863527A CN 114863527 A CN114863527 A CN 114863527A CN 202210488449 A CN202210488449 A CN 202210488449A CN 114863527 A CN114863527 A CN 114863527A
Authority
CN
China
Prior art keywords
makeup
image
loss
discriminator
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210488449.1A
Other languages
Chinese (zh)
Other versions
CN114863527B (en
Inventor
李妹纳
杭丽君
熊攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210488449.1A priority Critical patent/CN114863527B/en
Publication of CN114863527A publication Critical patent/CN114863527A/en
Application granted granted Critical
Publication of CN114863527B publication Critical patent/CN114863527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a makeup style migration method based on an FP-SCGAN model, which combines a characteristic pyramid with an SCGAN algorithm. The FP-SCGAN network comprises four parts in total: PSEnc, FIEnc, MFDec, and markov discriminators. PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of the picture to be migrated, MFDEC is used for fusing the facial features of the original picture and the makeup features of the reference picture, and Markov discriminators are used for measuring the distance between the generated distribution and the actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.

Description

Dressing style migration method based on FP-SCGAN model
Technical Field
The invention belongs to the technical field of makeup migration methods, and relates to a makeup style migration method based on an FP-SCGAN model.
Background
Computer vision is one of the most popular research fields in the field of deep learning, and is widely applied in various fields at present. Along with the development and application of image processing algorithms, the development of the short video industry is accelerated, more and more functions of camera filters, beauty, special effects and the like appear, and a large number of users are attracted. The application of these functions is inseparable from the style migration algorithm in the image processing algorithm.
The goal of image style migration is to migrate the style of a reference picture to another picture or pictures. Before neural networks, image style migration has a common idea: images of a certain style are analyzed to establish a mathematical or statistical model, and then the images to be migrated are changed to better conform to the established model. However, this has a significant disadvantage: a program can basically only do a certain style or a certain scene. Practical applications based on traditional style migration studies are very limited. At present, a style migration algorithm is mainly based on deep learning, a neural network is adopted to extract features of a style image and an image to be migrated, and the style migration is realized by sampling and restoring the image after the features are fused.
Currently, many style migration algorithms focus on the migration of face attributes, wherein a typical face attribute is migrated during makeup migration. GAN-based makeup migration algorithms perform very well in many algorithms. The SCGAN can transfer reference makeup to a target image well, and can still produce good transfer effect even for targets with greatly different makeup positions. However, the problems that the eye socket has an unnatural edge, and the light makeup cannot migrate easily occur in the migration process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, and combines a characteristic pyramid and a SCGAN generation makeup migration network FP-SCGAN, so that the problems can be effectively solved, and the migration effect is improved. The method comprises the following steps:
the FP-SCGAN network comprises a PSEnc, a FIEnc, an MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, the network is converged when the dynamic balance is finally achieved, and the training specifically comprises the following steps:
s10, obtaining style characteristics: make up free image x and made upSending the makeup image y into the FIEnc, and obtaining the facial features c of the picture to be migrated through a feature extraction, downsampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multilayer perceptron to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style feature codes of the picture to be migrated are sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract content characteristics c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
Preferably, the formula for calculating the loss of the generator is:
Figure BDA0003630217390000031
wherein E is x~X Representing the true probability of the non-makeup image; e y~Y Representing the true probability of the applied image; e x~X,y~Y Representing joint probabilities of generating images; d x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y A discriminator output representing the sample from the real data; g (x, y) is to transfer x with the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
Preferably, the formula for calculating the loss of the discriminator is as follows:
Figure BDA0003630217390000032
wherein D is x (·),D y (. as an output function of the arbiter, E x~X,y~Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
Preferably, the calculation formula of the identity loss is as follows:
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1
wherein G (x, x) is the migration of x with the makeup of x as a reference; g (y, y) is the migration of y with the makeup of y as reference, | | · | | purple 1 The loss is L1, i.e. the absolute error between the real data and the generated data is calculated.
Preferably, the cosmetic loss is calculated by the following formula:
Figure BDA0003630217390000033
wherein the content of the first and second substances,
Figure BDA0003630217390000034
the pairing data representing the generated x is generated,
Figure BDA0003630217390000035
pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
Preferably, the calculation formula of the local vgg loss is as follows:
Figure BDA0003630217390000041
wherein M is x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, face and lips, and M y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Indicating the L2 loss, i.e., the squared error between the true data and the generated data.
Preferably, the global vgg penalty is calculated by the formula:
Figure BDA0003630217390000042
wherein G (x, y) represents the migration of x with the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Is lost as L2.
Preferably, the reconstruction loss is calculated by the formula:
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1
wherein G (G (y, x), y) represents that y is migrated with the makeup of x as a reference and then is migrated with the makeup of y as a reference, G (G (x, y), x) represents that x is migrated with the makeup of y as a reference and then is migrated with the makeup of x as a reference, | | | 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
The invention has the following beneficial effects:
compared with the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of a picture to be migrated, MFDEC is used for fusing the facial features of an original picture and the makeup features of a reference picture, and a Markov discriminator is used for measuring the distance between a generated distribution and an actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.
Drawings
Fig. 1 is a diagram of an overall FP-SCGAN network structure in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the present invention;
FIG. 2 is a FIEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;
FIG. 3 is a PSEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;
fig. 4 is a structure diagram of an MFDec in a dressing style migration method based on an FP-SCGAN model according to an embodiment of the present invention;
FIG. 5 is a diagram of a Markov decision device in the FP-SCGAN model-based makeup style migration method according to an embodiment of the present invention;
fig. 6 is a flowchart of steps of a makeup style migration method based on the FP-SCGAN model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, which is a diagram of the overall network structure of the present invention, the FP-SCGAN network is composed of four parts, which are PSEnc, FIEnc, MFDec and a discriminator.
PSEnc is used to extract reference cosmetic features including various types of information such as color, texture, edges, and the like. The network comprises feature extraction and feature fusion by adopting a feature pyramid.
The PSEnc network structure is shown in fig. 3. Because the input makeup reference image contains a lot of information irrelevant to makeup, only the images of the three parts of the eye sockets, the face and the lips extracted from the reference image are input. Firstly, feature extraction is carried out on a reference cosmetic image through a pre-training VGG19 network, and conv1_1, conv2_1, conv3_1 and conv4_1 feature maps output by a VGG19 network are fused by adopting a feature pyramid structure. In order to enhance the feature extraction capability of the network, the four feature maps extracted by the VGG19 are subjected to convolution processing and feature map fusion. And outputting four layers of characteristics after the characteristic pyramid fusion, and then sending the extracted characteristics into a full connection layer to map the characteristics into a proper scale for subsequent AdaIN.
The FIEnc network is used for extracting the facial features of the picture to be migrated and comprises a feature extraction module, a down sampling module and a residual error module.
FIEnc network architecture referring to fig. 2, the feature extraction, downsampling, and residual modules are included. In order to preserve the features of the image to be migrated, the network extracts the facial features directly by stacking convolution. The first two processes are used for feature upscaling and downsampling, and the residual error module is used for improving the expression capacity of the network.
The MFDEC network is used for fusing the facial features of the original image and the cosmetic features of the reference image, and the decoder adopts AdaIN. The network includes a residual module, upsampling, and convolution.
MFDec network structure referring to fig. 4, the three feature maps of different scales output by PSEnc are first mapped into different network layers by MLP because the features of the reference map are different from the feature space in which the features of the original map are located, and the mapping using MLP can map the features of the reference map into a more reasonable feature space. And performing feature fusion with a feature map output by FIEnc through AdaIN after mapping. AdaIN was also used in the shallow layers of the MFDec network to introduce features in order to allow the lighter makeup of the reference makeup to be fully retained. A residual error module is adopted in a backbone structure of the network to improve the expression capability of the network, and upsampling and convolution are used for restoring the characteristics into an image.
After the multilayer perceptron MLP is adopted, the reference characteristics are closer to the distribution of the original image while keeping some original information, and the migration effect is more ideal.
All layers of the MFDec use AdaIN, expressed as AdaIN, to ensure that as many features as possible remain in the reference image
Figure BDA0003630217390000071
Where x is the characteristic of the original image, μ is the mean value of x in the channel direction, σ 2 Is the variance of x in the channel direction, epsilon is a minimum number, alpha is the mean value of the reference image feature y in the channel direction, and gamma is the standard deviation of y in the channel direction.
The discriminator is used to measure the distance between the generated distribution and the actual distribution. A markov discriminator is used in consideration of the fact that an image generated by a general discriminator is blurred. The markov discriminator determines whether each local region is a generated image, which is more detailed,
the structure of the discriminator used is shown in fig. 5. The SN is Spectral Normalization (Spectral Normalization), and the Normalization mode can enable the network to meet the Lipschitz continuity (Lipschitz continuity), limit the violent change of the function and enable the model training process to be more stable. In addition, the design of the discriminator adopted the recommendation in WGAN, and the loss was calculated without using cross-entropy loss, but instead with L1 loss.
The network overall framework is as follows: when the network forwards transmits, the image to be made up is directly input into the FIEnc to obtain the image characteristics. The reference image is divided into three parts, namely an orbit, a skin and a lip, and then is input into a full-connected layer, so that a style code is obtained, wherein the style code is the mean value and the variance of AdaIN. When the image features obtained in the FIEnc pass through the residual layer of the MFDec, the mean and variance of the features will shift to the values in the trellis code due to AdaIN in the residual layer, i.e., the distribution of the features is shifted to the distribution of the reference image. The picture after makeup transfer can be obtained after two times of upsampling.
The training of the network is carried out in the mutual game of the generator G and the discriminator D, and the network is converged when the dynamic balance is finally achieved. The loss function of FP-SCGAN is shown in formula (1.1). In the formula, L adv To combat losses, including generator losses and discriminator losses, λ adv Is its loss factor; l is cyc For reconstruction of losses, λ cyc Is its loss factor;
Figure BDA0003630217390000072
for global vgg penalty, λ g Is its loss factor;
Figure BDA0003630217390000073
for local vgg loss, λ l Is its loss factor; l is makeup For cosmetic loss, λ makeup Is its loss factor.
Figure BDA0003630217390000074
The optimization process of the network is represented as antagonistic training of G and D. When the parameters of G are fixed, the parameters,
Figure BDA0003630217390000081
the representation enhances the confidence of D to the real sample as much as possible. When the parameter of D is fixed,
Figure BDA0003630217390000082
represents an optimization G that minimizes the gap between the real and generated samples as much as possible.
Figure BDA0003630217390000083
During network training, the specific steps are as follows:
inputting: training set
Figure BDA0003630217390000084
Wherein x represents an unpainted image, y represents a made-up image,
Figure BDA0003630217390000085
the pairing data representing the generated x is generated,
Figure BDA0003630217390000086
pairing data representing generated y, M x Face mask, M, representing pre-makeup images y A face mask representing a post-makeup image. Training batch size is B, training set data volume is N, learning rate is gamma, training iteration number is J, | · | | survival 1 Is lost as L1.
Task: through a training set
Figure BDA0003630217390000087
The continuous iterative training of the makeup transfer machine enables the generator and the discriminator to converge, thereby achieving the purpose of makeup transfer.
Referring to fig. 6, the specific steps are:
s10, obtaining style characteristics: sending the non-made-up image x and the made-up image y into the FINec, and obtaining the facial features c of the picture to be migrated through a feature extraction, down-sampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into MLP to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style features of the picture to be migrated are coded and sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract content characteristics c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
S10 specifically includes taking B samples from N to form a batch
Figure BDA0003630217390000091
Feeding the sample x and the sample y into FIEnc (structure see FIG. 2) to obtain c x And c y The calculation process is shown as formula (1.2), wherein X represents an input image, f is a feature extraction module, Down is a Down-sampling module, and Res is a residual error module.
c=Res(Down(f(X))) (1.2)
Sending the key region of the face into PSEnc (structure diagram, see FIG. 3) to obtain the style characteristics s x And s y The process of extracting the makeup style of the image is shown as the formula (1.3).
s=concat(E(X*mask eye ),E(X*mask lip ),E(X*mask face )) (1.3)
Wherein E is PSEnc, X is the input image, mask eye Mask for an orbital mask of an input image lip Mask for mouth of input image face Is a face mask of the input image. concat represents the splicing of three features according to the feature channel direction. E (.) is calculated as shown in formula (1.4), wherein mask item For the masks of three key regions, the VGG is a pre-trained VGG network, and the FP is a feature pyramid. And after the VGG network extracts the features, fusing the features with different sizes by adopting the feature pyramid.
E(X*mask item )=FP(VGG(X*mask item )) (1.4)
S20 includes x And s y Sending MLP to obtain feature code x And code y
C to be obtained x And c y ,code x And code y Fed into MFDec (see FIG. 4 for structure) to obtain x y ,y x ,x x ,y y The calculation process is shown as formula (1.5).
out=conv(up(res(Dec(x,MLP(y code ))))) (1.5)
Wherein conv is a convolutional layer, up is upsampling, res is a residual module, and Dec (eta) is a decoder.
S30 includes fixing parameters of generator G, calculating loss of generator
Figure BDA0003630217390000101
And
Figure BDA0003630217390000102
for optimizing the discriminator D so that the discrimination capability of the discriminator D is enhanced. The calculation process is shown as formula (1.6). Wherein D is x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y Representing the discriminator output sampled from the real data.
Figure BDA0003630217390000103
And (5) reversely propagating, and updating parameters of the discriminator (the structure is shown in figure 5). The two discriminators are used for discriminating the generated makeup image and the makeup removing image respectively, and the two discriminators are identical in structure.
Fixing the parameters of the discriminator D, calculating the discriminator loss
Figure BDA0003630217390000104
And
Figure BDA0003630217390000105
for optimizing the generator G such that the spoofing capability of the generator G for the discriminator D is enhanced. MeterThe calculation process is shown in the formula (1.7).
Figure BDA0003630217390000106
S40 specifically includes calculating an identity loss L idt (x x X) and L idt (y y Y), the calculation process is shown in formula (1.8). The loss uses the generator to reconstruct x and y, so that the network can retain the characteristics of the original image to a greater extent.
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1 (1.8)
Calculated cosmetic loss
Figure BDA0003630217390000107
And
Figure BDA0003630217390000108
the calculation process is shown as formula (1.9). The effect of this loss is to guide the migration of critical area makeup.
Figure BDA0003630217390000109
Calculating local vgg loss
Figure BDA00036302173900001010
And
Figure BDA00036302173900001011
the calculation process is shown as formula (1.10), wherein M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M x,i In the same way, F l (.) represent layer i features in vgg networks. The effect of this loss is to enhance the retention of semantic information for key regions.
Figure BDA0003630217390000111
Computing global vgg loss
Figure BDA0003630217390000112
And
Figure BDA0003630217390000113
the calculation process is shown as formula (1.11). The effect of this loss is to ensure that the generated image is similar to the original image semantic information.
Figure BDA0003630217390000114
S50 includes y ,y x Sending the content into FIEnc to extract content characteristics c x,fake And c y,fake . C is to x,fake And code x Fed into MFDec to obtain x rec C is mixing y,fake And code y Fed into MFDec to obtain y rec
Calculating the reconstruction loss L cyc (x rec X) and L cyc (y rec Y), the calculation process is shown in formula (1.12), wherein G (G (y, x), y) represents that y is transferred by taking the makeup of x as a reference and then is transferred by taking the makeup of y as a reference, and G (G (x, y), x) is the same. The loss plays a role in guiding the network to perform overall style migration, and meanwhile, the basic characteristics of the original image are reserved.
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1 (1.12)
And (5) back propagation and updating generator parameters.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A makeup style migration method based on FP-SCGAN model is characterized in that an FP-SCGAN network comprises PSEnc, FIEnc, MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, and the network converges when the dynamic balance is finally achieved, the training specifically comprises the following steps:
s10, obtaining style characteristics: sending the non-made-up image x and the made-up image y into the FINec, and obtaining the facial features c of the picture to be migrated through a feature extraction, down-sampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multilayer perceptron to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style feature codes of the picture to be migrated are sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract the contentCharacteristic c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
2. The method of claim 1, wherein the formula for calculating the generator loss is:
Figure FDA0003630217380000021
wherein E is x~X Representing the true probability of the non-makeup image; e y~Y Representing the true probability of the applied image; e x~X,y~Y Representing joint probabilities of generating images; d x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y A discriminator output representing the sample from the real data; g (x, y) is to transfer x with the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
3. The method of claim 1, wherein the formula for calculating the discriminant loss is:
Figure FDA0003630217380000022
wherein D is x (·),D y (. represents the discriminator output of the sampled self-generated data, E x~X,y~Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
4. The method of claim 1, wherein the identity loss is calculated by the formula:
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1
wherein G (x, x) is the migration of x with the makeup of x as a reference; g (y, y) is the migration of y with the makeup of y as reference, | | · | | purple 1 The loss is L1, i.e. the absolute error between the real data and the generated data is calculated.
5. The method of claim 1, wherein the cosmetic loss is calculated by the formula:
Figure FDA0003630217380000023
wherein the content of the first and second substances,
Figure FDA0003630217380000024
the pairing data representing the generated x is generated,
Figure FDA0003630217380000025
pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
6. The method of claim 1, wherein the local vgg loss is calculated by the formula:
Figure FDA0003630217380000031
wherein M is x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Indicating the L2 loss, i.e., the squared error between the true data and the generated data.
7. The method of claim 1, wherein the global vgg penalty is calculated by:
Figure FDA0003630217380000032
wherein G (x, y) represents the migration of x with the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Is lost as L2.
8. The method of claim 1, wherein the reconstruction loss is calculated by the formula:
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1
wherein, G (G (y, x), y) represents that y is transferred by taking the makeup of x as reference and then is transferred by taking the makeup of y as reference, G (G (x, y), x) represents that x is transferred by taking the makeup of y as reference and then is transferred by taking the makeup of x as reference, | | I \ | _ m 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
CN202210488449.1A 2022-05-06 2022-05-06 Makeup style migration method based on FP-SCGAN model Active CN114863527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210488449.1A CN114863527B (en) 2022-05-06 2022-05-06 Makeup style migration method based on FP-SCGAN model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210488449.1A CN114863527B (en) 2022-05-06 2022-05-06 Makeup style migration method based on FP-SCGAN model

Publications (2)

Publication Number Publication Date
CN114863527A true CN114863527A (en) 2022-08-05
CN114863527B CN114863527B (en) 2024-03-19

Family

ID=82634559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210488449.1A Active CN114863527B (en) 2022-05-06 2022-05-06 Makeup style migration method based on FP-SCGAN model

Country Status (1)

Country Link
CN (1) CN114863527B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464210A (en) * 2017-07-06 2017-12-12 浙江工业大学 A kind of image Style Transfer method based on production confrontation network
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network
US20190332850A1 (en) * 2018-04-27 2019-10-31 Apple Inc. Face Synthesis Using Generative Adversarial Networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464210A (en) * 2017-07-06 2017-12-12 浙江工业大学 A kind of image Style Transfer method based on production confrontation network
CN107644006A (en) * 2017-09-29 2018-01-30 北京大学 A kind of Chinese script character library automatic generation method based on deep neural network
US20190332850A1 (en) * 2018-04-27 2019-10-31 Apple Inc. Face Synthesis Using Generative Adversarial Networks

Also Published As

Publication number Publication date
CN114863527B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
TWI779969B (en) Image processing method, processor, electronic device and computer-readable storage medium
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
Liu et al. Detach and adapt: Learning cross-domain disentangled deep representation
CN110322416B (en) Image data processing method, apparatus and computer readable storage medium
CN107993238A (en) A kind of head-and-shoulder area image partition method and device based on attention model
CN111767906B (en) Face detection model training method, face detection device and electronic equipment
CN111754596A (en) Editing model generation method, editing model generation device, editing method, editing device, editing equipment and editing medium
CN113487618B (en) Portrait segmentation method, portrait segmentation device, electronic equipment and storage medium
CN111724400A (en) Automatic video matting method and system
WO2023066173A1 (en) Image processing method and apparatus, and storage medium and electronic device
WO2022148248A1 (en) Image processing model training method, image processing method and apparatus, electronic device, and computer program product
WO2021127916A1 (en) Facial emotion recognition method, smart device and computer-readabel storage medium
Sun et al. Masked lip-sync prediction by audio-visual contextual exploitation in transformers
Hu et al. Dear-gan: Degradation-aware face restoration with gan prior
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
Xiao et al. Image hazing algorithm based on generative adversarial networks
CN114863527A (en) Dressing style migration method based on FP-SCGAN model
CN116342377A (en) Self-adaptive generation method and system for camouflage target image in degraded scene
WO2022252372A1 (en) Image processing method, apparatus and device, and computer-readable storage medium
Wang et al. MetaScleraSeg: an effective meta-learning framework for generalized sclera segmentation
CN114049303A (en) Progressive bone age assessment method based on multi-granularity feature fusion
Yoo et al. FastSwap: A Lightweight One-Stage Framework for Real-Time Face Swapping
WO2024099026A1 (en) Image processing method and apparatus, device, storage medium and program product
Wu et al. Semantic image inpainting based on generative adversarial networks
CN117275069B (en) End-to-end head gesture estimation method based on learnable vector and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant