CN114863527A - Dressing style migration method based on FP-SCGAN model - Google Patents
Dressing style migration method based on FP-SCGAN model Download PDFInfo
- Publication number
- CN114863527A CN114863527A CN202210488449.1A CN202210488449A CN114863527A CN 114863527 A CN114863527 A CN 114863527A CN 202210488449 A CN202210488449 A CN 202210488449A CN 114863527 A CN114863527 A CN 114863527A
- Authority
- CN
- China
- Prior art keywords
- makeup
- image
- loss
- discriminator
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013508 migration Methods 0.000 title claims abstract description 48
- 230000005012 migration Effects 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012546 transfer Methods 0.000 claims abstract description 15
- 210000004279 orbit Anatomy 0.000 claims abstract description 14
- 230000001815 facial effect Effects 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims description 15
- 239000002537 cosmetic Substances 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 5
- 238000004321 preservation Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 230000008569 process Effects 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a makeup style migration method based on an FP-SCGAN model, which combines a characteristic pyramid with an SCGAN algorithm. The FP-SCGAN network comprises four parts in total: PSEnc, FIEnc, MFDec, and markov discriminators. PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of the picture to be migrated, MFDEC is used for fusing the facial features of the original picture and the makeup features of the reference picture, and Markov discriminators are used for measuring the distance between the generated distribution and the actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.
Description
Technical Field
The invention belongs to the technical field of makeup migration methods, and relates to a makeup style migration method based on an FP-SCGAN model.
Background
Computer vision is one of the most popular research fields in the field of deep learning, and is widely applied in various fields at present. Along with the development and application of image processing algorithms, the development of the short video industry is accelerated, more and more functions of camera filters, beauty, special effects and the like appear, and a large number of users are attracted. The application of these functions is inseparable from the style migration algorithm in the image processing algorithm.
The goal of image style migration is to migrate the style of a reference picture to another picture or pictures. Before neural networks, image style migration has a common idea: images of a certain style are analyzed to establish a mathematical or statistical model, and then the images to be migrated are changed to better conform to the established model. However, this has a significant disadvantage: a program can basically only do a certain style or a certain scene. Practical applications based on traditional style migration studies are very limited. At present, a style migration algorithm is mainly based on deep learning, a neural network is adopted to extract features of a style image and an image to be migrated, and the style migration is realized by sampling and restoring the image after the features are fused.
Currently, many style migration algorithms focus on the migration of face attributes, wherein a typical face attribute is migrated during makeup migration. GAN-based makeup migration algorithms perform very well in many algorithms. The SCGAN can transfer reference makeup to a target image well, and can still produce good transfer effect even for targets with greatly different makeup positions. However, the problems that the eye socket has an unnatural edge, and the light makeup cannot migrate easily occur in the migration process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, and combines a characteristic pyramid and a SCGAN generation makeup migration network FP-SCGAN, so that the problems can be effectively solved, and the migration effect is improved. The method comprises the following steps:
the FP-SCGAN network comprises a PSEnc, a FIEnc, an MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, the network is converged when the dynamic balance is finally achieved, and the training specifically comprises the following steps:
s10, obtaining style characteristics: make up free image x and made upSending the makeup image y into the FIEnc, and obtaining the facial features c of the picture to be migrated through a feature extraction, downsampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y ;
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multilayer perceptron to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style feature codes of the picture to be migrated are sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y ;
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract content characteristics c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
Preferably, the formula for calculating the loss of the generator is:
wherein E is x~X Representing the true probability of the non-makeup image; e y~Y Representing the true probability of the applied image; e x~X,y~Y Representing joint probabilities of generating images; d x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y A discriminator output representing the sample from the real data; g (x, y) is to transfer x with the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
Preferably, the formula for calculating the loss of the discriminator is as follows:
wherein D is x (·),D y (. as an output function of the arbiter, E x~X,y~Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
Preferably, the calculation formula of the identity loss is as follows:
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1
wherein G (x, x) is the migration of x with the makeup of x as a reference; g (y, y) is the migration of y with the makeup of y as reference, | | · | | purple 1 The loss is L1, i.e. the absolute error between the real data and the generated data is calculated.
Preferably, the cosmetic loss is calculated by the following formula:
wherein the content of the first and second substances,the pairing data representing the generated x is generated,pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
Preferably, the calculation formula of the local vgg loss is as follows:
wherein M is x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, face and lips, and M y,i The face mask represents a makeup image, wherein i represents the serial number of a key area, including three parts of an eye socket, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Indicating the L2 loss, i.e., the squared error between the true data and the generated data.
Preferably, the global vgg penalty is calculated by the formula:
wherein G (x, y) represents the migration of x with the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Is lost as L2.
Preferably, the reconstruction loss is calculated by the formula:
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1
wherein G (G (y, x), y) represents that y is migrated with the makeup of x as a reference and then is migrated with the makeup of y as a reference, G (G (x, y), x) represents that x is migrated with the makeup of y as a reference and then is migrated with the makeup of x as a reference, | | | 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
The invention has the following beneficial effects:
compared with the prior art, the invention provides a makeup style migration method based on an FP-SCGAN model, PSEnc is used for extracting reference makeup features, FIEnc is used for extracting facial features of a picture to be migrated, MFDEC is used for fusing the facial features of an original picture and the makeup features of a reference picture, and a Markov discriminator is used for measuring the distance between a generated distribution and an actual distribution. The improved algorithm can solve the problems that an eye socket has an unnatural edge and light eye makeup cannot be transferred during makeup transfer, and compared with the conventional mainstream SCGAN makeup transfer algorithm, the transfer effect is improved.
Drawings
Fig. 1 is a diagram of an overall FP-SCGAN network structure in a makeup style migration method based on an FP-SCGAN model according to an embodiment of the present invention;
FIG. 2 is a FIEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;
FIG. 3 is a PSEnc structure diagram in the dressing style migration method based on the FP-SCGAN model according to the embodiment of the present invention;
fig. 4 is a structure diagram of an MFDec in a dressing style migration method based on an FP-SCGAN model according to an embodiment of the present invention;
FIG. 5 is a diagram of a Markov decision device in the FP-SCGAN model-based makeup style migration method according to an embodiment of the present invention;
fig. 6 is a flowchart of steps of a makeup style migration method based on the FP-SCGAN model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, which is a diagram of the overall network structure of the present invention, the FP-SCGAN network is composed of four parts, which are PSEnc, FIEnc, MFDec and a discriminator.
PSEnc is used to extract reference cosmetic features including various types of information such as color, texture, edges, and the like. The network comprises feature extraction and feature fusion by adopting a feature pyramid.
The PSEnc network structure is shown in fig. 3. Because the input makeup reference image contains a lot of information irrelevant to makeup, only the images of the three parts of the eye sockets, the face and the lips extracted from the reference image are input. Firstly, feature extraction is carried out on a reference cosmetic image through a pre-training VGG19 network, and conv1_1, conv2_1, conv3_1 and conv4_1 feature maps output by a VGG19 network are fused by adopting a feature pyramid structure. In order to enhance the feature extraction capability of the network, the four feature maps extracted by the VGG19 are subjected to convolution processing and feature map fusion. And outputting four layers of characteristics after the characteristic pyramid fusion, and then sending the extracted characteristics into a full connection layer to map the characteristics into a proper scale for subsequent AdaIN.
The FIEnc network is used for extracting the facial features of the picture to be migrated and comprises a feature extraction module, a down sampling module and a residual error module.
FIEnc network architecture referring to fig. 2, the feature extraction, downsampling, and residual modules are included. In order to preserve the features of the image to be migrated, the network extracts the facial features directly by stacking convolution. The first two processes are used for feature upscaling and downsampling, and the residual error module is used for improving the expression capacity of the network.
The MFDEC network is used for fusing the facial features of the original image and the cosmetic features of the reference image, and the decoder adopts AdaIN. The network includes a residual module, upsampling, and convolution.
MFDec network structure referring to fig. 4, the three feature maps of different scales output by PSEnc are first mapped into different network layers by MLP because the features of the reference map are different from the feature space in which the features of the original map are located, and the mapping using MLP can map the features of the reference map into a more reasonable feature space. And performing feature fusion with a feature map output by FIEnc through AdaIN after mapping. AdaIN was also used in the shallow layers of the MFDec network to introduce features in order to allow the lighter makeup of the reference makeup to be fully retained. A residual error module is adopted in a backbone structure of the network to improve the expression capability of the network, and upsampling and convolution are used for restoring the characteristics into an image.
After the multilayer perceptron MLP is adopted, the reference characteristics are closer to the distribution of the original image while keeping some original information, and the migration effect is more ideal.
All layers of the MFDec use AdaIN, expressed as AdaIN, to ensure that as many features as possible remain in the reference image
Where x is the characteristic of the original image, μ is the mean value of x in the channel direction, σ 2 Is the variance of x in the channel direction, epsilon is a minimum number, alpha is the mean value of the reference image feature y in the channel direction, and gamma is the standard deviation of y in the channel direction.
The discriminator is used to measure the distance between the generated distribution and the actual distribution. A markov discriminator is used in consideration of the fact that an image generated by a general discriminator is blurred. The markov discriminator determines whether each local region is a generated image, which is more detailed,
the structure of the discriminator used is shown in fig. 5. The SN is Spectral Normalization (Spectral Normalization), and the Normalization mode can enable the network to meet the Lipschitz continuity (Lipschitz continuity), limit the violent change of the function and enable the model training process to be more stable. In addition, the design of the discriminator adopted the recommendation in WGAN, and the loss was calculated without using cross-entropy loss, but instead with L1 loss.
The network overall framework is as follows: when the network forwards transmits, the image to be made up is directly input into the FIEnc to obtain the image characteristics. The reference image is divided into three parts, namely an orbit, a skin and a lip, and then is input into a full-connected layer, so that a style code is obtained, wherein the style code is the mean value and the variance of AdaIN. When the image features obtained in the FIEnc pass through the residual layer of the MFDec, the mean and variance of the features will shift to the values in the trellis code due to AdaIN in the residual layer, i.e., the distribution of the features is shifted to the distribution of the reference image. The picture after makeup transfer can be obtained after two times of upsampling.
The training of the network is carried out in the mutual game of the generator G and the discriminator D, and the network is converged when the dynamic balance is finally achieved. The loss function of FP-SCGAN is shown in formula (1.1). In the formula, L adv To combat losses, including generator losses and discriminator losses, λ adv Is its loss factor; l is cyc For reconstruction of losses, λ cyc Is its loss factor;for global vgg penalty, λ g Is its loss factor;for local vgg loss, λ l Is its loss factor; l is makeup For cosmetic loss, λ makeup Is its loss factor.The optimization process of the network is represented as antagonistic training of G and D. When the parameters of G are fixed, the parameters,the representation enhances the confidence of D to the real sample as much as possible. When the parameter of D is fixed,represents an optimization G that minimizes the gap between the real and generated samples as much as possible.
During network training, the specific steps are as follows:
inputting: training setWherein x represents an unpainted image, y represents a made-up image,the pairing data representing the generated x is generated,pairing data representing generated y, M x Face mask, M, representing pre-makeup images y A face mask representing a post-makeup image. Training batch size is B, training set data volume is N, learning rate is gamma, training iteration number is J, | · | | survival 1 Is lost as L1.
Task: through a training setThe continuous iterative training of the makeup transfer machine enables the generator and the discriminator to converge, thereby achieving the purpose of makeup transfer.
Referring to fig. 6, the specific steps are:
s10, obtaining style characteristics: sending the non-made-up image x and the made-up image y into the FINec, and obtaining the facial features c of the picture to be migrated through a feature extraction, down-sampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y ;
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into MLP to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style features of the picture to be migrated are coded and sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y ;
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract content characteristics c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
Feeding the sample x and the sample y into FIEnc (structure see FIG. 2) to obtain c x And c y The calculation process is shown as formula (1.2), wherein X represents an input image, f is a feature extraction module, Down is a Down-sampling module, and Res is a residual error module.
c=Res(Down(f(X))) (1.2)
Sending the key region of the face into PSEnc (structure diagram, see FIG. 3) to obtain the style characteristics s x And s y The process of extracting the makeup style of the image is shown as the formula (1.3).
s=concat(E(X*mask eye ),E(X*mask lip ),E(X*mask face )) (1.3)
Wherein E is PSEnc, X is the input image, mask eye Mask for an orbital mask of an input image lip Mask for mouth of input image face Is a face mask of the input image. concat represents the splicing of three features according to the feature channel direction. E (.) is calculated as shown in formula (1.4), wherein mask item For the masks of three key regions, the VGG is a pre-trained VGG network, and the FP is a feature pyramid. And after the VGG network extracts the features, fusing the features with different sizes by adopting the feature pyramid.
E(X*mask item )=FP(VGG(X*mask item )) (1.4)
S20 includes x And s y Sending MLP to obtain feature code x And code y 。
C to be obtained x And c y ,code x And code y Fed into MFDec (see FIG. 4 for structure) to obtain x y ,y x ,x x ,y y The calculation process is shown as formula (1.5).
out=conv(up(res(Dec(x,MLP(y code ))))) (1.5)
Wherein conv is a convolutional layer, up is upsampling, res is a residual module, and Dec (eta) is a decoder.
S30 includes fixing parameters of generator G, calculating loss of generatorAndfor optimizing the discriminator D so that the discrimination capability of the discriminator D is enhanced. The calculation process is shown as formula (1.6). Wherein D is x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y Representing the discriminator output sampled from the real data.
And (5) reversely propagating, and updating parameters of the discriminator (the structure is shown in figure 5). The two discriminators are used for discriminating the generated makeup image and the makeup removing image respectively, and the two discriminators are identical in structure.
Fixing the parameters of the discriminator D, calculating the discriminator lossAndfor optimizing the generator G such that the spoofing capability of the generator G for the discriminator D is enhanced. MeterThe calculation process is shown in the formula (1.7).
S40 specifically includes calculating an identity loss L idt (x x X) and L idt (y y Y), the calculation process is shown in formula (1.8). The loss uses the generator to reconstruct x and y, so that the network can retain the characteristics of the original image to a greater extent.
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1 (1.8)
Calculated cosmetic lossAndthe calculation process is shown as formula (1.9). The effect of this loss is to guide the migration of critical area makeup.
Calculating local vgg lossAndthe calculation process is shown as formula (1.10), wherein M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M x,i In the same way, F l (.) represent layer i features in vgg networks. The effect of this loss is to enhance the retention of semantic information for key regions.
Computing global vgg lossAndthe calculation process is shown as formula (1.11). The effect of this loss is to ensure that the generated image is similar to the original image semantic information.
S50 includes y ,y x Sending the content into FIEnc to extract content characteristics c x,fake And c y,fake . C is to x,fake And code x Fed into MFDec to obtain x rec C is mixing y,fake And code y Fed into MFDec to obtain y rec 。
Calculating the reconstruction loss L cyc (x rec X) and L cyc (y rec Y), the calculation process is shown in formula (1.12), wherein G (G (y, x), y) represents that y is transferred by taking the makeup of x as a reference and then is transferred by taking the makeup of y as a reference, and G (G (x, y), x) is the same. The loss plays a role in guiding the network to perform overall style migration, and meanwhile, the basic characteristics of the original image are reserved.
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1 (1.12)
And (5) back propagation and updating generator parameters.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (8)
1. A makeup style migration method based on FP-SCGAN model is characterized in that an FP-SCGAN network comprises PSEnc, FIEnc, MFDec and a Markov discriminator, the training of the network is carried out in the mutual game of a generator G and a discriminator D, and the network converges when the dynamic balance is finally achieved, the training specifically comprises the following steps:
s10, obtaining style characteristics: sending the non-made-up image x and the made-up image y into the FINec, and obtaining the facial features c of the picture to be migrated through a feature extraction, down-sampling and residual error module x ,c y Sending the key area of the makeup reference image into PSEnc, extracting the features through a pre-trained VGG19 network and fusing the features through a feature pyramid to obtain a style feature s x ,s y ;
S20, obtaining the feature of fusion of the reference makeup image and the image to be migrated: sending the obtained style characteristics into a multilayer perceptron to map the style characteristics to a characteristic space to obtain a style characteristic code x ,code y The obtained facial features and style feature codes of the picture to be migrated are sent into the MFDec to be subjected to feature fusion through a decoder AdaIN; meanwhile, AdaIN is used in the shallow layer of the MFDEC to introduce the characteristics, and the characteristics x for fusing the reference makeup image and the image to be migrated are obtained through the MFDEC network y ,y x ,x x ,y y ;
S30, optimizing the arbiter and generator: fixing the parameters of the generator G, calculating the loss of the generator, optimizing the discriminator D to enhance the discrimination capability of the discriminator D, then performing back propagation, updating the parameters of the discriminator, and totally two discriminators which are respectively used for discriminating the generated makeup image and the makeup removing image and have the same structure; fixing parameters of the discriminator D, calculating the discriminator loss, and optimizing the generator G to enhance the deception capability of the generator G on the discriminator D;
s40, calculating various losses: the method comprises the steps of identity loss, wherein the loss adopts a generator to reconstruct an image to be migrated; cosmetic loss, which directs the migration of critical area makeup; local vgg loss that enhances preservation of critical area semantic information; global vgg loss, which guarantees that the generated image is similar to the original image semantic information;
and S50, updating generator parameters: at the same time x y ,y x Sending the content into FIEnc to extract the contentCharacteristic c x,fake ,c y,fake (ii) a Then, c is mixed x,fake And code x And c y,fake And code y Are fed separately into MFDec to give x rec And y rec (ii) a Further calculating reconstruction loss, wherein the loss guides the network to carry out overall style migration and simultaneously reserves the basic characteristics of the original image; and finally, performing back propagation and updating generator parameters.
2. The method of claim 1, wherein the formula for calculating the generator loss is:
wherein E is x~X Representing the true probability of the non-makeup image; e y~Y Representing the true probability of the applied image; e x~X,y~Y Representing joint probabilities of generating images; d x (·),D y () represents the arbiter output of the sampled self-generated data; d x ,D y A discriminator output representing the sample from the real data; g (x, y) is to transfer x with the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
3. The method of claim 1, wherein the formula for calculating the discriminant loss is:
wherein D is x (·),D y (. represents the discriminator output of the sampled self-generated data, E x~X,y~Y Representing the joint probability of generating images, and G (x, y) is to transfer x by taking the makeup of y as a reference; g (y, x) is the migration of y with the makeup of x as a reference.
4. The method of claim 1, wherein the identity loss is calculated by the formula:
L idt =||G(x,x)-x|| 1 +||G(y,y)-y|| 1
wherein G (x, x) is the migration of x with the makeup of x as a reference; g (y, y) is the migration of y with the makeup of y as reference, | | · | | purple 1 The loss is L1, i.e. the absolute error between the real data and the generated data is calculated.
5. The method of claim 1, wherein the cosmetic loss is calculated by the formula:
wherein the content of the first and second substances,the pairing data representing the generated x is generated,pairing data representing generated y, x representing an unpainted image, y representing a made-up image, M x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) represents migration of y with the makeup of x as a reference, | | · | | purple 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
6. The method of claim 1, wherein the local vgg loss is calculated by the formula:
wherein M is x,i The face mask represents the pre-makeup image, wherein i represents the serial number of a key area, including three parts of eye sockets, a face and lips, and M y,i The face mask represents the image after makeup, wherein i represents the serial number of a key area, including three parts of an orbit, a face and a lip, and G (x, y) represents that x is migrated by taking the makeup of y as reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Indicating the L2 loss, i.e., the squared error between the true data and the generated data.
7. The method of claim 1, wherein the global vgg penalty is calculated by:
wherein G (x, y) represents the migration of x with the makeup of y as a reference; g (y, x) denotes shifting y with the makeup of x as a reference, F l (.) represents the l-th layer feature in vgg networks, | · | | computationally 2 Is lost as L2.
8. The method of claim 1, wherein the reconstruction loss is calculated by the formula:
L cyc =||G(G(y,x),y)-y|| 1 +||G(G(x,y),x)-x|| 1
wherein, G (G (y, x), y) represents that y is transferred by taking the makeup of x as reference and then is transferred by taking the makeup of y as reference, G (G (x, y), x) represents that x is transferred by taking the makeup of y as reference and then is transferred by taking the makeup of x as reference, | | I \ | _ m 1 For the L1 loss, i.e., the absolute error between the real data and the generated data is calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210488449.1A CN114863527B (en) | 2022-05-06 | 2022-05-06 | Makeup style migration method based on FP-SCGAN model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210488449.1A CN114863527B (en) | 2022-05-06 | 2022-05-06 | Makeup style migration method based on FP-SCGAN model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114863527A true CN114863527A (en) | 2022-08-05 |
CN114863527B CN114863527B (en) | 2024-03-19 |
Family
ID=82634559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210488449.1A Active CN114863527B (en) | 2022-05-06 | 2022-05-06 | Makeup style migration method based on FP-SCGAN model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114863527B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107464210A (en) * | 2017-07-06 | 2017-12-12 | 浙江工业大学 | A kind of image Style Transfer method based on production confrontation network |
CN107644006A (en) * | 2017-09-29 | 2018-01-30 | 北京大学 | A kind of Chinese script character library automatic generation method based on deep neural network |
US20190332850A1 (en) * | 2018-04-27 | 2019-10-31 | Apple Inc. | Face Synthesis Using Generative Adversarial Networks |
-
2022
- 2022-05-06 CN CN202210488449.1A patent/CN114863527B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107464210A (en) * | 2017-07-06 | 2017-12-12 | 浙江工业大学 | A kind of image Style Transfer method based on production confrontation network |
CN107644006A (en) * | 2017-09-29 | 2018-01-30 | 北京大学 | A kind of Chinese script character library automatic generation method based on deep neural network |
US20190332850A1 (en) * | 2018-04-27 | 2019-10-31 | Apple Inc. | Face Synthesis Using Generative Adversarial Networks |
Also Published As
Publication number | Publication date |
---|---|
CN114863527B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI779969B (en) | Image processing method, processor, electronic device and computer-readable storage medium | |
CN112233038B (en) | True image denoising method based on multi-scale fusion and edge enhancement | |
Liu et al. | Detach and adapt: Learning cross-domain disentangled deep representation | |
CN110322416B (en) | Image data processing method, apparatus and computer readable storage medium | |
CN107993238A (en) | A kind of head-and-shoulder area image partition method and device based on attention model | |
CN111767906B (en) | Face detection model training method, face detection device and electronic equipment | |
CN111754596A (en) | Editing model generation method, editing model generation device, editing method, editing device, editing equipment and editing medium | |
CN113487618B (en) | Portrait segmentation method, portrait segmentation device, electronic equipment and storage medium | |
CN111724400A (en) | Automatic video matting method and system | |
WO2023066173A1 (en) | Image processing method and apparatus, and storage medium and electronic device | |
WO2022148248A1 (en) | Image processing model training method, image processing method and apparatus, electronic device, and computer program product | |
WO2021127916A1 (en) | Facial emotion recognition method, smart device and computer-readabel storage medium | |
Sun et al. | Masked lip-sync prediction by audio-visual contextual exploitation in transformers | |
Hu et al. | Dear-gan: Degradation-aware face restoration with gan prior | |
CN113837290A (en) | Unsupervised unpaired image translation method based on attention generator network | |
Xiao et al. | Image hazing algorithm based on generative adversarial networks | |
CN114863527A (en) | Dressing style migration method based on FP-SCGAN model | |
CN116342377A (en) | Self-adaptive generation method and system for camouflage target image in degraded scene | |
WO2022252372A1 (en) | Image processing method, apparatus and device, and computer-readable storage medium | |
Wang et al. | MetaScleraSeg: an effective meta-learning framework for generalized sclera segmentation | |
CN114049303A (en) | Progressive bone age assessment method based on multi-granularity feature fusion | |
Yoo et al. | FastSwap: A Lightweight One-Stage Framework for Real-Time Face Swapping | |
WO2024099026A1 (en) | Image processing method and apparatus, device, storage medium and program product | |
Wu et al. | Semantic image inpainting based on generative adversarial networks | |
CN117275069B (en) | End-to-end head gesture estimation method based on learnable vector and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |