CN115496650A - Makeup migration method based on generation countermeasure network - Google Patents
Makeup migration method based on generation countermeasure network Download PDFInfo
- Publication number
- CN115496650A CN115496650A CN202211029533.3A CN202211029533A CN115496650A CN 115496650 A CN115496650 A CN 115496650A CN 202211029533 A CN202211029533 A CN 202211029533A CN 115496650 A CN115496650 A CN 115496650A
- Authority
- CN
- China
- Prior art keywords
- image
- makeup
- generator
- pixel
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000013508 migration Methods 0.000 title claims abstract description 20
- 230000005012 migration Effects 0.000 title claims abstract description 20
- 230000004927 fusion Effects 0.000 claims abstract description 43
- 238000010606 normalization Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 45
- 238000012546 transfer Methods 0.000 claims description 26
- 230000004913 activation Effects 0.000 claims description 18
- 239000002537 cosmetic Substances 0.000 claims description 14
- 238000001228 spectrum Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 9
- 230000001815 facial effect Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 235000009508 confectionery Nutrition 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000037308 hair color Effects 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 230000036548 skin texture Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention provides a makeup migration method based on a generated confrontation network, which comprises the following steps of (1) dividing a certain number of images into a training set and a testing set aiming at a data set, and carrying out pretreatment such as cutting and normalization on each image; (2) The makeup image obtains a characteristic component through a face style generator, the component is sent into two convolution layers for characteristic extraction, a pattern code of the makeup image and a pixel image characteristic diagram are input into a fusion block, an output parameter is sent into self attention through a face image encoder for information fusion, and a face image with the makeup style of the makeup image is obtained through a Resblock module; (3) The discriminator distinguishes between the real makeup image and the generated image, and the generator generates an image with makeup image style through the input pixel image and the makeup image in the step (2); (4) And controlling the contour weight of the original image and the content weight of the makeup image by changing the codes to obtain a final image.
Description
Technical Field
The invention provides a makeup migration method based on a generation countermeasure network, and belongs to the field of image style migration.
Background
Image style migration, also referred to as image style conversion, refers to a method of converting the style of an input image into a designated image style or styles. In image style migration, the goal is to maintain the texture or style of the original image while processing the new image. Image style migration technology is currently widely applied in the image processing fields of mobile camera filters, artistic image generation and the like. With the development of mobile phones and portable intelligent photographing devices, people currently pay attention as a face makeup migration technology, which is a core of portrait processing software, such as a commercially available software figure show on the market. These software systems require manual operation by the user and provide only a certain number of fixed makeup styles to the user. The face makeup transfer technology can transfer makeup to a plain face and keep the makeup style of the face, so that the face structure is kept unchanged and the makeup style is shown as much as possible.
Face makeup migration is a challenging task, and current methods for face makeup migration are mainly classified into three categories: a conventional method, a convolutional neural network method, and a generation countermeasure network method. The makeup transfer based on the traditional method adjusts the skin texture and the skin color difference of the plain color image and the makeup image by calculating the color and illumination change before and after makeup, and transfers the makeup to the plain color image. The method has high requirements on the pictures before and after makeup, and has low practicability. Firstly, selecting a picture which is most similar to a current pixel Yan Renlian from a makeup face database based on a convolutional neural network; then, carrying out face segmentation by adopting a full convolution image segmentation network, and extracting a five sense organ region; and finally finishing makeup migration on foundation (corresponding to the face), lip gloss (corresponding to the two lips) and eye shadow (corresponding to the two eyes). Although the method can control the dressing concentration, the whole effect is not natural. With the continuous development of the generation of the confrontation network technology, the confrontation network based face makeup migration method can significantly improve the migration effect compared with the traditional face makeup migration method and the face makeup migration method based on the convolutional neural network due to the capability of generating a visually realistic image, and has become a research hotspot in the field of face makeup migration at present.
The makeup transfer work based on the traditional method often lacks enough required training data and shows poor mobility on experimental results, while the deep learning-based method often is difficult to restore the details of makeup and is difficult to transfer correctly under extreme poses and conditions, and the problems of unnecessary change of background and hair color, poor makeup transfer capability, no recognition of partial face areas, insensitivity of partial cosmetics such as blush and the like, rough transfer result edge and the like of the makeup transfer among images with large differences hinder the application of the makeup transfer in actual life. Therefore, the invention comprehensively considers the advantages of the makeup and the dressing, designs a high-flexibility full-automatic makeup transfer method, and is convenient for the use and the deployment of enterprise software.
Disclosure of Invention
The present invention aims to overcome the deficiencies of the prior art and to easily implement global/local make-up and make-up removal with shadow control by editing style codes without additional computational effort based on generating a confrontational network. The general steps include designing a UV map, extracting two-dimensional plane information from a reference stereo image with makeup, thereby determining makeup components; secondly, a face contour encoder is designed, and high-dimensional space encoding is carried out on information needing to be reserved, such as head gestures, facial expressions, illumination, shielding and the like of an original face; finally, the extracted cosmetic components are mixed with the retained coded information by using an attention mechanism, so that the required cosmetic transfer image is generated.
To achieve the above object, the present invention provides a makeup transfer method based on a generative confrontation network, comprising the steps of:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain characteristic component y of each part of makeup i I = { lip, skin, eye }, representing the parts from lips, skin, eyes, respectively, each partBoth are fed into two convolutional layers for feature extraction, and each part y is subjected to pooling and convolution i Pattern code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
Further, in the step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set for the Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed in an (image-average)/variance mode, wherein the average and variance of each channel are 0.5, the number of the key points of the human face is labeled, the number of the pixel values of the overlay mask is 7 to represent the upper lip, 9 to represent the lower lip, 1,6, 13 to represent the skin of the face, 4 to represent the left eye, and 5 to represent the right eye, so that the corresponding face area on the mask is obtained.
Further, in the step (2), the makeup information info is obtained based on the reference makeup face code for the input pixel color image makeup (ii) a For input makeup image, and pixel-based image contour extraction module and contour information info sketch To put the two intoThe line fusion and the face feature decoding are used for generating the required face image with makeup, and the specific method is as follows:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate target eye, skin and lip information { y lip ,y skin ,y eyes ConvBlocks1 consists of 4 × 4 convolution with step size 2, padding 1, adaptive instanceNorm2d normalization and LeakyReLU activation function, convBlocks2 consists of 3 × 3 transpose convolution with step size 1, padding 0, adaptive instanceNorm2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information and fusing the outline information with makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
wherein, F j Represents the input of the fusion block of the j-th stage, μ (·), σ (·) represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (·) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z for cosmetic image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Go into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,...,L (2)
where L represents the total number of layers, the self attention (MSA) is composed of k parallel Self Attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA 2 (z);...;SA k (z)] (3)
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v [ ]]Calculating similarity of the current Query and all keys according to Query, key and Value information after linear transformation of a matrix, obtaining a group of weights by the similarity Value through a Softmax layer, and summing the products of the group of weights and corresponding values to obtain a Value under the Attention:
the result of MSA is input to a multi-level perceptron consisting of three fully connected levels:
z l =MLP(z′ l )+z′ l ,l=1,...,L (6)
wherein MLP represents a full link layer, z' l Representing the input of each fully connected layer, MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
Further, in the step (3), the arbiter and the generator are trained cyclically, specifically as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
wherein, the first and the second end of the pipe are connected with each other,the representative input field is a makeup image,representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removalLike the image of the eye(s) to be,representing the loss of the original makeup image and the generated makeup image judged by the discriminator,shows the loss of the original makeup color image and the original makeup color image judged by the discriminator, and the total loss of the discriminatorPassing back an optimization discriminator through a gradient;
(b) After training the discriminator, we retrain the generator:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the resultThe result of (2); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the resultAs a result of (1):
wherein, the first and the second end of the pipe are connected with each other,indicating a loss of the makeup image generated by the generator,representing a loss of the pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray histogram during makeup removal, | | · | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, outputting the result of the generator as the result of the makeup removal of the false makeup image according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel imageInputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup imageLoss function Is defined as:
wherein | · | purple sweet 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image for generating a random interference enhancement model with robustness defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introducedTo maintain local consistency that does not require transitions:
wherein, F l (. -) represents a face local generation module, | | | · suspension circuitry 1 Representing L1 normalization, | · | | non-woven phosphor 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G :
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
Further, in the step (4), the specific method is as follows:
controlling the profile weight ω of an original image by changing the encoding γ source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the present invention discloses a makeup transfer method based on the creation of a antagonistic network, which exhibits great flexibility in makeup transfer to support makeup removal, makeup transfer and partially specific makeup transfer. The spectrum normalization is added into the network of the face contour, so that the discriminator meets the 1-Lipschitz condition, the intensity of function change is limited, parameters are more stable in the optimization process of a neural network, gradient explosion is not easy to occur, meanwhile, gaussian noise is injected in the process of training a generator to enable the generated image to be finer and smoother, and the generated image has the makeup style of a reference image and keeps the characteristics of an original image at the same time by introducing local consistency loss.
Drawings
FIG. 1 is a schematic diagram of a method provided in an embodiment of the invention;
FIG. 2 is a schematic diagram of a residual module provided in an embodiment of the invention;
FIG. 3 is a schematic diagram of a fusion module provided in an embodiment of the invention;
FIG. 4 is a graphical representation of makeup migration results under a test data set in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and should not be taken as limiting the scope of the present invention
The invention provides a makeup transfer method based on a generation countermeasure network, which comprises the following steps:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain a characteristic component y of each part of makeup i I = { lip, skin, eye }, respectively, each section is sent into two convolution layers for feature extraction, and each section y is subjected to pooling and convolution i Pattern code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
In the step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set according to a Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed in an (image-average value)/variance mode, wherein the average value and the variance of each channel are 0.5, the number of key points of the human face is labeled, the number of the covered mask pixel values is 7 to represent an upper lip, 9 to represent a lower lip, 1,6, 13 to represent facial skin, 4 to represent a left eye and 5 to represent a right eye, and therefore a corresponding facial region on the mask is obtained.
In the step (2), makeup information info is obtained based on the reference makeup face code for the input pixel image makeup (ii) a For input makeup image, and pixel-based image contour extraction module and contour information info sketch The two are fused, and a required face image with makeup is generated through facial feature decoding, and the specific method is as follows:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate information of target eyes, skin and lips { y } lip ,y skin ,y eyes ConvBlocks1 consists of a 4 × 4 convolution with step size of 2, padding of 1, adaptationInstanceNorm 2d normalization and LeakyReLU activation function, convBlocks2 consists of a 3 × 3 transposed convolution with step size of 1, padding of 0, adaptationNorm 2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information, and fusing the outline information with the makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
wherein, F j Represents the input of the fusion block of the j-th stage, μ (), σ () represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (-) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z of makeup image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Go into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,...,L (2)
where L represents the total number of layers, the self-attention (MSA) is composed of k parallel self-attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA2(z);...;SA k (z)] (3)
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v [ ]]The similarity of the current Query and all keys is calculated by the Query, key and Value information after linear transformation matrix,and obtaining a group of weights by passing the similarity Value through a Softmax layer, and obtaining a Value under the Attention according to the summation of the products of the group of weights and the corresponding Value:
the result passing through the MSA is input to a multi-layer perceptron consisting of three fully-connected layers:
z l =MLP(z′ l )+z′ l ,l=1,...,L (6)
wherein MLP represents a full link layer, z' l Represents the input of each fully connected layer, and is MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
In the step (3), the arbiter and the generator are trained circularly, and the specific method is as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
wherein the content of the first and second substances,the representative input field is a makeup image,representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removal,representing the loss of the original makeup image and the generated makeup image judged by the discriminator,indicating the loss of the original makeup and makeup color images judged by the discriminator, the total loss of the discriminatorPassing back an optimization discriminator through a gradient;
(b) After training the discriminator, we retrain the generator:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the resultThe result of (1); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the resultAs a result of (1):
wherein the content of the first and second substances,indicating a loss of the makeup image generated by the generator,representing a loss of the pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray histogram during makeup removal, | | · | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, wherein the result output by the generator is the result of the false makeup image after the makeup removal according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel imageInputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup imageLoss function Is defined as:
wherein | · | purple sweet 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image, for generating a robustness of the random interference enhancement model, defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introducedTo maintain local consistency that does not require transitions:
wherein, F l (. DEG) represents a module for partially generating a face, | | ·| non-woven vision 1 Expressing L1 normalization, | · | non-calculation 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G :
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
In the step (4), the specific method is as follows:
controlling contour weight omega of original image by changing encoding gamma source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (5)
1. A makeup migration method based on a generative confrontation network, characterized in that it comprises the following steps:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain a characteristic component y of each part of makeup i I = { lip, skin, eye }, respectively, representing the parts from the lips, skin, eyes, each part fed inTwo convolutional layers for feature extraction, pooled and convolved for each part y i Style code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
2. The method for transferring Makeup based on the generation of the confrontation network according to claim 1, wherein in step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set for the Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed by adopting an (image-mean)/variance method, wherein the mean and variance of each channel are both 0.5, the key points of the human face are numbered, the value of 7 in the pixel values of the overlay mask represents the upper lip, 9 represents the lower lip, 1,6, 13 represents the skin of the face, 4 represents the left eye, and 5 represents the right eye, so as to obtain the corresponding face area on the mask.
3. The makeup transfer method based on creation of a countering network according to claim 1 or 2,wherein, in the step (2), the makeup information info is obtained based on the reference makeup face code for the input pixel-color image makeup (ii) a Contour extraction module and contour information info for input makeup image and pixel-based image sketch The two are fused and the face image with makeup is generated by decoding the facial features, and the specific method comprises the following steps:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate target eye, skin and lip information { y lip ,y skin ,y eyes ConvBlocks1 consists of 4 × 4 convolution with step size 2, padding 1, adaptive instanceNorm2d normalization and LeakyReLU activation function, convBlocks2 consists of 3 × 3 transpose convolution with step size 1, padding 0, adaptive instanceNorm2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information and fusing the outline information with makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
wherein, F j Represents the input of the fusion block of the j-th stage, μ (), σ () represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (-) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z for cosmetic image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Feed into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,…,L (2)
where L represents the total number of layers, the self attention (MSA) is composed of k parallel Self Attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA 2 (z);…;SA k (z)] (3)
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v]Calculating similarity of the current Query and all keys according to Query, key and Value information after linear transformation of a matrix, obtaining a group of weights by the similarity Value through a Softmax layer, and summing the products of the group of weights and corresponding values to obtain a Value under the Attention:
the result of MSA is input to a multi-level perceptron consisting of three fully connected levels:
z l =MLP(z′ l )+z′ l ,l=1,…,L (6)
wherein MLP represents a full link layer, z' l Representing the input of each fully connected layer, MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
4. The makeup migration method based on generation confrontation as claimed in claim 3, wherein in step (3), the arbiter and generator are trained cyclically as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
wherein the content of the first and second substances,the representative input field is a makeup image,representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removal,representing the loss of the original makeup image and the generated makeup image judged by the discriminator,indicating the loss of the original makeup and makeup color images judged by the discriminator, the total loss of the discriminatorPassing back an optimization discriminator through a gradient;
(b) After training the arbiter we retrain the generator again:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the resultThe result of (2); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the resultAs a result of (1):
wherein, the first and the second end of the pipe are connected with each other,indicating a loss of the makeup image generated by the generator,representing a loss of a pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray level histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray level histogram during makeup removal, | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, outputting the result of the generator as the result of the makeup removal of the false makeup image according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel imageInputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup imageLoss functionIs defined as:
wherein | · | charging 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image, for generating a robustness of the random interference enhancement model, defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introducedTo maintain local consistency that does not require transitions:
wherein, F l (. DEG) represents a module for partially generating a face, | | ·| non-woven vision 1 Representing L1 normalization, | · | | non-woven phosphor 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G :
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
5. The makeup transfer method based on generation confrontation as claimed in claim 3, wherein the step (4) is as follows:
controlling contour weight omega of original image by changing encoding gamma source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211029533.3A CN115496650A (en) | 2022-08-25 | 2022-08-25 | Makeup migration method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211029533.3A CN115496650A (en) | 2022-08-25 | 2022-08-25 | Makeup migration method based on generation countermeasure network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115496650A true CN115496650A (en) | 2022-12-20 |
Family
ID=84466608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211029533.3A Pending CN115496650A (en) | 2022-08-25 | 2022-08-25 | Makeup migration method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496650A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863032A (en) * | 2023-06-27 | 2023-10-10 | 河海大学 | Flood disaster scene generation method based on generation countermeasure network |
CN117036157A (en) * | 2023-10-09 | 2023-11-10 | 易方信息科技股份有限公司 | Editable simulation digital human figure design method, system, equipment and medium |
CN118014865A (en) * | 2024-04-10 | 2024-05-10 | 青岛童幻动漫有限公司 | Image fusion method for cartoon making |
-
2022
- 2022-08-25 CN CN202211029533.3A patent/CN115496650A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863032A (en) * | 2023-06-27 | 2023-10-10 | 河海大学 | Flood disaster scene generation method based on generation countermeasure network |
CN116863032B (en) * | 2023-06-27 | 2024-04-09 | 河海大学 | Flood disaster scene generation method based on generation countermeasure network |
CN117036157A (en) * | 2023-10-09 | 2023-11-10 | 易方信息科技股份有限公司 | Editable simulation digital human figure design method, system, equipment and medium |
CN117036157B (en) * | 2023-10-09 | 2024-02-20 | 易方信息科技股份有限公司 | Editable simulation digital human figure design method, system, equipment and medium |
CN118014865A (en) * | 2024-04-10 | 2024-05-10 | 青岛童幻动漫有限公司 | Image fusion method for cartoon making |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Song et al. | Geometry-aware face completion and editing | |
CN110070483B (en) | Portrait cartoon method based on generation type countermeasure network | |
CN115496650A (en) | Makeup migration method based on generation countermeasure network | |
CN108875935B (en) | Natural image target material visual characteristic mapping method based on generation countermeasure network | |
Akimoto et al. | Automatic creation of 3D facial models | |
CN109919830B (en) | Method for restoring image with reference eye based on aesthetic evaluation | |
Chen et al. | Example-based composite sketching of human portraits | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
CN108288072A (en) | A kind of facial expression synthetic method based on generation confrontation network | |
CN112950661A (en) | Method for generating antithetical network human face cartoon based on attention generation | |
US20230044644A1 (en) | Large-scale generation of photorealistic 3d models | |
CN110853119B (en) | Reference picture-based makeup transfer method with robustness | |
Liu et al. | Psgan++: Robust detail-preserving makeup transfer and removal | |
CN113538608B (en) | Controllable figure image generation method based on generation countermeasure network | |
CN113362422B (en) | Shadow robust makeup transfer system and method based on decoupling representation | |
CN112686816A (en) | Image completion method based on content attention mechanism and mask code prior | |
CN113570684A (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN112633288B (en) | Face sketch generation method based on painting brush touch guidance | |
CN115345773B (en) | Makeup migration method based on generation of confrontation network | |
CN111241963A (en) | First-person visual angle video interactive behavior identification method based on interactive modeling | |
CN111612687B (en) | Automatic makeup method for face image | |
CN117157673A (en) | Method and system for forming personalized 3D head and face models | |
CN117333604A (en) | Character face replay method based on semantic perception nerve radiation field | |
CN112241708A (en) | Method and apparatus for generating new person image from original person image | |
CN116486495A (en) | Attention and generation countermeasure network-based face image privacy protection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |