CN115496650A - Makeup migration method based on generation countermeasure network - Google Patents

Makeup migration method based on generation countermeasure network Download PDF

Info

Publication number
CN115496650A
CN115496650A CN202211029533.3A CN202211029533A CN115496650A CN 115496650 A CN115496650 A CN 115496650A CN 202211029533 A CN202211029533 A CN 202211029533A CN 115496650 A CN115496650 A CN 115496650A
Authority
CN
China
Prior art keywords
image
makeup
generator
pixel
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211029533.3A
Other languages
Chinese (zh)
Inventor
范国玉
葛琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211029533.3A priority Critical patent/CN115496650A/en
Publication of CN115496650A publication Critical patent/CN115496650A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a makeup migration method based on a generated confrontation network, which comprises the following steps of (1) dividing a certain number of images into a training set and a testing set aiming at a data set, and carrying out pretreatment such as cutting and normalization on each image; (2) The makeup image obtains a characteristic component through a face style generator, the component is sent into two convolution layers for characteristic extraction, a pattern code of the makeup image and a pixel image characteristic diagram are input into a fusion block, an output parameter is sent into self attention through a face image encoder for information fusion, and a face image with the makeup style of the makeup image is obtained through a Resblock module; (3) The discriminator distinguishes between the real makeup image and the generated image, and the generator generates an image with makeup image style through the input pixel image and the makeup image in the step (2); (4) And controlling the contour weight of the original image and the content weight of the makeup image by changing the codes to obtain a final image.

Description

Makeup migration method based on generation countermeasure network
Technical Field
The invention provides a makeup migration method based on a generation countermeasure network, and belongs to the field of image style migration.
Background
Image style migration, also referred to as image style conversion, refers to a method of converting the style of an input image into a designated image style or styles. In image style migration, the goal is to maintain the texture or style of the original image while processing the new image. Image style migration technology is currently widely applied in the image processing fields of mobile camera filters, artistic image generation and the like. With the development of mobile phones and portable intelligent photographing devices, people currently pay attention as a face makeup migration technology, which is a core of portrait processing software, such as a commercially available software figure show on the market. These software systems require manual operation by the user and provide only a certain number of fixed makeup styles to the user. The face makeup transfer technology can transfer makeup to a plain face and keep the makeup style of the face, so that the face structure is kept unchanged and the makeup style is shown as much as possible.
Face makeup migration is a challenging task, and current methods for face makeup migration are mainly classified into three categories: a conventional method, a convolutional neural network method, and a generation countermeasure network method. The makeup transfer based on the traditional method adjusts the skin texture and the skin color difference of the plain color image and the makeup image by calculating the color and illumination change before and after makeup, and transfers the makeup to the plain color image. The method has high requirements on the pictures before and after makeup, and has low practicability. Firstly, selecting a picture which is most similar to a current pixel Yan Renlian from a makeup face database based on a convolutional neural network; then, carrying out face segmentation by adopting a full convolution image segmentation network, and extracting a five sense organ region; and finally finishing makeup migration on foundation (corresponding to the face), lip gloss (corresponding to the two lips) and eye shadow (corresponding to the two eyes). Although the method can control the dressing concentration, the whole effect is not natural. With the continuous development of the generation of the confrontation network technology, the confrontation network based face makeup migration method can significantly improve the migration effect compared with the traditional face makeup migration method and the face makeup migration method based on the convolutional neural network due to the capability of generating a visually realistic image, and has become a research hotspot in the field of face makeup migration at present.
The makeup transfer work based on the traditional method often lacks enough required training data and shows poor mobility on experimental results, while the deep learning-based method often is difficult to restore the details of makeup and is difficult to transfer correctly under extreme poses and conditions, and the problems of unnecessary change of background and hair color, poor makeup transfer capability, no recognition of partial face areas, insensitivity of partial cosmetics such as blush and the like, rough transfer result edge and the like of the makeup transfer among images with large differences hinder the application of the makeup transfer in actual life. Therefore, the invention comprehensively considers the advantages of the makeup and the dressing, designs a high-flexibility full-automatic makeup transfer method, and is convenient for the use and the deployment of enterprise software.
Disclosure of Invention
The present invention aims to overcome the deficiencies of the prior art and to easily implement global/local make-up and make-up removal with shadow control by editing style codes without additional computational effort based on generating a confrontational network. The general steps include designing a UV map, extracting two-dimensional plane information from a reference stereo image with makeup, thereby determining makeup components; secondly, a face contour encoder is designed, and high-dimensional space encoding is carried out on information needing to be reserved, such as head gestures, facial expressions, illumination, shielding and the like of an original face; finally, the extracted cosmetic components are mixed with the retained coded information by using an attention mechanism, so that the required cosmetic transfer image is generated.
To achieve the above object, the present invention provides a makeup transfer method based on a generative confrontation network, comprising the steps of:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain characteristic component y of each part of makeup i I = { lip, skin, eye }, representing the parts from lips, skin, eyes, respectively, each partBoth are fed into two convolutional layers for feature extraction, and each part y is subjected to pooling and convolution i Pattern code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
Further, in the step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set for the Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed in an (image-average)/variance mode, wherein the average and variance of each channel are 0.5, the number of the key points of the human face is labeled, the number of the pixel values of the overlay mask is 7 to represent the upper lip, 9 to represent the lower lip, 1,6, 13 to represent the skin of the face, 4 to represent the left eye, and 5 to represent the right eye, so that the corresponding face area on the mask is obtained.
Further, in the step (2), the makeup information info is obtained based on the reference makeup face code for the input pixel color image makeup (ii) a For input makeup image, and pixel-based image contour extraction module and contour information info sketch To put the two intoThe line fusion and the face feature decoding are used for generating the required face image with makeup, and the specific method is as follows:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate target eye, skin and lip information { y lip ,y skin ,y eyes ConvBlocks1 consists of 4 × 4 convolution with step size 2, padding 1, adaptive instanceNorm2d normalization and LeakyReLU activation function, convBlocks2 consists of 3 × 3 transpose convolution with step size 1, padding 0, adaptive instanceNorm2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information and fusing the outline information with makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
Figure BDA0003815526160000031
wherein, F j Represents the input of the fusion block of the j-th stage, μ (·), σ (·) represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (·) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z for cosmetic image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Go into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,...,L (2)
where L represents the total number of layers, the self attention (MSA) is composed of k parallel Self Attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA 2 (z);...;SA k (z)] (3)
Figure BDA0003815526160000041
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,
Figure BDA0003815526160000042
weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v [ ]]Calculating similarity of the current Query and all keys according to Query, key and Value information after linear transformation of a matrix, obtaining a group of weights by the similarity Value through a Softmax layer, and summing the products of the group of weights and corresponding values to obtain a Value under the Attention:
the result of MSA is input to a multi-level perceptron consisting of three fully connected levels:
z l =MLP(z′ l )+z′ l ,l=1,...,L (6)
wherein MLP represents a full link layer, z' l Representing the input of each fully connected layer, MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
Further, in the step (3), the arbiter and the generator are trained cyclically, specifically as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
Figure BDA0003815526160000043
Figure BDA0003815526160000044
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003815526160000045
the representative input field is a makeup image,
Figure BDA0003815526160000046
representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removalLike the image of the eye(s) to be,
Figure BDA0003815526160000047
representing the loss of the original makeup image and the generated makeup image judged by the discriminator,
Figure BDA0003815526160000051
shows the loss of the original makeup color image and the original makeup color image judged by the discriminator, and the total loss of the discriminator
Figure BDA0003815526160000052
Passing back an optimization discriminator through a gradient;
(b) After training the discriminator, we retrain the generator:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the result
Figure BDA0003815526160000053
The result of (2); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the result
Figure BDA0003815526160000054
As a result of (1):
Figure BDA0003815526160000055
Figure BDA0003815526160000056
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003815526160000057
indicating a loss of the makeup image generated by the generator,
Figure BDA0003815526160000058
representing a loss of the pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Figure BDA0003815526160000059
Figure BDA00038155261600000510
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray histogram during makeup removal, | | · | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, outputting the result of the generator as the result of the makeup removal of the false makeup image according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel image
Figure BDA00038155261600000511
Inputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup image
Figure BDA00038155261600000512
Loss function
Figure BDA00038155261600000513
Figure BDA00038155261600000514
Is defined as:
Figure BDA00038155261600000515
Figure BDA00038155261600000516
wherein | · | purple sweet 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image for generating a random interference enhancement model with robustness defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introduced
Figure BDA0003815526160000061
To maintain local consistency that does not require transitions:
Figure BDA0003815526160000062
wherein, F l (. -) represents a face local generation module, | | | · suspension circuitry 1 Representing L1 normalization, | · | | non-woven phosphor 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G
Figure BDA0003815526160000063
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
Further, in the step (4), the specific method is as follows:
controlling the profile weight ω of an original image by changing the encoding γ source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the present invention discloses a makeup transfer method based on the creation of a antagonistic network, which exhibits great flexibility in makeup transfer to support makeup removal, makeup transfer and partially specific makeup transfer. The spectrum normalization is added into the network of the face contour, so that the discriminator meets the 1-Lipschitz condition, the intensity of function change is limited, parameters are more stable in the optimization process of a neural network, gradient explosion is not easy to occur, meanwhile, gaussian noise is injected in the process of training a generator to enable the generated image to be finer and smoother, and the generated image has the makeup style of a reference image and keeps the characteristics of an original image at the same time by introducing local consistency loss.
Drawings
FIG. 1 is a schematic diagram of a method provided in an embodiment of the invention;
FIG. 2 is a schematic diagram of a residual module provided in an embodiment of the invention;
FIG. 3 is a schematic diagram of a fusion module provided in an embodiment of the invention;
FIG. 4 is a graphical representation of makeup migration results under a test data set in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and should not be taken as limiting the scope of the present invention
The invention provides a makeup transfer method based on a generation countermeasure network, which comprises the following steps:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain a characteristic component y of each part of makeup i I = { lip, skin, eye }, respectively, each section is sent into two convolution layers for feature extraction, and each section y is subjected to pooling and convolution i Pattern code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
In the step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set according to a Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed in an (image-average value)/variance mode, wherein the average value and the variance of each channel are 0.5, the number of key points of the human face is labeled, the number of the covered mask pixel values is 7 to represent an upper lip, 9 to represent a lower lip, 1,6, 13 to represent facial skin, 4 to represent a left eye and 5 to represent a right eye, and therefore a corresponding facial region on the mask is obtained.
In the step (2), makeup information info is obtained based on the reference makeup face code for the input pixel image makeup (ii) a For input makeup image, and pixel-based image contour extraction module and contour information info sketch The two are fused, and a required face image with makeup is generated through facial feature decoding, and the specific method is as follows:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate information of target eyes, skin and lips { y } lip ,y skin ,y eyes ConvBlocks1 consists of a 4 × 4 convolution with step size of 2, padding of 1, adaptationInstanceNorm 2d normalization and LeakyReLU activation function, convBlocks2 consists of a 3 × 3 transposed convolution with step size of 1, padding of 0, adaptationNorm 2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information, and fusing the outline information with the makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
Figure BDA0003815526160000081
wherein, F j Represents the input of the fusion block of the j-th stage, μ (), σ () represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (-) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z of makeup image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Go into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,...,L (2)
where L represents the total number of layers, the self-attention (MSA) is composed of k parallel self-attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA2(z);...;SA k (z)] (3)
Figure BDA0003815526160000091
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,
Figure BDA0003815526160000092
weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v [ ]]The similarity of the current Query and all keys is calculated by the Query, key and Value information after linear transformation matrix,and obtaining a group of weights by passing the similarity Value through a Softmax layer, and obtaining a Value under the Attention according to the summation of the products of the group of weights and the corresponding Value:
the result passing through the MSA is input to a multi-layer perceptron consisting of three fully-connected layers:
z l =MLP(z′ l )+z′ l ,l=1,...,L (6)
wherein MLP represents a full link layer, z' l Represents the input of each fully connected layer, and is MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
In the step (3), the arbiter and the generator are trained circularly, and the specific method is as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
Figure BDA0003815526160000093
Figure BDA0003815526160000094
wherein the content of the first and second substances,
Figure BDA0003815526160000095
the representative input field is a makeup image,
Figure BDA0003815526160000096
representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removal,
Figure BDA0003815526160000097
representing the loss of the original makeup image and the generated makeup image judged by the discriminator,
Figure BDA0003815526160000098
indicating the loss of the original makeup and makeup color images judged by the discriminator, the total loss of the discriminator
Figure BDA0003815526160000099
Passing back an optimization discriminator through a gradient;
(b) After training the discriminator, we retrain the generator:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the result
Figure BDA0003815526160000101
The result of (1); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the result
Figure BDA0003815526160000102
As a result of (1):
Figure BDA0003815526160000103
Figure BDA0003815526160000104
wherein the content of the first and second substances,
Figure BDA0003815526160000105
indicating a loss of the makeup image generated by the generator,
Figure BDA0003815526160000106
representing a loss of the pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Figure BDA0003815526160000107
Figure BDA0003815526160000108
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray histogram during makeup removal, | | · | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, wherein the result output by the generator is the result of the false makeup image after the makeup removal according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel image
Figure BDA0003815526160000109
Inputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup image
Figure BDA00038155261600001010
Loss function
Figure BDA00038155261600001011
Figure BDA00038155261600001012
Is defined as:
Figure BDA00038155261600001013
Figure BDA00038155261600001014
wherein | · | purple sweet 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image, for generating a robustness of the random interference enhancement model, defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introduced
Figure BDA0003815526160000111
To maintain local consistency that does not require transitions:
Figure BDA0003815526160000112
wherein, F l (. DEG) represents a module for partially generating a face, | | ·| non-woven vision 1 Expressing L1 normalization, | · | non-calculation 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G
Figure BDA0003815526160000113
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
In the step (4), the specific method is as follows:
controlling contour weight omega of original image by changing encoding gamma source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A makeup migration method based on a generative confrontation network, characterized in that it comprises the following steps:
(1) Dividing a certain number of images as a training set and a test set aiming at the Makeup Transfer data set, cutting and normalizing each image, numbering keys in the face of a person, and obtaining a corresponding position image in a mask;
(2) The makeup image is processed by a face style generator to obtain a characteristic component y of each part of makeup i I = { lip, skin, eye }, respectively, representing the parts from the lips, skin, eyes, each part fed inTwo convolutional layers for feature extraction, pooled and convolved for each part y i Style code z mapped to each part i The pixel image is normalized by 5 convolution layers of 4 multiplied by 4 and spectrum to obtain a characteristic diagram input, and the style code z of the makeup image i Inputting the pixel image feature map into a feature map obtained by a makeup extraction part, inputting the feature map into a fusion block, encoding parameters output by the fusion block by a face image encoder, sending the encoded parameters into multi-head self attention (MSA) for information fusion, and obtaining a face image with a makeup style of a makeup image by fused information through 4 Resblock modules;
(3) Aiming at a designed generator and an encoder, a cyclic training mode is selected for parameter learning, a discriminator is used for distinguishing the difference between a real makeup image and an image generated by the generator, the input of the generator is a pixel image and a makeup image, and the output of the generator is a face image with a makeup style of the makeup image;
(4) Aiming at the need of generating partial makeup style, the contour weight omega of the original image is controlled by changing the code source Content weight ω of cosmetic image ref To obtain the final image I.
2. The method for transferring Makeup based on the generation of the confrontation network according to claim 1, wherein in step (1), 100 pixel images ori and 300 made-up images ref are selected as a test set for the Makeup Transfer data set, the rest are training sets, each read image is scaled to 256 × 256 and converted into a vector, normalization is performed by adopting an (image-mean)/variance method, wherein the mean and variance of each channel are both 0.5, the key points of the human face are numbered, the value of 7 in the pixel values of the overlay mask represents the upper lip, 9 represents the lower lip, 1,6, 13 represents the skin of the face, 4 represents the left eye, and 5 represents the right eye, so as to obtain the corresponding face area on the mask.
3. The makeup transfer method based on creation of a countering network according to claim 1 or 2,wherein, in the step (2), the makeup information info is obtained based on the reference makeup face code for the input pixel-color image makeup (ii) a Contour extraction module and contour information info for input makeup image and pixel-based image sketch The two are fused and the face image with makeup is generated by decoding the facial features, and the specific method comprises the following steps:
(a) The makeup image is input into 3 ascending-dimension convolution modules ConvBlocks1 and 3 descending-dimension convolution modules ConvBlocks2 to generate target eye, skin and lip information { y lip ,y skin ,y eyes ConvBlocks1 consists of 4 × 4 convolution with step size 2, padding 1, adaptive instanceNorm2d normalization and LeakyReLU activation function, convBlocks2 consists of 3 × 3 transpose convolution with step size 1, padding 0, adaptive instanceNorm2d normalization and LeakyReLU activation function;
(b) Inputting the pixel image into 5 convolution layers of 4 multiplied by 4, each convolution layer is subjected to spectrum normalization to limit the intensity of the change of the loss function, and the pixel image except the last convolution layer is subjected to a LeakyRelu activation function after passing through the other convolution layers to obtain an image coding block of 16 multiplied by 16;
(c) Further extracting information aiming at the outline information and fusing the outline information with makeup information;
(c1) Inputting image coding blocks of pixel-color images 16 × 16 into Convblock of the next stage for further extraction, and copying a concat into a third-stage fusion block of a fusion module, wherein each fusion block is composed of three convolution layers and two AdaIN layers, and the AdaIN layers are used for aligning the mean value and the variance of image features to the mean value and the variance AdaIN () of the style images, and are defined as:
Figure FDA0003815526150000021
wherein, F j Represents the input of the fusion block of the j-th stage, μ (), σ () represents the normalized coefficients of the output of the fusion block, α, β represent the coefficients that control AdaIN (-) scaling and biasing;
inputting the information extracted by the second stage into a Convblock of the next stage and a fusion block of the second stage of the fusion module, wherein the Convblock consists of a 3 × 3 convolutional layer with the padding of 2, spectrum normalization and a LeakyRelu activation function, and finally the output obtained by the Convblock is input into the first stage of the fusion block;
(c2) Design code z for cosmetic image i And Yan Tuxiang, inputting the feature map obtained by the makeup extraction part into the fusion block, coding the parameters by a facial image coder, and coding the coded parameters Z l Feed into multi-headed self attention (MSA) for information fusion, with residual concatenation after each block:
z′ l =MSA(Z l-1 )+Z l-1 ,l=1,…,L (2)
where L represents the total number of layers, the self attention (MSA) is composed of k parallel Self Attention (SA) tiles on the lanes:
MSA(z)=[SA 1 (z);SA 2 (z);…;SA k (z)] (3)
Figure FDA0003815526150000022
[q,k,v]=zU qkv (5)
where SA represents attention output, z ∈ R (N+1)×D Is an input residual sequence of width N and length D,
Figure FDA0003815526150000038
weight matrices being linear transformations, D k For expanded dimensions, [ q, k, v]Calculating similarity of the current Query and all keys according to Query, key and Value information after linear transformation of a matrix, obtaining a group of weights by the similarity Value through a Softmax layer, and summing the products of the group of weights and corresponding values to obtain a Value under the Attention:
the result of MSA is input to a multi-level perceptron consisting of three fully connected levels:
z l =MLP(z′ l )+z′ l ,l=1,…,L (6)
wherein MLP represents a full link layer, z' l Representing the input of each fully connected layer, MSA (z) when l = 1;
(d) And inputting the mixed information into 4 Resblock modules, wherein each Resblock consists of two Convblocks with unchanged dimensions and a residual edge, and the difference is that the activation function used by the former Convblock is relu, and the latter Convblock does not adopt the activation function, so that the required makeup image is generated by the network finally.
4. The makeup migration method based on generation confrontation as claimed in claim 3, wherein in step (3), the arbiter and generator are trained cyclically as follows:
(a) Training the discriminator, inputting the pixel image and the makeup image into the generator, setting the generator parameters as makeup, outputting the output of the generator as the image after makeup is applied to the pixel image according to the style of the makeup image, and inputting the result of the image after makeup into the discriminator D y (ii) a Inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the image after makeup removal according to the style of the pixel image, and outputting the image result after makeup removal to a discriminator D x Obtaining a result, and obtaining a result of the loss function based on the result:
Figure FDA0003815526150000031
Figure FDA0003815526150000032
wherein the content of the first and second substances,
Figure FDA0003815526150000033
the representative input field is a makeup image,
Figure FDA0003815526150000034
representative input field is element Yan Tuxiang, D y (y) shows the result of discrimination of the true makeup image by the discriminator, D x (x) Representing the discrimination result of the discriminator on the real pixel image, G (x, y) representing the face image generated by the generator when the generator parameter is set as makeup, G (y, x) representing the face image generated by the generator when the generator parameter is set as makeup removal,
Figure FDA0003815526150000035
representing the loss of the original makeup image and the generated makeup image judged by the discriminator,
Figure FDA0003815526150000036
indicating the loss of the original makeup and makeup color images judged by the discriminator, the total loss of the discriminator
Figure FDA0003815526150000037
Passing back an optimization discriminator through a gradient;
(b) After training the arbiter we retrain the generator again:
(b1) Inputting the pixel image and the makeup image into respective generators, setting parameters of the generators as makeup, outputting the generator as an image after the pixel image is made up according to the style of the makeup image, and inputting the image after the makeup into a discriminator D y Obtaining a decision result and obtaining a loss function based on the result
Figure FDA0003815526150000041
The result of (2); inputting the pixel image and the makeup image into a generator, setting the parameters of the generator as makeup removal, outputting the generator as an image after the makeup removal of the makeup image according to the style of the pixel image, and inputting the makeup-removed image into a discriminator D x Obtaining a decision result and obtaining a loss function based on the result
Figure FDA0003815526150000042
As a result of (1):
Figure FDA0003815526150000043
Figure FDA0003815526150000044
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003815526150000045
indicating a loss of the makeup image generated by the generator,
Figure FDA0003815526150000046
representing a loss of a pixel-color image generated by the generator;
(b2) Respectively obtaining the false makeup image and the false pixel image output by the generator in the step (b 1) according to histogram matching calculation
Figure FDA0003815526150000047
Figure FDA0003815526150000048
Wherein, HM (x, y) represents the generator parameter set as the matching result of the gray level histogram during makeup, HM (y, x) represents the generator parameter set as the matching result of the gray level histogram during makeup removal, | | 2 Represents L2 normalization;
(b3) Inputting the false makeup image and the original pixel image output by the generator in the step (b 1) into the generator, setting the parameters of the generator as makeup removal, outputting the result of the generator as the result of the makeup removal of the false makeup image according to the style of the pixel image, and obtaining the cycle consistent loss through an L1 loss function by the result and the original pixel image
Figure FDA0003815526150000049
Inputting the false pixel image and the makeup image output by the generator into the generator, setting the parameters of the generator as makeup, outputting the result of the makeup image style fitting of the false pixel image by the generator, and obtaining the cycle consistent loss by the L1 loss function of the result and the makeup image
Figure FDA00038155261500000410
Loss function
Figure FDA00038155261500000411
Is defined as:
Figure FDA00038155261500000412
Figure FDA00038155261500000413
wherein | · | charging 1 Denotes L1 normalization, F PNI (x) A transformation module representing the generated image, for generating a robustness of the random interference enhancement model, defined as follows:
F PNI (x)=x+γ·η (14)
where η represents a noise term sampled from the gaussian distribution and γ represents a coefficient that controls the magnitude of η;
(b4) In order to maintain the original image characteristics in addition to the look of the cosmetic image, a local perception loss is introduced
Figure FDA0003815526150000051
To maintain local consistency that does not require transitions:
Figure FDA0003815526150000052
wherein, F l (. DEG) represents a module for partially generating a face, | | ·| non-woven vision 1 Representing L1 normalization, | · | | non-woven phosphor 2 Represents L2 normalization;
finally, the loss functions are integrated to obtain the final loss function L of the generator G
Figure FDA0003815526150000053
Wherein λ is G ,λ cyc ,λ makeup ,λ per Respectively, a generator control coefficient, a cycle consistency control coefficient, a histogram matching control coefficient and a local consistency control coefficient.
5. The makeup transfer method based on generation confrontation as claimed in claim 3, wherein the step (4) is as follows:
controlling contour weight omega of original image by changing encoding gamma source Content weight ω of cosmetic image ref To obtain the final image I:
I=(1-γ)ω ref +γω source (17)。
CN202211029533.3A 2022-08-25 2022-08-25 Makeup migration method based on generation countermeasure network Pending CN115496650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211029533.3A CN115496650A (en) 2022-08-25 2022-08-25 Makeup migration method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211029533.3A CN115496650A (en) 2022-08-25 2022-08-25 Makeup migration method based on generation countermeasure network

Publications (1)

Publication Number Publication Date
CN115496650A true CN115496650A (en) 2022-12-20

Family

ID=84466608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211029533.3A Pending CN115496650A (en) 2022-08-25 2022-08-25 Makeup migration method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN115496650A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863032A (en) * 2023-06-27 2023-10-10 河海大学 Flood disaster scene generation method based on generation countermeasure network
CN117036157A (en) * 2023-10-09 2023-11-10 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium
CN118014865A (en) * 2024-04-10 2024-05-10 青岛童幻动漫有限公司 Image fusion method for cartoon making

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863032A (en) * 2023-06-27 2023-10-10 河海大学 Flood disaster scene generation method based on generation countermeasure network
CN116863032B (en) * 2023-06-27 2024-04-09 河海大学 Flood disaster scene generation method based on generation countermeasure network
CN117036157A (en) * 2023-10-09 2023-11-10 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium
CN117036157B (en) * 2023-10-09 2024-02-20 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium
CN118014865A (en) * 2024-04-10 2024-05-10 青岛童幻动漫有限公司 Image fusion method for cartoon making

Similar Documents

Publication Publication Date Title
Song et al. Geometry-aware face completion and editing
CN110070483B (en) Portrait cartoon method based on generation type countermeasure network
CN115496650A (en) Makeup migration method based on generation countermeasure network
CN108875935B (en) Natural image target material visual characteristic mapping method based on generation countermeasure network
Akimoto et al. Automatic creation of 3D facial models
CN109919830B (en) Method for restoring image with reference eye based on aesthetic evaluation
Chen et al. Example-based composite sketching of human portraits
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN108288072A (en) A kind of facial expression synthetic method based on generation confrontation network
CN112950661A (en) Method for generating antithetical network human face cartoon based on attention generation
US20230044644A1 (en) Large-scale generation of photorealistic 3d models
CN110853119B (en) Reference picture-based makeup transfer method with robustness
Liu et al. Psgan++: Robust detail-preserving makeup transfer and removal
CN113538608B (en) Controllable figure image generation method based on generation countermeasure network
CN113362422B (en) Shadow robust makeup transfer system and method based on decoupling representation
CN112686816A (en) Image completion method based on content attention mechanism and mask code prior
CN113570684A (en) Image processing method, image processing device, computer equipment and storage medium
CN112633288B (en) Face sketch generation method based on painting brush touch guidance
CN115345773B (en) Makeup migration method based on generation of confrontation network
CN111241963A (en) First-person visual angle video interactive behavior identification method based on interactive modeling
CN111612687B (en) Automatic makeup method for face image
CN117157673A (en) Method and system for forming personalized 3D head and face models
CN117333604A (en) Character face replay method based on semantic perception nerve radiation field
CN112241708A (en) Method and apparatus for generating new person image from original person image
CN116486495A (en) Attention and generation countermeasure network-based face image privacy protection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination