CN115760552A

CN115760552A - Face image makeup migration method and system based on image makeup migration network

Info

Publication number: CN115760552A
Application number: CN202211374753.XA
Authority: CN
Inventors: 熊盛武; 孙朝阳
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-03-07

Abstract

The invention discloses a face image makeup migration method and system based on an image makeup migration network, which comprises the steps of firstly obtaining an original face image; then inputting the obtained original face image into an image makeup transfer network to generate a final makeup transfer result; the invention relates to an image makeup migration network, which consists of a semantic corresponding sub-network and an image coloring sub-network; and the semantic correspondence sub-network is used for learning dense semantic correspondence between the makeup-free images and the makeup images. The coloring sub-network is mainly used for rendering the makeup features after semantic alignment into a target image to generate a final makeup migration result. The invention defines makeup migration as the problem of coloring the image, and better avoids the problem of lack of paired data. A substitute attention mechanism is provided, the semantic correspondence of the pixel level is calculated under low resolution, patch block aggregation is carried out under high resolution by utilizing the correspondence, and the calculated amount is greatly reduced on the premise of not losing makeup details.

Description

Face image makeup transfer method and system based on image makeup transfer network

Technical Field

The invention belongs to an image processing technology based on a generation countermeasure network GAN, and relates to a face image makeup migration method and a face image makeup migration system, in particular to a face image makeup migration method and a face image makeup migration system based on image coloring and alternative attention.

Background

Under the background of the vigorous development of the cosmetics market, how to quickly and accurately provide personalized makeup products for users gradually becomes a research focus in the field of computer vision. As an effective means for solving the above problems, the makeup transfer technology has been receiving attention due to its wide application scenarios and great market demands. In recent years, makeup migration has gradually received wide attention from both domestic and foreign researchers. In conjunction with decoupling presentation and generation of antagonistic networks, numerous scholars have made a great contribution to the advancement of makeup migration technology.

However, two key challenges remain to be addressed in cosmetic transfer. One is that the quality of the resulting migration results is heavily affected by the synthetic pseudo-paired data. Due to the inherent problems of cosmetic migration, there is practically no paired data for network supervised training. In order to overcome the problem, the existing method adopts a histogram matching or human face characteristic point distortion method to synthesize pseudo-paired data to supervise network training, so that the quality of the synthesized data seriously influences the generation effect of the network. Another difficulty is the computational cost of semantic correspondences. The existing method proves that the learning corresponding to the semantics can effectively improve the quality of makeup transfer, but the semantics correspondence firstly needs to calculate the relevance of each pixel point and all pixel points, then all features are aggregated according to the relevance, the calculation complexity and the feature diagram size form a square relation, and the practical application of the makeup transfer is severely limited.

Disclosure of Invention

In order to solve the defects of the background technology, the invention provides a face image makeup migration method and system based on image coloring and alternative attention.

The technical scheme adopted by the method is as follows: a face image makeup migration method based on an image makeup migration network comprises the following steps:

step 1: acquiring an original face image;

step 2: inputting the obtained original face image into an image makeup migration network to generate a final makeup migration result;

the image makeup migration network consists of a semantic corresponding sub-network and an image coloring sub-network;

the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution into the high resolution features;

said image coloring subnetwork comprising an identity encoder, a color distiller, and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified person identity characteristics and the semantically aligned makeup characteristics into the decoder to generate a final makeup transfer result;

the feature extractor consists of 3 conversion blocks with the step length of 2; the identity encoder and the color distiller each consist of 2 conversion blocks with a step size of 2 and 2 Residual blocks; the decoder consists of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;

the convergence block consists of a convergence Layer, an isolation normalization Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two fusion blocks connected in series, with the last ReLUActivate layer removed.

The technical scheme adopted by the system of the invention is as follows: a face image makeup migration system based on an image makeup migration network comprises an original face image acquisition module and a makeup migration module:

the original face image acquisition module is used for acquiring an original face image;

the makeup transfer module is used for inputting the acquired original face image into an image makeup transfer network to generate a final makeup transfer result;

said image coloring subnetwork comprising an identity encoder, a color distiller, and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified character identity characteristics and the semantically aligned makeup features into the decoder to generate a final makeup transfer result;

the feature extractor consists of 3 Convolation blocks with the step length of 2; the identity encoder and the color distiller each consist of 2 Convolition blocks with a step size of 2 and 2 Residual blocks; the decoder consists of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;

the convergence block consists of a convergence Layer, an isolation normalization Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two connected convergence blocks in series, with the last reluativationlayer removed.

Compared with the prior art, the invention has the beneficial effects that:

(1) Pseudo-pair data does not need to be generated, and the complexity of data collection is reduced;

(2) The generated makeup migration result has higher makeup similarity than the prior method;

(3) The proposed alternative attention mechanism greatly reduces the computational cost.

Drawings

Fig. 1 is a diagram of the overall architecture of the network according to the embodiment of the present invention, which includes a semantic correspondence sub-network and an image coloring sub-network. The semantic correspondence sub-network comprises a feature extractor and proposed alternative attention mechanism, and the image coloring sub-network comprises an identity encoder, a color distillation encoder and a decoder.

Fig. 2 is a schematic diagram of the study of makeup migration from image coloring, the network taking as input a grayscale makeup-free image and a color makeup image, with the goal of migrating the colors of the makeup image into the makeup-free image, according to an embodiment of the present invention.

FIG. 3 is an alternative attention-vision interpretation of an embodiment of the present invention, which provides an alternative attention mechanism for computing correlation matrix and aggregate features at different resolutions, greatly reducing computation costs without loss of cosmetic details.

FIG. 4 is a flowchart illustrating specific operations of an alternative attention module according to an embodiment of the present invention.

Fig. 5 is a comparison result between the embodiment of the present invention and other methods, where the first three rows are comparison results under the condition of a front face, and the last three rows are comparison results under different posture expressions.

Fig. 6 is a method diagram of comparative results of examples of the present invention, the results being partially enlarged for better comparison of makeup migration effects of the different methods.

Fig. 7 is a diagram of the semantic correspondence effect and the migration result according to the embodiment of the present invention, and two stars are selected in the last two rows for experiments in order to verify the generalization ability of the model.

Fig. 8 is a makeup interpolation chart according to an embodiment of the present invention, in which makeup features are extracted and interpolation of makeup effects is performed by interpolating the makeup features.

Fig. 9 is a makeup editing sample 1 according to an embodiment of the present invention, which proposes a user interaction mode in which a user can edit his/her favorite makeup.

Fig. 10 is a makeup editing sample 2 according to an embodiment of the present invention, which proposes a user interaction mode in which a user can edit his/her favorite makeup.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

The makeup migration is intended to transfer the makeup style of the reference image to the source image while maintaining the identity of the source image. While the existing cosmetic transfer methods achieve encouraging results, they still suffer from two drawbacks: one is that the resulting cosmetic similarity is affected by the synthetic pseudo-paired data, and the other is that the learned semantic correspondence requires a large amount of computation.

This embodiment proposes a makeup migration method based on coloring of images and alternative attention. In view of the close relationship between makeup and color change, makeup migration is defined as an example-based coloring problem. In particular, the grey scale source image is colored, i.e. the makeup is simulated by coloring, taking as an example a color reference image. At the same time, a new alternative focus effectively maps low resolution feature semantics to high resolution features, thereby greatly reducing computational cost. In addition, the embodiment also explores a new makeup editing scene with huge requirements, and verifies the effectiveness of the method of the embodiment in the scene. A large number of experiments prove that the method is effective and reasonable.

The invention provides a face image makeup migration method based on an image makeup migration network, which comprises the following steps:

step 1: acquiring an original face image;

and 2, step: inputting the obtained original face image into an image makeup transfer network to generate a final makeup transfer result;

please refer to fig. 1, which is a diagram illustrating an overall architecture of the image makeup migration network according to the present embodiment. The image makeup migration network of the present embodiment includes a semantic correspondence sub-network and an image coloring sub-network. And the semantic correspondence sub-network is used for learning dense semantic correspondence between the makeup-free images and the makeup images. Since makeup transfer requires transferring makeup to a semantic location corresponding to a target image, learning semantic correspondence is necessary. The coloring sub-network is mainly used for rendering the makeup features after semantic alignment into a target image to generate a final makeup migration result. These two sub-networks are well-targeted and each plays its own role, working together to produce realistic cosmetic results.

Referring to fig. 2 and 3, the semantic correspondence sub-network includes a feature extractor and an alternative attention module. The feature extractor is used for extracting rich space semantic features for feature matching. In the embodiment, it is observed that the human face has certain structural features, and when one point of the face corresponds to another point, a corresponding relationship also exists around the two points. According to the above observation, the embodiment provides a novel alternative attention module, which maps the semantic correspondence learned under the low resolution successfully into the high resolution features, and greatly reduces the computational complexity while aligning the semantic information of the high resolution features. The image coloring subnetwork includes an identity encoder, a color distiller, and a decoder. The identity encoder is used for extracting the identity characteristics of the person, and the color distiller is used for distilling the makeup characteristics of the reference image. The unmodified person identity characteristics and the semantically aligned makeup characteristics (completed by the semantically corresponding sub-networks) are input into a decoder to generate a final makeup transfer result. The specific network architecture is composed of two main modules stacked: a convolution block and a residual block. A convergence block is composed of a convergence Layer, an organization normalization Layer, and a ReLUActivation Layer. One Residual block consists of two contribution blocks, but the last active layer is removed. The feature extractor consists of 3 Convolation blocks with the step length of 2, the identity encoder and the color distiller consist of 2 Convolation blocks with the step length of 2 and 2 Residual blocks, the decoder consists of 4 Residual blocks and 2 Convolation blocks, and the up-sampling operation is carried out through bilinear interpolation. The alternative attention module consists of a pixel unshuffle operation, a cross-attribute and a pixel shuffle;

for true data (x) _gray Y), recording the color makeup-free picture set as X, wherein the color makeup-free picture is X, and X belongs to X; recording a color makeup picture set as Y, wherein the color makeup picture set is Y, and Y belongs to X; x is a radical of a fluorine atom _gray A grayscale image representing a color makeup-free image x;

the specific implementation of the step 2 comprises the following substeps:

(1) Analyzing the face corresponding to x and y (x) _parse 、y _parse ) Are connected and input together to a feature extractor E _feature In the method, characteristic variables of an unpainted picture x and a cosmetic picture y are obtained

And

wherein x is _parse 、y _parse Representing face analysis, and obtaining from a semantic segmentation network BiSeNet;

(2) Applying L2 regularization to feature variables along channel dimensions

And

(3) Calculating a matching score for describing the semantic corresponding relation of the local characteristic variables at each spatial position in a point multiplication mode, modeling the semantic corresponding relation, and obtaining a correlation matrix M:

wherein p and q are the spatial coordinates in the source image and the reference image, respectively,

representing variables located in source image features

The local feature variable at the spatial position p,

representing a variable located in a reference image

Local feature variables at spatial position q, M (p, q) e [0,1]A matching score representing two spatial positions p and q; the closer the matching score is to 0, the weaker the semantic correspondence is represented; conversely, the closer the matching score is to 1, the stronger the semantic correspondence is represented;

(4) In the image coloring subnetwork, the color makeup-free image x is first grayed to obtain a grayscale image x _gray Then input to identity encoder E _identity In extracting identity characteristics

(5) Then the color cosmetic image y is combined with face analysis to utilize y _parse Separating out the face area, and performing dot multiplication on the face area and the y to obtain y _face Then fed to a color distiller E _color In (1), acquiring color characteristics

And background color characteristics of the makeup-free image x

Wherein x is utilized _parse Separating out non-face area, and carrying out dot multiplication on the non-face area and x to obtain x _back ；

(6) The correlation matrix M obtained by calculation and the color characteristics of the makeup image

Input into the proposed alternative attention module; first, the color characteristics

Carrying out channel backwashing card operation to ensure that the space size of the channel backwashing card is matched with the correlation matrix M; then, the correlation matrix M is utilized to aggregate the features of different spatial positions to obtain the makeup features with aligned semantemes

Wherein softmax _j Representing that softmax operation is performed on j columns, M (i, j) represents a calculated correlation matrix, epsilon is a scaling factor and is set to be 100, pixel _unshufflerepresents channel anti-shuffling operation, and i, j respectively represent space coordinates;

(7) Obtaining the face color feature of the distorted makeup image with the same size as the input feature by the inverse operation pixel _ shuffle of the channel shuffle

(8) Finally, the identity characteristics of the makeup-free image are determined

Background color characteristics of an unpainted image

And facial color features of the distorted makeup image

Performing channel dimension splicing, inputting into a decoder to obtain the final makeup migration result

Input data during training includes coloring simulation data (y) _gray ，y _warp ) And true data (x) _gray Y); wherein, y _gray ，y _warp A gray scale image and an affine transformation distorted image respectively representing y, as shown in fig. 2; x is the number of _gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;

the following takes real data as an example to describe the makeup migration process, and the simulated data is the same as the processing flow and only has the difference of input.

In the semantic correspondence sub-network, in order to calculate the correspondence, the present embodiment adopts a strategy of extracting high-level features and then calculating the correlation in the feature space. Firstly, x and y are connected with the corresponding dimension of a face analysis channel and are input into a feature extractor together,

E _feature a feature extractor is represented. And then, calculating to obtain a semantic correspondence correlation matrix M by using the dot multiplication operation in Deep extension-based registration. A source image x is obtained _s And a reference image y _r Characteristic variable of

And

the next step is then to perform a matching operation that measures the semantic relevance of the different spatial locations. To obtain more accurate semantic correspondence, L2 regularization is applied to feature variables along the channel dimension prior to feature matching

And

in (3), obtaining normalized feature variables. Then, for the local feature variable at each spatial position, calculating a matching score describing the semantic correspondence by adopting a point multiplication mode, and modeling the semantic correspondence, wherein a mathematical formula is described as follows:

representing variables located in source image features

The local feature variable at the spatial position p,

representing a variable located in a reference image

Local feature variables at spatial position q, M (p, q) E [0,1]Representing the matching scores for two spatial positions p and q. The closer the matching score is to 0, the weaker the semantic correspondence is represented; conversely, the closer the matching score is to 1, the stronger the semantic correspondence is indicated.

Inspired by decoupling representation, the embodiment builds an image coloring sub-network, decomposes an input image into content features and makeup features, and realizes makeup migration effect by exchanging the makeup features. In the makeup migration network, firstly, a source image and a reference image are respectively decomposed into content characteristics and makeup characteristics by using a content encoder and a makeup encoder with different functions; then, the built semantic corresponding relation is utilized to determine the makeup characteristics of the reference imageWarping to make the content feature semanteme of the source image aligned with the content feature semanteme of the source image; and finally, fusing the content characteristics of the unmodified source image and the makeup characteristics of the semantically aligned reference image, and inputting the fused content characteristics and the makeup characteristics into a decoder to generate a final makeup transfer result. The specific process is that in the image coloring sub-network, the color makeup-free image x is firstly grayed to obtain a grayscale image x _gray Then input into identity encoder to extract identity characteristics

Then, combining the colorful makeup image y with human face analysis, and obtaining y through simple image processing operation _face Then inputting into a color distiller to obtain color characteristics

In addition, since additional information such as the background of the makeup-free image does not need to be changed, the background color feature of the makeup-free image is also extracted

In the makeup migration task, there are two main aspects to learn. One is to establish semantic correspondence between the source image and the reference image, and the other is to extract the makeup style of the reference image. The semantic correspondence ensures that the reference makeup is accurately rendered to the semantic correspondence of the source image, and the makeup style is extracted to ensure that the makeup style of the generated result is similar to the reference makeup. According to the above analysis, the present embodiment designs two networks, one semantic correspondence network for establishing semantic correspondence, and one makeup transfer network for feature decoupling and result generation. However, the calculation complexity of the semantic corresponding network is in a square relation with the space size of the image, and therefore, a substitute attention module is designed to solve the problem, so that the high-resolution feature can also perform semantic corresponding operation, and the makeup similarity is greatly improved.

Then, the obtained correlation matrix M and the color feature of the makeup image are calculated

Input into the proposed alternative attention mechanism. Referring to FIG. 4, color characteristics are first evaluated

And (5) performing channel backwashing card operation to enable the space size of the channel backwashing card to be matched with the correlation matrix M. Then, the correlation matrix M is utilized to aggregate features of different spatial positions to obtain semantically aligned makeup features:

finally, pixel _ shuffle is operated through channel shuffle to obtain a warped feature with the same size as the input feature

The intuitive interpretation of this operation is as shown in fig. two, that is, the correlation calculation is performed at low resolution, and the correlation is mapped into the patch at high resolution, and feature aggregation is performed.

Will be provided with

After input into the alternative attention module, in combination with the correlation matrix M, a semantic alignment with the makeup-free image is obtained

Finally, the identity characteristics of the makeup-free image are determined

Background color characteristics of an unpainted image

And facial color features of the distorted cosmetic image

The loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function and a reconstruction loss function, and an overall loss function is obtained through weighting;

semantic loss function: this embodiment uses the semantic loss function proposed in the article of A systematic semantic-aware transform network for modeling transfer and removal, without supervision, constrained semantic correspondences are built within the same semantic space.

The identity loss function: this example uses the identity loss function proposed in the article A systematic magnetic-aware transform network for makeup transfer and removal to constrain the gradient consistency of the migration result with the makeup-free image to maintain identity consistency.

Local color loss function: this embodiment proposes a differentiable local color histogram loss function:

where item ∈ {1ip, eye, face } indicates decomposition of a face region into lips, eyes, and faces (faces do not include lips and eyes), <' > indicates point-to-point multiplication, hist indicates histogram statistics, mask ^item Parsing mask, | · | | non-calculation of the field of vision that represents the semantic information of the corresponding face ₁ Representing the L1 norm. This loss function constrains the generated results to be consistent with the reference cosmetic color distribution within the locally identical semantic region.

Reconstruction loss function: when the input data is (y) _gray ，y _warp ) The present embodiment uses L1Norm and VGG 19-based perceptual loss to constrain reconstruction errors, constraining the migration junctionsFruits and y _gt Consistency between them. In addition to this, the present embodiment also includes a widely used penalty function in which least squares penalty is used instead of negative log-likelihood penalty to stabilize the training.

The experimental results of the present invention are shown in fig. 7, and the comparative results with other methods are shown in fig. 5.

Fig. 6 is a method diagram showing comparative results of examples of the present invention, and the results are partially enlarged for better comparison of makeup migration effects of different methods.

In contrast to the eye makeup migration effect, the generation result of BeautyGAN fails to effectively migrate the eye makeup of the reference image into the result, such as the second row and the fifth row, because the region-level semantic correspondence fails to meet the eye makeup migration requirement. For the fifth row color makeup eye shadow, the PSGAN also fails to obtain a satisfactory makeup migration effect, and the eye shadow style of the generated result is obviously different from that of the reference image.

In contrast to the blush makeup migration effect, beautyGAN fails to effectively extract the blush style, resulting in the production result being unaware of the blush makeup migration effect, as in the third and sixth lines. PSGAN extracts part of the blush information, but the resulting blush is lighter in color and still more different from the reference makeup, as in lines six and seven. In contrast, the result of makeup migration of the present embodiment remains highly similar to the makeup style of the reference picture regardless of the lipstick, eye shadow, or blush. In the fifth, sixth and seventh rows, for a large-area blush and a color makeup eye shadow, other makeup transfer algorithms fail to be effective, and makeup of the reference image cannot be effectively rendered into the source image, but the makeup transfer algorithm of the embodiment can still generate a real and accurate makeup transfer result.

The invention can further proceed:

(1) Cosmetic migration intensity control: because the method extracts the makeup information, the intensity of makeup migration can be controlled as long as a certain weight is given. As shown in fig. 8.

(2) And (3) dressing editing: the proposed network structure allows for makeup editing. The meaning is that the user can smear the favorite color on the corresponding semantic position of the reference image, and the method of the embodiment can generate an ideal makeup migration result, thereby achieving the purpose of makeup editing. Other makeup migration technologies based on a creation network do not have this function. Such as fig. 9, fig. 10.

The invention researches makeup migration from the coloring perspective, and defines the makeup migration as the image coloring problem, so that a large amount of color images which are easy to collect are used as the self-supervision data of the network, and the problem of lack of paired data is better avoided. A large number of experiments verify the reasonability and effectiveness of the makeup transfer strategy in a real environment. The invention provides a substitute attention mechanism, which is used for calculating the semantic corresponding relation of a pixel level under low resolution, and carrying out patch block aggregation under high resolution by utilizing the corresponding relation, thereby greatly reducing the calculation amount on the premise of not losing makeup details. The invention explores a new makeup editing scene and verifies the effectiveness of the method in the scene.

It should be understood that the above description of the preferred embodiments is illustrative, and not restrictive, and that various changes and modifications may be made therein by those skilled in the art without departing from the scope of the invention as defined in the appended claims.

Claims

1. A face image makeup migration method based on an image makeup migration network is characterized by comprising the following steps:

step 1: acquiring an original face image;

the image makeup migration network consists of a semantic correspondence sub-network and an image coloring sub-network;

the feature extractor consists of 3 Convolition blocks with the step length of 2 and a bilinear interpolation operation; the identity encoder and the color distiller each consist of 2 conversion blocks with a step size of 2 and 2 Residual blocks; the decoder is composed of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;

2. The face image makeup transfer method based on the image makeup transfer network according to claim 1, characterized in that: in step 2, for the true data (x) _gray Y), recording the color makeup-free picture set as X, wherein the color makeup-free picture is X, and X belongs to X; recording a color makeup picture set as Y, wherein the color makeup picture set is Y, and Y belongs to X; x is the number of _gray A grayscale image representing a color makeup-free image x;

the specific implementation of the step 2 comprises the following sub-steps:

(1) Analyzing the face corresponding to x and y (x) _parse 、y _parse ) Are connected and input together to a feature extractor E _feature In the method, characteristic variables of the makeup-free picture x and the makeup picture y are obtained

And

(2) Applying L2 regularization to feature variables along channel dimensions

And

obtaining normalized characteristic variables;

representing variables located in source image features

The local feature variable at the spatial position p,

representing a variable located in a reference image

(5) Then the color cosmetic image y is combined with face analysis to utilize y _parse Separating out the face area, and carrying out dot multiplication on the face area and y to obtain y _face Then fed to a color distiller E _color In (1), acquiring color characteristics

And background color characteristics of the makeup-free image x

Wherein x is utilized _parse Separating out non-face area, and dot multiplying it with x to obtain x _back ；

(6) Calculating the obtained correlation matrix M and the color characteristics of the makeup image

Carrying out channel backwashing card operation to ensure that the space size of the channel backwashing card is matched with the correlation matrix M; then, the correlation matrix M is utilized to aggregate the features of different spatial positions to obtain the semantically aligned makeup features

Wherein, softmax _j Representing that softmax operation is performed on j columns, M (i, j) represents a correlation matrix obtained through calculation, epsilon is a scaling factor, pixel _ unshuffle represents channel anti-shuffling operation, and i, j respectively represent space coordinates;

(7) Obtaining the face color feature of the distorted makeup image with the same size as the input feature through the channel shuffle operation pixel _ shuffle

Background color characteristics of an unpainted image

And facial color features of the distorted makeup image

Performing channel dimension splicing, inputting into a decoder to obtain the final cosmetic transfer result

3. The face image makeup migration method based on the image makeup migration network according to claim 1, characterized in that: the image makeup migration network obtains a trained image makeup migration network by jointly training a semantic correspondence sub-network and an image coloring sub-network;

the semantic correspondence sub-network is trained, and the input data in the process comprises coloring simulation data (y) _gray ,y _warp ) And true data (x) _gray Y); wherein y is _gray ,y _warp Gray scale image and affine transformation warped image, x, representing y, respectively _gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;

the loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function and a reconstruction loss function; obtaining an integral loss function through weighting;

the local color loss function is:

wherein item ∈ { lip, eye, face } indicates decomposition of the face region into lips, eyes, and faces, <' > indicates point-to-point multiplication, hist indicates histogram statistics, mask ^item The analytic mask representing the semantic information of the corresponding face, | · | | the ground ₁ Represents the L1 norm;

binary masks representing the corresponding lip, eye and face regions of images x and y, respectively, by separation x _parse And y _parse Obtaining;

indicating that makeup migration results were obtained during the training.

4. The face image makeup migration method based on the image makeup migration network according to claim 1, characterized in that: the image makeup migration network obtains a trained image makeup migration network through a joint training semantic correspondence sub-network and an image coloring sub-network;

the input data in the training process of the image coloring subnetwork comprises coloring simulation data (y) _gray ,y _warp ) And true data (x) _gray Y); wherein, y _gray ,y _warp Respectively representing a gray-scale image of y and an affine transformation distorted image; x is the number of _gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;

the loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function, a reconstruction loss function and a countervailing loss function; obtaining an overall loss function through weighting;

the local color loss function is:

binary masks representing the corresponding lip, eye and face regions of images x and y, respectively, by separating x _parse And y _parse Obtaining;

showing the cosmetic migration results obtained during the training;

in the antagonistic loss function, least square loss is used to stabilize training instead of negative log-likelihood loss.

5. A face image makeup migration system based on an image makeup migration network is characterized by comprising an original face image acquisition module and a makeup migration module:

the makeup migration module is used for inputting the obtained original face image into an image makeup migration network to generate a final makeup migration result;

the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution to the high resolution feature;

the image coloring sub-network comprises an identity encoder, a color distiller and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified character identity characteristics and the semantically aligned makeup features into the decoder to generate a final makeup transfer result;

the feature extractor is operated by 3 Convolition blocks with the step length of 2 and a bilinear interpolation; the identity encoder and the color distiller each consist of 2 Convolition blocks with a step size of 2 and 2 Residual blocks; the decoder is composed of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attribute and a pixel shuffle;

the convention block consists of a convention Layer, an organization normalized Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two fusion blocks connected in series, with the last ReLUActivate layer removed.