CN115760552A - Face image makeup migration method and system based on image makeup migration network - Google Patents

Face image makeup migration method and system based on image makeup migration network Download PDF

Info

Publication number
CN115760552A
CN115760552A CN202211374753.XA CN202211374753A CN115760552A CN 115760552 A CN115760552 A CN 115760552A CN 202211374753 A CN202211374753 A CN 202211374753A CN 115760552 A CN115760552 A CN 115760552A
Authority
CN
China
Prior art keywords
makeup
image
network
color
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211374753.XA
Other languages
Chinese (zh)
Inventor
熊盛武
孙朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202211374753.XA priority Critical patent/CN115760552A/en
Publication of CN115760552A publication Critical patent/CN115760552A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a face image makeup migration method and system based on an image makeup migration network, which comprises the steps of firstly obtaining an original face image; then inputting the obtained original face image into an image makeup transfer network to generate a final makeup transfer result; the invention relates to an image makeup migration network, which consists of a semantic corresponding sub-network and an image coloring sub-network; and the semantic correspondence sub-network is used for learning dense semantic correspondence between the makeup-free images and the makeup images. The coloring sub-network is mainly used for rendering the makeup features after semantic alignment into a target image to generate a final makeup migration result. The invention defines makeup migration as the problem of coloring the image, and better avoids the problem of lack of paired data. A substitute attention mechanism is provided, the semantic correspondence of the pixel level is calculated under low resolution, patch block aggregation is carried out under high resolution by utilizing the correspondence, and the calculated amount is greatly reduced on the premise of not losing makeup details.

Description

Face image makeup transfer method and system based on image makeup transfer network
Technical Field
The invention belongs to an image processing technology based on a generation countermeasure network GAN, and relates to a face image makeup migration method and a face image makeup migration system, in particular to a face image makeup migration method and a face image makeup migration system based on image coloring and alternative attention.
Background
Under the background of the vigorous development of the cosmetics market, how to quickly and accurately provide personalized makeup products for users gradually becomes a research focus in the field of computer vision. As an effective means for solving the above problems, the makeup transfer technology has been receiving attention due to its wide application scenarios and great market demands. In recent years, makeup migration has gradually received wide attention from both domestic and foreign researchers. In conjunction with decoupling presentation and generation of antagonistic networks, numerous scholars have made a great contribution to the advancement of makeup migration technology.
However, two key challenges remain to be addressed in cosmetic transfer. One is that the quality of the resulting migration results is heavily affected by the synthetic pseudo-paired data. Due to the inherent problems of cosmetic migration, there is practically no paired data for network supervised training. In order to overcome the problem, the existing method adopts a histogram matching or human face characteristic point distortion method to synthesize pseudo-paired data to supervise network training, so that the quality of the synthesized data seriously influences the generation effect of the network. Another difficulty is the computational cost of semantic correspondences. The existing method proves that the learning corresponding to the semantics can effectively improve the quality of makeup transfer, but the semantics correspondence firstly needs to calculate the relevance of each pixel point and all pixel points, then all features are aggregated according to the relevance, the calculation complexity and the feature diagram size form a square relation, and the practical application of the makeup transfer is severely limited.
Disclosure of Invention
In order to solve the defects of the background technology, the invention provides a face image makeup migration method and system based on image coloring and alternative attention.
The technical scheme adopted by the method is as follows: a face image makeup migration method based on an image makeup migration network comprises the following steps:
step 1: acquiring an original face image;
step 2: inputting the obtained original face image into an image makeup migration network to generate a final makeup migration result;
the image makeup migration network consists of a semantic corresponding sub-network and an image coloring sub-network;
the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution into the high resolution features;
said image coloring subnetwork comprising an identity encoder, a color distiller, and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified person identity characteristics and the semantically aligned makeup characteristics into the decoder to generate a final makeup transfer result;
the feature extractor consists of 3 conversion blocks with the step length of 2; the identity encoder and the color distiller each consist of 2 conversion blocks with a step size of 2 and 2 Residual blocks; the decoder consists of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;
the convergence block consists of a convergence Layer, an isolation normalization Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two fusion blocks connected in series, with the last ReLUActivate layer removed.
The technical scheme adopted by the system of the invention is as follows: a face image makeup migration system based on an image makeup migration network comprises an original face image acquisition module and a makeup migration module:
the original face image acquisition module is used for acquiring an original face image;
the makeup transfer module is used for inputting the acquired original face image into an image makeup transfer network to generate a final makeup transfer result;
the image makeup migration network consists of a semantic corresponding sub-network and an image coloring sub-network;
the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution into the high resolution features;
said image coloring subnetwork comprising an identity encoder, a color distiller, and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified character identity characteristics and the semantically aligned makeup features into the decoder to generate a final makeup transfer result;
the feature extractor consists of 3 Convolation blocks with the step length of 2; the identity encoder and the color distiller each consist of 2 Convolition blocks with a step size of 2 and 2 Residual blocks; the decoder consists of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;
the convergence block consists of a convergence Layer, an isolation normalization Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two connected convergence blocks in series, with the last reluativationlayer removed.
Compared with the prior art, the invention has the beneficial effects that:
(1) Pseudo-pair data does not need to be generated, and the complexity of data collection is reduced;
(2) The generated makeup migration result has higher makeup similarity than the prior method;
(3) The proposed alternative attention mechanism greatly reduces the computational cost.
Drawings
Fig. 1 is a diagram of the overall architecture of the network according to the embodiment of the present invention, which includes a semantic correspondence sub-network and an image coloring sub-network. The semantic correspondence sub-network comprises a feature extractor and proposed alternative attention mechanism, and the image coloring sub-network comprises an identity encoder, a color distillation encoder and a decoder.
Fig. 2 is a schematic diagram of the study of makeup migration from image coloring, the network taking as input a grayscale makeup-free image and a color makeup image, with the goal of migrating the colors of the makeup image into the makeup-free image, according to an embodiment of the present invention.
FIG. 3 is an alternative attention-vision interpretation of an embodiment of the present invention, which provides an alternative attention mechanism for computing correlation matrix and aggregate features at different resolutions, greatly reducing computation costs without loss of cosmetic details.
FIG. 4 is a flowchart illustrating specific operations of an alternative attention module according to an embodiment of the present invention.
Fig. 5 is a comparison result between the embodiment of the present invention and other methods, where the first three rows are comparison results under the condition of a front face, and the last three rows are comparison results under different posture expressions.
Fig. 6 is a method diagram of comparative results of examples of the present invention, the results being partially enlarged for better comparison of makeup migration effects of the different methods.
Fig. 7 is a diagram of the semantic correspondence effect and the migration result according to the embodiment of the present invention, and two stars are selected in the last two rows for experiments in order to verify the generalization ability of the model.
Fig. 8 is a makeup interpolation chart according to an embodiment of the present invention, in which makeup features are extracted and interpolation of makeup effects is performed by interpolating the makeup features.
Fig. 9 is a makeup editing sample 1 according to an embodiment of the present invention, which proposes a user interaction mode in which a user can edit his/her favorite makeup.
Fig. 10 is a makeup editing sample 2 according to an embodiment of the present invention, which proposes a user interaction mode in which a user can edit his/her favorite makeup.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The makeup migration is intended to transfer the makeup style of the reference image to the source image while maintaining the identity of the source image. While the existing cosmetic transfer methods achieve encouraging results, they still suffer from two drawbacks: one is that the resulting cosmetic similarity is affected by the synthetic pseudo-paired data, and the other is that the learned semantic correspondence requires a large amount of computation.
This embodiment proposes a makeup migration method based on coloring of images and alternative attention. In view of the close relationship between makeup and color change, makeup migration is defined as an example-based coloring problem. In particular, the grey scale source image is colored, i.e. the makeup is simulated by coloring, taking as an example a color reference image. At the same time, a new alternative focus effectively maps low resolution feature semantics to high resolution features, thereby greatly reducing computational cost. In addition, the embodiment also explores a new makeup editing scene with huge requirements, and verifies the effectiveness of the method of the embodiment in the scene. A large number of experiments prove that the method is effective and reasonable.
The invention provides a face image makeup migration method based on an image makeup migration network, which comprises the following steps:
step 1: acquiring an original face image;
and 2, step: inputting the obtained original face image into an image makeup transfer network to generate a final makeup transfer result;
please refer to fig. 1, which is a diagram illustrating an overall architecture of the image makeup migration network according to the present embodiment. The image makeup migration network of the present embodiment includes a semantic correspondence sub-network and an image coloring sub-network. And the semantic correspondence sub-network is used for learning dense semantic correspondence between the makeup-free images and the makeup images. Since makeup transfer requires transferring makeup to a semantic location corresponding to a target image, learning semantic correspondence is necessary. The coloring sub-network is mainly used for rendering the makeup features after semantic alignment into a target image to generate a final makeup migration result. These two sub-networks are well-targeted and each plays its own role, working together to produce realistic cosmetic results.
Referring to fig. 2 and 3, the semantic correspondence sub-network includes a feature extractor and an alternative attention module. The feature extractor is used for extracting rich space semantic features for feature matching. In the embodiment, it is observed that the human face has certain structural features, and when one point of the face corresponds to another point, a corresponding relationship also exists around the two points. According to the above observation, the embodiment provides a novel alternative attention module, which maps the semantic correspondence learned under the low resolution successfully into the high resolution features, and greatly reduces the computational complexity while aligning the semantic information of the high resolution features. The image coloring subnetwork includes an identity encoder, a color distiller, and a decoder. The identity encoder is used for extracting the identity characteristics of the person, and the color distiller is used for distilling the makeup characteristics of the reference image. The unmodified person identity characteristics and the semantically aligned makeup characteristics (completed by the semantically corresponding sub-networks) are input into a decoder to generate a final makeup transfer result. The specific network architecture is composed of two main modules stacked: a convolution block and a residual block. A convergence block is composed of a convergence Layer, an organization normalization Layer, and a ReLUActivation Layer. One Residual block consists of two contribution blocks, but the last active layer is removed. The feature extractor consists of 3 Convolation blocks with the step length of 2, the identity encoder and the color distiller consist of 2 Convolation blocks with the step length of 2 and 2 Residual blocks, the decoder consists of 4 Residual blocks and 2 Convolation blocks, and the up-sampling operation is carried out through bilinear interpolation. The alternative attention module consists of a pixel unshuffle operation, a cross-attribute and a pixel shuffle;
for true data (x) gray Y), recording the color makeup-free picture set as X, wherein the color makeup-free picture is X, and X belongs to X; recording a color makeup picture set as Y, wherein the color makeup picture set is Y, and Y belongs to X; x is a radical of a fluorine atom gray A grayscale image representing a color makeup-free image x;
the specific implementation of the step 2 comprises the following substeps:
(1) Analyzing the face corresponding to x and y (x) parse 、y parse ) Are connected and input together to a feature extractor E feature In the method, characteristic variables of an unpainted picture x and a cosmetic picture y are obtained
Figure BDA0003926142120000051
And
Figure BDA0003926142120000052
Figure BDA0003926142120000053
Figure BDA0003926142120000054
wherein x is parse 、y parse Representing face analysis, and obtaining from a semantic segmentation network BiSeNet;
(2) Applying L2 regularization to feature variables along channel dimensions
Figure BDA0003926142120000055
And
Figure BDA0003926142120000056
(3) Calculating a matching score for describing the semantic corresponding relation of the local characteristic variables at each spatial position in a point multiplication mode, modeling the semantic corresponding relation, and obtaining a correlation matrix M:
Figure BDA0003926142120000057
wherein p and q are the spatial coordinates in the source image and the reference image, respectively,
Figure BDA0003926142120000058
representing variables located in source image features
Figure BDA0003926142120000059
The local feature variable at the spatial position p,
Figure BDA00039261421200000510
representing a variable located in a reference image
Figure BDA00039261421200000511
Local feature variables at spatial position q, M (p, q) e [0,1]A matching score representing two spatial positions p and q; the closer the matching score is to 0, the weaker the semantic correspondence is represented; conversely, the closer the matching score is to 1, the stronger the semantic correspondence is represented;
(4) In the image coloring subnetwork, the color makeup-free image x is first grayed to obtain a grayscale image x gray Then input to identity encoder E identity In extracting identity characteristics
Figure BDA00039261421200000512
(5) Then the color cosmetic image y is combined with face analysis to utilize y parse Separating out the face area, and performing dot multiplication on the face area and the y to obtain y face Then fed to a color distiller E color In (1), acquiring color characteristics
Figure BDA00039261421200000513
And background color characteristics of the makeup-free image x
Figure BDA00039261421200000514
Wherein x is utilized parse Separating out non-face area, and carrying out dot multiplication on the non-face area and x to obtain x back
(6) The correlation matrix M obtained by calculation and the color characteristics of the makeup image
Figure BDA00039261421200000515
Input into the proposed alternative attention module; first, the color characteristics
Figure BDA00039261421200000516
Carrying out channel backwashing card operation to ensure that the space size of the channel backwashing card is matched with the correlation matrix M; then, the correlation matrix M is utilized to aggregate the features of different spatial positions to obtain the makeup features with aligned semantemes
Figure BDA00039261421200000517
Figure BDA00039261421200000518
Wherein softmax j Representing that softmax operation is performed on j columns, M (i, j) represents a calculated correlation matrix, epsilon is a scaling factor and is set to be 100, pixel _unshufflerepresents channel anti-shuffling operation, and i, j respectively represent space coordinates;
(7) Obtaining the face color feature of the distorted makeup image with the same size as the input feature by the inverse operation pixel _ shuffle of the channel shuffle
Figure BDA00039261421200000519
(8) Finally, the identity characteristics of the makeup-free image are determined
Figure BDA00039261421200000520
Background color characteristics of an unpainted image
Figure BDA00039261421200000521
And facial color features of the distorted makeup image
Figure BDA0003926142120000061
Performing channel dimension splicing, inputting into a decoder to obtain the final makeup migration result
Figure BDA0003926142120000062
Input data during training includes coloring simulation data (y) gray ,y warp ) And true data (x) gray Y); wherein, y gray ,y warp A gray scale image and an affine transformation distorted image respectively representing y, as shown in fig. 2; x is the number of gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;
the following takes real data as an example to describe the makeup migration process, and the simulated data is the same as the processing flow and only has the difference of input.
In the semantic correspondence sub-network, in order to calculate the correspondence, the present embodiment adopts a strategy of extracting high-level features and then calculating the correlation in the feature space. Firstly, x and y are connected with the corresponding dimension of a face analysis channel and are input into a feature extractor together,
Figure BDA0003926142120000063
E feature a feature extractor is represented. And then, calculating to obtain a semantic correspondence correlation matrix M by using the dot multiplication operation in Deep extension-based registration. A source image x is obtained s And a reference image y r Characteristic variable of
Figure BDA0003926142120000064
And
Figure BDA0003926142120000065
the next step is then to perform a matching operation that measures the semantic relevance of the different spatial locations. To obtain more accurate semantic correspondence, L2 regularization is applied to feature variables along the channel dimension prior to feature matching
Figure BDA0003926142120000066
And
Figure BDA0003926142120000067
in (3), obtaining normalized feature variables. Then, for the local feature variable at each spatial position, calculating a matching score describing the semantic correspondence by adopting a point multiplication mode, and modeling the semantic correspondence, wherein a mathematical formula is described as follows:
Figure BDA0003926142120000068
wherein p and q are the spatial coordinates in the source image and the reference image, respectively,
Figure BDA0003926142120000069
representing variables located in source image features
Figure BDA00039261421200000610
The local feature variable at the spatial position p,
Figure BDA00039261421200000611
representing a variable located in a reference image
Figure BDA00039261421200000612
Local feature variables at spatial position q, M (p, q) E [0,1]Representing the matching scores for two spatial positions p and q. The closer the matching score is to 0, the weaker the semantic correspondence is represented; conversely, the closer the matching score is to 1, the stronger the semantic correspondence is indicated.
Inspired by decoupling representation, the embodiment builds an image coloring sub-network, decomposes an input image into content features and makeup features, and realizes makeup migration effect by exchanging the makeup features. In the makeup migration network, firstly, a source image and a reference image are respectively decomposed into content characteristics and makeup characteristics by using a content encoder and a makeup encoder with different functions; then, the built semantic corresponding relation is utilized to determine the makeup characteristics of the reference imageWarping to make the content feature semanteme of the source image aligned with the content feature semanteme of the source image; and finally, fusing the content characteristics of the unmodified source image and the makeup characteristics of the semantically aligned reference image, and inputting the fused content characteristics and the makeup characteristics into a decoder to generate a final makeup transfer result. The specific process is that in the image coloring sub-network, the color makeup-free image x is firstly grayed to obtain a grayscale image x gray Then input into identity encoder to extract identity characteristics
Figure BDA00039261421200000613
Figure BDA00039261421200000614
Then, combining the colorful makeup image y with human face analysis, and obtaining y through simple image processing operation face Then inputting into a color distiller to obtain color characteristics
Figure BDA00039261421200000615
In addition, since additional information such as the background of the makeup-free image does not need to be changed, the background color feature of the makeup-free image is also extracted
Figure BDA0003926142120000071
In the makeup migration task, there are two main aspects to learn. One is to establish semantic correspondence between the source image and the reference image, and the other is to extract the makeup style of the reference image. The semantic correspondence ensures that the reference makeup is accurately rendered to the semantic correspondence of the source image, and the makeup style is extracted to ensure that the makeup style of the generated result is similar to the reference makeup. According to the above analysis, the present embodiment designs two networks, one semantic correspondence network for establishing semantic correspondence, and one makeup transfer network for feature decoupling and result generation. However, the calculation complexity of the semantic corresponding network is in a square relation with the space size of the image, and therefore, a substitute attention module is designed to solve the problem, so that the high-resolution feature can also perform semantic corresponding operation, and the makeup similarity is greatly improved.
Then, the obtained correlation matrix M and the color feature of the makeup image are calculated
Figure BDA0003926142120000072
Input into the proposed alternative attention mechanism. Referring to FIG. 4, color characteristics are first evaluated
Figure BDA0003926142120000073
And (5) performing channel backwashing card operation to enable the space size of the channel backwashing card to be matched with the correlation matrix M. Then, the correlation matrix M is utilized to aggregate features of different spatial positions to obtain semantically aligned makeup features:
Figure BDA0003926142120000074
finally, pixel _ shuffle is operated through channel shuffle to obtain a warped feature with the same size as the input feature
Figure BDA0003926142120000075
The intuitive interpretation of this operation is as shown in fig. two, that is, the correlation calculation is performed at low resolution, and the correlation is mapped into the patch at high resolution, and feature aggregation is performed.
Will be provided with
Figure BDA0003926142120000076
After input into the alternative attention module, in combination with the correlation matrix M, a semantic alignment with the makeup-free image is obtained
Figure BDA0003926142120000077
Finally, the identity characteristics of the makeup-free image are determined
Figure BDA0003926142120000078
Background color characteristics of an unpainted image
Figure BDA0003926142120000079
And facial color features of the distorted cosmetic image
Figure BDA00039261421200000710
Performing channel dimension splicing, inputting into a decoder to obtain the final makeup migration result
Figure BDA00039261421200000711
The loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function and a reconstruction loss function, and an overall loss function is obtained through weighting;
semantic loss function: this embodiment uses the semantic loss function proposed in the article of A systematic semantic-aware transform network for modeling transfer and removal, without supervision, constrained semantic correspondences are built within the same semantic space.
The identity loss function: this example uses the identity loss function proposed in the article A systematic magnetic-aware transform network for makeup transfer and removal to constrain the gradient consistency of the migration result with the makeup-free image to maintain identity consistency.
Local color loss function: this embodiment proposes a differentiable local color histogram loss function:
Figure BDA0003926142120000081
where item ∈ {1ip, eye, face } indicates decomposition of a face region into lips, eyes, and faces (faces do not include lips and eyes), <' > indicates point-to-point multiplication, hist indicates histogram statistics, mask item Parsing mask, | · | | non-calculation of the field of vision that represents the semantic information of the corresponding face 1 Representing the L1 norm. This loss function constrains the generated results to be consistent with the reference cosmetic color distribution within the locally identical semantic region.
Reconstruction loss function: when the input data is (y) gray ,y warp ) The present embodiment uses L1Norm and VGG 19-based perceptual loss to constrain reconstruction errors, constraining the migration junctionsFruits and y gt Consistency between them. In addition to this, the present embodiment also includes a widely used penalty function in which least squares penalty is used instead of negative log-likelihood penalty to stabilize the training.
The experimental results of the present invention are shown in fig. 7, and the comparative results with other methods are shown in fig. 5.
Fig. 6 is a method diagram showing comparative results of examples of the present invention, and the results are partially enlarged for better comparison of makeup migration effects of different methods.
In contrast to the eye makeup migration effect, the generation result of BeautyGAN fails to effectively migrate the eye makeup of the reference image into the result, such as the second row and the fifth row, because the region-level semantic correspondence fails to meet the eye makeup migration requirement. For the fifth row color makeup eye shadow, the PSGAN also fails to obtain a satisfactory makeup migration effect, and the eye shadow style of the generated result is obviously different from that of the reference image.
In contrast to the blush makeup migration effect, beautyGAN fails to effectively extract the blush style, resulting in the production result being unaware of the blush makeup migration effect, as in the third and sixth lines. PSGAN extracts part of the blush information, but the resulting blush is lighter in color and still more different from the reference makeup, as in lines six and seven. In contrast, the result of makeup migration of the present embodiment remains highly similar to the makeup style of the reference picture regardless of the lipstick, eye shadow, or blush. In the fifth, sixth and seventh rows, for a large-area blush and a color makeup eye shadow, other makeup transfer algorithms fail to be effective, and makeup of the reference image cannot be effectively rendered into the source image, but the makeup transfer algorithm of the embodiment can still generate a real and accurate makeup transfer result.
The invention can further proceed:
(1) Cosmetic migration intensity control: because the method extracts the makeup information, the intensity of makeup migration can be controlled as long as a certain weight is given. As shown in fig. 8.
(2) And (3) dressing editing: the proposed network structure allows for makeup editing. The meaning is that the user can smear the favorite color on the corresponding semantic position of the reference image, and the method of the embodiment can generate an ideal makeup migration result, thereby achieving the purpose of makeup editing. Other makeup migration technologies based on a creation network do not have this function. Such as fig. 9, fig. 10.
The invention researches makeup migration from the coloring perspective, and defines the makeup migration as the image coloring problem, so that a large amount of color images which are easy to collect are used as the self-supervision data of the network, and the problem of lack of paired data is better avoided. A large number of experiments verify the reasonability and effectiveness of the makeup transfer strategy in a real environment. The invention provides a substitute attention mechanism, which is used for calculating the semantic corresponding relation of a pixel level under low resolution, and carrying out patch block aggregation under high resolution by utilizing the corresponding relation, thereby greatly reducing the calculation amount on the premise of not losing makeup details. The invention explores a new makeup editing scene and verifies the effectiveness of the method in the scene.
It should be understood that the above description of the preferred embodiments is illustrative, and not restrictive, and that various changes and modifications may be made therein by those skilled in the art without departing from the scope of the invention as defined in the appended claims.

Claims (5)

1. A face image makeup migration method based on an image makeup migration network is characterized by comprising the following steps:
step 1: acquiring an original face image;
and 2, step: inputting the obtained original face image into an image makeup transfer network to generate a final makeup transfer result;
the image makeup migration network consists of a semantic correspondence sub-network and an image coloring sub-network;
the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution into the high resolution features;
said image coloring subnetwork comprising an identity encoder, a color distiller, and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified character identity characteristics and the semantically aligned makeup features into the decoder to generate a final makeup transfer result;
the feature extractor consists of 3 Convolition blocks with the step length of 2 and a bilinear interpolation operation; the identity encoder and the color distiller each consist of 2 conversion blocks with a step size of 2 and 2 Residual blocks; the decoder is composed of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attention and a pixel shuffle;
the convergence block consists of a convergence Layer, an isolation normalization Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two connected convergence blocks in series, with the last reluativationlayer removed.
2. The face image makeup transfer method based on the image makeup transfer network according to claim 1, characterized in that: in step 2, for the true data (x) gray Y), recording the color makeup-free picture set as X, wherein the color makeup-free picture is X, and X belongs to X; recording a color makeup picture set as Y, wherein the color makeup picture set is Y, and Y belongs to X; x is the number of gray A grayscale image representing a color makeup-free image x;
the specific implementation of the step 2 comprises the following sub-steps:
(1) Analyzing the face corresponding to x and y (x) parse 、y parse ) Are connected and input together to a feature extractor E feature In the method, characteristic variables of the makeup-free picture x and the makeup picture y are obtained
Figure FDA0003926142110000011
And
Figure FDA0003926142110000012
Figure FDA0003926142110000013
Figure FDA0003926142110000014
wherein x is parse 、y parse Representing face analysis, and obtaining from a semantic segmentation network BiSeNet;
(2) Applying L2 regularization to feature variables along channel dimensions
Figure FDA0003926142110000015
And
Figure FDA0003926142110000016
obtaining normalized characteristic variables;
(3) Calculating a matching score for describing the semantic corresponding relation of the local characteristic variables at each spatial position in a point multiplication mode, modeling the semantic corresponding relation, and obtaining a correlation matrix M:
Figure FDA0003926142110000021
wherein p and q are the spatial coordinates in the source image and the reference image, respectively,
Figure FDA0003926142110000022
representing variables located in source image features
Figure FDA0003926142110000023
The local feature variable at the spatial position p,
Figure FDA0003926142110000024
representing a variable located in a reference image
Figure FDA0003926142110000025
Local feature variables at spatial position q, M (p, q) e [0,1]A matching score representing two spatial positions p and q; the closer the matching score is to 0, the weaker the semantic correspondence is represented; conversely, the closer the matching score is to 1, the stronger the semantic correspondence is represented;
(4) In the image coloring subnetwork, the color makeup-free image x is first grayed to obtain a grayscale image x gray Then input to identity encoder E identity In extracting identity characteristics
Figure FDA0003926142110000026
(5) Then the color cosmetic image y is combined with face analysis to utilize y parse Separating out the face area, and carrying out dot multiplication on the face area and y to obtain y face Then fed to a color distiller E color In (1), acquiring color characteristics
Figure FDA0003926142110000027
And background color characteristics of the makeup-free image x
Figure FDA0003926142110000028
Wherein x is utilized parse Separating out non-face area, and dot multiplying it with x to obtain x back
(6) Calculating the obtained correlation matrix M and the color characteristics of the makeup image
Figure FDA0003926142110000029
Input into the proposed alternative attention module; first, the color characteristics
Figure FDA00039261421100000210
Carrying out channel backwashing card operation to ensure that the space size of the channel backwashing card is matched with the correlation matrix M; then, the correlation matrix M is utilized to aggregate the features of different spatial positions to obtain the semantically aligned makeup features
Figure FDA00039261421100000211
Figure FDA00039261421100000212
Wherein, softmax j Representing that softmax operation is performed on j columns, M (i, j) represents a correlation matrix obtained through calculation, epsilon is a scaling factor, pixel _ unshuffle represents channel anti-shuffling operation, and i, j respectively represent space coordinates;
(7) Obtaining the face color feature of the distorted makeup image with the same size as the input feature through the channel shuffle operation pixel _ shuffle
Figure FDA00039261421100000213
(8) Finally, the identity characteristics of the makeup-free image are determined
Figure FDA00039261421100000214
Background color characteristics of an unpainted image
Figure FDA00039261421100000215
And facial color features of the distorted makeup image
Figure FDA00039261421100000216
Performing channel dimension splicing, inputting into a decoder to obtain the final cosmetic transfer result
Figure FDA00039261421100000217
3. The face image makeup migration method based on the image makeup migration network according to claim 1, characterized in that: the image makeup migration network obtains a trained image makeup migration network by jointly training a semantic correspondence sub-network and an image coloring sub-network;
the semantic correspondence sub-network is trained, and the input data in the process comprises coloring simulation data (y) gray ,y warp ) And true data (x) gray Y); wherein y is gray ,y warp Gray scale image and affine transformation warped image, x, representing y, respectively gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;
the loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function and a reconstruction loss function; obtaining an integral loss function through weighting;
the local color loss function is:
Figure FDA0003926142110000031
wherein item ∈ { lip, eye, face } indicates decomposition of the face region into lips, eyes, and faces, <' > indicates point-to-point multiplication, hist indicates histogram statistics, mask item The analytic mask representing the semantic information of the corresponding face, | · | | the ground 1 Represents the L1 norm;
Figure FDA0003926142110000032
binary masks representing the corresponding lip, eye and face regions of images x and y, respectively, by separation x parse And y parse Obtaining;
Figure FDA0003926142110000033
indicating that makeup migration results were obtained during the training.
4. The face image makeup migration method based on the image makeup migration network according to claim 1, characterized in that: the image makeup migration network obtains a trained image makeup migration network through a joint training semantic correspondence sub-network and an image coloring sub-network;
the input data in the training process of the image coloring subnetwork comprises coloring simulation data (y) gray ,y warp ) And true data (x) gray Y); wherein, y gray ,y warp Respectively representing a gray-scale image of y and an affine transformation distorted image; x is the number of gray A grayscale image representing a color makeup-free image x, y representing a color makeup image;
the loss functions adopted in the training process comprise a semantic loss function, an identity loss function, a local color loss function, a reconstruction loss function and a countervailing loss function; obtaining an overall loss function through weighting;
the local color loss function is:
Figure FDA0003926142110000034
wherein item ∈ { lip, eye, face } indicates decomposition of the face region into lips, eyes, and faces, <' > indicates point-to-point multiplication, hist indicates histogram statistics, mask item The analytic mask representing the semantic information of the corresponding face, | · | | the ground 1 Represents the L1 norm;
Figure FDA0003926142110000035
binary masks representing the corresponding lip, eye and face regions of images x and y, respectively, by separating x parse And y parse Obtaining;
Figure FDA0003926142110000036
showing the cosmetic migration results obtained during the training;
in the antagonistic loss function, least square loss is used to stabilize training instead of negative log-likelihood loss.
5. A face image makeup migration system based on an image makeup migration network is characterized by comprising an original face image acquisition module and a makeup migration module:
the original face image acquisition module is used for acquiring an original face image;
the makeup migration module is used for inputting the obtained original face image into an image makeup migration network to generate a final makeup migration result;
the image makeup migration network consists of a semantic corresponding sub-network and an image coloring sub-network;
the semantic correspondence sub-network comprises a feature extractor and a substitute attention module; the feature extractor is used for extracting rich space semantic features for feature matching; the alternative attention module is used for successfully mapping the learning semantic corresponding relation under the low resolution to the high resolution feature;
the image coloring sub-network comprises an identity encoder, a color distiller and a decoder; the identity encoder is used for extracting the identity characteristics of the person; the color distiller is used for distilling the makeup characteristic of the reference image; the decoder is used for inputting the unmodified character identity characteristics and the semantically aligned makeup features into the decoder to generate a final makeup transfer result;
the feature extractor is operated by 3 Convolition blocks with the step length of 2 and a bilinear interpolation; the identity encoder and the color distiller each consist of 2 Convolition blocks with a step size of 2 and 2 Residual blocks; the decoder is composed of 4 Residual blocks and 2 conversion blocks, and performs up-sampling operation through bilinear interpolation; the alternative attention module consists of a pixel unshuffle operation, a cross-attribute and a pixel shuffle;
the convention block consists of a convention Layer, an organization normalized Layer and a ReLUActivate Layer which are connected in sequence; the Residual block is composed of two fusion blocks connected in series, with the last ReLUActivate layer removed.
CN202211374753.XA 2022-11-04 2022-11-04 Face image makeup migration method and system based on image makeup migration network Pending CN115760552A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211374753.XA CN115760552A (en) 2022-11-04 2022-11-04 Face image makeup migration method and system based on image makeup migration network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211374753.XA CN115760552A (en) 2022-11-04 2022-11-04 Face image makeup migration method and system based on image makeup migration network

Publications (1)

Publication Number Publication Date
CN115760552A true CN115760552A (en) 2023-03-07

Family

ID=85356236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211374753.XA Pending CN115760552A (en) 2022-11-04 2022-11-04 Face image makeup migration method and system based on image makeup migration network

Country Status (1)

Country Link
CN (1) CN115760552A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036157A (en) * 2023-10-09 2023-11-10 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036157A (en) * 2023-10-09 2023-11-10 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium
CN117036157B (en) * 2023-10-09 2024-02-20 易方信息科技股份有限公司 Editable simulation digital human figure design method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN109978930B (en) Stylized human face three-dimensional model automatic generation method based on single image
Xiang et al. Deep learning for image inpainting: A survey
CN112766160B (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN109376582A (en) A kind of interactive human face cartoon method based on generation confrontation network
CN113807265B (en) Diversified human face image synthesis method and system
CN105488472A (en) Digital make-up method based on sample template
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN113570684A (en) Image processing method, image processing device, computer equipment and storage medium
Liu et al. Bayesian Tensor Inference for Sketch-Based Facial Photo Hallucination.
CN113362422B (en) Shadow robust makeup transfer system and method based on decoupling representation
CN111950430A (en) Color texture based multi-scale makeup style difference measurement and migration method and system
EP4268198A1 (en) Methods and systems for extracting color from facial image
CN115760552A (en) Face image makeup migration method and system based on image makeup migration network
EP4214685A1 (en) Methods and systems for forming personalized 3d head and facial models
Hu et al. Face reenactment via generative landmark guidance
Bian et al. Conditional adversarial consistent identity autoencoder for cross-age face synthesis
CN111402403B (en) High-precision three-dimensional face reconstruction method
He Application of local color simulation method of landscape painting based on deep learning generative adversarial networks
CN111563944B (en) Three-dimensional facial expression migration method and system
US11734889B2 (en) Method of gaze estimation with 3D face reconstructing
Guo et al. Facial parts swapping with generative adversarial networks
Zhao et al. Purifying naturalistic images through a real-time style transfer semantics network
Cao et al. Guided cascaded super-resolution network for face image
Wu et al. HIGSA: Human image generation with self-attention
CN113781372A (en) Deep learning-based opera facial makeup generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination