CN110084121A

CN110084121A - Implementation method based on the human face expression migration for composing normalized circulation production confrontation network

Info

Publication number: CN110084121A
Application number: CN201910240461.9A
Authority: CN
Inventors: 吴晨; 李雷; 陈芸
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-08-02

Abstract

Present invention discloses a kind of implementation methods based on the human face expression migration for composing normalized circulation production confrontation network, method includes the following steps: S1: acquiring various human face expression pictures, and classified one by one according to human face expression；S2: picture pretreatment removes fuzzy photo, then employment face detection algorithm obtains five key points of face, and uniformly cuts face picture according to key point；S3: the circulation production of building generator and arbiter composition fights network, and the pretreated picture of two classes is inputted respectively in network and calculates loss function and is trained；S4: the tool that trained generator is migrated as human face expression is obtained, and is applied to actual measurement.A generator is enabled to realize a variety of human face expression migrations based on normalized circulation production confrontation network is composed, and the human face expression generated can be more naturally, there is preferably robustness.

Description

Realization based on the human face expression migration for composing normalized circulation production confrontation network Method

Technical field

The present invention relates to a kind of realization sides based on the human face expression migration for composing normalized circulation production confrontation network Method can be used for image processing technology in computer vision.

Background technique

In recent years, with the high speed development of artificial intelligence, deep learning also becomes popular research field, and production The it is proposed of confrontation network accelerates the process of deep learning.The scholars such as University of Montreal Ian Goodfellow mentioned in 2014 Production confrontation network is gone out, the network of production confrontation in recent years has become one of research hotspot in deep learning.

It is a kind of generation model that production, which fights network, its structure is inspired in zero-sum two-person game.Production pair Anti- network contains a generator and an arbiter.Generator can be learnt by the potential distribution of truthful data, Then the data distribution that generator can generate vacation carrys out approach truthful data.Arbiter is a classifier, can be judged Whether data distribution is true.By the continuous competition learning of two networks, generator can generate false number more and more true to nature According to distribution, it is finally reached the effect mixed the spurious with the genuine.

Circulation production confrontation network is then the combination that production is fought to network and the conversion of figure to figure, recycles production pair Anti- network is substantially the production confrontation network of two mirror symmetries, constitutes a loop network.Two production confrontation Two generators of network share and two arbiters, i.e., there are two arbiter and two generators altogether.It recycles production and fights net Network is suitble to handle picture Style Transfer, but applies and hold during the local feature migration of face removal this picture of glasses Easily there is the change of picture integral color, i.e., is likely to occur the variation in other regions of face after face removal glasses.

Summary of the invention

The object of the invention is to propose a kind of based on spectrum normalization to solve the above-mentioned problems in the prior art Circulation production confrontation network human face expression migration implementation method.

The purpose of the invention will be achieved through the following technical solutions: fights net based on the normalized circulation production of spectrum The implementation method of the human face expression migration of network, method includes the following steps:

S1: various human face expression pictures are acquired, and are classified one by one according to human face expression, sorted human face expression number is obtained According to collection；

S2: pre-processing the sorted human face expression data set picture that the S1 step obtains, and removes fuzzy photo, then Employment face detection algorithm obtains five key points of face, and uniformly cuts face picture according to key point, obtains pretreated Human face expression data set；

S3: the circulation production of building generator and arbiter composition fights network, and the pretreatment that S2 step is obtained Loss function is calculated in human face expression data set input network afterwards and is trained, and trained Maker model is obtained；

S4: the tool that the trained generator of acquisition of the S3 step is migrated as human face expression, and be applied to real It surveys.

Preferably, in the S1 step, acquisition human face expression picture needs classification balanced, and all kinds of human face expression pictures need Will more than two thousand sheets, and need face is clear, posture rectify.

Preferably, in the S2 step, fuzzy photo is removed, then employment face detection algorithm obtains five keys of face Point, and uniformly cut face picture according to key point, such as has that certain class human face expression picture is less, then to the progress of this kind of pictures Data enhancing.

Preferably, in the S2 step, it includes generator G that circulation, which generates confrontation network, and arbiter D, generator G are negative Only hot vector of human face expression picture and human face expression is generated its color and pays attention to trying hard to pay attention to trying hard to mask by duty, arbiter D Be responsible for calculating the only hot vector of human face expression of probability and input picture that input picture is real human face expression picture, arbiter and The human face expression that generator all only needs a model that multiple domain can be completed is converted.

Preferably, the generator is made of four parts: convolutional layer, warp lamination, residual error network block and convolution pay attention to Power module；Convolutional layer extracts the characteristic information of picture by convolution operation, and with the multiple-layer stacked of convolution operation, the letter of extraction Breath is more and more abstract；Residual error network block in network structure is used in order to which low-level image feature is passed into high level, and inhibits ladder The case where degree disappears；The convolutional layer superposition of multilayer is equivalent to an encoder, extracts high latitude information；By the warp lamination of multilayer Superposition is equivalent to a decoder；Decoding process restores low-level features from feature vector；The arbiter is by two parts Composition: convolutional layer, convolution pay attention to power module, and the input of arbiter is a picture, and output is that input picture is true picture The only hot vector of the human face expression of probability and input picture.

Preferably, the convolution notices that power module includes that channel pays attention to power module and space transforms power module, and channel pays attention to Power module enhances important channel information, space transforms power module is used for the space of weighted feature figure for weighting channel characteristics Feature, important spatial information in Enhanced feature figure.

Preferably, it is realized in generator and arbiter by constraining the spectral norm of each convolution in the S3 step entire It is continuous that generator and the nonlinear mapping function of arbiter fitting meet Lipschitz, is conducive to generator fitting truthful data Distribution；Pay attention to trying hard to pay attention to mask to try hard to the synthesis process with true picture by the color that generator generates in the S3 step In, as shown in Figure 2: first color is paid attention to trying hard to pay attention to trying hard to carry out being multiplied to obtain face change regional value point by point with mask, Mask is noticed that try hard to reversion is multiplied to obtain face constant region thresholding with true picture point by point, finally carries out point-by-point phase for the two again The human face expression picture that add operation is synthesized.

Preferably, both the arbiter and generator group are combined into arbiter loss function and classification loss function, circulation Loss function is that desired picture image after generator converts human face expression twice is consistent, and the expression of last time is solely hot Solely hot vector is consistent for the expression of vector sum original image, constitutes a circulation.

Preferably, it is trained in the S3 step to based on the normalized circulation production confrontation network of spectrum, including step It is rapid:

S31: by solely hot vector input is given birth to by true human face expression a picture and human face expression b described in the S2 step The G that grows up to be a useful person obtains its color and pays attention to trying hard to pay attention to trying hard to mask, then the color is paid attention to trying hard to paying attention to trying hard to mask and really The human face expression b picture that is synthetically generated of human face expression a picture, the human face expression b picture input arbiter D of generation is obtained The human face expression b picture of generation is the human face expression of genuine probability and predictionOnly hot vector, minimizes the human face expression of prediction The only hot vector of only hot vector sum human face expression b；

S32: by the human face expression b picture of generation described in the S31 step and human face expression a, solely hot vector inputs generator G obtains its color and pays attention to trying hard to pay attention to trying hard to mask, then the color is paid attention to trying hard to the people for paying attention to trying hard to and generating with mask The human face expression a picture that face expression b picture is synthetically generated minimizes the human face expression a picture and real human face expression a figure of generation Piece；

S33: being trained the generator G and arbiter D, so that the loss function of network minimizes.

Preferably, loss function designs in the S33 step are as follows:

L (G, D)=L_GAN(G, D)+λ L_cyc(G)+βL_TV(A)+χL_label(D)

Wherein,

In the formula, G is generator, and D is arbiter, D_IIt is that arbiter prediction inputs picture as the general of human face expression figure Rate, D_labelIt is the only hot vector of human face expression of arbiter prediction input picture, A is that mask pays attention to trying hard to, A_{I, j}For mask attention I-th row j of figure is arranged, x_aIt is the human face expression a picture in training sample, a x_aSolely hot vector, b are training to corresponding human face expression Solely hot vector, x ' are the generations obtained by human face expression a picture and only hot vector b by generator to the human face expression of Shi Zhiding Human face expression b picture, x " is by the people of the human face expression b picture that generates and only hot vector a generation obtained by generator Face expression a picture,It is x_aPicture and only hot vector b pay attention to trying hard to by the mask that generator obtains,It is x_aFigure Piece and only hot vector b pay attention to trying hard to by the color that generator obtains, A_{G (x ', a)}It is x ' picture and solely hot vector a passes through generation The mask that device obtains pays attention to trying hard to, C_{G (x ', a)}It is that x ' picture and only hot vector a pay attention to trying hard to by the color that generator obtains, λ, β, χ are hyper parameter,To be multiplied point by point, L_GANIt is arbiter loss, L_cycIt is circulation loss, L_TVIt is full variational regularization loss, L_labelIt is Tag Estimation loss

The invention adopts the above technical scheme compared with prior art, has following technical effect that this method can solve to scheme The problems such as piece local feature cannot focus on local feature and striated noise during migrating.Based on the normalized circulation of spectrum Production confrontation network enables to a generator to realize a variety of human face expression migrations, and the human face expression that mesh generates can be more Add naturally, there is preferable robustness.

Detailed description of the invention

Fig. 1 is that the present invention is based on the implementation methods for the human face expression migration for composing normalized circulation production confrontation network Flow chart.

Fig. 2 is that the present invention is based on the structure charts for composing generator in normalized circulation production confrontation network.

Fig. 3 is that the present invention is based on the structure charts for composing residual error network block in normalized circulation production confrontation network.

Fig. 4 is that the present invention is based on compose normalized circulation production to fight the structure chart that convolution in network pays attention to power module.

Fig. 5 is that the present invention is based on compose convolution in normalized circulation production confrontation network to notice that the channel of power module pays attention to The structure chart of power module.

Fig. 6 is that the present invention is based on compose normalized circulation production to fight the space transforms that convolution in network pays attention to power module The structure chart of power module.

Fig. 7 is that the present invention is based on the structure charts for composing arbiter in normalized circulation production confrontation network.

Fig. 8 is that the present invention is based on compose normalized circulation production to fight the process that generator in network is synthetically generated picture Figure.

Specific embodiment

The purpose of the present invention, advantage and feature, by by the non-limitative illustration of preferred embodiment below carry out diagram and It explains.These embodiments are only the prominent examples using technical solution of the present invention, it is all take equivalent replacement or equivalent transformation and The technical solution of formation, all falls within the scope of protection of present invention.

Present invention discloses a kind of realizations based on the human face expression migration for composing normalized circulation production confrontation network Method, specific process is as shown in Fig. 1, method includes the following steps:

It is specific as follows: to find picture website, find human face expression picture and guarantee that picture is relatively clear.Utilize crawler skill Art crawls all kinds of human face expression pictures from website respectively, and guarantees that the human face expression picture of every one kind is greater than 1,000.

S2: pre-processing sorted human face expression data set picture, removes fuzzy photo, then employment face detection algorithm obtains To five key points of face, and face picture is uniformly cut according to key point, obtains pretreated human face expression data set；

Screening picture one by one, the picture that removal obscures and image content is not inconsistent.By the unified cutting of the picture screened For 128*128 size, saved respectively according to all kinds of expressions of facial image.

S3: the circulation production of building generator and arbiter composition fights network, and by pretreated human face expression Loss function is calculated in data set input network and is trained, and trained Maker model is obtained；

The network structure of generator is as shown in Fig. 2.Generator is made of four parts: convolutional layer, warp lamination, residual error Network block and convolution pay attention to power module.Convolutional layer extracts the characteristic information of picture by convolution operation, and with convolution operation Multiple-layer stacked, the information of extraction is more and more abstract.Residual error network block in network structure is as shown in Fig. 3, residual error network block It is and to inhibit the phenomenon that gradient disappears in order to which low-level image feature is passed into high level.The convolutional layer superposition of multilayer is equivalent to One encoder extracts high latitude information, and the deconvolution layer superposition of multilayer is equivalent to a decoder.

It joined convolution attention mechanism module in the S3 step in the network structure of generator and arbiter.Convolution Notice that power module is as shown in Fig. 4: paying attention to power module and space transforms power module comprising channel.Channel pays attention to power module such as attached drawing Shown in 5, space transforms power module is as shown in Fig. 6.Channel notices that power module is to enhance important lead to weight channel characteristics Road information.Space transforms power module is the important spatial information in Enhanced feature figure for the space characteristics of weighted feature figure.It is raw The input grown up to be a useful person is a kind of picture, and output is that color pays attention to trying hard to pay attention to trying hard to mask.

The continuity of whole network is constrained in the network structure of generator and arbiter in the S3 step.Its In realize that it is continuous that whole network meets Lipschitz by constraining the spectral norm of each convolution in arbiter and generator.

It includes generator G, arbiter D that circulation, which generates confrontation network, in the S3 step.Generator G is responsible for human face expression Only hot vector of picture and human face expression generates its color and pays attention to trying hard to pay attention to trying hard to mask, and arbiter D is responsible for calculating input Picture is the probability of real human face expression picture and the only hot vector of human face expression of input picture.Arbiter and generator all only need Want a model that the human face expression conversion of multiple domain can be completed.

The network structure of arbiter is as shown in Fig. 7, and arbiter is made of two parts: convolutional layer, convolution attention mould Block.The input of arbiter is a picture, and output is the human face expression of the probability that input picture is true picture and input picture Only hot vector.

Loss function is to lose letter by arbiter loss function, circulation loss function, full variation difference loss function and classification Array synthesis.Arbiter be desirable to accurately judge picture whether be true picture and input picture expression classification, Generator is then desirable to the facial image according to input and the human face expression specified the solely corresponding human face expression of hot vector generation Image, i.e. expectation arbiter judge that the picture generated can accurately identify the human face expression of generation for true picture and arbiter The classification of image, the two group are combined into arbiter loss function and classification loss function.

Circulation loss function is then that expectation picture image after generator conversion twice human face expression is consistent and last Solely hot vector is consistent for the expression of the only hot vector sum original image of primary expression, constitutes a circulation.Full variation difference loss function The mask for being to be able to that generator is generated notices that trying hard to A has preferable continuity, and mask is paid attention to Try hard to change face as small as possible during i.e. human face expression small as far as possible converts.

The training process, specifically:

S31: by solely hot vector inputs generator G by true human face expression a picture and human face expression b described in S2 step It obtains its color to pay attention to trying hard to pay attention to trying hard to mask, then this color is paid attention to trying hard to paying attention to trying hard to and true face with mask The human face expression b picture that expression a picture is synthetically generated.

The human face expression b picture that the human face expression b picture input arbiter D of generation is generated is for genuine probability and in advance The human face expression of surveyOnly hot vector.Minimize the human face expression of predictionThe only hot vector of only hot vector sum human face expression b.

S32: by the human face expression b picture of generation described in S31 step and human face expression a, solely hot vector input generator G is obtained Pay attention to trying hard to pay attention to trying hard to mask to its color, then this color is paid attention to trying hard to the face table for paying attention to trying hard to and generating with mask The human face expression a picture that feelings b picture is synthetically generated.Synthesis process is as shown in Fig. 8, minimizes the human face expression a picture of generation With real human face expression a picture.

Loss function are as follows:

L (G, D)=L_GAN(G, D)+λ L_cyc(G)+βL_TV(A)+χL_label(D)

Wherein,

G is generator in the formula, and D is arbiter, D_IIt is that arbiter prediction inputs picture as the general of human face expression figure Rate, D_labelIt is the only hot vector of human face expression of arbiter prediction input picture, A is that mask pays attention to trying hard to, A_{I, j}For mask attention I-th row j of figure is arranged, x_aIt is the human face expression a picture in training sample, a x_aSolely hot vector, b are training to corresponding human face expression Solely hot vector (different and a), x ' are to be obtained by human face expression a picture and only hot vector b by generator to the human face expression of Shi Zhiding The human face expression b picture for the generation arrived, x " are that the human face expression b picture and only hot vector a by generating are obtained by generator The human face expression a picture of generation,It is x_aPicture and only hot vector b pay attention to trying hard to by the mask that generator obtains,It is x_aPicture and only hot vector b pay attention to trying hard to by the color that generator obtains, A_{G (x ', a)}It is x ' picture and only hot vector A pays attention to trying hard to by the mask that generator obtains, C_{G (x ', a)}It is the color note that x ' picture and only hot vector a are obtained by generator Meaning is tried hard to, and λ, β, χ are hyper parameter,To be multiplied point by point. L_GANIt is arbiter loss, L_cycIt is circulation loss, L_TVBe full variation just Then change loss, L_labelIt is Tag Estimation loss.

S4: the tool that trained generator is migrated as human face expression is obtained, and is applied to actual measurement.

To sum up, the present invention is by the way that by human face expression picture and corresponding human face expression, solely hot vector is input to based on spectrum normalizing In the circulation production confrontation network of change, training pattern obtains training perfect generator G.The first generator G can be incited somebody to action at this time According to the facial image of input and specified human face expression, solely hot vector generates the human face expression picture of correspondence image.It will be based on spectrum Normalized circulation production confrontation role of network has accomplished the good transformation of human face expression in human face expression migration application, Optimize the problem of circulation generates a shortcomings that model can only train a kind of expression to migrate in confrontation network and easy over-fitting.

Still there are many embodiment, all technical sides formed using equivalents or equivalent transformation by the present invention Case is within the scope of the present invention.

Claims

1. the implementation method based on the human face expression migration for composing normalized circulation production confrontation network, it is characterised in that: should Method the following steps are included:

S1: various human face expression pictures are acquired, and are classified one by one according to human face expression, sorted human face expression data are obtained Collection；

S2: the sorted human face expression data set picture that the S1 step obtains is pre-processed, fuzzy photo, then employment are removed Face detection algorithm obtains five key points of face, and uniformly cuts face picture according to key point, obtains pretreated face Expression data collection；

S3: the circulation production of building generator and arbiter composition fights network, and S2 step is obtained pretreated Loss function is calculated in human face expression data set input network and is trained, and trained Maker model is obtained；

S4: the tool that the trained generator of acquisition of the S3 step is migrated as human face expression, and it is applied to actual measurement.

2. the realization according to claim 1 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: in the S1 step, acquisition human face expression picture needs classification balanced, all kinds of human face expression pictures Need more than two thousand sheets, and need face is clear, posture rectify.

3. the realization according to claim 1 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: in the S2 step, remove fuzzy photo, then employment face detection algorithm obtains five keys of face Point, and uniformly cut face picture according to key point, such as has that certain class human face expression picture is less, then to the progress of this kind of pictures Data enhancing.

4. the realization according to claim 1 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: in the S2 step, it includes generator G that circulation, which generates confrontation network, and arbiter D, generator G are negative Only hot vector of human face expression picture and human face expression is generated its color and pays attention to trying hard to pay attention to trying hard to mask by duty, arbiter D Be responsible for calculating the only hot vector of human face expression of probability and input picture that input picture is real human face expression picture, arbiter and The human face expression that generator all only needs a model that multiple domain can be completed is converted.

5. the realization according to claim 4 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: the generator is made of four parts: convolutional layer, warp lamination, residual error network block and convolution pay attention to Power module；Convolutional layer extracts the characteristic information of picture by convolution operation, and with the multiple-layer stacked of convolution operation, the letter of extraction Breath is more and more abstract；Residual error network block in network structure is used in order to which low-level image feature is passed into high level, and inhibits ladder The case where degree disappears；The convolutional layer superposition of multilayer is equivalent to an encoder, extracts high latitude information；By the warp lamination of multilayer Superposition is equivalent to a decoder；Decoding process restores low-level features from feature vector；The arbiter is by two parts Composition: convolutional layer, convolution pay attention to power module, and the input of arbiter is a picture, and output is that input picture is true picture The only hot vector of the human face expression of probability and input picture.

6. the realization according to claim 5 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: the convolution notices that power module includes that channel pays attention to power module and space transforms power module, and channel pays attention to Power module enhances important channel information, space transforms power module is used for the space of weighted feature figure for weighting channel characteristics Feature, important spatial information in Enhanced feature figure.

7. the realization according to claim 4 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: whole to realize by the spectral norm for constraining each convolution in generator and arbiter in the S3 step It is continuous that a generator and the nonlinear mapping function of arbiter fitting meet Lipschitz, is conducive to the true number of generator fitting According to distribution；Pay attention to trying hard to paying attention to mask trying hard to synthesizing with true picture by the color that generator generates in the S3 step Cheng Zhong, as shown in Figure 2: paying attention to trying hard to pay attention to trying hard to carry out being multiplied to obtain face change region point by point with mask by color first Value, then mask is paid attention to trying hard to reversion and is multiplied to obtain face constant region thresholding point by point with true picture, finally by the two progress by The human face expression picture that point phase add operation is synthesized.

8. the realization according to claim 4 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: both the arbiter and generator group are combined into arbiter loss function and classification loss function, circulation Loss function is that desired picture image after generator converts human face expression twice is consistent, and the expression of last time is solely hot Solely hot vector is consistent for the expression of vector sum original image, constitutes a circulation.

9. the realization according to claim 1 based on the human face expression migration for composing normalized circulation production confrontation network Method, it is characterised in that: be trained in the S3 step to based on the normalized circulation production confrontation network of spectrum, including step It is rapid:

S31: by solely hot vector inputs generator G by true human face expression a picture and human face expression b described in the S2 step It obtains its color to pay attention to trying hard to pay attention to trying hard to mask, then the color is paid attention to trying hard to paying attention to trying hard to and true face with mask The human face expression b picture that expression a picture is synthetically generated, the people that the human face expression b picture input arbiter D of generation is generated Face expression b picture is the human face expression of genuine probability and predictionOnly hot vector, minimizes the human face expression of predictionOnly hot vector With the only hot vector of human face expression b；

S32: by the human face expression b picture of generation described in the S31 step and human face expression a, solely hot vector input generator G is obtained Pay attention to trying hard to pay attention to trying hard to mask to its color, then the color is paid attention to trying hard to the face table for paying attention to trying hard to and generating with mask The human face expression a picture that feelings b picture is synthetically generated minimizes the human face expression a picture and real human face expression a picture of generation；

10. the reality according to claim 1 based on the human face expression migration for composing normalized circulation production confrontation network Existing method, it is characterised in that:

Loss function designs in the S33 step are as follows:

L (G, D)=L_GAN(G, D)+λ L_cyc(G)+βL_TV(A)+χL_label(D)

Wherein,

In the formula, G is generator, and D is arbiter, D_IIt is the probability that arbiter prediction input picture is human face expression figure, D_labelIt is the only hot vector of human face expression of arbiter prediction input picture, A is that mask pays attention to trying hard to, A_{I, j}Pay attention to trying hard to for mask The i-th row j column, x_aIt is the human face expression a picture in training sample, a x_aSolely hot vector, b are when training to corresponding human face expression Solely hot vector, x ' are by the people of human face expression a picture and only hot vector b generation obtained by generator to specified human face expression Face expression b picture, x " are the human face expression b picture and the face table of only hot vector a generation obtained by generator by generating Feelings a picture,It is x_aPicture and only hot vector b pay attention to trying hard to by the mask that generator obtains,It is x_aPicture and Only hot vector b pays attention to trying hard to by the color that generator obtains, A_{G (x ', a)}It is that x ' picture and only hot vector a are obtained by generator Mask pay attention to trying hard to, C_{G (x ', a)}It is that x ' picture and only hot vector a pay attention to trying hard to by the color that generator obtains, λ, β, χ are Hyper parameter,To be multiplied point by point, L_GANIt is arbiter loss, L_cycIt is circulation loss, L_TVIt is full variational regularization loss, L_labelIt is Tag Estimation loss.