CN108288072A

CN108288072A - A kind of facial expression synthetic method based on generation confrontation network

Info

Publication number: CN108288072A
Application number: CN201810078963.1A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-01-26
Filing date: 2018-01-26
Publication date: 2018-07-17

Abstract

What is proposed in the present invention is a kind of based on the facial expression synthetic method for generating confrontation network, and main contents include：The facial expression synthesis of geometry guiding and facial geometric operation, its process is, it first gives the thermal map of target face expression and the front face of expression does not synthesize new facial image correspondingly, then summation is weighted to all loss functions and obtains total losses function, then facial expression edition is guided using the geometric position of one group of datum mark, by using the acquisition facial expression transfer of Expression synthesis model as a result, carrying out facial expression interpolation finally by the value of Serial regulation form parameter.The present invention fights network using the generation of geometry guiding, the photorealism of different expressions can be generated from single image, and fine granularity control is carried out to composograph, facial expression transfer and interpolation can also be easily carried out, facial expression transfer is realized and intersects Expression Recognition.

Description

Facial expression synthesis method based on generation countermeasure network

Technical Field

The invention relates to the field of facial expression synthesis, in particular to a facial expression synthesis method based on a generation countermeasure network.

Background

The human face plays a very important information expression role in human communication, which conveys human emotional and mental states. In recent years, automatic processing of facial expressions by computers has become a hot research topic in the fields of computer vision, computer graphics, pattern recognition and the like, and has wide application prospects in video conferences, movie production, intelligent human-computer interfaces and the like. The facial expression processing includes facial expression recognition and facial expression synthesis. The synthesis of the facial expressions enables people to use the equipment more conveniently, and if the synthesis of the facial expressions enables a computer to generate fine and vivid facial expression animations, the interestingness of man-machine interaction can be further increased, and a better interaction atmosphere can be created. The facial expression synthesis can also be applied to character simulation in movies, games or advertisements, and by applying the facial expression synthesis and data driving or parameter driving, the manufacturing cost can be greatly reduced, and the working efficiency is improved. The face of a suspected molecule is reproduced through a reconstruction and synthesis technology, and a key clue is provided for case detection and pursuit; the facial expression synthesis can also make the traditional computer-aided instruction system more vivid and interesting, and further improve the learning enthusiasm of students. However, the conventional method utilizes a variational self-encoder which can generate a high-resolution realistic image, but the computation is very complex, and the image generated by the depth generation model is often lack of details, blurred or low in resolution.

The invention provides a facial expression synthesis method based on a generation countermeasure network, which comprises the steps of firstly giving a heat map of a target facial expression and a front face without the expression to correspondingly synthesize a new face image, then carrying out weighted summation on all loss functions to obtain a total loss function, then adopting the geometric positions of a group of reference points to guide facial expression editing, obtaining a facial expression transfer result by using an expression synthesis model, and finally carrying out facial expression interpolation by linearly adjusting the value of a shape parameter. The invention utilizes the generation of the confrontation network guided by geometry to generate vivid images with different expressions from a single image, and carries out fine-grained control on the composite image, and can also easily carry out facial expression transfer and interpolation to realize facial expression transfer and cross expression recognition.

Disclosure of Invention

The invention aims to provide a facial expression synthesis method based on generation of a countermeasure network, which aims to solve the problems of image blurring or low resolution and the like.

In order to solve the above problems, the present invention provides a facial expression synthesis method based on generation of an confrontation network, which mainly comprises:

geometrically guided facial expression synthesis;

and (II) geometric operation of the face.

Wherein said geometrically guided facial expression synthesis, as with an Active Appearance Model (AAM), the face geometry is defined by a set of fiducial points; the heat map is used to encode the location of the facial fiducial and provides a per-pixel likelihood of fiducial location; given a heat map of the target facial expression and a frontal face without expression (hereinafter referred to as an expressionless face), a new face image (an expressive face) is synthesized accordingly;

given a pair of generators G_E:(I^N,H^E)→I^EAnd G_N:(I^E,H^E)→I^NWherein, I^NIs a non-expressive face, I^EIs a facial expression, H^EIs corresponding to I^EA heat map of (a); the two discriminators associated with these two generators are D_EAnd D_NDividing the actual triples (I, H, I') by the corresponding regions and generating the triples (I, H, g (I)); i and I' are the facial images of the blankness and expression, and vice versa;

H^Eas a supplementary note in control and expression removal in expression synthesis in both of these face editing modes; in the expression synthesis process, H^EFor specifying target expressions, so that G_ECan express neutral expression I^NConverting into a desired expression; in the expression removal process, H^EResponsible for refers toShow that^EIn order to I^NRecovery of (1);

the loss of the geometric-guided facial expression synthesis comprises antagonism loss, pixel loss, cycle consistency loss and identity retention loss, and the weighted sum of the four loss functions is the total loss function.

Further, the antagonism loss and pixel loss, since the proposed face editing model generates a result conditioned on the input face image and heat map, apply a Generation Antagonism Network (GAN) to the condition setting; the loss of opposition of the generator and the discriminator are respectively shown as follows:

the generator is not only tasked with deceiving the discriminator, but also synthesizing a real image similar to the target calibration as much as possible; loss per pixel L_pixelForcing the transformed face image to have a small distance with calibrated real data in the original pixel space; l is_pixelThe form of (A) is as follows:

L_pixel＝E_{I,H,I′～P(I,H,I′)}‖I′-G(I,H)‖₁(3)

encouraging fuzzy output using the L1 distance; (I, H, I') is (I)^N,H^E,I^E) And (I)^E,H^E,I^N) Depending on the generator.

Further, the cycle consistency loss, generator G_EAnd G_NConstructing a complete mapping cycle between the neutral expression face and the expression face; if a facial image is converted from a neutral expression to an angry expression and then converted to a neutral expression, then it is idealThe same face image should be obtained in the case; therefore, an extra cycle consistency penalty L is introduced_cycTo ensure the consistency of the source image with the reconstructed image, e.g. I^NAnd G_N(G_E(I^N,H^E),H^E)、I^EAnd G_E(G_N(I^E,H^E),H^E)；L_cycThe calculation is as follows:

L_cyc＝E_I,H～P(I,H)‖I-G′(G(I,H))‖₁(4)

wherein G' is the generator opposite G; if G is used to convert the neutral expression to the expression specified by the facial geometry heat map H, then G' is used to restore the neutral expression with the help of H.

Further, the identity is preserved lost, facial expression editing preserves facial features after expression synthesis and removal; thus, identity retention terms are employed to enforce identity consistency:

L_identity＝E_I,H～P(I,H)‖F(I)-F(G(I,H))‖₁(5)

wherein F is a feature extractor for facial recognition; adopting a model of the mitigation convolutional neural network as a feature extraction network, wherein the model comprises 9 convolutional layers, 4 maximum pool layers and 1 full-connection layer; the convolutional neural network is lightened to be trained into a classifier capable of distinguishing various identities in advance, so that the most prominent characteristic of face recognition can be captured; thus, the face identity retained by the face editing process can be performed with this loss.

Further, the total loss function, the final complete target G of the generator_NAnd G_EIs a weighted sum L of all losses defined above_G-advRemoving the pattern difference, L, between the actual and generated samples_pixelEnsuring pixel correctness, L_cycEnsures the period consistency of the reconstructed image and the source image, L_identityThe identity is preserved by the mapping process, so the total loss function is:

L_G＝L_G-adv+α₁L_pixel+α₂L_cyc+α₃L_identity(6)

wherein, α₁、α₂、α₃Is the loss weight coefficient.

The facial geometric operations comprise facial expression editing, facial expression transfer, facial expression synthesis and interpolation.

Further, the facial expression editing is guided by the geometric positions of a group of datum points; the human face has unique physiological structure characteristics, so that the correlation between the positions of the datum points is strong; therefore, changes in the face geometry should be limited to avoid unreasonable settings; in consideration of the prior knowledge of face distribution, a parameterized shape model is established as a geometric generator, and a basic shape model is learned from a marked training image;

firstly, normalizing the face to the same proportion according to the positions of two eyes and rotating the face to the horizontal; principal Component Analysis (PCA) is then applied to obtain a basic shape model of the K fiducial locations:

s(p)＝s₀+S_p(7)

wherein, s₀∈R^2K×1,S∈R^2K×N,p∈RN^×1(ii) a Basic shape s₀Is the average shape of all training images, the column of S is the N eigenvectors corresponding to the N largest eigenvalues; different face geometries can be obtained by varying the value of the shape parameter p;

however, facial geometry is not only related to facial expressions, but also has a large relationship to facial identity; facial geometry varies from person to person even under the same expression; for example, the distance between the eyes and the length of the nose depend largely on the identity of the face and not the expression; based on these individual differences, an individual-specific shape model based on equation (7) is proposed, which can be generated by using individual-specific shape modelsNeutral shapeInstead of the mean shape s₀Exporting; the individual specific shape model is given by:

wherein,representing identity related changes and p representing changes caused by facial expressions.

Further, the facial expression is transferred, and two expression faces I are given^AAnd I^BDetecting facial markers s^AAnd s^B(ii) a The expressive removal model is first used to restore the non-expressed face:

wherein,andrespectively represent I^AAnd I^BNeutral expressive faces of (1); thus, a neutral shapeAndmay be obtained by facial marker detection; the shape parameters are then derived by solving the following least squares regression problem:

changing the shape parameters to obtain the transfer position of the reference point:

converting the heat map according to the shape of the transfer, and connecting the heat map with the corresponding non-expressive surface to be used as an input of expression synthesis; finally, the result of facial expression transfer is obtained by using the expression synthesis model:

the expression synthesis model is represented by the above formula.

Further, the facial expression synthesis and interpolation firstly prepares a neutral expression face image and shape parameters for the target expression; obtaining a neutral expression face based on the proposed expression removal model; shape parameters for a particular expression may be learned from the annotated training dataset by a basic shape model (as shown in equation (7)); once the values of the shape parameters are associated with certain semantic attributes (such as fear and surprise), they can be used to synthesize a facial expression with the desired semantic type; further, the facial expression interpolation may be performed by linearly adjusting the value of the shape parameter.

Drawings

Fig. 1 is a system block diagram of a facial expression synthesis method based on generation of a confrontation network according to the present invention.

Fig. 2 is a flow chart of a facial expression synthesis method based on generation of a confrontation network according to the present invention.

Fig. 3 is an example of facial expression synthesis based on geometric guidance of a facial expression synthesis method of generating a confrontational network according to the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application can be combined with each other without conflict, and the present invention is further described in detail with reference to the drawings and specific embodiments.

Fig. 1 is a system block diagram of a facial expression synthesis method based on generation of a confrontation network according to the present invention. Mainly comprises geometric-guided facial expression synthesis and facial geometric operation.

For contrast loss and pixel loss, since the proposed face editing model generates a result conditioned on the input face image and heat map, a generation countermeasure network (GAN) is applied to the condition setting; the loss of opposition of the generator and the discriminator are respectively shown as follows:

L_pixel＝E_{I,H,I′～P(I,H,I′)}‖I′-G(I,H)‖₁(3)

Loss of cycle consistency, generator G_EAnd G_NConstructing a complete mapping cycle between the neutral expression face and the expression face; if a face image is converted from a neutral expression to an angry expression and then converted to a neutral expression, the same face image should be obtained in an ideal case; therefore, an extra cycle consistency penalty L is introduced_cycTo ensure the consistency of the source image with the reconstructed image, e.g. I^NAnd G_N(G_E(I^N,H^E),H^E)、I^EAnd G_E(G_N(I^E,H^E),H^E)；L_cycThe calculation is as follows:

L_cyc＝E_{I，H～P(I,H)}‖I-G′(G(I,H))‖₁(4)

Identity retention loss, facial expression editing retains facial features after expression synthesis and removal; thus, identity retention terms are employed to enforce identity consistency:

L_identity＝E_I,H～P(I,H)‖F(I)-F(G(I,H))‖₁(5)

Total loss function, final complete target G of generator_NAnd G_EIs a weighted sum L of all losses defined above_G-advRemoving the pattern difference, L, between the actual and generated samples_pixelEnsuring pixel correctness, L_cycEnsures the period consistency of the reconstructed image and the source image, L_identityThe identity is preserved by the mapping process, so the total loss function is:

L_G＝L_G-adv+α₁L_pixel+α₂L_cyc+α₃L_identity(6)

wherein, α₁、α₂、α₃Is the loss weight coefficient.

Facial expression editing, wherein the geometric positions of a group of datum points are adopted to guide the facial expression editing; the human face has unique physiological structure characteristics, so that the correlation between the positions of the datum points is strong; therefore, changes in the face geometry should be limited to avoid unreasonable settings; in consideration of the prior knowledge of face distribution, a parameterized shape model is established as a geometric generator, and a basic shape model is learned from a marked training image;

s(p)＝s₀+S_p(7)

wherein, s₀∈R^2K×1,S∈R^2K×N,p∈R^N×1(ii) a Basic shape s₀Is the average shape of all training images, the column of S is the N eigenvectors corresponding to the N largest eigenvalues; different face geometries can be obtained by varying the value of the shape parameter p;

however, facial geometry is not only related to facial expressions, but also has a large relationship to facial identity; facial geometry varies from person to person even under the same expression; for example, the distance between the eyes and the length of the nose depend largely on the identity of the face and not the expression; based on these individual differences, an individual-specific shape model based on equation (7) is proposed, which can be constructed by using the neutral shapes of different individualsInstead of the mean shape s₀Exporting; the individual specific shape model is given by:

wherein,representing identity-related changes, and p represents changes caused by facial expressions

Facial expression transition, given two expressing faces I^AAnd I^BDetecting facial markers s^AAnd s^B(ii) a The expressive removal model is first used to restore the non-expressed face:

the expression synthesis model is represented by the above formula.

Synthesizing and interpolating facial expressions, namely firstly preparing a neutral expression facial image and shape parameters for a target expression; obtaining a neutral expression face based on the proposed expression removal model; shape parameters for a particular expression may be learned from the annotated training dataset by a basic shape model (as shown in equation (7)); once the values of the shape parameters are associated with certain semantic attributes (such as fear and surprise), they can be used to synthesize a facial expression with the desired semantic type; further, the facial expression interpolation may be performed by linearly adjusting the value of the shape parameter.

Fig. 2 is a flow chart of a facial expression synthesis method based on generation of a confrontation network according to the present invention. The method comprises the steps of firstly giving a heat map of target facial expressions and correspondingly synthesizing a new face image of a front face without the expressions, then conducting weighted summation on all loss functions to obtain a total loss function, guiding facial expression editing by adopting the geometric positions of a group of datum points, obtaining a facial expression transfer result by using an expression synthesis model, and finally conducting facial expression interpolation by linearly adjusting the values of shape parameters.

Fig. 3 is an example of facial expression synthesis based on geometric guidance of a facial expression synthesis method of generating a confrontational network according to the present invention. As with the Active Appearance Model (AAM), face geometry is defined by a set of fiducial points; the heat map is used to encode the location of the facial fiducial and provides a per-pixel likelihood of fiducial location; given a heat map of the target facial expression and a frontal face without expression (hereinafter referred to as an expressionless face), a new face image (an expressive face) is synthesized accordingly;

given a pair of generators G_E:(I^N,H^E)→I^EAnd G_N:(I^E,H^E)→I^NWherein, I^NIs a non-expressive face, I^EIs a facial expression, H^EIs corresponding to I^EA heat map of (a); and the twoTwo discriminators associated with the generator are D_EAnd D_NDividing the actual triples (I, H, I') by the corresponding regions and generating the triples (I, H, g (I)); i and I' are the facial images of the blankness and expression, and vice versa;

H^Eas a supplementary note in control and expression removal in expression synthesis in both of these face editing modes; in the expression synthesis process, H^EFor specifying target expressions, so that G_ECan express neutral expression I^NConverting into a desired expression; in the expression removal process, H^EIs responsible for indicating I^EIn order to I^NRecovery of (1);

As shown, the images in the first column are the input face, and the remaining images are the input heat map and the composite result.

It will be appreciated by persons skilled in the art that the invention is not limited to details of the foregoing embodiments and that the invention can be embodied in other specific forms without departing from the spirit or scope of the invention. In addition, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention, and such modifications and alterations should also be viewed as being within the scope of this invention. It is therefore intended that the following appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

Claims

1. A facial expression synthesis method based on generation of an antagonistic network is characterized by mainly comprising (I) facial expression synthesis of geometric guidance; face geometry operation (two).

2. Geometrically guided facial expression synthesis (one) based on claim 1, characterized in that, like the Active Appearance Model (AAM), the face geometry is defined by a set of reference points; the heat map is used to encode the location of the facial fiducial and provides a per-pixel likelihood of fiducial location; given a heat map of the target facial expression and a frontal face without expression (hereinafter referred to as an expressionless face), a new face image (an expressive face) is synthesized accordingly;

3. Antagonism and pixel loss based on claim 2, characterized in that the generative confrontation network (GAN) is applied to condition settings as a result of the proposed face editing model generating conditions on input face images and heat maps; the loss of opposition of the generator and the discriminator are respectively shown as follows:

L_pixel＝E_{I,H,I′～P(I,H,I′)}‖I′-G(I,H)‖₁(3)

4. Loss of cyclic consistency according to claim 2, characterized in that the generator G is based on_EAnd G_NConstructing a complete mapping cycle between the neutral expression face and the expression face; if a face image is converted from a neutral expression to an angry expression and then converted to a neutral expression, the same face image should be obtained in an ideal case; therefore, an extra cycle consistency penalty L is introduced_cycTo ensure the consistency of the source image with the reconstructed image, e.g. I^NAnd G_N(G_E(I^N,H^E),H^E)、I^EAnd G_E(G_N(I^E,H^E),H^E)；L_cycThe calculation is as follows:

L_cyc＝E_I,H～P(I,H)‖I-G′(G(I,H))‖₁(4)

5. The identity retention loss of claim 2, wherein facial expression editing retains facial features after expression synthesis and removal; thus, identity retention terms are employed to enforce identity consistency:

L_identity＝E_I,H～P(I,H)‖F(I)-F(G(I,H))‖₁(5)

6. The overall loss function of claim 2, wherein the final complete target G of the generator_NAnd G_EIs a weighted sum L of all losses defined above_G-advRemoving the pattern difference, L, between the actual and generated samples_pixelEnsuring pixel correctness, L_cycEnsures the period consistency of the reconstructed image and the source image, L_identityThe identity is preserved by the mapping process, so the total loss function is:

L_G＝L_G-adv+α₁L_pixel+α₂L_cyc+α₃L_identity(6)

wherein, α₁、α₂、α₃Is the loss weight coefficient.

7. A facial geometry operation (ii) as claimed in claim 1 wherein the facial geometry operation comprises facial expression editing, facial expression transfer, facial expression synthesis and interpolation.

8. The facial expression editor of claim 7, wherein the geometric locations of a set of fiducial points are used to guide the facial expression editor; the human face has unique physiological structure characteristics, so that the correlation between the positions of the datum points is strong; therefore, changes in the face geometry should be limited to avoid unreasonable settings; in consideration of the prior knowledge of face distribution, a parameterized shape model is established as a geometric generator, and a basic shape model is learned from a marked training image;

s(p)＝s₀+S_p(7)

9. Facial expression transfer according to claim 7, characterized in that two expressive faces I are given^AAnd I^BDetecting facial markers s^AAnd s^B(ii) a The expressive removal model is first used to restore the non-expressed face:

the expression synthesis model is represented by the above formula.

10. The facial expression synthesis and interpolation of claim 7, wherein a neutral expression face image and shape parameters are prepared for the target expression; obtaining a neutral expression face based on the proposed expression removal model; shape parameters for a particular expression may be learned from the annotated training dataset by a basic shape model (as shown in equation (7)); once the values of the shape parameters are associated with certain semantic attributes (such as fear and surprise), they can be used to synthesize a facial expression with the desired semantic type; further, the facial expression interpolation may be performed by linearly adjusting the value of the shape parameter.