CN112634392A - Facial expression synthesis method based on geometric prior confrontation generation network - Google Patents

Facial expression synthesis method based on geometric prior confrontation generation network Download PDF

Info

Publication number
CN112634392A
CN112634392A CN202011623820.8A CN202011623820A CN112634392A CN 112634392 A CN112634392 A CN 112634392A CN 202011623820 A CN202011623820 A CN 202011623820A CN 112634392 A CN112634392 A CN 112634392A
Authority
CN
China
Prior art keywords
expression
image
face
real
removal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011623820.8A
Other languages
Chinese (zh)
Inventor
侯峦轩
马鑫
赫然
孙哲南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Original Assignee
Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd filed Critical Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Priority to CN202011623820.8A priority Critical patent/CN112634392A/en
Publication of CN112634392A publication Critical patent/CN112634392A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a facial expression synthesis method based on a geometric prior confrontation generation network, which comprises the following steps: preprocessing the image data of the expression database, extracting expression key points of the face image, and making a face key point thermodynamic diagram; the method comprises the steps of taking a face image and a face key point thermodynamic diagram as input of a network model, training two pairs of generators and discriminators in an antagonism generation network to simultaneously complete expression generation and removal tasks, training to obtain a geometric prior antagonism generation network model capable of simultaneously performing expression generation and expression removal, using the trained geometric prior antagonism generation network model to generate the network model, performing expression generation and removal processing on test data, and performing expression invariant face recognition on an image subjected to removal processing operation. The method simultaneously optimizes two tasks of facial expression synthesis and facial expression removal, is favorable for improving the convergence speed of network training, and effectively improves the generalization capability of the model.

Description

Facial expression synthesis method based on geometric prior confrontation generation network
Technical Field
The invention relates to the technical field of graphic processing, in particular to a facial expression synthesis method based on a geometric prior confrontation generation network.
Background
Facial expression synthesis is a typical graphics processing problem, aiming at synthesizing a specific expression of a specific character, and has attracted wide attention in the fields of computer graphics, computer vision, pattern recognition, machine learning, and the like. Synthesizing realistic facial expression images like photographs is of great value to both the academic world and the industrial level, and expression synthesis has many applications in facial animation, face editing, augmentation of facial databases, and face recognition.
However, the synthesis of "real" facial expression images is still difficult due to the extremely complex geometric forms of the human face, countless facial wrinkles and subtle color and texture changes.
In recent years, deep learning has achieved remarkable effects in many fields of machine vision, and in particular, countermeasures against generation networks have caused great adverse effects in image generation. The antagonism generation network is initiated from the thought of two-person zero-sum game in the game theory, has two networks of a generation network and a discriminant network, and utilizes mutual competition between the two networks to continuously improve the network performance and finally reach balance. Many variant networks have been derived based on the idea of generating networks against, and these networks have made significant advances in image synthesis, image hyper-segmentation, image style conversion, and face synthesis. Researches related to face synthesis, including side face turning, face complementing, shielding removing, multi-view face generation, face expression synthesis and the like, are widely concerned by researchers. Face synthesis methods based on countermeasure generation are generally divided into two iterative processes: firstly, carrying out nonlinear processing (generally convolution operation) on input by using a generative network model to obtain a generated image; and then the discriminant network is used for judging whether the generated image is true or false and performing parameter back propagation to improve the network performance.
With the continuous development of science and technology, the demands of people in different fields are correspondingly improved, including movie advertisement animation production, online games, remote video calls, user interactive agents, facial operation planning and the like, and the generation of vivid human face images has important significance on good experience of users and facial information understanding.
Therefore, in such a background, it is necessary to develop a high-performance facial expression synthesis method based on a geometric prior confrontation generation network.
Disclosure of Invention
The invention aims to improve the visual reality of face synthesis and maintain the original identity characteristics, and provides a face expression synthesis method (G2-GAN for short) based on a geometric prior confrontation generation Network.
The technical scheme adopted for realizing the purpose of the invention is as follows:
a facial expression synthesis method based on a geometric prior confrontation generation network comprises the following steps:
s1, preprocessing image data of an expression database, extracting expression key points of a face image, and making a face key point thermodynamic diagram;
s2, taking the face image and the face key point thermodynamic diagram as input of a network model, training two pairs of generators and discriminators in an confrontation generation network to simultaneously complete the tasks of expression generation and removal, and training to obtain a geometric prior confrontation generation network model capable of simultaneously performing expression generation and expression removal; wherein the content of the first and second substances,
for the expression generation task, processing input through a generator to obtain a generated expressive face, and calculating the resistance loss with a real expressive face in a discriminator; for the expression removal task, processing input through a generator to obtain a generated expressionless face, and calculating the resistance loss of the expressionless face and the real expressionless face in a discriminator; after iteration is repeated for a plurality of times to reach stability, training of the model is completed;
and S3, generating a network model by using the trained geometric prior countermeasure, performing expression generation and removal processing on the test data, and performing expression-invariant face recognition on the image subjected to the removal processing operation.
In the step S2, in the expression generation task, combining the facial key point thermodynamic diagram obtained by processing the expression key point information with the non-expression image as the input of the network model, and using the expression image as the real image label; in the expression removal task, the expression key point information is processed to obtain a human face key point thermodynamic diagram, the human face key point thermodynamic diagram and the expression image are combined to be used as the input of a network model, and the non-expression image is used as a real image label.
Wherein, step S2 includes the following steps:
s21: initializing network weight parameters for both expression generation and expression removal tasks, wherein the loss function of the generator is
Figure BDA0002874414410000031
And
Figure BDA0002874414410000032
the loss function of the discriminator is
Figure BDA0002874414410000033
And
Figure BDA0002874414410000034
s22: for the expression generation task, combining the information of the non-expression face and the expression key point to be input into a generator network G1The generated expressive image, the non-expressive face and the expression key point information form a false sample, the real expressive face, the non-expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D1In turn, iteratively training the generatorsLoss function
Figure BDA0002874414410000035
Loss function of sum discriminator
Figure BDA0002874414410000036
All reduce to tend to be stable;
s23: for the expression removal task, the information of the real expression face and the expression key points is combined and input into a generator network G2The removal task of the expressive face is carried out, the generated non-expressive image, the real expressive face and the expression key point information form a false sample, the real non-expressive face, the real expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D2In the method, the loss function of the generator is trained in turn by iteration
Figure BDA0002874414410000037
Loss function of sum discriminator
Figure BDA0002874414410000038
All reduce to tend to be stable;
s24: and simultaneously training the tasks of expression generation and expression removal until all loss functions are not reduced any more, so as to obtain a final confrontation generation network model, namely a model for human face expression editing.
In the expression generation task, the objective functions of the generator and the discriminator are expressed as follows:
Figure BDA0002874414410000039
Figure BDA00028744144100000310
wherein, I is a real non-expression image in the expression synthesis discriminator, H is an expression face thermodynamic diagram, G1For synthesizing expressionsGenerator, D1Is an expression synthesis discriminator, I' is a real expression image in the expression synthesis discriminator,
Figure BDA0002874414410000041
to generate a loss of contrast between the image and the real image, LpixelIs an inter-picture space L1Norm measure, LcycIs cyclic consistency L1Norm measure, LidentityIs a characteristic interlayer L1Norm measure, α123As a lost weight coefficient, EI,H,I'~P(I,H,I')The probability distribution expectation of the real expressive image in the expression synthesis discriminator, the expressive face thermodynamic diagram and the real expressive image in the expression synthesis discriminator is obtained.
The objective functions of the generator and the discriminator in the expression removal task are expressed as follows:
Figure BDA0002874414410000042
Figure BDA0002874414410000043
wherein, I is a real expression image in the expression removal discriminator, H is an expression face thermodynamic diagram, G2Remove generators for expressions, D2Is an expression removal discriminator, I' is a real non-expression image in the expression removal discriminator,
Figure BDA0002874414410000044
to generate a loss of contrast between the image and the real image, LpixelRemoving inter-image L in discriminator for expression1Norm measure, LcycRemoving the cycle consistency L in the discriminator for expressions1Norm measure, LidentityRemoving inter-feature layer L in discriminator for expression1Norm measure, α123Generating objective functions of generators and discriminators in task for expression for lost weight coefficientsThe weight coefficients of the same penalty in the number.
The real and false image pairs input into the discriminator in the expression generation task are as follows:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
Wherein, the real and false image pairs input into the discriminator in the expression removing task are as follows:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real expression image, H is an expression face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expression image.
Wherein, the generator confrontation loss function in the expression synthesis task is as follows:
LG1-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and D is an expression synthesis discriminator.
Wherein, the pixel loss function between image pairs in the expression synthesis task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
Wherein, the image cycle consistency loss function in the expression synthesis task is as follows:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and G' is an expression removal generator.
Wherein, the image pair characteristic layer loss function in the expression synthesis task is as follows:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and F is a face image feature extractor.
Wherein, the generator confrontation loss function in the expression removal task is as follows:
LG2-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and D is an expression removal discriminator.
Wherein, the pixel loss function between image pairs in the expression removal task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expressive face.
Wherein, the image cycle consistency loss function in the expression removal task is as follows:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and G' is an expression synthesis generator.
Wherein the image pair feature layer loss function in the expression removal task is as follows:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and F is a face image feature extractor.
The invention uses two sub-confrontation generation networks aiming at different tasks, and can simultaneously complete two tasks of generating the facial expression and removing the facial expression. The invention uses the loss function of image level, characteristic level and cycle consistency as the restraint, keeps the consistency of the identity characteristic, and can improve the accuracy and robustness of the network.
The invention uses the confrontation generation network as a basic model frame, designs a multi-task optimization model according to the prior knowledge of the geometric characteristics of the human face, simultaneously optimizes two tasks of human face expression synthesis and human face expression removal, simultaneously optimizes the two tasks, is favorable for improving the convergence speed of network training, and effectively improves the generalization capability of the model.
The invention provides a method with wide application significance aiming at the specific problems in face synthesis, face expression synthesis and face expression removal. The geometric prior multitask confrontation generation network can simultaneously complete the tasks of generating the facial expression and removing the facial expression, and the recovered expressionless facial image is used for identifying the expression-invariant face.
The geometric prior confrontation generation network model provided by the invention uses a multi-task multi-target optimization mode, so that the model convergence is faster, the effect is better, and the generalization performance is stronger. In addition, the facial expression synthesis scheme provided by the invention can well keep the identity characteristics.
Drawings
Fig. 1 is a flow chart of a method for synthesizing facial expressions based on a geometric prior confrontation generation network in the invention.
Fig. 2 is an example of facial expression generation and facial expression removal results and real image contrast on the expression database CK +, in which the upper two lines generate real images and the lower two lines generate images.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention generates a group of highly nonlinear transformation by the geometric prior confrontation generation network, is used for carrying out the tasks of generating the facial expression and removing the facial expression, and can well keep the original identity characteristics.
Fig. 1 is a flowchart of a method for synthesizing facial expressions based on a geometric prior confrontation generation network according to the present invention, as shown in fig. 1, the method includes the following steps:
step S1, extracting 68 feature points of the face of the image in the data, standardizing the image according to the positions of two eyes and cutting the image to a uniform size of 144 x 144, extracting 68 feature point information of the face of the cut image again, and making a face key point thermodynamic diagram; finally, combining the key point thermodynamic diagram and the face image to be used as input data;
and step S2, training the geometric prior confrontation generation network model by using the training input data so as to complete the tasks of facial expression generation and facial expression removal.
In order to expand the input data sample size and improve the generalization capability of the network, the network input was randomly cropped 128 × 128 size images from 144 × 144 images in the training process, and cropped 128 × 128 size images from 144 × 144 images in the center in the testing process.
In the generator of the countermeasure generation network, the invention firstly carries out down-sampling operation by utilizing the convolutional neural network structure and then carries out up-sampling operation to ensure that the size of the generated image is consistent with the input.
In one embodiment, the convolutional neural network structure is composed of 13 convolutional layers, wherein each convolutional layer is a filter with the size of 4 × 4, the step length is 2, the filling is 1, the number of the filters is increased and then decreased, and forward connection is arranged between layers with the same number of filters and output size in the up-down sampling process. The number of layers of the convolutional layers and the number and size of the filters in each convolutional layer can be selected and set according to actual conditions.
In the discriminator, a convolution neural network structure is adopted to take a real image pair and a false image pair as input, and the output adopts a block countermeasure loss function to judge the truth and the false.
In one embodiment, a signature graph of 14 × 14 is used to determine authenticity.
In the step, the information of the non-expression face and the expression key point is used as network input, the real image is the non-expression face, and two pairs of generators and discriminators in the training countermeasure generation network simultaneously complete the tasks of expression generation and removal.
For the expression generation task, processing input through a generator to obtain a generated expressive face, and calculating the resistance loss with a real expressive face in a discriminator;
for the expression removal task, the real expressive image is processed through the generator to obtain the generated non-expressive face, the anti-loss calculation is carried out on the non-expressive face and the real non-expressive face in the discriminator, and the model training is completed after the model is iterated for multiple times and stabilized.
According to the invention, a countermeasure generation network which takes the non-expression image and the expression image as input and takes the image generated and removed by the facial expression as output is constructed by utilizing the high nonlinear fitting capability of the countermeasure generation network and aiming at the two tasks of the facial expression generation and the facial expression removal.
In particular, the network can well maintain the identity information of the human face through the limitation of an additional loss function. Thus, through the network shown in fig. 1, a confrontation generation network which simultaneously generates and removes expressions can be trained by using two related tasks, so that expression editing and recovery of the expressionless face image can be performed.
In the testing stage, firstly, a non-expressive face and an expressive thermodynamic diagram are used as input to obtain a visual effect diagram, as shown in fig. 2; then, the expressive face and the blankness thermodynamic diagram are used as input to generate a blankness face, and an expression-invariant face recognition experiment is performed, wherein the result is shown in table 1.
Specifically, the two tasks of the geometry prior confrontation generation network are human face expression generation and human face expression removal respectively. Specifically, the objective functions of the generator and the discriminator in the expression generation task are expressed as follows:
Figure BDA0002874414410000091
Figure BDA0002874414410000092
wherein, I is a real non-expression image, and H is a tableEmotional thermodynamic diagram G1For expression composition generators, D1Is an expression synthesis discriminator, I' is a real expression image,
Figure BDA0002874414410000093
to generate a loss of contrast between the image and the real image, LpixelIs an inter-picture space L1Norm measure, LcycIs cyclic consistency L1Norm measure, LidentityIs a characteristic interlayer L1Norm measure, α123Is the lost weight factor.
The objective functions of the generator and the discriminator in the expression removal task are expressed as follows:
Figure BDA0002874414410000094
Figure BDA0002874414410000095
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G2Remove generators for expressions, D2Is an expression removal discriminator, I' is a real non-expression image,
Figure BDA0002874414410000096
to generate a loss of contrast between the image and the real image, LpixelIs an inter-picture space L1Norm measure, LcycIs cyclic consistency L1Norm measure, LidentityIs a characteristic interlayer L1Norm measure, α123Is the lost weight factor.
The confrontation generation network model mainly completes two tasks of facial expression generation and facial expression removal, and the final aim of the confrontation generation network is to generate a network
Figure BDA0002874414410000097
Several ofThe loss function is minimized and remains stable.
The real and false image pairs input into the discriminator in the expression generation task are as follows:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
Wherein, the real and false image pairs input into the discriminator in the expression removing task are as follows:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real expression image, H is an expression face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expression image.
Wherein, the generator confrontation loss function in the expression synthesis task is as follows:
LG1-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and D is an expression synthesis discriminator.
Wherein, the pixel loss function between image pairs in the expression synthesis task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
Wherein, the image cycle consistency loss function in the expression synthesis task is as follows:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and G' is an expression removal generator.
Wherein, the image pair characteristic layer loss function in the expression synthesis task is as follows:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and F is a face image feature extractor.
Wherein, the generator confrontation loss function in the expression removal task is as follows:
LG2-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and D is an expression removal discriminator.
Wherein, the pixel loss function between image pairs in the expression removal task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expressive face.
Wherein, the image cycle consistency loss function in the expression removal task is as follows:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and G' is an expression synthesis generator.
Wherein the image pair feature layer loss function in the expression removal task is as follows:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and F is a face image feature extractor.
Wherein the formation of the countermeasure generation network model is achieved by training as follows:
step S21: the initialization makes the network weight parameters of the expression generation and removal tasks equal, wherein alpha12310, 5, 0.1, batch size 5, learning rate 10-4
Step S22: for the expression generation task, combining the information of the non-expression face and the expression key point to be input into a generator network G1The generated expressive image, the non-expressive face and the expression key point information form a false sample, the real expressive face, the non-expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D1In the method, the loss function of the generator is trained in turn by iteration
Figure BDA0002874414410000111
Loss function of sum discriminator
Figure BDA0002874414410000112
All reduce to tend to be stable;
step S23: for the expression removal task, the information of the real expression face and the expression key points is combined and input into a generator network G2The removal task of the expressive face is carried out, the generated non-expressive image, the generated expressive face and the expression key point information form a false sample, the real non-expressive face, the generated expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D2In the method, the loss function of the generator is trained in turn by iteration
Figure BDA0002874414410000121
Loss function of sum discriminator
Figure BDA0002874414410000122
All reduce to tend to be stable;
step S24: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, so as to obtain a final confrontation generation network model.
Step S3: and generating a network model by using the trained confrontation, performing expression editing and removing processing on the test data, and performing expression-invariant face recognition on the image subjected to the expression removing operation.
In order to explain the specific implementation mode of the invention in detail and verify the effectiveness of the invention, the method provided by the invention is applied to an open face database, namely a CK + facial expression database. The database contains 123 individuals, 593 video sequences, and 6 expressions, wherein the expression intensity is gradually increased from the first frame to the last frame.
Specifically, in order to train the geometric prior countermeasure generation network model for expression editing, the first frame and the second half image of the video sequence are selected from the video sequence as a required data set, image data of 100 objects is used as a training set according to identities, and image data of 23 objects is used as a test set. The method comprises the steps of extracting key points by using a face key point extraction network, processing the key points to obtain a thermodynamic diagram form, using a network structure and an objective function of the method, combining a non-expressive image and an expressive key point thermodynamic diagram as input, and training the neural network by using confrontation and gradient back transmission between a generator and a discriminator. And continuously adjusting the weights of different tasks in the training process until the network converges finally to obtain the model for editing the facial expressions.
To test the effectiveness of the model, expression generation and removal operations were performed using the test set data, and the results of the visualization are shown in FIG. 2. Fig. 2 is an example of facial expression generation, facial expression removal results and real image comparison on the expression database CK + in the present invention, in which the upper two behaviors are real images and the lower two behaviors are generated images.
Meanwhile, in order to verify that the proposed model has the capability of keeping identity information, a test set is adjusted, an image with removed expression is obtained by using the combination of an expressive image and an expressive thermodynamic diagram as input, and then an expressionless image is used for carrying out an expression invariant face recognition experiment.
In the experiment, the influence of different loss functions on the model performance is verified and compared with the recognition result of the original image, and the experimental result is shown in table 1. The embodiment effectively proves the effectiveness of the method for editing the expression and maintaining the identity information.
Figure BDA0002874414410000131
TABLE 1
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (15)

1. The facial expression synthesis method based on the geometric prior confrontation generation network is characterized by comprising the following steps:
s1, preprocessing image data of an expression database, extracting expression key points of a face image, and making a face key point thermodynamic diagram;
s2, taking the face image and the face key point thermodynamic diagram as input of a network model, training two pairs of generators and discriminators in an confrontation generation network to simultaneously complete the tasks of expression generation and removal, and training to obtain a geometric prior confrontation generation network model capable of simultaneously performing expression generation and expression removal; wherein the content of the first and second substances,
for the expression generation task, processing input through a generator to obtain a generated expressive face, and calculating the resistance loss with a real expressive face in a discriminator; for the expression removal task, processing input through a generator to obtain a generated expressionless face, and calculating the resistance loss of the expressionless face and the real expressionless face in a discriminator; after iteration is repeated for a plurality of times to reach stability, training of the model is completed;
and S3, generating a network model by using the trained geometric prior countermeasure, performing expression generation and removal processing on the test data, and performing expression-invariant face recognition on the image subjected to the removal processing operation.
2. The facial expression synthesis method based on geometric prior confrontation generation network of claim 1, wherein in the expression generation task, the facial key point thermodynamic diagram obtained by processing the expression key point information is combined with the non-expression image as the input of the network model, and the expression image is used as the real image label; in the expression removal task, the expression key point information is processed to obtain a human face key point thermodynamic diagram, the human face key point thermodynamic diagram and the expression image are combined to be used as the input of a network model, and the non-expression image is used as a real image label.
3. The method for synthesizing facial expressions based on geometric prior confrontation generation network as claimed in claim 1, wherein step S2 includes:
s21: initializing network weight parameters for both expression generation and expression removal tasks, wherein the loss function of the generator is
Figure FDA0002874414400000011
And
Figure FDA0002874414400000012
the loss function of the discriminator is
Figure FDA0002874414400000013
And
Figure FDA0002874414400000014
s22: for the expression generation task, combining the information of the non-expression face and the expression key point to be input into a generator network G1The generated expressive image, the non-expressive face and the expression key point information form a false sample, the real expressive face, the non-expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D1In the method, the loss function of the generator is trained in turn by iteration
Figure FDA0002874414400000021
Loss function of sum discriminator
Figure FDA0002874414400000022
All reduce to tend to be stable;
s23: for the expression removal task, the information of the real expression face and the expression key points is combined and input into a generator network G2The removal task of the expressive face is carried out, the generated non-expressive image, the real expressive face and the expression key point information form a false sample, the real non-expressive face, the real expressive face and the expression key point information form a true sample, and the false sample and the true sample are input into a discriminator network D2In the method, the loss function of the generator is trained in turn by iteration
Figure FDA0002874414400000023
Loss function of sum discriminator
Figure FDA0002874414400000024
All reduce to tend to be stable;
s24: and simultaneously training the expression generation and expression removal tasks until all loss functions are not reduced any more, so as to obtain a final confrontation generation network model.
4. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 3, wherein the objective functions of the generator and the discriminator in the expression generation task are expressed as follows:
Figure FDA0002874414400000025
Figure FDA0002874414400000026
wherein, I is a real non-expression image in the expression synthesis discriminator, H is an expression face thermodynamic diagram, G1For expression composition generators, D1Is an expression synthesis discriminator, and I' is a real expression image in the expression synthesis discriminator,
Figure FDA0002874414400000027
To generate a loss of contrast between the image and the real image, LpixelIs an inter-picture space L1Norm measure, LcycIs cyclic consistency L1Norm measure, LidentityIs a characteristic interlayer L1Norm measure, α123As a lost weight coefficient, EI,H,I'~P(I,H,I')The probability distribution expectation of the real expressive image in the expression synthesis discriminator, the expressive face thermodynamic diagram and the real expressive image in the expression synthesis discriminator is obtained.
5. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 3, wherein the objective functions of the generator and the discriminator in the expression removal task are expressed as follows:
Figure FDA0002874414400000031
Figure FDA0002874414400000032
wherein, I is a real expression image in the expression removal discriminator, H is an expression face thermodynamic diagram, G2Remove generators for expressions, D2Is an expression removal discriminator, I' is a real non-expression image in the expression removal discriminator,
Figure FDA0002874414400000033
to generate a loss of contrast between the image and the real image, LpixelRemoving inter-image L in discriminator for expression1Norm measure, LcycRemoving the cycle consistency L in the discriminator for expressions1Norm measure, LidentityRemoving inter-feature layer L in discriminator for expression1Norm measure, α123Is the lost weight factor.
6. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 4, wherein the real and false image pairs input into the discriminator in the expression generation task are:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
7. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 5, wherein the real and false image pairs input into the discriminator in the expression removal task are:
the false image pair is [ I, H, G (I, H) ], the true image pair is [ I, H, I' ]
Wherein, I is a real expression image, H is an expression face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expression image.
8. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 4, wherein the generator confrontation loss function in the expression generation task is:
LG1-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and D is an expression synthesis discriminator.
9. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 4, wherein the pixel loss function between image pairs in the expression generation task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and I' is a real expression image.
10. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 4, wherein the image cycle consistency loss function in the expression generation task is:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and G' is an expression removal generator.
11. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 4, wherein the image pair feature layer loss function in the expression generation task is:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real non-expression image, H is an expression face thermodynamic diagram, G is an expression synthesis generator, and F is a face image feature extractor.
12. The facial expression synthesis method based on geometric prior confrontation generation network as claimed in claim 5, wherein the generator confrontation loss function in the expression removal task is:
LG2-adv=-EI,H~P(I,H)logD(I,H,G(I,H))
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and D is an expression removal discriminator.
13. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 5, wherein the pixel loss function between image pairs in the expression removal task is as follows:
Lpixel=EI,H,I'~P(I,H,I')||I'-G(I,H)||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and I' is a real non-expressive face.
14. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 5, wherein the image cycle consistency loss function in the expression removal task is:
Lcyc=EI,H~P(I,H)||I-G'(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and G' is an expression synthesis generator.
15. The facial expression synthesis method based on geometric prior confrontation generation network according to claim 5, wherein the image pair feature layer loss function in the expression removal task is:
Lidentity=EI,H~P(I,H)||F(I)-F(G(I,H))||1
wherein, I is a real expressive image, H is an expressive face thermodynamic diagram, G is an expression removal generator, and F is a face image feature extractor.
CN202011623820.8A 2020-12-30 2020-12-30 Facial expression synthesis method based on geometric prior confrontation generation network Pending CN112634392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011623820.8A CN112634392A (en) 2020-12-30 2020-12-30 Facial expression synthesis method based on geometric prior confrontation generation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011623820.8A CN112634392A (en) 2020-12-30 2020-12-30 Facial expression synthesis method based on geometric prior confrontation generation network

Publications (1)

Publication Number Publication Date
CN112634392A true CN112634392A (en) 2021-04-09

Family

ID=75289729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011623820.8A Pending CN112634392A (en) 2020-12-30 2020-12-30 Facial expression synthesis method based on geometric prior confrontation generation network

Country Status (1)

Country Link
CN (1) CN112634392A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230239A (en) * 2017-12-25 2018-06-29 中国科学院自动化研究所 Facial expression synthesis device
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230239A (en) * 2017-12-25 2018-06-29 中国科学院自动化研究所 Facial expression synthesis device
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIE CAO , YIBO HU, BING YU, RAN HE等: "3D Aided Duet GANs for Multi-View Face Image Synthesis", 《GEOMETRY GUIDED ADVERSARIAL FACIAL EXPRESSION SYNTHESIS》 *
RAN HE,ZHENAN SUN等: "Adversarial Cross-Spectral Face Completion for NIR-VIS Face Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
SONG, LINGXIAO; LU, ZHIHE; HE, RAN等: "Geometry Guided Adversarial Facial Expression Synthesis", 《 26TH ACM MULTIMEDIA CONFERENCE (MM)》 *

Similar Documents

Publication Publication Date Title
CN110852941B (en) Neural network-based two-dimensional virtual fitting method
CN109886881B (en) Face makeup removal method
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN111428667A (en) Human face image correcting method for generating confrontation network based on decoupling expression learning
Han et al. Asymmetric joint GANs for normalizing face illumination from a single image
CN111476241B (en) Character clothing conversion method and system
Liu et al. Psgan++: Robust detail-preserving makeup transfer and removal
CN112686816A (en) Image completion method based on content attention mechanism and mask code prior
CN113538608B (en) Controllable figure image generation method based on generation countermeasure network
Kumar et al. A GAN-based model of deepfake detection in social media
CN113642621A (en) Zero sample image classification method based on generation countermeasure network
CN113724354A (en) Reference image color style-based gray level image coloring method
CN114333002A (en) Micro-expression recognition method based on deep learning of image and three-dimensional reconstruction of human face
CN115049556A (en) StyleGAN-based face image restoration method
CN112686817B (en) Image completion method based on uncertainty estimation
CN111368734A (en) Micro expression recognition method based on normal expression assistance
CN112634392A (en) Facial expression synthesis method based on geometric prior confrontation generation network
Lyu et al. 3D-Aware Adversarial Makeup Generation for Facial Privacy Protection
Jiang et al. Tcgan: Semantic-aware and structure-preserved gans with individual vision transformer for fast arbitrary one-shot image generation
CN113343761A (en) Real-time facial expression migration method based on generation confrontation
Li et al. Scribble-to-Painting Transformation with Multi-Task Generative Adversarial Networks.
Koç et al. An examination of synthetic images produced with DCGAN according to the size of data and epoch
Kapalavai et al. Generating New Human Faces and Improving the Quality of Images Using Generative Adversarial Networks (GAN)
Ma A comparison of art style transfer in Cycle-GAN based on different generators
CN114399593B (en) Face glasses removing and three-dimensional model generating method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210409