CN111783658A - Two-stage expression animation generation method based on double generation countermeasure network - Google Patents
Two-stage expression animation generation method based on double generation countermeasure network Download PDFInfo
- Publication number
- CN111783658A CN111783658A CN202010621885.2A CN202010621885A CN111783658A CN 111783658 A CN111783658 A CN 111783658A CN 202010621885 A CN202010621885 A CN 202010621885A CN 111783658 A CN111783658 A CN 111783658A
- Authority
- CN
- China
- Prior art keywords
- expression
- stage
- image
- target
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 151
- 238000000034 method Methods 0.000 title claims abstract description 66
- 239000013598 vector Substances 0.000 claims abstract description 46
- 238000013508 migration Methods 0.000 claims abstract description 20
- 230000005012 migration Effects 0.000 claims abstract description 20
- 230000008859 change Effects 0.000 claims abstract description 8
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 230000003042 antagnostic effect Effects 0.000 claims abstract description 5
- 230000001502 supplementing effect Effects 0.000 claims abstract 2
- 230000006870 function Effects 0.000 claims description 41
- 230000007935 neutral effect Effects 0.000 claims description 25
- 230000008921 facial expression Effects 0.000 claims description 20
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 230000002194 synthesizing effect Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 6
- 230000001815 facial effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008451 emotion Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000001308 synthesis method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention relates to a two-stage expression animation generation method based on a double generation antagonistic network, which comprises the steps of firstly, extracting expression characteristics in a target expression profile by using an expression migration network faceGAN in a first stage, migrating the expression characteristics to a source face, and generating a first-stage prediction graph; in the second stage, the detail generation network fineGAN is used for supplementing and enriching the details of the eye and mouth regions which have larger contribution to the change of the expression in the first-stage prediction graph, a fine-grained second-stage prediction graph is generated and synthesized into a face video animation, and the expression migration network faceGAN and the detail generation network fineGAN are both realized by adopting a generation confrontation network. The method includes the steps that an anti-network is generated in two stages to generate expression animation, expression conversion is conducted in the first stage, image details are optimized in the second stage, the designated area of an image is extracted through a mask vector to conduct emphasis optimization, and meanwhile, the important part generation effect is better by combining the use of a local discriminator.
Description
Technical Field
The technical scheme of the invention relates to image data processing in computer vision, in particular to a two-stage expression animation generation method based on a double generation antagonistic network.
Background
The facial expression synthesis refers to transferring expressions from a target expression reference face to a source face, identity information of a newly synthesized source face image is kept unchanged, but the expressions of the newly synthesized source face image are kept consistent with the target expression reference face, and the technology is gradually applied to the fields of movie and television production, virtual reality, criminal investigation and the like. The synthesis of the facial expression has important research value in both academic and industrial fields, and how to robustly synthesize natural and vivid facial expressions becomes a challenging hot research topic.
The existing facial expression synthesis methods can be divided into two categories, namely a traditional graphics method and an image generation method based on deep learning. The first type of conventional graphical method generally uses a parametric model to parameterize a source face image, and a design model performs expression conversion to generate a new image, or distorts the face image by using feature correspondence and an optical flow graph to assemble a face patch from existing expression data, but the process of designing the model is detailed and complicated, and a very expensive calculation amount is generated, and the generalization capability is poor.
And a second expression synthesis method based on deep learning. Firstly, extracting facial features by using a deep neural network, mapping an image from a high-dimensional space to a feature vector, changing source expression features by adding an expression label, synthesizing a target facial image by using the deep neural network, and mapping the target facial image back to the high-dimensional space. The appearance of GAN networks then brings about eosin for clear image synthesis, which has attracted great attention once it is proposed. In the field of image synthesis, a large number of research methods such as GAN variants have been introduced to generate images. For example, a Conditional generation countermeasure network (CGAN) may generate an image under specific supervision information, and in the field of facial expression generation, an expression label may be used as Conditional supervision information to generate facial images of different expressions. At present, the GAN network-based correlation method also has some disadvantages, and when an expression animation is generated, unreasonable artifacts, fuzzy generated images, low resolution and the like may occur.
The facial expression generation is image-image conversion, the invention aims to generate facial animation, belongs to image-video conversion, and increases the challenge on the time dimension compared with the task of facial expression generation. Xing et al use a gender-preserving network in the GP-GAN for Synthesizing Faces from Landmarks to enable the network to learn more gender information, but this method still has a deficiency in preserving face identity information, which may result in the generated face having different identity characteristics from the target face. CN108288072A discloses a facial expression synthesis method based on a generation countermeasure network, which does not consider fine-grained generation of a face image, omits the extraction of detail features of a source face image, and has the defects of fuzzy generation result and low resolution. CN110084121A discloses a method for realizing facial expression migration of a cyclic generation type confrontation network based on spectrum normalization, the method adopts an expression unique heat vector to supervise the training process of the network, the discreteness of the unique heat vector limits the learning ability of the network, so that the network can only learn the expression of target emotion, such as happiness, sadness, surprise and the like, but can not learn the emotion degree, and the method is deficient in the aspect of continuous generation of the emotion. CN105069830A discloses an expression animation generation method and device, which can only generate expression animations with six specified templates, but human expressions are very rich and complex, so that the method has poor expansibility and cannot generate any specified expression animation according to user requirements. CN107944358A discloses a face generation method based on a deep convolution countermeasure network model, which cannot ensure invariance of face identity information in the expression generation process, and may have a defect that the generated face is inconsistent with the target face.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: firstly, extracting the characteristics of a target expression by using an expression migration network in a first stage, migrating the characteristics to a source face to generate a first-stage prediction graph, and naming the expression migration network in the first stage as facegan (face genetic adaptive network); in the second stage, a detail generation network is used for enriching some face details in the first-stage prediction graph, generating a fine-grained second-stage prediction graph and synthesizing video animation, wherein the second-stage detail generation network is named as FineGAN (Fine Generation adaptive network); the method of the invention overcomes the problems of fuzzy generated images or low resolution, unreasonable artifacts in the generated result and the like in the prior art.
The technical scheme adopted by the invention for solving the technical problem is as follows: in the first stage, under the drive of a target expression profile, an expression migration network faceGAN is used for capturing expression characteristics in the target expression profile and migrating the expression characteristics to a source face to generate a first-stage prediction graph; in the second stage, the detail generation network FineGAN is used as supplement to enrich the details of eyes and mouth regions which have relatively large contribution to the change of the expression in the first-stage prediction graph, generate a fine-grained second-stage prediction graph and synthesize a facial animation, and the specific steps are as follows:
firstly, acquiring a facial expression profile of each frame of image in a data set:
collecting a facial expression video sequence data set, extracting a face in each frame of image in a video sequence by using a Dlib machine learning library, simultaneously obtaining a plurality of feature points in each face, and then sequentially connecting the feature points by using line segmentsObtaining an expression profile of each frame of the video sequence, and recording as e ═ e (e)1,e2,···,ei,···,en) Wherein e represents the set of all expression contour graphs in a video sequence, namely an expression contour graph sequence; n represents the number of video frames, eiAn expression profile representing the ith frame in a video sequence;
the first stage is to build an expression migration network faceGAN, and the method comprises the following steps:
secondly, extracting the identity characteristics of the source face and the expression characteristics of the target expression profile graph, and preliminarily generating a first-stage prediction graph:
the faceshift network faceGAN comprises a generator G1And a discriminator D1Wherein the generator G1Comprising three sub-networks, respectively two encoders Encid and EncexpA decoder Dec1;
Firstly, inputting a neutral non-expression image I of a source faceNAnd a sequence e of target expression profiles, then using an identity encoder EncidNeutral non-expression image I of extraction source faceNIdentity feature vector fidWhile using the expression encoder EncexpExtracting expression characteristic vector set f of target expression profile graph sequence eexp, wherein fexp=(fexp_1,fexp_2,···,fexp_i,···,fexp_n) The formula is expressed as:
fid=Encid(IN) (1),
fexp_i=Encexp(ei) (2),
the identity feature vector fidAnd the expression feature vector f of the ith frameexp_iThe serial connection is carried out to obtain a characteristic vector f, and f is fid+fexp_iThe feature vector f is fed to the decoder Dec1Decoding to generate a first-stage prediction graph Ipre-targetAnd I ispre-target=Dec1(f) Finally, will Ipre-targetInput to a discriminator D1Judging whether the image is true or false;
and thirdly, taking the prediction image of the first stage as input, and adopting the concept of cycleGAN to reconstruct a neutral image of the source face:
predicting the first stage of the picture Ipre-targetAnd the neutral non-expression image I in the second stepNCorresponding expression profile eNAs the input of the faceGAN again, using the identity encoder EncidExtracting an image Ipre-targetUsing the expression encoder EncexpExtracting an expression profile eNThe expression feature vector is repeatedly processed by the second step, and is decoded by a decoder to generate INIs reconstructed image IreconGenerating a reconstructed image IreconIs expressed as:
Irecon=Dec1(Encid(Ipre-target)+Encexp(eN)) (3);
fourthly, calculating a loss function in the faceGAN of the first-stage expression migration network:
the generator G in the FaceGAN of the first-stage expression migration network1The specific formula of the loss function is as follows:
wherein ,
wherein ,IrealFor the target true value, equation (5) is the penalty of the generator, D1(. represents) the discriminator D1The method comprises the steps that an object is true probability, an SSIM (structure description language) function in a formula (6) is used for measuring similarity between two images, a formula (7) is pixel loss, an MAE (mean square error) function is a mean square error function and is used for measuring a difference between a true value and a predicted value, a formula (8) is sensing loss, sensing characteristics of the images are extracted by using VGG-19, characteristics output by a last convolution layer in a VGG-19 network are used as sensing characteristics of the images, the sensing loss between the images and the true images is calculated and generated by the method, a formula (9) is reconstruction loss, and a neutral expressionless image I of a source face is calculatedNAnd its reconstructed image IreconThe distance between them;
discriminator D in FaceGAN of the first-stage expression migration network1The specific formula of the loss function is as follows:
wherein ,
equation (11) is the penalty loss, and equation (12) is the penalty loss of the reconstructed image, where λ1 and λ2For loss of similarityAnd loss of perceptionGenerator G in faceGAN1Weight parameter of (1), λ3Is heavyCountermeasure loss for constructing imagesWeight parameters in FaceGAN arbiter losses;
and building a detail generation network FineGAN of the second stage, wherein the steps from the fifth step to the seventh step are as follows:
fifthly, local mask vectors adaptive to individuals are generated:
using a plurality of feature points in each human face obtained in the first step to extract an eye region IeyeAnd mouth area ImouthSetting eye mask vectors M respectivelyeyeAnd mouth mask vector MmouthTaking an eye as an example, an eye mask vector M is constructed by setting the pixel value of the eye region in the image to 1 and the pixel values of the other regions to 0eyeMouth mask vector MmouthIs formed with an eye mask vector MeyeSimilarly;
and sixthly, inputting the prediction graph of the first stage into a network of a second stage to carry out detail optimization:
generator G contained in detail generation network FineGAN2And a discriminator D2,D2Is formed by a global arbiter DglobalAnd two local discriminators Deye and DmouthForming;
predicting the first stage of the picture Ipre-targetAnd a neutral blankness image I in the second stepNInput to the generator G2In the method, a second-stage prediction image I with more human face details is generatedtargetThen the second stage prediction graph ItargetSimultaneously input into three discriminators, via a global discriminator DglobalFor the second stage prediction chart ItargetPerforming global discrimination to make the second stage prediction graph ItargetWith the target real image IrealAs close as possible, by means of an eye part discriminator DeyeAnd mouth local discriminator DmouthFor the second stage prediction chart ItargetFurther optimizes the eye and mouth regions to make the second stage predict the image ItargetMore lifelike, second stage pre-processingDrawing ItargetIs expressed as:
Itarget=G2(Ipre-target,IN) (13);
and seventhly, calculating a loss function in the FinEGAN in the second stage:
generator G2The specific formula of the loss function is as follows:
wherein ,
equation (15) is a penalty, including a global penalty and a local penalty, operatorIs a Hadamard product, formula (16) is a pixel loss, formula (17) and formula (18) are local pixel losses, an L1 norm of a pixel difference between a local region of the generated image and a local region of the real image is calculated, formula (19) is a local perceptual loss, and generator G is a maximum value of a local perceptual loss2The total loss function is the weighted sum of the loss functions;
discriminator D2The specific formula of the loss function is as follows:
wherein ,
equation (21) is the penalty of the global arbiter, and equations (22) and (23) are the penalty of the local arbiter, where λ4 and λ5In FineGAN generator G for local antagonistic losses, respectively2Weight parameter of (1), λ6 and λ7Respectively, eye pixel loss LeyeAnd mouth pixel lossIn FineGAN generator G2Weight parameter of (1), λ8For local perception lossIn FineGAN generator G2Weight parameter of (1), λ9To combat loss of global competitionIn FineGAN discriminator D2The weight parameter of (1);
and eighth step, synthesizing a video:
each frame is independently generated, thus completing n frames of image (I)target_1,Itarget_2,···,Itarget_i,···,Itarget_n) After the generation, the video frame sequence is synthesized into the final human face animation;
therefore, the generation of the two-stage expression animation based on the double-generation countermeasure network is completed, the expression in the face image is converted, and the image details are optimized.
In particular, the identity encoder EncidThe system comprises 4 layers of convolution blocks, and a CBAM attention module is added into the first 3 layers of convolution blocks; expression encoder EncexpIncluding 3 layers of convolution blocks, adding CBAM attention module and decoder Dec in last layer of convolution block1The method comprises a 4-layer deconvolution block, a CBAM attention module is added in a first 3-layer convolution block, and a network encoder and a decoder are connected by using a jump connection, specifically, an identity encoder Enc is usedid Layer 1 output and decoder Dec1The input of the last 1 layer is connected with an identity encoder EncidLayer 2 output and decoder Dec1The input of the 2 nd last layer is connected with an identity encoder EncidLayer 3 output and decoder Dec1The input of the last 3 layer is connected. The CBAM attention module is added, so that the network can pay more attention to the learning of important areas in the image, and meanwhile, in order to enable the network to learn detail information such as face textures of lower layers, the high layer and the lower layer of the network are combined by using jump connection.
In the two-stage expression animation generation method based on the double generation countermeasure network, the English abbreviation of the generation countermeasure network model is GAN, the generation countermeasure network model is called as general adaptive Networks, the generation countermeasure network model is a well-known algorithm in the technical field, and a Dlib library is a public database.
The invention has the beneficial effects that: compared with the prior art, the method has the advantages that,
the significant improvements of the present invention are as follows:
(1) compared with CN108288072A, the method of the invention has the advantages that the detail generation network can ensure the fine-grained generation of the human face animation, and two important areas of the mouth and eyes are optimized, so that the generation effect is more vivid and natural.
(2) Compared with CN110084121A, the method of the invention has the advantages that the expression profile is used to supervise the learning process of faceGAN network, so that the network can learn the continuous expression of the expression, learn the emotion degree and generate smooth face animation.
(3) Compared with CN105069830A, the method of the invention has the advantages that the target expression profile is used to guide the expression of the target expression of the network learning, the method is not limited to the type limitation of the expression, and the expression animation of any emotion required by the user can be generated.
(4) Compared with CN107944358A, the method of the invention has the advantages that the method utilizes the ring network structure of the cycleGAN to train the model, and simultaneously adds jump connection in the faceGAN to ensure the consistency of the identity information of the generated face and the source face.
(5) According to the method, the global discriminator, the local discriminator and the local loss function (the formula (17) and the formula (18)) are arranged, so that the real degree of the whole generated image can be ensured, and two important areas, namely eyes and a mouth, can be generated in a refined mode.
(6) According to the method, the attention module and the second-stage detail generation network are added into the faceGAN, so that the local detail generation and fine-grained expression of the image are guaranteed.
The prominent substantive features of the invention are:
1) the method includes the steps that a countermeasure network is generated in two stages to generate expression animations, the expressions are converted in the first stage, and image details are optimized in the second stage; a local loss function based on a mask is provided, a specified region of an image is extracted through a mask vector, emphasis optimization is carried out, and meanwhile, the important part generating effect is better by combining the use of a local discriminator.
2) In the application, each frame of image in the video sequence is generated by a neutral image instead of a video frame sequence generated in a recursive mode, so that the problem that the generation quality of the subsequent frame is worse and worse due to the fact that errors generated by the preorder frame are transmitted to the subsequent frame and the propagation of the errors is caused is solved; in addition, the image input mode can enable the difficulty of model training to be increased by more learning of the network from the neutral expression to the larger change of other expressions. After the predicted image is generated by using the first-stage network, the predicted image is input into the network again, and the source input image is reconstructed by using the ring network concept of the cycleGAN, so that the network can be forced to retain identity characteristics without increasing the number of parameters of the model, and loss functions of the model comprise countermeasure loss, SSIM similarity loss, pixel loss, VGG perception loss and reconstruction loss. The second stage network of the present application includes a generator and a global arbiter, two local arbiters, with the addition of mask-based local arbiters and local penalty functions.
3) In the faceGAN, the method uses the concept of cycleGAN to input the image after the expression conversion as the network again, and reconstructs and generates a source face image, so that the network can forcibly keep the identity characteristics of the face and only change the expression; meanwhile, in the faceGAN, a jump connection structure is utilized to fuse the high-level features and the low-level features of the network, so that the network can learn more face identity information in the low-level features; the method and the device can realize that the identity information of the face is not changed while the expression conversion is carried out.
4) The invention provides a detail optimization network FineGAN, which is focused on the generation of image details and emphatically optimizes important eye regions and mouth regions; proper weight is set to balance pixel loss and antagonistic loss, and perceptual loss is added to remove artifacts, so that the generated image does not contain unreasonable artifacts and the like, and the network generates a high-quality vivid image which has rich details and accords with human vision.
5) The method has the advantages of relatively less network parameter quantity, lower space and time complexity, capability of learning the migration of any expression type by using a uniform network and learning the continuous change of the emotional intensity, and good use prospect.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic block flow diagram of the method of the present invention.
In fig. 2, the odd rows are schematic diagrams of the facial feature points of the method of the present invention, and the even rows are facial expression contour diagrams.
Fig. 3 is a mask diagram of the present invention, wherein the first row is a face region image extracted after preprocessing the original data set, the second and fourth rows are visualizations of an eye mask vector and a mouth mask vector, respectively, and the third and fifth rows are partial region images extracted after applying the eye mask vector and the mouth mask vector to the source image.
FIG. 4 is a graph of the 3 experimental effects of the present invention, wherein the odd rows are the input to the method of the present invention, including a sequence of neutral images of the source face and a silhouette image of the target expression; the even rows are experimental results, i.e., the sequence of video frames that output the expressive animation.
Detailed Description
The embodiment shown in fig. 1 shows that the two-stage expression animation generation method based on the dual generation countermeasure network of the present invention has the following processes:
acquiring a facial expression profile of each frame of image in a data set → extracting the identity characteristic of a source face and the expression characteristic of a target expression profile, preliminarily generating a first-stage prediction image → taking the first-stage prediction image as input, adopting the concept of CycleGAN to reconstruct a neutral image of the source face → calculating a loss function in the first-stage faceGAN → generating a local mask vector adapted to an individual → inputting the first-stage prediction image into a network of a second stage, and carrying out detail optimization → calculating a loss function in the second-stage FineGAN → synthesizing a video.
Example 1
The two-stage expression animation generation method based on the double generation countermeasure network of the embodiment specifically comprises the following steps:
firstly, acquiring a facial expression profile of each frame of image in a data set:
collecting a facial expression video sequence data set, extracting a face in each frame of image in a video sequence by using a Dlib machine learning library, simultaneously obtaining 68 feature points in each face (in the expression migration field, 68 feature points form a face contour and an eye, mouth and nose contour, and 5 or 81 feature points can be arranged), as shown in an odd line in FIG. 2, and then connecting the feature points in sequence by using line segments to obtain an expression contour map of each frame of the video sequence, as shown in an even line in FIG. 2, recording the expression contour map of each frame of the video sequenceIs e ═ e (e)1,e2,···,ei,···,en) Wherein e represents the set of all facial expression profiles in a video sequence, n represents the number of video frames, eiRepresenting the facial expression profile of the ith frame in a certain video sequence;
the first stage is to build an expression migration network faceGAN, and the method comprises the following steps:
secondly, extracting the identity characteristics of the source face and the expression characteristics of the target expression profile graph, and preliminarily generating a first-stage prediction graph:
the faceGAN includes a generator G1And a discriminator D1Wherein the generator G1Comprising three sub-networks, respectively two encoders Encid and EncexpA decoder Dec1;
Firstly, inputting a neutral non-expression image I of a source faceNAnd a target expression profile image sequence e, wherein the input of the embodiment is a neutral face of the S010 user, the target expression profile image sequence is a process from a facial expression to an exposed smile, and the neutral expressionless image I is a neutral expressionless image INThe extracted expression profile is marked as eNWith specific inputs as shown in the first line of FIG. 4, and then using the identity encoder EncidExtracting identity characteristic vector f of S010 useridWhile using the expression encoder EncexpExpression feature vector set f for extracting target expression profile graphexp, wherein fexp=(fexp_1,fexp_2,···,fexp_i,···,fexp_n) The formula is expressed as:
fid=Encid(IN) (1),
fexp_i=Encexp(ei) (2),
the identity feature vector fidAnd the expression feature vector f of the ith frameexp_iThe serial connection is carried out to obtain a characteristic vector f, and f is fid+fexp_iThe feature vector f is fed to the decoder Dec1Decoding to generate a first-stage prediction graph Ipre-targetAnd I ispre-target=Dec1(f) Finally, will Ipre-targetInput to a discriminator D1Judging whether the image is true or false;
and thirdly, taking the prediction image of the first stage as input, and adopting the concept of cycleGAN to reconstruct a neutral image of the source face:
predicting the first stage of the picture Ipre-targetAnd the neutral non-expression image I in the second stepNExtracted expression profile eNRepeating the second step as faceGAN input to generate S010 reconstructed image I with neutral expression of userreconGenerating IreconIs expressed as:
Irecon=Dec1(Encid(Ipre-target)+Encexp(eN)) (3);
fourthly, calculating a loss function in the FaceGAN in the first stage:
generator G in the first stage faceGAN described above1The specific formula of the loss function is as follows:
wherein ,
wherein ,IrealCalculating a target real value (a target real value, namely Groundtruth, which is a source face image with a target expression, namely a real image of a final predicted value of a model), namely an S010 real image of smile of a user, a formula (5) is countermeasure loss of a generator, an SSIM (question mark) function in a formula (6) is used for measuring similarity between two images, a formula (7) is pixel loss, an MAE (question mark) function is a mean square error function is used for measuring a difference between the real value and the predicted value, a formula (8) is sensing loss, and sensing characteristics of the images are extracted by using VGG-19NAnd reconstructing an image IreconDistance between, generator G1The loss function of (a) is a weighted sum of the loss functions of the respective portions;
discriminator D in FaceGAN of the first stage1The specific formula of the loss function is as follows:
wherein ,
equation (11) is the countermeasure loss, and equation (12) is the countermeasure loss of the reconstructed image;
the identity encoder EncidThe system comprises 4 layers of convolution blocks, and a CBAM attention module is added into the first 3 layers of convolution blocks; expression encoder EncexpIncluding 3 layers of convolution blocks, adding CBAM attention module and decoder Dec in last layer of convolution block1The system comprises 4 layers of deconvolution blocks, a CBAM attention module is added in the first 3 layers of convolution blocks, and the high layer and the low layer of the network are connected by using a jump connectionThe specific way is to use an identity encoder EncidLayer 1 output and decoder Dec1The input of the last 1 layer is connected with an identity encoder EncidLayer 2 output and decoder Dec1The input of the 2 nd last layer is connected with an identity encoder EncidLayer 3 output and decoder Dec1The inputs of the last 3 layers are connected and the convolution kernel size in this patent is 3 × 3.
And building a detail generation network FineGAN of the second stage, wherein the steps from the fifth step to the seventh step are as follows:
fifthly, local mask vectors adaptive to individuals are generated:
using 68 feature points in each human face obtained in the first step to extract an eye region IeyeAnd mouth area ImouthFirst, eye mask vectors M are set up separatelyeyeAnd mouth mask vector MmouthAs shown in the second row and the fourth row in fig. 3, taking the eye as an example, M is formed by setting the pixel value of the eye region in the image to 1 and the pixel values of the other regions to 0eyeMouth mask vector MmouthIs formed with MeyeSimilarly;
and sixthly, inputting the prediction graph of the first stage into a network of a second stage to carry out detail optimization:
finegan includes generator G2And a discriminator D2,D2Is formed by a global arbiter DglobalAnd two local discriminators Deye and DmouthForming;
predicting the first stage of the image Ipre-targetAnd the neutral non-expression image I in the second stepNInput to the generator G2In the method, a second-stage prediction graph I containing more face details of the S010 user is generatedtargetThen mix ItargetSimultaneously input into three discriminators, by DglobalFor the generated ItargetMaking a global discrimination totargetReal image I smiling with S010 userrealAs close as possible, by means of an eye part discriminator DeyeAnd mouth local discriminator DmouthTo ItargetFurther emphasising the optimization of the eye and mouth regions such that an image I is generatedtargetMore realistic, the formula is illustrated as follows:
Itarget=G2(Ipre-target,IN) (13);
and seventhly, calculating a loss function in the FinEGAN in the second stage:
generator G2The loss function is specifically formulated as follows:
wherein ,
equation (15) is a penalty, including a global penalty and a local penalty, operatorIs a Hadamard product, formula (16) is a pixel loss, formula (17) and formula (18) are local pixel losses, an L1 norm of a pixel difference between a local region of a generated image and a local region of a real image is calculated, formula (19) is a local perceptual loss, and a generator total loss function is a weighted sum of loss functions;
discriminator D2The specific formula of the loss function is as follows:
wherein ,
equation (21) is the penalty of the global arbiter, and equations (22) and (23) are the penalty of the local arbiter;
and eighth step, synthesizing a video:
each frame is independently generated, thus completing n frames of image (I)target_1,Itarget_2,···,Itarget_i,···,Itarget_n) After the generation, that is, an expression gradual change process from a non-expression to a smile of the user is generated in S010, and the video frame sequence is synthesized into a facial animation of the user in S010, as shown in the second line of fig. 4;
therefore, the generation of the two-stage expression animation based on the double-generation countermeasure network is completed, the expression in the face image is converted, and the image details are optimized.
In this embodiment, the weight parameter settings related to the steps are shown in table 1, and the whole sample database has good effects.
TABLE 1 weight parameter settings for each penalty in this example
In the two-stage expression animation generation method based on the double generation countermeasure network, the English abbreviation of the generation countermeasure network model is GAN, which is called as general adaptive Networks, and the method is a well-known algorithm in the technical field.
Figure 4 shows the effect diagram of 3 embodiments of the invention. Wherein the second line is a sequence of video frames generating S010 a user from neutral expression to smiling, the fourth line is a sequence of video frames generating S022 a user from neutral expression to surprised big mouth, and the sixth line is a sequence of video frames generating S032 a user from neutral expression to down-left mouth. Fig. 4 shows that the method of the present invention can complete the migration of expressions under the condition of retaining the face identity information, and can generate a continuously gradual change video frame sequence to synthesize an animation video with the specified identity and the specified expression.
Nothing in this specification is said to apply to the prior art.
Claims (6)
1. A two-stage expression animation generation method based on a double generation countermeasure network is characterized in that the method comprises the steps of firstly, extracting expression features in a target expression profile by using an expression migration network faceGAN in a first stage, migrating the expression features to a source face, and generating a first-stage prediction image; in the second stage, the detail generation network fineGAN is used for supplementing and enriching the details of the eye and mouth regions which have larger contribution to the change of the expression in the first-stage prediction graph, a fine-grained second-stage prediction graph is generated and synthesized into a face video animation, and the expression migration network faceGAN and the detail generation network fineGAN are both realized by adopting a generation confrontation network.
2. The generation method of claim 1, wherein the faceshift network FaceGAN comprises a generator G1And a discriminator D1Wherein the generator G1Comprising three sub-networks, each being an identity encoder EncidAnd an expression encoder EncexpA decoder Dec1;
Generator G contained in detail generation network FineGAN2And a discriminator D2,D2Is formed by a global arbiter DglobalAn eye part discriminator DeyeAnd a local mouth discriminator DmouthAnd (4) forming.
3. The generation method of claim 1, characterized in that the method comprises the following specific steps:
firstly, acquiring a facial expression profile of each frame of image in a data set:
collecting a facial expression video sequence data set, extracting a face in each frame of image in a video sequence by using a Dlib machine learning library, simultaneously obtaining a plurality of feature points in each face, then connecting the feature points in sequence by using line segments to obtain an expression profile of each frame of the video sequence, and recording the expression profile as e ═ (e ═1,e2,…,ei,…,en) Wherein e represents the set of all expression contour graphs in a video sequence, namely an expression contour graph sequence; n represents the number of video frames, eiAn expression profile representing the ith frame in a video sequence;
the first stage is to build an expression migration network faceGAN, and the method comprises the following steps:
secondly, extracting the identity characteristics of the source face and the expression characteristics of the target expression profile graph, and preliminarily generating a first-stage prediction graph:
the faceshift network faceGAN comprises a generator G1And a discriminator D1Wherein the generator G1Comprising three sub-networks, respectively two encoders Encid and EncexpA decoder Dec1;
Firstly, inputting a neutral non-expression image I of a source faceNAnd a sequence e of target expression profiles, then using an identity encoder EncidNeutral non-expression image I of extraction source faceNIdentity feature vector fidWhile using the expression encoder EncexpExtracting expression characteristic vector set f of target expression profile graph sequence eexp, wherein fexp=(fexp_1,fexp_2,…,fexp_i,…,fexp_n) The formula is expressed as:
fid=Encid(IN) (1),
fexp_i=Encexp(ei) (2),
the identity feature vector fidAnd the expression feature vector f of the ith frameexp_iThe serial connection is carried out to obtain a characteristic vector f, and f is fid+fexp_iThe feature vector f is fed to the decoder Dec1Decoding to generate a first-stage prediction graph Ipre-targetAnd I ispre-target=Dec1(f) Finally, will Ipre-targetInput to a discriminator D1Judging whether the image is true or false;
and thirdly, taking the prediction image of the first stage as input, and adopting the concept of cycleGAN to reconstruct a neutral image of the source face:
predicting the first stage of the picture Ipre-targetAnd the neutral non-expression image I in the second stepNCorresponding expression profile eNAs the input of the faceGAN again, using the identity encoder EncidExtracting an image Ipre-targetUsing the expression encoder EncexpExtracting an expression profile eNThe expression feature vector is repeatedly processed by the second step, and is decoded by a decoder to generate INIs reconstructed image IreconGenerating a reconstructed image IreconIs expressed as:
Irecon=Dec1(Encid(Ipre-target)+Encexp(eN)) (3);
fourthly, calculating a loss function in the faceGAN of the first-stage expression migration network:
the generator G in the FaceGAN of the first-stage expression migration network1The specific formula of the loss function is as follows:
wherein ,
wherein ,IrealFor the target true value, equation (5) is the penalty of the generator, D1(. represents) the discriminator D1The method comprises the steps that an object is true probability, an SSIM (structure description language) function in a formula (6) is used for measuring similarity between two images, a formula (7) is pixel loss, an MAE (mean square error) function is a mean square error function and is used for measuring a difference between a true value and a predicted value, a formula (8) is sensing loss, sensing characteristics of the images are extracted by using VGG-19, characteristics output by a last convolution layer in a VGG-19 network are used as sensing characteristics of the images, the sensing loss between the images and the true images is calculated and generated by the method, a formula (9) is reconstruction loss, and a neutral expressionless image I of a source face is calculatedNAnd its reconstructed image IreconThe distance between them;
discriminator D in FaceGAN of the first-stage expression migration network1The specific formula of the loss function is as follows:
wherein ,
equation (11) is the penalty loss, and equation (12) is the penalty loss of the reconstructed image, where λ1 and λ2For loss of similarityAnd loss of perceptionGenerator G in faceGAN1Weight parameter of (1), λ3For counteracting loss of reconstructed imageWeight parameters in FaceGAN arbiter losses;
and building a detail generation network FineGAN of the second stage, wherein the steps from the fifth step to the seventh step are as follows:
fifthly, local mask vectors adaptive to individuals are generated:
using a plurality of feature points in each human face obtained in the first step to extract an eye region IeyeAnd mouth area ImouthSetting eye mask vectors M respectivelyeyeAnd mouth mask vector MmouthTaking an eye as an example, an eye mask vector M is constructed by setting the pixel value of the eye region in the image to 1 and the pixel values of the other regions to 0eyeMouth mask vector MmouthIs formed with an eye mask vector MeyeSimilarly;
and sixthly, inputting the prediction graph of the first stage into a network of a second stage to carry out detail optimization:
generator G contained in detail generation network FineGAN2And a discriminator D2,D2Is formed by a global arbiter DglobalAnd two local discriminators Deye and DmouthForming;
will be firstPhase prediction graph Ipre-targetAnd a neutral blankness image I in the second stepNInput to the generator G2In the method, a second-stage prediction image I with more human face details is generatedtargetThen the second stage prediction graph ItargetSimultaneously input into three discriminators, via a global discriminator DglobalFor the second stage prediction chart ItargetPerforming global discrimination to make the second stage prediction graph ItargetWith the target real image IrealAs close as possible, by means of an eye part discriminator DeyeAnd mouth local discriminator DmouthFor the second stage prediction chart ItargetFurther optimizes the eye and mouth regions to make the second stage predict the image ItargetMore vivid, second stage prediction graph ItargetIs expressed as:
Itarget=G2(Ipre-target,IN) (13);
and seventhly, calculating a loss function in the FinEGAN in the second stage:
generator G2The specific formula of the loss function is as follows:
wherein ,
equation (15) is a penalty, including a global penalty and a local penalty, operatorIs a Hadamard product, formula (16) is a pixel loss, formula (17) and formula (18) are local pixel losses, an L1 norm of a pixel difference between a local region of the generated image and a local region of the real image is calculated, formula (19) is a local perceptual loss, and generator G is a maximum value of a local perceptual loss2The total loss function is the weighted sum of the loss functions;
discriminator D2The specific formula of the loss function is as follows:
wherein ,
equation (21) is the penalty of the global arbiter, and equations (22) and (23) are the penalty of the local arbiter, where λ4 and λ5In FineGAN generator G for local antagonistic losses, respectively2Weight parameter of (1), λ6 and λ7Respectively eye pixel lossAnd mouth pixel lossIn FineGAN generator G2Weight parameter of (1), λ8For local perception lossIn FineGAN generator G2Weight parameter of (1), λ9To combat loss of global competitionIn FineGAN discriminator D2The weight parameter of (1);
and eighth step, synthesizing a video:
each frame is independently generated, thus completing n frames of image (I)target_1,Itarget_2,…,Itarget_i,…,Itarget_n) After the generation, the video frame sequence is synthesized into the final human face animation;
therefore, the generation of the two-stage expression animation based on the double-generation countermeasure network is completed, the expression in the face image is converted, and the image details are optimized.
4. Method for generating according to claim 2 or 3, characterized in that said identity encoder EncidThe system comprises 4 layers of convolution blocks, and a CBAM attention module is added into the first 3 layers of convolution blocks; expression encoder EncexpIncluding 3 layers of convolution blocks, adding CBAM attention module and decoder Dec in last layer of convolution block1The system comprises 4 layers of deconvolution blocks, a CBAM attention module is added in the first 3 layers of convolution blocks, and a hopping connection is used for combining the high layer and the low layer of the network at the same timeidLayer 1 output and decoder Dec1The input of the last 1 layer is connected with an identity encoder EncidLayer 2 output and decoder Dec1The input of the 2 nd last layer is connected with an identity encoder EncidLayer 3 output and decoder Dec1The input of the last 3 layer is connected.
6. the generation method according to claim 2, wherein the number of feature points obtained in the first step for each face is 68, and the 68 feature points constitute a face contour and an eye, mouth, and nose contour.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010621885.2A CN111783658B (en) | 2020-07-01 | 2020-07-01 | Two-stage expression animation generation method based on dual-generation reactance network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010621885.2A CN111783658B (en) | 2020-07-01 | 2020-07-01 | Two-stage expression animation generation method based on dual-generation reactance network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111783658A true CN111783658A (en) | 2020-10-16 |
CN111783658B CN111783658B (en) | 2023-08-25 |
Family
ID=72761358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010621885.2A Active CN111783658B (en) | 2020-07-01 | 2020-07-01 | Two-stage expression animation generation method based on dual-generation reactance network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111783658B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033288A (en) * | 2021-01-29 | 2021-06-25 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113326934A (en) * | 2021-05-31 | 2021-08-31 | 上海哔哩哔哩科技有限公司 | Neural network training method, and method and device for generating images and videos |
CN113343761A (en) * | 2021-05-06 | 2021-09-03 | 武汉理工大学 | Real-time facial expression migration method based on generation confrontation |
CN115100329A (en) * | 2022-06-27 | 2022-09-23 | 太原理工大学 | Multi-mode driving-based emotion controllable facial animation generation method |
CN115311261A (en) * | 2022-10-08 | 2022-11-08 | 石家庄铁道大学 | Method and system for detecting abnormality of cotter pin of suspension device of high-speed railway contact network |
US20230154088A1 (en) * | 2021-11-17 | 2023-05-18 | Adobe Inc. | Disentangling latent representations for image reenactment |
US11875601B2 (en) | 2020-12-24 | 2024-01-16 | Beijing Baidu Netcom Science and Technology Co., Ltd | Meme generation method, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002304638A (en) * | 2001-04-03 | 2002-10-18 | Atr Ningen Joho Tsushin Kenkyusho:Kk | Device and method for generating expression animation |
WO2019228317A1 (en) * | 2018-05-28 | 2019-12-05 | 华为技术有限公司 | Face recognition method and device, and computer readable medium |
CN110689480A (en) * | 2019-09-27 | 2020-01-14 | 腾讯科技(深圳)有限公司 | Image transformation method and device |
CN111243066A (en) * | 2020-01-09 | 2020-06-05 | 浙江大学 | Facial expression migration method based on self-supervision learning and confrontation generation mechanism |
-
2020
- 2020-07-01 CN CN202010621885.2A patent/CN111783658B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002304638A (en) * | 2001-04-03 | 2002-10-18 | Atr Ningen Joho Tsushin Kenkyusho:Kk | Device and method for generating expression animation |
WO2019228317A1 (en) * | 2018-05-28 | 2019-12-05 | 华为技术有限公司 | Face recognition method and device, and computer readable medium |
CN110689480A (en) * | 2019-09-27 | 2020-01-14 | 腾讯科技(深圳)有限公司 | Image transformation method and device |
CN111243066A (en) * | 2020-01-09 | 2020-06-05 | 浙江大学 | Facial expression migration method based on self-supervision learning and confrontation generation mechanism |
Non-Patent Citations (1)
Title |
---|
陈军波;刘蓉;刘明;冯杨;: "基于条件生成式对抗网络的面部表情迁移模型", 计算机工程, no. 04 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11875601B2 (en) | 2020-12-24 | 2024-01-16 | Beijing Baidu Netcom Science and Technology Co., Ltd | Meme generation method, electronic device and storage medium |
CN113033288A (en) * | 2021-01-29 | 2021-06-25 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113033288B (en) * | 2021-01-29 | 2022-06-24 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113343761A (en) * | 2021-05-06 | 2021-09-03 | 武汉理工大学 | Real-time facial expression migration method based on generation confrontation |
CN113326934A (en) * | 2021-05-31 | 2021-08-31 | 上海哔哩哔哩科技有限公司 | Neural network training method, and method and device for generating images and videos |
CN113326934B (en) * | 2021-05-31 | 2024-03-29 | 上海哔哩哔哩科技有限公司 | Training method of neural network, method and device for generating images and videos |
US20230154088A1 (en) * | 2021-11-17 | 2023-05-18 | Adobe Inc. | Disentangling latent representations for image reenactment |
US11900519B2 (en) * | 2021-11-17 | 2024-02-13 | Adobe Inc. | Disentangling latent representations for image reenactment |
CN115100329A (en) * | 2022-06-27 | 2022-09-23 | 太原理工大学 | Multi-mode driving-based emotion controllable facial animation generation method |
CN115311261A (en) * | 2022-10-08 | 2022-11-08 | 石家庄铁道大学 | Method and system for detecting abnormality of cotter pin of suspension device of high-speed railway contact network |
Also Published As
Publication number | Publication date |
---|---|
CN111783658B (en) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783658A (en) | Two-stage expression animation generation method based on double generation countermeasure network | |
US11367239B2 (en) | Textured neural avatars | |
CN112149459B (en) | Video saliency object detection model and system based on cross attention mechanism | |
CN111275518A (en) | Video virtual fitting method and device based on mixed optical flow | |
CN111798369A (en) | Face aging image synthesis method for generating confrontation network based on circulation condition | |
CN113807265B (en) | Diversified human face image synthesis method and system | |
JP2022513858A (en) | Data processing methods, data processing equipment, computer programs, and computer equipment for facial image generation | |
CN111612687B (en) | Automatic makeup method for face image | |
CN114581992A (en) | Human face expression synthesis method and system based on pre-training StyleGAN | |
Zhou et al. | Generative adversarial network for text-to-face synthesis and manipulation with pretrained bert model | |
CN116071494A (en) | High-fidelity three-dimensional face reconstruction and generation method based on implicit nerve function | |
CA3180427A1 (en) | Synthesizing sequences of 3d geometries for movement-based performance | |
CN114549387A (en) | Face image highlight removal method based on pseudo label | |
Sun et al. | A unified framework for biphasic facial age translation with noisy-semantic guided generative adversarial networks | |
CN111767842B (en) | Micro-expression type discrimination method based on transfer learning and self-encoder data enhancement | |
Hou et al. | Lifelong age transformation with a deep generative prior | |
Wang et al. | Fine-grained image style transfer with visual transformers | |
CN114565624A (en) | Image processing method for liver focus segmentation based on multi-phase stereo primitive generator | |
CN114627293A (en) | Image matting method based on multi-task learning | |
CN113343761A (en) | Real-time facial expression migration method based on generation confrontation | |
Quan et al. | Facial Animation Using CycleGAN | |
CN116542292B (en) | Training method, device, equipment and storage medium of image generation model | |
CN117036893B (en) | Image fusion method based on local cross-stage and rapid downsampling | |
Sreekala et al. | Human Imitation in Images and Videos using GANs. | |
Bo et al. | Style Transfer Analysis Based on Generative Adversarial Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |