CN113096206A - Human face generation method, device, equipment and medium based on attention mechanism network - Google Patents

Human face generation method, device, equipment and medium based on attention mechanism network Download PDF

Info

Publication number
CN113096206A
CN113096206A CN202110277161.5A CN202110277161A CN113096206A CN 113096206 A CN113096206 A CN 113096206A CN 202110277161 A CN202110277161 A CN 202110277161A CN 113096206 A CN113096206 A CN 113096206A
Authority
CN
China
Prior art keywords
expression
image
bidirectional
determining
generated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110277161.5A
Other languages
Chinese (zh)
Other versions
CN113096206B (en
Inventor
文永明
黄绮恒
成慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110277161.5A priority Critical patent/CN113096206B/en
Publication of CN113096206A publication Critical patent/CN113096206A/en
Application granted granted Critical
Publication of CN113096206B publication Critical patent/CN113096206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a human face generation method, a human face generation device, human face generation equipment and a human face generation medium based on an attention mechanism network, wherein the method comprises the following steps: acquiring a training image and a training expression target domain; determining a mapping relation table according to the relation between the expression basic category and the activation vector; determining a mapping rule from the probability vector of the expression basic category to the activation vector according to the mapping relation table; inputting the training image and the training expression target domain into a bidirectional antagonism network for training to obtain a bidirectional antagonism model; and inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule, and determining a continuous target facial expression image. The method can process continuous expression distribution, generate continuous facial expression images and improve the robustness of the model to the change of the background and the illumination condition.

Description

Human face generation method, device, equipment and medium based on attention mechanism network
Technical Field
The invention relates to the technical field of deep learning and image processing, in particular to a human face generation method, a human face generation device, human face generation equipment and a human face generation medium based on an attention mechanism network.
Background
There are two main methods for facial expression generation: one is an image deformation model based on face modeling, such as a three-dimensional face deformation model 3 DMM. The three-dimensional deformation model is established on the basis of a three-dimensional face database, the statistics of the face shape and the face texture are taken as constraints, and the influence of the face posture and the illumination factor is considered, so that the generated three-dimensional face model is high in precision. The 3DMM can be used for constructing a complete face expression result, so that functions of angle conversion, continuous expression conversion and the like can be realized, and the change operations of face changing, expression changing and the like based on a model are facilitated. However, the accuracy depends heavily on the model used, and the training of the model requires a relatively high data acquisition and processing requirement.
The second is a neural network-based generative model, and the invention belongs to the category. The generative model is broadly defined as a process of inputting given training data and generating a new sample similar to the original data distribution, i.e. training data is generated from a certain distribution, the trained model learns the same distribution to generate a sample, and then a true-like sample can be sampled from the data distribution. The generation model comprises Pixel point sequence prediction Pixel CNN and Pixel RNN, a variational automatic encoder VAE and a generation countermeasure network GAN. Defining an easy-to-process density function in the Pixel CNN to directly optimize the likelihood of training data; a density distribution function which is not suitable for processing is defined in the VAE, the density function is modeled through an implicit variable s, data similar to a real sample is generated through sampling, potential vectors of image coding are subjected to Gaussian distribution on the basis of a self-encoder, image generation is achieved, and the lower bound of data log-likelihood is optimized. However, the generated model for the true sample likelihood estimation depends on the distribution of the chosen samples; the generated model adopting the approximate reasoning method is difficult to solve the optimal solution and only can approach the feasible lower bound of the target function, so that the generated picture is fuzzy as a whole.
The generation of the countermeasure network GAN is a game theory-based generation model. A GAN is typically composed of two parts, a generation network and a discrimination network. The input to the generation network is a set of random numbers z and the output is an image used to generate false samples. The false samples should approximate the data distribution of the real samples as much as possible, so that the discrimination network cannot distinguish whether the real samples or the false samples exist. The input of the network is judged to be an image, and the output is a probability value (the probability value is more than 0.5, namely a true sample, and less than 0.5, namely a false sample) for judging whether the input sample is a true sample or a false sample. The GAN network structure has been proven to be useful for generating realistic high detail images, successfully applied to image transformation, image resolution, indoor scene modeling, etc. The generating network and the discriminating network are alternately trained. During the alternate training process, the false samples generated by the generation network increasingly approach the distribution of the real data.
The existing method based on generation of the confrontation network makes remarkable progress in the aspect of facial expression synthesis. Although these prior art methods are effective in synthesizing discrete facial expressions, these methods can only generate a discrete number of expression classes that are determined by the annotations in the data set. Since the expression category is a discrete variable, the prior art processes the expression discrete variable to result in the output generation result being discrete. The prior art cannot generate expressions in smooth transition and is not good at processing continuous expression distribution.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for generating a face based on an attention mechanism network, so as to implement processing of continuous expression distribution and improve robustness of a model to changes in background and illumination conditions.
In one aspect, the present invention provides a face generation method based on an attention mechanism network, including:
acquiring a training image and a training expression target domain;
determining a mapping relation table according to the relation between the expression basic category and the activation vector;
determining a mapping rule from the probability vector of the expression basic category to the activation vector according to the mapping relation table;
inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model comprises a mask for eliminating attention and color and an evaluation generation image;
and inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule, and determining a continuous target facial expression image.
Optionally, the determining a mapping relationship table according to the relationship between the expression basic category and the activation vector includes:
determining the contraction state of specific facial muscles according to the expression basic categories;
determining different combinations of the activation vectors based on a contraction status of the particular facial muscle;
and determining the mapping relation table according to different combinations of the activation vectors.
Optionally, the inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model includes masks for eliminating attention and color and is used for evaluating a generated image, and the method includes:
determining a color mask for the training image;
determining an attention mask of the training image according to the training expression target domain;
determining the generated image according to the color mask and the attention mask.
Optionally, the inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model includes masks for eliminating attention and color and is used for evaluating a generated image, and the method includes:
obtaining a confidence coefficient parameter, and evaluating the generated image according to the confidence coefficient parameter;
mapping the generated image to a probability matrix and determining a loss value by combining a loss function;
and modifying the bidirectional antagonism network according to the loss value, and determining the bidirectional antagonism model.
Optionally, the inputting, according to the mapping rule, an initial facial expression image to be generated and an expression basic category probability vector to be generated into the bidirectional antagonism model to determine a continuous target facial expression image includes:
inputting the probability vector of the expression basic category, and determining an expression target domain to be generated according to the mapping rule;
and inputting the expression target domain to be generated into the bidirectional antagonism model, and outputting the continuous target human face expression image.
Optionally, the method for generating a face of a mechanism network is characterized in that the inputting the probability vector of the expression basic category and determining an expression target domain to be generated according to the mapping rule includes:
determining a set of hyper-parameters according to the mapping rule;
determining the expression target domain to be generated according to an expression target domain generating formula and by combining the hyper-parameter and the probability vector of the expression basic category;
the expression target domain generation formula is as follows:
yg=F(v)=∑iαiT(v);
wherein, ygF is a mapping function, alpha, for the expression target domain to be generatediAnd T (v) is the corresponding activation vector obtained by the probability vector v of the expression basic category according to a mapping relation table T.
On the other hand, the embodiment of the invention also discloses a human face generation device based on the attention mechanism network, which comprises the following steps:
the first module is used for acquiring a training image and a training expression target domain;
the second module is used for determining a mapping relation table according to the relation between the expression basic category and the activation vector;
a third module, configured to determine, according to the mapping relationship table, a mapping rule from the probability vector of the expression basic category to the activation vector;
a fourth module, configured to input the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, where the bidirectional antagonism model includes masks for eliminating attention and color, and is used for evaluating a generated image;
and the fifth module is used for inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule and determining a continuous target facial expression image.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
On the other hand, the embodiment of the invention also discloses a computer readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to realize the method.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: the method comprises the steps of obtaining a training image and a training expression target domain; determining a mapping relation table according to the relation between the expression basic category and the activation vector; determining a mapping rule from the probability vector of the expression basic category to the activation vector according to the mapping relation table; probability vectors of expression basic categories can be processed, and continuous facial expressions can be generated in a smoother transition mode; inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model comprises a mask for eliminating attention and color and an evaluation generation image; inputting an initial facial expression image to be generated and an expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule, and determining a continuous target facial expression image; the robustness of the model to changes of background and illumination conditions can be improved based on an attention mechanism.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a detailed flow chart of an embodiment of the present invention;
FIG. 2 is a flow chart of a training model according to an embodiment of the present invention;
FIG. 3 is a flow chart of generating a facial expression according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the face generation effect according to an embodiment of the present invention;
fig. 5 is a mapping table created according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the invention provides a human face generation method, a human face generation device, human face generation equipment and a human face generation medium based on an attention mechanism network, so that continuous expression distribution is processed, and robustness of a model to changes of background and illumination conditions is improved.
The embodiment of the invention discloses a face generation method based on an attention mechanism network, which comprises the following steps of:
acquiring a training image and a training expression target domain;
determining a mapping relation table according to the relation between the expression basic category and the activation vector;
determining a mapping rule from the probability vector of the expression basic category to the activation vector according to the mapping relation table;
inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model comprises a mask for eliminating attention and color and an evaluation generation image;
and inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule, and determining a continuous target facial expression image.
Further as a preferred embodiment, the determining a mapping relationship table according to the relationship between the expression basic category and the activation vector includes:
determining the contraction state of specific facial muscles according to the expression basic categories;
determining different combinations of the activation vectors based on a contraction status of the particular facial muscle;
and determining the mapping relation table according to different combinations of the activation vectors.
Referring to fig. 5, seven expression basic categories are provided, which are aversion, happy, surprised, fear, angry, slight, hurt and nature, each expression basic category is related to the contraction state of specific facial muscles, the 0 state represents that the specific facial muscles are relaxed, and the 1 state represents that the specific facial muscles are contracted; the activation vectors are used for describing basic categories of expressions, different specific facial muscle contraction states are represented according to different activation vector combinations, and the mapping relation table is determined.
As a further preferred embodiment, the inputting the training image and the training expression target domain into a bidirectional antagonism network training results in a bidirectional antagonism model, wherein the bidirectional antagonism model includes a model for eliminating attention and color masks, and a model for evaluating generated images, and the method includes:
determining a color mask for the training image;
determining an attention mask of the training image according to the training expression target domain;
determining the generated image according to the color mask and the attention mask.
Referring to fig. 2, inputting a training image and a training expression target domain into a bidirectional antagonism model, wherein a generating module in the bidirectional antagonism model comprises two generators, namely an attention mask generator and a color mask generator; generating an attention mask through an attention mask generator, enabling a neural network to keep attention to the part related to the expression, focusing on an image area meaningful for synthesizing a new expression, reducing the attention to the part unrelated to the expression generation, and keeping the rest part of the image unchanged, such as hair, glasses, a hat or a decoration; the attention mask generator only renders elements related to the expression, focuses on pixels defining the facial expression, and uses a color mask to process the illumination condition and remove the influence of illumination on the generation of the facial expression; and according to the training expression target domain, after the training image is processed by using a color mask and an attention mask, gradually changing the size of the activation vector to generate a corresponding face image with complex emotion.
As a further preferred embodiment, the inputting the training image and the training expression target domain into a bidirectional antagonism network training results in a bidirectional antagonism model, wherein the bidirectional antagonism model includes a model for eliminating attention and color masks, and a model for evaluating generated images, and the method includes:
obtaining a confidence coefficient parameter, and evaluating the generated image according to the confidence coefficient parameter;
mapping the generated image to a probability matrix and determining a loss value by combining a loss function;
and modifying the bidirectional antagonism network according to the loss value, and determining the bidirectional antagonism model.
Referring to fig. 2, after the image is generated in the previous step, the image is evaluated; the evaluation module in the two-way antagonism model comprises a conditional discriminator used for training and evaluating the fidelity of the generated image and the completion degree of the expected expression; the conditional arbiter maps the generated image to a matrix
Figure BDA0002977111460000061
In (1), wherein,
Figure BDA0002977111460000062
expressed as a matrix dimension of
Figure BDA0002977111460000063
YI[i,j]Representing overlapping patches [ i, j]Is the true probability, [ i, j]H is the length of the generated image and W is the width of the generated image; in addition, to evaluate its constraints, a confidence parameter of the activation value is added on top of the conditional arbiter to estimate the activation vector value in the image
Figure BDA0002977111460000064
Where N denotes the number of activation vectors used and T denotes the transpose operation.
The overall loss function is as follows:
Figure BDA0002977111460000065
it consists of a total of four loss terms:
image fighting loss term
Figure BDA0002977111460000066
Figure BDA0002977111460000067
Wherein D isIRepresenting an image discriminator, G representing a generator,
Figure BDA0002977111460000068
representing the input original face image, yfRepresenting the neighborhood of the target expression, λgpA penalty factor is represented which is a function of,
Figure BDA00029771114600000611
it is shown that the expectation is obtained,
Figure BDA0002977111460000069
representing the corresponding facial expression image generated by the generator; the effect of the method is to make the distribution of the generated images tend to the distribution of the training images, namely to make the generated images look more real; the loss function is based on WGAN, since the original GAN is hard to train with JS divergence, and is prone to gradient disappearance or gradient explosion; the meaning of the loss is that the result of the generated image is maximized and the result of the original image is minimized, and a gradient penalty term is added to control the gradient in a certain range.
Loss term of attention mechanism
Figure BDA00029771114600000610
Figure BDA0002977111460000071
Wherein λ isTVThe representation of the hyper-parameter is,
Figure BDA0002977111460000076
indicating that it is desired, yoRepresenting the original expression neighborhood, H, W representing the length and width of the image, A representing the attention mask; since the dataset does not have the true values of the attention mask, the attention mask is easily oversaturated, i.e. all values tend to 1; the first term of attention loss is the fully differential loss, which is originally a smoothing for the image, and the second term is a penalty term of L2.
Conditional expression loss terms
Figure BDA0002977111460000072
Figure BDA0002977111460000073
Inputting the original image and the generated image into a discriminator respectively, and calculating loss of the expression vector and the truth value of the expression vector respectively;
loss of identity item
Figure BDA0002977111460000074
Figure BDA0002977111460000075
Bringing the output of the second generator closer to the original image; and ensuring that the generated expression face and the original image are the same person.
As a further preferred embodiment, the determining a continuous target facial expression image by inputting an initial facial expression image to be generated and an expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule includes:
inputting the probability vector of the expression basic category, and determining an expression target domain to be generated according to the mapping rule;
and inputting the expression target domain to be generated into the bidirectional antagonism model, and outputting the continuous target human face expression image.
Referring to fig. 3, the probability vector of the expression basic category is a vector of the expression category in percentage of the facial expression, and expression target domains to be generated are obtained in a one-to-one correspondence manner according to a mapping rule; inputting the expression target domain to be generated into the bidirectional antagonism model trained by the method, combining the facial image to be generated, synthesizing the image through the attention mask generator and the color mask generator, and outputting the continuous target facial expression image, referring to fig. 4, wherein fig. 4 is a generated complex facial generation effect diagram with 20% aversion, 50% anger and 30% hurt.
Further as a preferred embodiment, the inputting the probability vector of the expression basic category and determining an expression target domain to be generated according to the mapping rule includes:
determining a set of hyper-parameters according to the mapping rule;
determining the expression target domain to be generated according to an expression target domain generating formula and by combining the hyper-parameter and the probability vector of the expression basic category;
the expression target domain generation formula is as follows:
yg=F(v)=∑iαiT(v);
wherein, ygF is a mapping function, alpha, for the expression target domain to be generatediAnd T (v) is the corresponding activation vector obtained by the probability vector v of the expression basic category according to a mapping relation table T.
Referring to fig. 5, a set of hyper-parameters α ═ α is set1,α2,α3,…,αiAdjusting the size of the activation vector corresponding to the expression category probability, and changing the expression category probability vector v to { v ═ v }1,v2,v3,…,v7Fig. 5 shows a table according to the mapping relationship T, and finally obtains a corresponding expression target domain according to an expression target domain generation formula.
The specific operation flow of the embodiment of the present invention is further described with reference to fig. 1 below: the embodiment of the invention constructs a mapping relation table for the relation between seven expression basic categories and activation vectors, designs a mapping rule from the probability vectors of the expression basic categories to the activation vectors, trains a bidirectional antagonism model based on an attention mechanism in an unsupervised mode, processes the probability vectors and images of continuous expression categories through the model based on the attention mechanism, and outputs continuous facial expression pictures containing corresponding expression components.
The embodiment of the invention also discloses a face generation device based on the attention mechanism, which comprises:
the first module is used for acquiring a training image and a training expression target domain;
the second module is used for determining a mapping relation table according to the relation between the expression basic category and the activation vector;
a third module, configured to determine, according to the mapping relationship table, a mapping rule from the probability vector of the expression basic category to the activation vector;
a fourth module, configured to input the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, where the bidirectional antagonism model includes masks for eliminating attention and color, and is used for evaluating a generated image;
and the fifth module is used for inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule and determining a continuous target facial expression image.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In the prior art, a method based on generation of a countermeasure network can only effectively synthesize discrete facial expressions, and only a discrete number of expression types can be generated, and the expression types are determined by labeling of a data set. Since the expression category is a discrete variable, the prior art processes the expression discrete variable to result in the output generation result being discrete. The prior art cannot generate expressions in smooth transition and is not good at processing continuous expression distribution.
In summary, the face generation method, apparatus, device and medium based on attention mechanism network of the present invention have the following advantages:
1) in the generation of the facial expression, a mapping rule from the probability vector of the expression basic category to the activation vector is designed, so that the probability vector of continuous expression basic categories is processed, and continuous facial expression pictures containing corresponding expression components are generated.
2) When the probability vectors of the expression basic categories are processed, the attention mechanism principle is used in the generator, the attention of the neural network to the parts relevant to the expression generation is kept, meanwhile, the attention to the parts irrelevant to the expression generation is reduced, the generator can only focus on the generation of continuous new expressions, other elements are kept, and the robustness of the model to the change of the background and the illumination condition is improved.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a u-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A face generation method based on an attention mechanism network is characterized by comprising the following steps:
acquiring a training image and a training expression target domain;
determining a mapping relation table according to the relation between the expression basic category and the activation vector;
determining a mapping rule from the probability vector of the expression basic category to the activation vector according to the mapping relation table;
inputting the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model is used for eliminating attention and color masks and evaluating generated images;
and inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule, and determining a continuous target facial expression image.
2. The method of claim 1, wherein the determining a mapping relationship table according to the relationship between the expression basic categories and the activation vectors comprises:
determining the contraction state of specific facial muscles according to the expression basic categories;
determining different combinations of the activation vectors based on a contraction status of the particular facial muscle;
and determining the mapping relation table according to different combinations of the activation vectors.
3. The method of claim 1, wherein the inputting the training image and the training expression target domain into a bidirectional antagonism network is trained to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model comprises masks for eliminating attention and color, and the evaluating the generated image comprises:
determining a color mask for the training image;
determining an attention mask of the training image according to the training expression target domain;
determining the generated image according to the color mask and the attention mask.
4. The method for generating a human face based on an attention mechanism network according to any one of claims 1 or 3, wherein the inputting the training image and the training expression target domain into a bidirectional antagonism network is trained to obtain a bidirectional antagonism model, wherein the bidirectional antagonism model comprises a model for eliminating attention and color masks and a model for evaluating a generated image, and comprises:
obtaining a confidence coefficient parameter, and evaluating the generated image according to the confidence coefficient parameter;
mapping the generated image to a probability matrix and determining a loss value by combining a loss function;
and modifying the bidirectional antagonism network according to the loss value, and determining the bidirectional antagonism model.
5. The method of claim 1, wherein the step of inputting an initial facial expression image to be generated and an expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule to determine a continuous target facial expression image comprises:
inputting the probability vector of the expression basic category, and determining an expression target domain to be generated according to the mapping rule;
and inputting the expression target domain to be generated into the bidirectional antagonism model, and outputting the continuous target human face expression image.
6. The method as claimed in claim 5, wherein the inputting the probability vector of the expression basic category and determining the expression target domain to be generated according to the mapping rule comprises:
determining a set of hyper-parameters according to the mapping rule;
determining the expression target domain to be generated according to an expression target domain generating formula and by combining the hyper-parameter and the probability vector of the expression basic category;
the expression target domain generation formula is as follows:
yg=F(v)=∑iαiT(v);
wherein, ygF is a mapping function, alpha, for the expression target domain to be generatediAnd T (v) is the corresponding activation vector obtained by the probability vector v of the expression basic category according to a mapping relation table T.
7. A human face generation apparatus based on attention mechanism network, comprising:
the first module is used for acquiring a training image and a training expression target domain;
the second module is used for determining a mapping relation table according to the relation between the expression basic category and the activation vector;
a third module, configured to determine, according to the mapping relationship table, a mapping rule from the probability vector of the expression basic category to the activation vector;
a fourth module, configured to input the training image and the training expression target domain into a bidirectional antagonism network to train to obtain a bidirectional antagonism model, where the bidirectional antagonism model includes masks for eliminating attention and color, and is used for evaluating a generated image;
and the fifth module is used for inputting the initial facial expression image to be generated and the expression basic category probability vector to be generated into the bidirectional antagonism model according to the mapping rule and determining a continuous target facial expression image.
8. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1-6.
9. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-6.
CN202110277161.5A 2021-03-15 2021-03-15 Human face generation method, device, equipment and medium based on attention mechanism network Active CN113096206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110277161.5A CN113096206B (en) 2021-03-15 2021-03-15 Human face generation method, device, equipment and medium based on attention mechanism network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110277161.5A CN113096206B (en) 2021-03-15 2021-03-15 Human face generation method, device, equipment and medium based on attention mechanism network

Publications (2)

Publication Number Publication Date
CN113096206A true CN113096206A (en) 2021-07-09
CN113096206B CN113096206B (en) 2022-09-23

Family

ID=76667410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110277161.5A Active CN113096206B (en) 2021-03-15 2021-03-15 Human face generation method, device, equipment and medium based on attention mechanism network

Country Status (1)

Country Link
CN (1) CN113096206B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network
CN111027425A (en) * 2019-11-28 2020-04-17 深圳市木愚科技有限公司 Intelligent expression synthesis feedback interaction system and method
CN111028305A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Expression generation method, device, equipment and storage medium
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
CN111652121A (en) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 Training method of expression migration model, and expression migration method and device
CN112287858A (en) * 2020-11-03 2021-01-29 北京享云智汇科技有限公司 Recognition method and system for generating confrontation network expression

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
CN111028305A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Expression generation method, device, equipment and storage medium
CN111027425A (en) * 2019-11-28 2020-04-17 深圳市木愚科技有限公司 Intelligent expression synthesis feedback interaction system and method
CN111652121A (en) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 Training method of expression migration model, and expression migration method and device
CN112287858A (en) * 2020-11-03 2021-01-29 北京享云智汇科技有限公司 Recognition method and system for generating confrontation network expression

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALBERT PUMAROLA等: ""GANimation: Anatomically-Aware Facial Animation from a Single Image"", 《15TH EUROPEAN CONFERENCE ON COMPUTER VISION ECCV 2018:COMPUTER VISION - ECCV 2018》 *
顾天成: ""对抗生成网络表情生成研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Also Published As

Publication number Publication date
CN113096206B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
Zhang et al. Stackgan++: Realistic image synthesis with stacked generative adversarial networks
Li et al. Controllable text-to-image generation
Goodfellow Nips 2016 tutorial: Generative adversarial networks
Habibie et al. A recurrent variational autoencoder for human motion synthesis
Turhan et al. Recent trends in deep generative models: a review
Reed et al. Deep visual analogy-making
Taylor et al. Two Distributed-State Models For Generating High-Dimensional Time Series.
CN109166144B (en) Image depth estimation method based on generation countermeasure network
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
KR102602112B1 (en) Data processing method, device, and medium for generating facial images
Yu et al. A video, text, and speech-driven realistic 3-D virtual head for human–machine interface
Ververas et al. Slidergan: Synthesizing expressive face images by sliding 3d blendshape parameters
CN114330736A (en) Latent variable generative model with noise contrast prior
Yang et al. Multiscale mesh deformation component analysis with attention-based autoencoders
Taylor Composable, distributed-state models for high-dimensional time series
CN114494543A (en) Action generation method and related device, electronic equipment and storage medium
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
Liu et al. Palm up: Playing in the latent manifold for unsupervised pretraining
CN113096206B (en) Human face generation method, device, equipment and medium based on attention mechanism network
Jang et al. Observational learning algorithm for an ensemble of neural networks
Guo et al. Optimizing latent distributions for non-adversarial generative networks
Zhang et al. On Open-Set, High-Fidelity and Identity-Specific Face Transformation
Rakesh et al. Generative Adversarial Network: Concepts, Variants, and Applications
Ardino Exploring Deep generative models for Structured Object Generation and Complex Scenes Manipulation
Lu et al. Cdvae: Co-embedding deep variational auto encoder for conditional variational generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant