CN113205449A - Expression migration model training method and device and expression migration method and device - Google Patents

Expression migration model training method and device and expression migration method and device Download PDF

Info

Publication number
CN113205449A
CN113205449A CN202110560292.4A CN202110560292A CN113205449A CN 113205449 A CN113205449 A CN 113205449A CN 202110560292 A CN202110560292 A CN 202110560292A CN 113205449 A CN113205449 A CN 113205449A
Authority
CN
China
Prior art keywords
dimensional face
training
decoder
encoder
expression migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110560292.4A
Other languages
Chinese (zh)
Inventor
梁延研
冯梓原
林旭新
杨林
史少桦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Kingsoft Online Game Technology Co Ltd
Original Assignee
Zhuhai Kingsoft Online Game Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Kingsoft Online Game Technology Co Ltd filed Critical Zhuhai Kingsoft Online Game Technology Co Ltd
Priority to CN202110560292.4A priority Critical patent/CN113205449A/en
Publication of CN113205449A publication Critical patent/CN113205449A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a training method and device of an expression migration model, and an expression migration method and device, wherein the expression migration model comprises an encoder, a first decoder and a second decoder, and the training method comprises the following steps: acquiring a first three-dimensional face sample and a second three-dimensional face sample; training the first three-dimensional face sample based on the encoder and the first decoder, and training the second three-dimensional face sample based on the encoder and the second decoder; it is determined whether a training stop condition is reached and, in the event that the training stop condition is reached, the training process is stopped. According to the training method of the expression migration model, the graph convolution neural network is used for extracting features, the network structure of the self-encoder is adopted for training, and the expression migration to the specific three-dimensional face is achieved.

Description

Expression migration model training method and device and expression migration method and device
Technical Field
The specification relates to the technical field of computers, in particular to a method and a device for training an expression migration model and a method and a device for expression migration.
Background
With the development of technology, processing and analyzing facial expressions become a research hotspot in the fields of computer vision and graphics, and facial expression migration is also widely applied. The facial expression migration means that the captured real user expression is mapped to another target image, so that the purpose of migrating the facial expression to the target image is achieved. The technology not only enables the user to control the facial expression in the target picture or video by inputting the face, but also provides data enhancement service for the face recognition task.
The existing three-dimensional facial expression migration is mainly performed by a method of detecting parameters of a face key point fitting 3D dm Model (3D deformable Model), wherein the 3D dm Model is constructed by Principal Component Analysis (PCA) of a database, and can be regarded as an average face (mean face) obtained from a large number of faces in the database, and always has features of the faces in the database. The 3DMM model reconstructs the human face, and not only needs similar growth and expression, but also needs the same expression, so that the expression cannot be transferred to the specific three-dimensional human face.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a training method for an expression migration model and an expression migration method. The present specification also relates to a training apparatus for an expression migration model, an expression migration apparatus, a computing device, and a computer-readable storage medium, so as to solve the technical defects in the prior art.
According to a first aspect of embodiments of the present specification, there is provided a training method for an expression migration model, where the expression migration model includes an encoder, a first decoder, and a second decoder, the training method includes:
acquiring a first three-dimensional face sample and a second three-dimensional face sample;
training the first three-dimensional face sample based on the encoder and the first decoder, and training the second three-dimensional face sample based on the encoder and the second decoder;
it is determined whether a training stop condition is reached and, in the event that the training stop condition is reached, the training process is stopped.
Optionally, training a first three-dimensional face sample based on the encoder and the first decoder comprises:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face sample into the encoder to obtain a first encoding vector;
inputting the first coding vector into the first decoder to obtain a first decoding vector, and obtaining a loss value according to the first decoding vector and the initial vertex information;
adjusting a coefficient vector of a network layer in the encoder and the first decoder according to the loss value.
Optionally, training the second three-dimensional face sample based on the encoder and the second decoder comprises:
inputting the initial vertex information and the adjacency matrix of the second three-dimensional face sample into the encoder to obtain a second encoding vector,
inputting the second coding vector into the second decoder to obtain a second decoding vector, and obtaining a loss value according to the second decoding vector and the initial vertex information of the second three-dimensional face sample;
adjusting a coefficient vector of a network layer in the encoder and the second decoder according to the loss value.
Optionally, the encoder comprises a graph convolution neural network layer, a down-sampling layer and a full connection layer, the graph convolution neural network layer and the down-sampling layer are sequentially arranged at intervals, the first decoder and the second decoder comprise the full connection layer, an up-sampling layer and a convolutional neural network layer, and the up-sampling layer and the convolutional neural network layer are sequentially arranged at intervals.
Optionally, training a first preset number of first three-dimensional face samples and training a second preset number of second three-dimensional face samples are alternately performed.
Optionally, the obtaining a first three-dimensional face sample comprises:
the method comprises the steps of obtaining a plurality of face images, and carrying out face reconstruction on the face images to obtain a first three-dimensional face sample.
Optionally, the obtaining the first three-dimensional face sample further includes:
and after face reconstruction is carried out on the plurality of face images, carrying out spatial alignment on the reconstructed three-dimensional face and the second three-dimensional face sample.
According to a second aspect of embodiments of the present specification, there is provided an expression migration method using an expression migration model, the expression migration model including an encoder, a first decoder, and a second decoder and being trained in advance by the training method of any one of the above, the method including:
acquiring a first three-dimensional face of an expression to be migrated;
and performing expression migration on the first three-dimensional face based on the encoder and the second decoder to obtain a second three-dimensional face.
Optionally, performing expression migration on the first three-dimensional face based on the encoder and the second decoder comprises:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face into the encoder to obtain a first encoding vector;
inputting the first encoded vector to the second decoder.
Optionally, the obtaining of the first three-dimensional face with the expression to be migrated includes:
and intercepting multi-frame face images of the same face from a target video, and carrying out face reconstruction on the multi-frame face images to obtain a plurality of first three-dimensional faces of expressions to be migrated.
Optionally, the expression migration method further includes:
and generating an animation based on the second three-dimensional face.
According to a third aspect of embodiments of the present specification, there is provided a training apparatus for an expression migration model, the expression migration model including an encoder, a first decoder, and a second decoder, the training apparatus including:
a first obtaining module configured to obtain a first three-dimensional face sample and a second three-dimensional face sample;
a training module configured to train the first three-dimensional face sample based on the encoder and the first decoder, and train the second three-dimensional face sample based on the encoder and the second decoder;
a judging module configured to judge whether a training stop condition is reached and, in case the training stop condition is reached, stop the training process.
According to a fourth aspect of embodiments of the present specification, there is provided an expression migration apparatus using an expression migration model, the expression migration model including an encoder, a first decoder, and a second decoder and being trained in advance by the training method described in any one of the above, the expression migration apparatus including:
the second acquisition module is configured to acquire a first three-dimensional face of the expression to be migrated;
and the migration module is configured to perform expression migration on the first three-dimensional face based on the encoder and the second decoder to obtain a second three-dimensional face.
According to a fifth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, where the computer-executable instructions, when executed by the processor, implement the method for training an expression migration model according to the first aspect, or implement the operation steps of the expression migration method using an expression migration model according to the second aspect.
According to a sixth aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the method for training an expression migration model according to the first aspect or the operation steps of the expression migration method using an expression migration model according to the second aspect.
The method for training the expression migration model comprises the steps of obtaining a first three-dimensional face sample and a second three-dimensional face sample; training a first three-dimensional face sample based on an encoder and a first decoder, and training a second three-dimensional face sample based on the encoder and a second decoder; in case the training stop condition is reached, the training process is stopped.
The training method of the expression migration model according to the description extracts features by using a graph convolutional neural network, and performs training by using a network structure of a self-encoder to realize the migration of expressions to a specific three-dimensional face.
Drawings
Fig. 1 is a flowchart illustrating a training method of an expression migration model according to an embodiment of the present specification;
fig. 2 is a process schematic diagram illustrating a training method of an expression migration model according to an embodiment of the present specification;
FIG. 3 is a schematic diagram illustrating a network architecture of an expression migration model according to an embodiment of the present specification;
FIG. 4 illustrates a schematic diagram of a down-sampling layer and an up-sampling layer in the expression migration model of FIG. 3;
FIG. 5 is a flowchart illustrating an expression migration method using an expression migration model according to an embodiment of the present specification;
fig. 6 is a flowchart illustrating an expression migration method using an expression migration model according to an embodiment of the present specification;
FIG. 7 is a flowchart illustrating a training method for an expression migration model according to an embodiment of the present specification;
FIG. 8 is a flowchart illustrating a process of a method for three-dimensional facial expression migration according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram illustrating a training apparatus for an expression migration model according to an embodiment of the present specification;
fig. 10 is a schematic structural diagram illustrating an expression transfer apparatus according to an embodiment of the present specification;
fig. 11 shows a block diagram of a computing device according to an embodiment of the present specification.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In the present specification, a method for training an expression migration model is provided, and the present specification also relates to an apparatus for training an expression migration model, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Fig. 1 is a flowchart illustrating a training method for an expression migration model according to an embodiment of the present specification, where the expression migration model includes an encoder, a first decoder, and a second decoder, and the training method specifically includes the steps of:
step 102: and acquiring a first three-dimensional face sample and a second three-dimensional face sample.
The first three-dimensional face sample and the second three-dimensional face sample are three-dimensional face samples of characters or virtual characters, and the first three-dimensional face sample and the second three-dimensional face sample correspond to different characters or virtual characters. The three-dimensional face has different expressions, which are characterized in that three-dimensional face grids are specifically arranged, vertexes are arranged on the grids, and the three-dimensional face grids are defined as a graph
Figure BDA0003078693870000061
Representative of the fact that,
Figure BDA0003078693870000062
Figure BDA0003078693870000063
Figure BDA0003078693870000064
is a set of vertices, a is an adjacency matrix that characterizes the relationship between vertices, each vertex being represented by a three-dimensional coordinate value. And determining the elements of each position according to the adjacent relation of the corresponding row and the corresponding column of each position by taking each vertex of the three-dimensional face sample as the row and the column respectively. In one embodiment, the first three-dimensional face sample is a three-dimensional face sample of a character, and the second three-dimensional face sample is a three-dimensional face sample of a virtual character, such as a three-dimensional face sample of a game character.
Optionally, the obtaining a first three-dimensional face sample comprises:
the method comprises the steps of obtaining a plurality of face images, and carrying out face reconstruction on the face images to obtain a first three-dimensional face sample.
The first three-dimensional face sample can be obtained by performing three-dimensional face reconstruction on a two-dimensional image, for example, capturing a frame-by-frame image from a video, and then performing three-dimensional face reconstruction to obtain a three-dimensional face sample. And three-dimensional face reconstruction can be carried out on a plurality of two-dimensional images obtained by direct shooting to obtain a three-dimensional face sample.
The three-dimensional face reconstruction can be reconstructed by conventional three-dimensional face reconstruction methods, for example by single-map modeling including muscle models, image-based modeling, multi-map modeling including orthogonal view modeling, multi-map modeling systems, model-based three-dimensional face reconstruction including the generic face model CANDIDE-3, the three-dimensional deformation model 3DMM, and end-to-end three-dimensional face reconstruction including VRNet, PRNet. And reconstructing the two-dimensional image into a three-dimensional face through three-dimensional face reconstruction.
Optionally, the obtaining the first three-dimensional face sample further includes:
and after face reconstruction is carried out on the plurality of face images, carrying out spatial alignment on the reconstructed three-dimensional face and the second three-dimensional face sample.
Spatial alignment (Spatial alignment) as shown in fig. 2 is used to align the reconstructed three-dimensional face with the second three-dimensional face sample in space, with the eye, mouth, nose positions in the two faces aligned. And then topology unification is carried out, so that the number of points of the reconstructed three-dimensional face and the number of the points of the second three-dimensional face sample are the same as that of the reconstructed three-dimensional face, the three-dimensional face is formed by faces, the faces are formed by points and are unified, and meanwhile, the serial numbers of the points forming the faces are also required to be the same, so that the faces of the two three-dimensional faces are in one-to-one correspondence with the faces, and the faces are changed into the same topology structure.
Step 104: the first three-dimensional face sample is trained based on an encoder and a first decoder, and the second three-dimensional face sample is trained based on the encoder and a second decoder.
As shown in fig. 2, a first three-dimensional face sample is trained by an encoder and a first decoder, and a second three-dimensional face sample is trained by an encoder and a second decoder. Optionally, training a first three-dimensional face sample based on the encoder and the first decoder may be achieved by:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face sample into the encoder to obtain a first encoding vector;
inputting the first coding vector into the first decoder to obtain a first decoding vector, and obtaining a loss value according to the first decoding vector and the initial vertex information;
adjusting a coefficient vector of a network layer in the encoder and the first decoder according to the loss value.
Optionally, training the second three-dimensional face sample based on the encoder and the second decoder may be achieved by:
inputting the initial vertex information and the adjacency matrix of the second three-dimensional face sample into the encoder to obtain a second encoding vector,
inputting the second coding vector into the second decoder to obtain a second decoding vector, and obtaining a loss value according to the second decoding vector and the initial vertex information of the second three-dimensional face sample;
adjusting a coefficient vector of a network layer in the encoder and the second decoder according to the loss value.
The network architecture for expression migration of a three-dimensional face is composed of an encoder (encoder) and two decoders (decoders), the network architecture is shown in fig. 3, the encoder encodes expression information, and the decoders recover identity information of the three-dimensional face. Optionally, the encoder comprises a graph convolution neural network layer, a down-sampling layer and a full connection layer, the graph convolution neural network layer and the down-sampling layer are sequentially arranged at intervals, the first decoder and the second decoder comprise the full connection layer, an up-sampling layer and a convolutional neural network layer, and the up-sampling layer and the convolutional neural network layer are sequentially arranged at intervals. As shown in fig. 3, G represents a graph convolutional neural network, F represents a fully connected layer, the length of a box represents the number of vertices of a face mesh, except for the last layer, each layer of the graph convolutional neural network is followed by a downsampling operation, the graph convolutional neural network layer and the downsampling layer are sequentially arranged at intervals, each downsampling reduces the number of input vertices to 1/4, the last layer is a fully connected layer, and the output result is Z. Each downsampling can reduce the number of the top points to one sixth, one eighth and the like of the original number, and the number of the convolution network layers is unchanged. The number of the encoder layer and the decoder layer can be adjusted, and is not limited to 4 layers, and can be 6 layers, 8 layers and the like, which is not limited in the application.
As shown in fig. 2 and 3, the decoder is composed of two parts, respectively decoding the corresponding 3D face mesh. The fully-connected layer is followed by an upsampling layer, followed by a atlas neural network. The image convolution neural network and the upsampling are performed alternately, the vertex is restored to be 4 times of the original vertex in each upsampling, and finally the vertex of the complete 3D face mesh is output. The first decoder comprises a full connection layer, an upper sampling layer and a convolutional neural network layer, wherein the upper sampling layer and the convolutional neural network layer are sequentially arranged at intervals. The original vertex information can be recovered during up-sampling, the up-sampling and the down-sampling are corresponding, the vertex is deleted during the down-sampling, the coordinates of the deleted points are recorded while the deletion is carried out, and the original vertex information can be recovered as much as possible according to the coordinates of the deleted points during the up-sampling.
As shown in FIG. 4, part (a) represents the original mesh, QdDeleting some points in the down-sampling process to form part (b), convolving to form part (c), and up-samplinguAnd (d) finding the nearest grid according to the coordinate projection of the deleted point during upsampling, recovering, and comparing the final point number with the original point number without change.
The vertex information is coordinate information of the vertexes of the three-dimensional face mesh, and provides input dimensionality, output dimensionality and an adjacency matrix when the graph convolution network is defined. The vertex set is input into a first graph convolution neural network of the encoder, and graph convolution is carried out by combining adjacency matrix information. In the graph convolution neural network, vertex information is aggregated by using a preset function according to a certain rule to obtain a new coding vector and output the new coding vector.
The graph convolution will be explained below. Calculating an adjacent matrix A to obtain a degree matrix D, obtaining a Laplace matrix L by using L ═ D-A, and obtaining a characteristic vector u of the Laplace matrix0,u1,…,un-1Wherein, L ═ U Λ UT,U=[u0,u1,…,un-1],Λ=diag([λ01,…,λn-1])∈Rn×n. The general definition of graph convolution is:
x*y=U((UTx)⊙(UTy)) (1)
where x and y are inputs, x represents a convolution operation, U is a feature vector, and U is a function vectorTx represents the Fourier transform of x, UTy represents a Fourier transform on y, and |, represents a Hadamard product. X is the input vector to be convolved, y is the signal information on the graph, and convolution operation is carried out after Fourier transformation.
An image is given, a Laplace matrix of the image is obtained, eigenvalues and eigenvectors of the Laplace matrix are calculated, the eigenvectors are used as a group of basis vectors of Fourier transform, then Fourier transform is carried out, the input on the image is changed into representation on a frequency domain, convolution operation is carried out in the space of the frequency domain, and then the result is transformed into the original space. In other words, the signal to be convolved is transformed into the space of the set of basis, i.e. the frequency domain space, by the set of basis vectors, convolved in the frequency domain space, and then inverse transformed back to the original space by the set of basis vectors. The above convolution process is called spectral convolution.
The above is the theoretical knowledge of the spectrum convolution, and the following is a description of the specific convolution operation. The calculation process of the feature vector is simulated by using a chebyshev polynomial, as shown in the following formula:
Figure BDA0003078693870000081
wherein, gθIs a convolution kernel that is a function of the convolution kernel,
Figure BDA0003078693870000082
is the scaling factor of the Laplacian, where InIs an n-dimensional identity matrix, theta is the Chebyshev coefficient vector, TkIs k order Chebyshev polynomial, k can be 6, 8, 10, etc., and can be set by self, theta is coefficient vector to be trained by model, lambda ismaxThe maximum eigenvalue of the laplacian matrix.
Chebyshev polynomial equation is Tk(x)=2xTk-1(x)-Tk-2(x),T0=1,T1The graph convolution after applying the chebyshev polynomial is defined as follows:
Figure BDA0003078693870000091
wherein, yjThe jth feature of y is calculated,
Figure BDA0003078693870000092
x is input and has FinThe characteristics of the device are as follows,
Figure BDA0003078693870000093
Finis the coordinate dimension of the coordinate vertex, in this case, Fin3. Each convolution layer has Fin×FoutIndividual Chebyshev coefficient vector, thetai,j∈RKCan be used as a training parameter. FoutThe number of channels of the output vector is a quantity specified when the graph convolution network layer is set. In one embodiment, n is 3791, Fout16. In one embodiment, at the time of network training, k is selected to be 6, an SGD stochastic gradient descent optimizer is used, the attenuation rate of the weight is 0.0005, the size of the batch is 16, the learning rate is 0.008, and the attenuation of the learning rate is 0.99. The decay rate of the weights is used for regularization to prevent overfitting from occurring. The batch is a hyper-parameter, and in this embodiment, is defined as 16, and represents that only 16 three-dimensional faces are taken for training each time, and the batch may be set to other values, preferably to multiples of 2.
Step 106: it is determined whether a training stop condition is reached and, in the event that the training stop condition is reached, the training process is stopped.
As shown in fig. 2, discriminators are respectively disposed behind the first decoder and the second decoder, the discriminators determine whether the face result decoded by the decoders is good or bad, the result is true (real) as the face result is closer to the input three-dimensional face, the vertex coordinates of the decoded face are compared with the vertex coordinates of the reconstructed face, the difference value is smaller than a preset threshold value, and the discrimination is determined as true (real), so that the generated result is closer to the original input. During training, a part of three-dimensional face samples in a training set are input, and a part of target face samples are input. The parameters are changed during the training of the three-dimensional face sample, the changed parameters are also suitable for the target face sample, the encoder can search for a set of parameters to achieve the effect of simultaneously coding the three-dimensional face sample and the target face sample, and the set of parameters is used during the application of the model. Because the encoders of the first three-dimensional face sample and the second three-dimensional face sample learn common information of different individuals, namely network parameters of the hidden layer, and the decoders are responsible for reconstructing corresponding individuals in training and learning information (including different facial expressions) containing independent individuals, the first three-dimensional face sample realizes individual facial expression migration based on the second decoder in the application process.
Optionally, training a first preset number of first three-dimensional face samples and training a second preset number of second three-dimensional face samples are alternately performed. That is, during training, the first batch may be a three-dimensional face sample, and the second batch may be a target face sample. Or training the three-dimensional face samples in a disorganized sequence, or training n three-dimensional face samples first, then training n target face samples, and alternately training. This is not a limitation of the present application.
Fig. 5 shows an expression migration method using an expression migration model, where the expression migration model includes an encoder, a first decoder, and a second decoder, and is trained in advance by the above training method, and the expression migration method includes:
step 502: acquiring a first three-dimensional face of an expression to be migrated;
the first three-dimensional face can be obtained by performing three-dimensional face reconstruction on the two-dimensional image, for example, capturing a plurality of frames of face images of the same face from the target video, and performing three-dimensional face reconstruction to obtain a plurality of first three-dimensional faces of the expression to be migrated. And three-dimensional face reconstruction can be carried out on a plurality of two-dimensional images obtained by direct shooting to obtain the three-dimensional face.
Step 504: and performing expression migration on the first three-dimensional face based on the encoder and the second decoder to obtain a second three-dimensional face.
The encoder comprises a graph convolution neural network layer, a down-sampling layer and a full-connection layer, the graph convolution neural network layer and the down-sampling layer are sequentially arranged at intervals, the first decoder and the second decoder comprise the full-connection layer, an upper sampling layer and a convolutional neural network layer, and the upper sampling layer and the convolutional neural network layer are sequentially arranged at intervals. As shown in fig. 3, G represents a graph convolutional neural network, F represents a fully connected layer, the length of a box represents the number of vertices of a face mesh, except for the last layer, each layer of the graph convolutional neural network is followed by a downsampling operation, the graph convolutional neural network layer and the downsampling layer are sequentially arranged at intervals, each downsampling reduces the number of input vertices to 1/4, the last layer is a fully connected layer, and the output result is Z. Each downsampling can reduce the number of the top points to one sixth, one eighth and the like of the original number, and the number of the convolution network layers is unchanged. The number of the encoder layer and the decoder layer can be adjusted, and is not limited to 4 layers, and can be 6 layers, 8 layers and the like, which is not limited in the application.
The decoder consists of two parts, which respectively decode the corresponding 3D face mesh. The fully-connected layer is followed by an upsampling layer, followed by a atlas neural network. The image convolution neural network and the upsampling are performed alternately, the vertex is restored to be 4 times of the original vertex in each upsampling, and finally the vertex of the complete 3D face mesh is output. The first decoder comprises a full connection layer, an upper sampling layer and a convolutional neural network layer, wherein the upper sampling layer and the convolutional neural network layer are sequentially arranged at intervals. The original vertex information can be recovered during up-sampling, the up-sampling and the down-sampling are corresponding, the vertex is deleted during the down-sampling, the coordinates of the deleted points are recorded while the deletion is carried out, and the original vertex information can be recovered as much as possible according to the coordinates of the deleted points during the up-sampling.
As shown in FIG. 4, a is the original grid, QdAnd deleting some points in the downsampling process for downsampling, reserving coordinates of the deleted points, generating c after convolution, finding the nearest grid according to the coordinate projection of the deleted points during upsampling, and recovering, wherein the number of the final points is not changed compared with the original number.
Optionally, performing expression migration on the first three-dimensional face based on the encoder and the second decoder may be implemented by:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face into the encoder to obtain a first encoding vector;
inputting the first encoded vector to the second decoder.
As shown in fig. 6, the initial vertex information and the adjacency matrix are input into a graph convolution neural network layer of an encoder, a first coding vector is output through a downsampling layer and a full-connection layer, the first coding vector is input into a full-connection layer of a second decoder, a decoding vector is output through an upsampling layer and the convolution neural network layer, and a three-dimensional face after expression migration is obtained according to the decoding vector.
The expression migration model is trained through the training method, and in the training process, encoders of the first three-dimensional face sample and the second three-dimensional face sample learn common information of different individuals, namely network parameters of the hidden layer, and respective decoders are responsible for reconstructing corresponding individuals in the training process and learn information (including different facial expressions) containing independent individuals, so that the first three-dimensional face sample realizes individual facial expression migration based on the second decoder in the application process, the expression migration of the first three-dimensional face to the target three-dimensional face is realized, and the target three-dimensional face is a face reconstructed in the second decoder and obtained through training of a large number of second three-dimensional face samples in the training process.
In one embodiment, the expression migration method further includes:
and generating an animation based on the second three-dimensional face.
Inputting a plurality of first three-dimensional faces of the expression migration model, outputting a plurality of corresponding second three-dimensional faces through the expression migration model, and generating an animation based on the output second three-dimensional faces. For example, animation may be generated using the MPEG-4 based three-dimensional face animation principles, and the application is not limited thereto. The expressions of the first three-dimensional faces are transferred to the target three-dimensional faces to obtain a plurality of second three-dimensional faces, the second three-dimensional faces are output in an animation mode, the expressions transferred from the original video can be displayed in the animation mode, the animation formed by virtual characters with the expressions is realized, the expressions of the formed virtual characters are richer due to the fact that the face expressions can be transferred to the virtual characters to form the expressions of the virtual characters, and the problems that time and labor are wasted when the expressions of game characters are played in a traditional method are solved.
The following describes the training method of the expression migration model with reference to fig. 7. Fig. 7 shows a processing flow chart of a training method for an expression migration model provided in an embodiment of the present specification, which specifically includes the following steps:
step 702: acquiring a first three-dimensional face sample and a three-dimensional face sample of a virtual character in a game picture;
the three-dimensional face sample of the virtual character may be an expressionless face sample or an expressive face sample of the same character, for example, an created virtual character face sample with an expression or an expressive virtual character face sample obtained by other methods or the expression migration method of this embodiment.
Step 704: inputting the initial vertex information of the first three-dimensional face sample into an encoder of the expression migration model to obtain a first encoding vector;
referring to fig. 3, the encoder includes a convolutional neural network layer, a downsampling layer, and a full connection layer, and the convolutional neural network layer and the downsampling layer are sequentially disposed at intervals.
Step 706: inputting the first coding vector into a first decoder of the expression migration model to obtain a first decoding vector, and obtaining a loss value according to the first decoding vector and the initial vertex information;
the first decoder comprises a full connection layer, an upper sampling layer and a convolutional neural network layer, wherein the upper sampling layer and the convolutional neural network layer are sequentially arranged at intervals.
Step 708: adjusting coefficient vectors of convolutional neural network layers in the encoder and the first decoder according to the loss values;
and adjusting the Chebyshev coefficient vector theta according to the loss value, and calculating a primary loss function and adjusting theta after the three-dimensional face sample of each batch is input.
Step 710: inputting the initial vertex information of the three-dimensional face sample of the virtual character into the encoder to obtain a second encoding vector;
and then training the three-dimensional face sample of the virtual character, inputting the three-dimensional face sample into an encoder, and obtaining a second encoding vector through a graph convolution neural network layer, a down-sampling layer and a connecting layer.
Step 712: inputting a second coding vector into a second decoder of the migration model to obtain a second decoding vector, and obtaining a loss value according to the second decoding vector and the initial vertex information of the three-dimensional face sample of the virtual character;
the second decoder comprises a full connection layer, an upper sampling layer and a convolutional neural network layer, wherein the upper sampling layer and the convolutional neural network layer are sequentially arranged at intervals.
Step 714: and adjusting coefficient vectors of the convolutional neural network layers in the encoder and the second decoder according to the loss value until a training stopping condition is reached, and stopping the training process.
The training stop condition is that the loss functions of the first decoder and the second decoder both converge. In one embodiment, stopping the training process until the training stop condition is reached may include:
judging whether the loss value is smaller than a preset threshold value or not;
if not, continuing training;
if so, determining that the training stop condition is reached.
The preset threshold is a critical value of the loss value, when the loss value is greater than or equal to the preset threshold, it is indicated that a certain deviation still exists between the prediction result and the real result of the initial model, the parameters of the initial model still need to be adjusted, and the model continues to be trained; and under the condition that the loss value is smaller than the preset threshold value, the approximation degree of the predicted result and the real result of the initial model is enough, and the training can be stopped. The value of the preset threshold may be determined according to actual conditions, and the specification does not limit this.
The detailed parameters for each layer, which were specified in one embodiment, are shown in tables 1 and 2.
TABLE 1 encoder architecture
Layer(s) Input size Output size
Convolution with a bit line 3791×3 3791×16
Down sampling 3791×16 948×16
Convolution with a bit line 948×16 948×16
Down sampling 948×16 237×16
Convolution with a bit line 237×16 237×16
Down sampling 237×16 60×16
Convolution with a bit line 60×16 60×32
Down sampling 60×32 15×32
Full connection 15×32 8
TABLE 2 decoder architecture
Layer(s) Input size Output size
Full connection 8 15×32
Upsampling 15×32 60×32
Convolution with a bit line 60×32 60×32
Upsampling 60×32 237×32
Convolution with a bit line 237×32 237×16
Upsampling 237×16 948×16
Convolution with a bit line 948×16 948×16
Upsampling 948×16 3791×16
Convolution with a bit line 3791×16 3791×3
The input size and the output size of the tables 1 and 2 are good in experimental effect. But not as a limitation of the present application, the number of subsequent channels may be set by itself.
The following describes an application method of the expression migration model with reference to fig. 8. Fig. 8 shows a processing flow chart of a method for three-dimensional facial expression migration provided in an embodiment of the present specification, which specifically includes the following steps:
802: intercepting one frame of image from a video, and carrying out face reconstruction to obtain a three-dimensional face;
the face reconstruction is to generate a corresponding three-dimensional face from a two-dimensional face in a two-dimensional photo, and reconstruct each frame of the photo into a three-dimensional face.
804: inputting the three-dimensional face into an expression migration model, performing three-dimensional face expression migration, and migrating the expression to the three-dimensional face of the virtual character;
the expression migration model is obtained by training through a method shown in the attached figure 7, after the three-dimensional face is appointed to be encoded by an encoder and output a coding vector, the coding vector is input into a second decoder to be decoded to obtain a decoding vector, the expression migration of the three-dimensional face is realized, and the expression of the three-dimensional face obtained by the multi-frame image reconstruction of the video is migrated to the three-dimensional face of a virtual character, such as the expressionless face of the virtual character.
Step 806: and generating animation based on the three-dimensional face after the expression of the virtual character is transferred, and displaying the expression transferred from the original video.
Because the facial expressions can be transferred to the virtual roles to form the expressions of the virtual roles, the formed expressions of the virtual roles are richer, and the problem that the traditional method wastes time and labor when being used as the expressions of the game characters is solved.
Corresponding to the above method embodiment, an embodiment of a training device for an expression migration model is also provided in this specification, and fig. 9 shows a schematic structural diagram of the training device for an expression migration model provided in an embodiment of this specification. The expression migration model includes an encoder, a first decoder, and a second decoder, as shown in fig. 9, the apparatus includes:
a first obtaining module 902 configured to obtain a first three-dimensional face sample and a second three-dimensional face sample;
a training module 904 configured to train the first three-dimensional face sample based on the encoder and the first decoder, and train the second three-dimensional face sample based on the encoder and the second decoder;
a decision module 906 configured to decide whether a training stop condition is reached and, in case the training stop condition is reached, to stop the training process.
The above is a schematic scheme of the training apparatus for an expression migration model according to this embodiment. It should be noted that the technical solution of the training apparatus for expression migration models and the technical solution of the training method for expression migration models belong to the same concept, and details that are not described in detail in the technical solution of the training apparatus for expression migration models can be referred to the description of the technical solution of the training method for expression migration models.
Corresponding to the above method embodiment, the present specification further provides an expression migration apparatus using an expression migration model, and fig. 10 shows a schematic structural diagram of an expression migration apparatus provided in an embodiment of the present specification. The expression migration model includes an encoder, a first decoder and a second decoder and is trained in advance by the training method described in any one of the above, as shown in fig. 10, the apparatus includes:
a second obtaining module 1002, configured to obtain a first three-dimensional face of an expression to be migrated;
a migration module 1004 configured to perform expression migration on the first three-dimensional face based on the encoder and the second decoder, so as to obtain a second three-dimensional face.
The above is a schematic scheme of an expression migration apparatus according to this embodiment. It should be noted that the technical solution of the expression migration apparatus and the technical solution of the expression migration method using the expression migration model belong to the same concept, and details of the technical solution of the expression migration apparatus, which are not described in detail, can be referred to the description of the technical solution of the expression migration method using the expression migration model.
FIG. 11 illustrates a block diagram of a computing device 1100 provided in accordance with an embodiment of the present description. The components of the computing device 1100 include, but are not limited to, memory 1110 and a processor 1120. The processor 1120 is coupled to the memory 1110 via a bus 1130 and the database 1150 is used to store data.
The computing device 1100 also includes an access device 1140, the access device 1140 enabling the computing device 1100 to communicate via one or more networks 1160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 1140 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 1100, as well as other components not shown in FIG. 11, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 11 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1100 can be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1100 can also be a mobile or stationary server.
The processor 1120 is configured to execute computer-executable instructions, and the computer-executable instructions, when executed by the processor, implement the above-mentioned method for training expression migration models, or the above-mentioned operation steps of the expression migration method using expression migration models.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned expression migration model training method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned expression migration model training method.
An embodiment of the present specification further provides a computer-readable storage medium, which stores computer instructions, which when executed by a processor, are configured to implement the method for training an expression migration model described above, or the operation steps of the expression migration method using an expression migration model described above.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the training method of the expression migration model belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the training method of the expression migration model.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims (15)

1. A training method of an expression migration model is characterized in that the expression migration model comprises an encoder, a first decoder and a second decoder, and the training method comprises the following steps:
acquiring a first three-dimensional face sample and a second three-dimensional face sample;
training the first three-dimensional face sample based on the encoder and the first decoder, and training the second three-dimensional face sample based on the encoder and the second decoder;
it is determined whether a training stop condition is reached and, in the event that the training stop condition is reached, the training process is stopped.
2. The training method of claim 1, wherein training a first three-dimensional face sample based on the encoder and the first decoder comprises:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face sample into the encoder to obtain a first encoding vector;
inputting the first coding vector into the first decoder to obtain a first decoding vector, and obtaining a loss value according to the first decoding vector and the initial vertex information;
adjusting a coefficient vector of a network layer in the encoder and the first decoder according to the loss value.
3. Training method according to claim 1 or 2, wherein training the second three-dimensional face sample based on the encoder and the second decoder comprises:
inputting the initial vertex information and the adjacency matrix of the second three-dimensional face sample into the encoder to obtain a second encoding vector,
inputting the second coding vector into the second decoder to obtain a second decoding vector, and obtaining a loss value according to the second decoding vector and the initial vertex information of the second three-dimensional face sample;
adjusting a coefficient vector of a network layer in the encoder and the second decoder according to the loss value.
4. The training method of claim 1, wherein the encoder comprises a convolutional neural network layer, a downsampling layer and a fully-connected layer, the convolutional neural network layer and the downsampling layer are sequentially arranged at intervals, the first decoder and the second decoder comprise a fully-connected layer, an upsampling layer and a convolutional neural network layer, and the upsampling layer and the convolutional neural network layer are sequentially arranged at intervals.
5. A training method as claimed in claim 3, characterized in that training a first preset number of first three-dimensional face samples and training a second preset number of second three-dimensional face samples are performed alternately.
6. The training method of claim 1, wherein obtaining a first three-dimensional face sample comprises:
the method comprises the steps of obtaining a plurality of face images, and carrying out face reconstruction on the face images to obtain a first three-dimensional face sample.
7. The training method of claim 6, wherein obtaining a first three-dimensional face sample further comprises:
and after face reconstruction is carried out on the plurality of face images, carrying out spatial alignment on the reconstructed three-dimensional face and the second three-dimensional face sample.
8. An expression migration method using an expression migration model, wherein the expression migration model includes an encoder, a first decoder, and a second decoder, and is pre-trained by the training method of any one of claims 1 to 7, the expression migration method comprising:
acquiring a first three-dimensional face of an expression to be migrated;
and performing expression migration on the first three-dimensional face based on the encoder and the second decoder to obtain a second three-dimensional face.
9. The expression migration method according to claim 8, wherein performing expression migration on the first three-dimensional face based on the encoder and the second decoder comprises:
inputting the initial vertex information and the adjacency matrix of the first three-dimensional face into the encoder to obtain a first encoding vector;
inputting the first encoded vector to the second decoder.
10. The expression migration method according to claim 8 or 9, wherein the obtaining of the first three-dimensional face of the expression to be migrated includes:
and intercepting multi-frame face images of the same face from a target video, and carrying out face reconstruction on the multi-frame face images to obtain a plurality of first three-dimensional faces of expressions to be migrated.
11. The expression migration method according to claim 10, further comprising:
and generating an animation based on the second three-dimensional face.
12. An apparatus for training an expression migration model, wherein the expression migration model includes an encoder, a first decoder, and a second decoder, the apparatus comprising:
a first obtaining module configured to obtain a first three-dimensional face sample and a second three-dimensional face sample;
a training module configured to train the first three-dimensional face sample based on the encoder and the first decoder, and train the second three-dimensional face sample based on the encoder and the second decoder;
a judging module configured to judge whether a training stop condition is reached and, in case the training stop condition is reached, stop the training process.
13. An expression migration apparatus using an expression migration model, wherein the expression migration model includes an encoder, a first decoder, and a second decoder, and is pre-trained by the training method according to any one of claims 1 to 7, the expression migration apparatus comprising:
the second acquisition module is configured to acquire a first three-dimensional face of the expression to be migrated;
and the migration module is configured to perform expression migration on the first three-dimensional face based on the encoder and the second decoder to obtain a second three-dimensional face.
14. A computing device, comprising:
a memory and a processor;
the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions to implement the method for training the expression migration model according to any one of claims 1 to 7 or the operation steps of the expression migration method using the expression migration model according to any one of claims 8 to 11.
15. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the method for training an expression migration model according to any one of claims 1 to 7 or the operation steps of the expression migration method using an expression migration model according to any one of claims 8 to 11.
CN202110560292.4A 2021-05-21 2021-05-21 Expression migration model training method and device and expression migration method and device Pending CN113205449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110560292.4A CN113205449A (en) 2021-05-21 2021-05-21 Expression migration model training method and device and expression migration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110560292.4A CN113205449A (en) 2021-05-21 2021-05-21 Expression migration model training method and device and expression migration method and device

Publications (1)

Publication Number Publication Date
CN113205449A true CN113205449A (en) 2021-08-03

Family

ID=77023024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110560292.4A Pending CN113205449A (en) 2021-05-21 2021-05-21 Expression migration model training method and device and expression migration method and device

Country Status (1)

Country Link
CN (1) CN113205449A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705368A (en) * 2021-08-09 2021-11-26 上海幻电信息科技有限公司 Facial expression migration method and device and computer equipment
CN113762147A (en) * 2021-09-06 2021-12-07 网易(杭州)网络有限公司 Facial expression migration method and device, electronic equipment and storage medium
CN113781616A (en) * 2021-11-08 2021-12-10 江苏原力数字科技股份有限公司 Facial animation binding acceleration method based on neural network
CN115601485A (en) * 2022-12-15 2023-01-13 阿里巴巴(中国)有限公司(Cn) Data processing method of task processing model and virtual character animation generation method
WO2023185395A1 (en) * 2022-03-30 2023-10-05 北京字跳网络技术有限公司 Facial expression capturing method and apparatus, computer device, and storage medium
CN117540789A (en) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 Model training method, facial expression migration method, device, equipment and medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376692A (en) * 2018-11-22 2019-02-22 河海大学常州校区 Migration convolution neural network method towards facial expression recognition
CN111243066A (en) * 2020-01-09 2020-06-05 浙江大学 Facial expression migration method based on self-supervision learning and confrontation generation mechanism
CN111401216A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Image processing method, model training method, image processing device, model training device, computer equipment and storage medium
CN111488972A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Data migration method and device, electronic equipment and storage medium
CN111553267A (en) * 2020-04-27 2020-08-18 腾讯科技(深圳)有限公司 Image processing method, image processing model training method and device
CN111652121A (en) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 Training method of expression migration model, and expression migration method and device
CN111767744A (en) * 2020-07-06 2020-10-13 北京猿力未来科技有限公司 Training method and device for text style migration system
CN111767842A (en) * 2020-06-29 2020-10-13 杭州电子科技大学 Micro-expression type distinguishing method based on transfer learning and self-encoder data enhancement
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
WO2020258668A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Facial image generation method and apparatus based on adversarial network model, and nonvolatile readable storage medium and computer device
CN112233012A (en) * 2020-08-10 2021-01-15 上海交通大学 Face generation system and method
CN112348739A (en) * 2020-11-27 2021-02-09 广州博冠信息科技有限公司 Image processing method, device, equipment and storage medium
CN112541958A (en) * 2020-12-21 2021-03-23 清华大学 Parametric modeling method and device for three-dimensional face
CN112633425A (en) * 2021-03-11 2021-04-09 腾讯科技(深圳)有限公司 Image classification method and device
CN112767519A (en) * 2020-12-30 2021-05-07 电子科技大学 Controllable expression generation method combined with style migration
CN113610989A (en) * 2021-08-04 2021-11-05 北京百度网讯科技有限公司 Method and device for training style migration model and method and device for style migration
CN114283051A (en) * 2021-12-09 2022-04-05 湖南大学 Face image processing method and device, computer equipment and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376692A (en) * 2018-11-22 2019-02-22 河海大学常州校区 Migration convolution neural network method towards facial expression recognition
WO2020258668A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Facial image generation method and apparatus based on adversarial network model, and nonvolatile readable storage medium and computer device
CN111243066A (en) * 2020-01-09 2020-06-05 浙江大学 Facial expression migration method based on self-supervision learning and confrontation generation mechanism
CN111401216A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Image processing method, model training method, image processing device, model training device, computer equipment and storage medium
CN111488972A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Data migration method and device, electronic equipment and storage medium
CN111553267A (en) * 2020-04-27 2020-08-18 腾讯科技(深圳)有限公司 Image processing method, image processing model training method and device
CN111652121A (en) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 Training method of expression migration model, and expression migration method and device
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
CN111767842A (en) * 2020-06-29 2020-10-13 杭州电子科技大学 Micro-expression type distinguishing method based on transfer learning and self-encoder data enhancement
CN111767744A (en) * 2020-07-06 2020-10-13 北京猿力未来科技有限公司 Training method and device for text style migration system
CN112233012A (en) * 2020-08-10 2021-01-15 上海交通大学 Face generation system and method
CN112348739A (en) * 2020-11-27 2021-02-09 广州博冠信息科技有限公司 Image processing method, device, equipment and storage medium
CN112541958A (en) * 2020-12-21 2021-03-23 清华大学 Parametric modeling method and device for three-dimensional face
CN112767519A (en) * 2020-12-30 2021-05-07 电子科技大学 Controllable expression generation method combined with style migration
CN112633425A (en) * 2021-03-11 2021-04-09 腾讯科技(深圳)有限公司 Image classification method and device
CN113610989A (en) * 2021-08-04 2021-11-05 北京百度网讯科技有限公司 Method and device for training style migration model and method and device for style migration
CN114283051A (en) * 2021-12-09 2022-04-05 湖南大学 Face image processing method and device, computer equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
XUE F: "Transfer: Learning relation-aware facial expression representations with transformers", 《PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION》, 31 December 2021 (2021-12-31), pages 3601 - 3610 *
中国电子学会编著: "《2020-2021智能科学与技术学科发展报告》", 30 November 2022, 中国科学技术出版社, pages: 90 - 92 *
刘伦豪杰;王晨辉;卢慧;王家豪;: "基于迁移卷积神经网络的人脸表情识别", 电脑知识与技术, no. 07, 5 March 2019 (2019-03-05), pages 1 *
张江宁: "基于深度学习的条件式视觉内容生成研究及应用", 《中国博士学位论文全文数据库信息科技辑》, no. 02, 15 February 2023 (2023-02-15), pages 138 - 217 *
陈军波;刘蓉;刘明;冯杨;: "基于条件生成式对抗网络的面部表情迁移模型", 计算机工程, no. 04, 15 April 2020 (2020-04-15), pages 1 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705368A (en) * 2021-08-09 2021-11-26 上海幻电信息科技有限公司 Facial expression migration method and device and computer equipment
CN113762147A (en) * 2021-09-06 2021-12-07 网易(杭州)网络有限公司 Facial expression migration method and device, electronic equipment and storage medium
CN113781616A (en) * 2021-11-08 2021-12-10 江苏原力数字科技股份有限公司 Facial animation binding acceleration method based on neural network
WO2023185395A1 (en) * 2022-03-30 2023-10-05 北京字跳网络技术有限公司 Facial expression capturing method and apparatus, computer device, and storage medium
CN115601485A (en) * 2022-12-15 2023-01-13 阿里巴巴(中国)有限公司(Cn) Data processing method of task processing model and virtual character animation generation method
CN117540789A (en) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 Model training method, facial expression migration method, device, equipment and medium
CN117540789B (en) * 2024-01-09 2024-04-26 腾讯科技(深圳)有限公司 Model training method, facial expression migration method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN113205449A (en) Expression migration model training method and device and expression migration method and device
US10593021B1 (en) Motion deblurring using neural network architectures
CN111091045B (en) Sign language identification method based on space-time attention mechanism
EP3678059B1 (en) Image processing method, image processing apparatus, and a neural network training method
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
Chen et al. The face image super-resolution algorithm based on combined representation learning
CN113822969B (en) Training neural radiation field model, face generation method, device and server
Sun et al. Learning image compressed sensing with sub-pixel convolutional generative adversarial network
CN110111256B (en) Image super-resolution reconstruction method based on residual distillation network
CN113177882B (en) Single-frame image super-resolution processing method based on diffusion model
Tuzel et al. Global-local face upsampling network
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
KR102602112B1 (en) Data processing method, device, and medium for generating facial images
CN113762147B (en) Facial expression migration method and device, electronic equipment and storage medium
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
US20230154111A1 (en) Method and apparatus for three-dimensional reconstruction of a human head for rendering a human image
WO2022156621A1 (en) Artificial intelligence-based image coloring method and apparatus, electronic device, computer readable storage medium, and computer program product
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
CN116664397B (en) TransSR-Net structured image super-resolution reconstruction method
CN113392791A (en) Skin prediction processing method, device, equipment and storage medium
WO2022166840A1 (en) Face attribute editing model training method, face attribute editing method and device
CN114581918A (en) Text recognition model training method and device
Gao et al. Tetgan: A convolutional neural network for tetrahedral mesh generation
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN113128455A (en) Cell image reconstruction model training method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 519000 Room 102, 202, 302 and 402, No. 325, Qiandao Ring Road, Tangjiawan Town, high tech Zone, Zhuhai City, Guangdong Province, Room 102 and 202, No. 327 and Room 302, No. 329

Applicant after: Zhuhai Jinshan Digital Network Technology Co.,Ltd.

Address before: 519000 Room 102, 202, 302 and 402, No. 325, Qiandao Ring Road, Tangjiawan Town, high tech Zone, Zhuhai City, Guangdong Province, Room 102 and 202, No. 327 and Room 302, No. 329

Applicant before: ZHUHAI KINGSOFT ONLINE GAME TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information