CN114241553A

CN114241553A - Method for migrating facial expressions to faces of virtual characters

Info

Publication number: CN114241553A
Application number: CN202111477680.2A
Authority: CN
Inventors: 邵天甲; 周昆
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-25

Abstract

The invention discloses a method for transferring facial expressions to faces of virtual characters, which requires a user to provide a few grid pairs with human expressions matched with the expressions of the virtual characters, and the grid pairs and a large number of automatically generated grids with random expressions are used as training data of a neural network, and the neural network learns to obtain mapping for matching any human grid with the expressions of the faces of the grids of the virtual characters; the method provided by the invention has the advantages that the workload of the expression migration workflow is obviously reduced, the number of the expression-matched grid pairs required to be created by a user is no more than dozens of pairs, the expression of any face grid can be migrated to the grid of a virtual character in real time after the neural network training is completed, and the migration result can be consistent with the face expression in sense.

Description

Method for migrating facial expressions to faces of virtual characters

Technical Field

The invention relates to the field of three-dimensional geometric modeling, in particular to a method for migrating facial expressions to faces of virtual characters.

Background

The expression migration technology is widely applied to the fields of live broadcast industry, movie industry, social software, VR application and the like. In the past, expression migration is realized by adopting a blendshape deformation animation technology, and the method requires an artist to manually create a large number of virtual character expression grids corresponding to basic facial expressions, and has high requirements on manual workload, professional skills and experience. The method based on the geometric features calculates the deformation features of the human face expression relative to the expressionless grids, applies the same deformation to the expressionless grids of the virtual character to perform expression migration, but the method is only suitable for grids with approximate shapes, the facial features of the virtual character are often different from the human face, for example, a cartoon-style virtual character has larger eyes, and the virtual character cannot completely close the eyes in the migration result of the method. The recent expression migration method is a method based on a machine learning technique, and an unsupervised method does not require manual labor, but uses geometric features as a priori, so that there are similar problems to the method based on the geometric features; the supervised method adopts a large amount of manual labels as a priori, so that although the requirement on the specialty is reduced, certain labor is still required, and the result of high-quality table according with the requirement of industrial production is difficult to obtain.

Disclosure of Invention

The invention aims to provide a method for migrating facial expressions to the faces of virtual characters aiming at the defects of the prior art. The method utilizes a deep learning method to calculate the position of each expression grid in a low-dimensional space and a Laplacian operator, aligns the facial expression manifold of the human face and the virtual character in the low-dimensional space according to a small number of expression matching grid pairs provided by a user, and finds the mapping from the human face to the facial expression matching of the virtual character. The method can obviously reduce the manual workload required by expression migration, and has high practical value.

The purpose of the invention is realized by the following technical scheme:

a method of migrating a facial expression to the face of a virtual character, comprising the steps of:

(1) preparing a training data set: the training data set comprises face grid data, virtual role grid data and expression matching grid pairs provided by a user; the face grid data is obtained by randomly sampling expression coefficients of a public face data set; the virtual character grid data is obtained by randomly applying reasonable deformation processing to the virtual character; the expression matching grid pair comprises a face grid with an expression and a virtual character grid with the same expression as the face;

(2) respectively training a variational self-encoder network by using the face grid data and the virtual character grid data, encoding each grid by using the trained variational self-encoder network to obtain a low-dimensional implicit vector corresponding to each grid, and finding out K nearest neighbor grids of each grid by taking the Euclidean distance of the low-dimensional implicit vector as a measurement standard;

(3) adopting the training data set to jointly train a new variational self-encoder network corresponding to the face and a new variational self-encoder network corresponding to the virtual character, embedding the training data set into a low-dimensional space in the training process, and enabling the manifold of the face grid data in the low-dimensional space to be aligned with the manifold of the virtual character according to the position of the expression matching grid pair in the low-dimensional space and a Laplacian operator; the Laplace operator is the average value of the characteristics of the grid by subtracting the characteristics of K adjacent grids from the characteristics of the grid;

(4) adopting the training data set to train and circulate to generate an confrontation network, and learning to obtain the mapping of the face grid corresponding to the facial expression of the virtual character grid;

(5) inputting the face mesh of the expression to be migrated into a coder of a variational self-coder network corresponding to the trained face to obtain a corresponding low-dimensional implicit vector, then inputting the low-dimensional implicit vector into a trained cyclic generation confrontation network to obtain a low-dimensional implicit vector of the virtual character, and then inputting the low-dimensional implicit vector of the virtual character into a decoder of the variational self-coder network corresponding to the trained virtual character to obtain the virtual character of the migration expression.

Further, in the step (3), in the process of aligning the face mesh data with the manifold of the virtual character, it is restricted that the position of the face mesh data in the expression matching mesh pair corresponds to the position of the virtual character mesh, and the laplacian of the face mesh data in the expression matching mesh pair corresponds to the laplacian of the virtual character mesh.

Further, in the step (4), when the confrontation network is generated in the training cycle, a face mesh and a virtual character mesh are randomly selected from the training data set, and the selected face mesh and virtual character mesh and the respective K neighbor meshes constitute the training data of the batch.

Further, in the step (4), when the countermeasure network is generated in a training cycle, the parameters of the two variational self-encoder networks trained in the step (3) are fixed, the outputs of the encoders of the two variational self-encoder networks after training are used as the inputs of the countermeasure network generated in a cycle, meanwhile, the face mesh of the countermeasure network generated in an input cycle is constrained to be in one-to-one correspondence with the positions and laplacian operators of the migrated virtual role meshes corresponding to the face mesh, and the virtual role mesh of the countermeasure network generated in an input cycle is constrained to be in one-to-one correspondence with the positions and laplacian operators of the migrated face mesh corresponding to the virtual role meshes.

The invention has the beneficial effects that:

the invention combines the deep learning technology, and the trained neural network can achieve the real-time effect of 30 frames per second in practical application, so that the invention can be widely applied to scenes with high real-time requirements. In the final obtained result, even if the appearance characteristics of the face are different from the face of the virtual character, the expression of the virtual character grid obtained by migration can be consistent with the input face grid expression in human senses by combining the similarity of the whole expression manifold and the local characteristics of the expression matching grid pair provided by the user, so that the problems that the eyes of the large-eye virtual character cannot be closed in the traditional method based on geometric characteristics and the like are solved.

Drawings

FIG. 1 is an algorithmic architecture diagram of the present invention; the left graph is a schematic diagram of an expression embedding training process of sensing Laplace operators; the right diagram is a schematic diagram of an expression migration training process for sensing Laplace operators;

FIG. 2 is a diagram of the results of expression migration according to the real data of the present invention; from left to right in sequence: the method comprises the steps of obtaining a real face image, combining a face expression grid generated by an existing expression tracker and a face grid of a virtual character memory obtained after network migration.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will become more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

The core technology of the invention is to use a Laplace perception variational self-encoder network and a cyclic confrontation generation network to find expression corresponding mapping from human face to virtual character face according with human sense according to an expression matching grid pair initially provided by a user.

The invention comprises the following steps:

1. preparing a training data set: the data required for training the neural network comprises a large amount of face grid data with random expressions, a large amount of virtual character grid data with random expressions and expression matching grid pairs provided by a user.

1.1. Preparing a large amount of face grid data with random expressions

According to the disclosed face data set, random sampling of expression coefficients is carried out on the face data set, and a large amount of face grid data with random expressions can be automatically generated. The random sampling process of the expression coefficients can directly utilize a random number generator, can also utilize an open facial image data set, and is combined with the prior expression tracker to capture the expression coefficients in the images for sampling.

1.2. Preparing a large amount of virtual character grid data with random expressions

By using 3D modeling tool software such as Autodesk Maya and the like, a plurality of groups of 3D deformations can be created for the virtual character grid to simulate the expression of the virtual character. The weight coefficients of the 3D deformations are randomly sampled in a reasonable range, and a large number of virtual character grids with random expressions can be automatically generated.

1.3. Preparing user-provided emoticon matching grid pairs

The user can manually create 9 a face mesh with an expression and a virtual character mesh with the same expression as the face using 3D modeling tool software such as Autodesk Maya.

2. Finding out K nearest neighbor grids of each grid in the data set, and the method comprises the following steps:

2.1. embedding face mesh and virtual character mesh in low-dimensional space by using variational self-encoder network

In order to facilitate the neural network to learn the deformation of the mesh, each mesh in the training dataset in step 1 is first converted into a DR feature (l.gao, y. -k.lai, j.yang, l. -x.zhang, l.kobbelt, and s.xia.spark data drive mesh definition. arXiv prediction arXiv: 1709.01250, 2017.) representation. And (3) respectively learning and coding the face grids and the virtual character grids in the training data set by using two variational self-coder networks, and recording a low-dimensional implicit vector obtained after each grid is coded:

l_A＝Enc(M_A)

wherein M is_ARepresenting the DR characteristics of trellis A, Enc (-) representing the encoder portion of a variational autoencoder network, l_ARepresenting the low-dimensional implicit vector obtained after the grid is coded.

2.2. Finding K neighbors for a grid

After the low-dimensional implicit vector expression of the grids in step 2.1 is obtained, K Nearest neighbor grids of each grid are found by using a KNN (K Nearest neighbor) algorithm with Euclidean distance between implicit vectors as a measurement standard, and a set formed by the K Nearest neighbor grids of the grid A is recorded as N (A).

3. Expression embedding process of perception Laplace operator: and jointly training a new variational self-encoder network corresponding to the face grid and a new variational self-encoder network corresponding to the virtual character by adopting the training data set, so that the manifold of the face grid data in a low-dimensional space is aligned with the manifold of the virtual character.

3.1. Embedding expression grids into low-dimensional manifold

As with step 2.1 training the default variational autoencoder network, two new variational autoencoder networks are trained using a small batch gradient descent method, letting them learn the training data set:

l_A＝Enc(M_A)，M′_A＝Dec(l_A)

wherein M is_A、Enc(·)、l_ASee step 2.1, Dec (-) denotes the decoder part, M 'of the variational self-encoder network'_ADecoder for representing variational autocoder based on l_AThe resulting DR signature is restored. The loss terms applied to the network training in this step are:

L^rec＝||M′_A-M_A||₁

wherein L is^recRepresents the reconstruction loss, L^kldRepresenting the KL divergence loss, KL (-) representing the KL divergence calculation function,

represents a standard normal distribution, Q (l)_A| A) represents l_APosterior distribution with respect to a.

3.2. According to the expression matching grid pair, the manifold of the face grid data in the low-dimensional space is aligned with the manifold of the virtual character

In each step of training the network, randomly selecting a pair of expression matching grids from the expression matching grid pair (A)ⁱ，Bⁱ) Find their K neighbor grids separately to compute the Laplace operator, where AⁱIs a face mesh in an expression matching mesh pair, BⁱIs a virtual character grid. The grids and K neighbors of the grids are used as small batch of training data, and the variational self-coder network learns the grids while the constraint network makes AⁱThe position and Laplace operator in the low-dimensional manifold are respectively equal to BⁱThe position in the low-dimensional manifold is consistent with the Laplace operator, i.e. the loss term applied to the network training is divided by L^rec、L^kldBesides, the method also comprises the following steps:

wherein L is^anchorRepresenting a positional constraint on the case matching grid pair in a low-dimensional manifold, L^{anchor-Laplacian}Representing a laplacian constraint on the case matching grid pair in the low-dimensional manifold,

the Laplace operator is expressed, and the calculation mode is as follows:

3.3. variational self-encoder network for alternatively training perceptual Laplacian

As described with reference to steps 2.1 and 2.2, two new variational self-coder networks are trained alternately.

4. Expression migration process of perception Laplace operator: and (3) according to the two low-dimensional manifolds obtained by encoding in the step (3), adopting the training data set to train and circulate to generate a confrontation network, and learning to obtain a mapping of the face corresponding to the grid facial expression of the virtual character.

4.1. Cyclic generation countermeasure network for training perceptual Laplacian

Because the human face is roughly aligned with the manifold of the face grid of the virtual character in the step 3, the grid with the matched expression has similar positions and Laplacian operators on the two manifolds, and the difficulty of finding the mapping with the matched expression is reduced. And (3) fixing the parameters of the two variational self-encoder networks trained in the step (3), taking the output of the encoders of the two variational self-encoder networks trained as the input of a loop generation countermeasure network, and finding the mapping of the face mesh matched with the virtual character mesh expression by the network so as to finish the expression migration work.

When training the network, taking the example of migrating from the face to the virtual role, randomly selecting a pair of grids (A, B) and K neighbors thereof as small batch of training data each time, wherein the loss items of the constraint cycle generation countermeasure network during training are as follows:

L^cycle＝||l_A-T(S(l_A))||₂+||l_B-S(T(l_B))||₂

L^corr＝||l_A-S(l_A)||₂+||l_B-T(l_B)||₂

wherein S is a generator network for migrating from the face mesh to the virtual character mesh, T is a generator network for migrating from the virtual character mesh to the face mesh, Dis_humanAnd Dis_avatarA discriminator network for judging the authenticity of the face mesh and the authenticity of the virtual character mesh,

and

to combat the loss term, L^cycleFor cyclic consistency constraints, L^corrFor positional constraints of the grid in the manifold before and after migration, L^LaplacianThe laplace constraint of the mesh in the manifold before and after migration.

To pair

There are also similar definitions.

4.2 expression migration Using trained network

According to the variational self-encoder network and the cyclic generation countermeasure network of the perception Laplacian trained in the step 3 and the step 4.1, expression migration work can be carried out: for any face mesh A, the network outputs DR characteristics of the virtual character mesh after migration

According to

Rebuilding the virtual character grid to obtain a virtual character grid B with the same expression as A_outputAs a result of the expression migration work.

Examples of the embodiments

The inventor realizes the implementation example of the invention on a desktop computer which is provided with an AMD R7-3800X central processing unit, an Nvidia GeForce GTX 1660SUPER display card and a 16GB memory. The inventor uses common data sets Facewarehouse and FaceForensics + + to obtain a face grid with random expressions, and uses virtual roles published by the network in a public way

The virtual role grid with random expressions is generated, and 9 grid pairs matched with the expressions are artificially created by using Autodesk Maya to test the experimental result of the invention. The results show that, in the current hardware configuration, the present invention can migrate facial expressions to the face of a virtual character in real time at a rate of 30 frames per second. In the final migration result, the expression of the virtual character is consistent with the input facial expression in human sense, and the difficulty (such as the fact that the large-eye virtual character can not close the eyes) encountered by the traditional method is also solved in the method. Migration work relative to traditional expressionsIn the process, a user needs to manually create at least 40 pairs of face and virtual character grids with the same expression, and the time consumption for performing expression migration on each virtual character is usually more than 3 hours.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and although the invention has been described in detail with reference to the foregoing examples, it will be apparent to those skilled in the art that various changes in the form and details of the embodiments may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims

1. A method for migrating a facial expression to the face of a virtual character, comprising the steps of:

2. The method of claim 1, wherein in the step (3), during the process of aligning the face mesh data with the manifold of the virtual character, the positions of the face mesh data in the expression matching mesh pair are constrained to correspond to the positions of the virtual character mesh, and the laplacian of the face mesh data in the expression matching mesh pair corresponds to the laplacian of the virtual character mesh.

3. The method for migrating facial expressions to faces of virtual characters as claimed in claim 1, wherein in step (4), when the confrontation network is generated in the training cycle, a face mesh and a virtual character mesh are randomly selected from the training data set, and the selected face mesh and virtual character mesh and the respective K neighbor meshes are combined into the batch of training data.

4. The method according to claim 1, wherein in the step (4), when the confrontation network is generated in the training loop, the parameters of the two variational self-encoder networks trained in the step (3) are fixed, the outputs of the encoders of the two variational self-encoder networks trained in the step (3) are used as the inputs of the confrontation network generated in the training loop, and meanwhile, the face mesh of the confrontation network generated in the input constraint loop is in one-to-one correspondence with the laplacian operator and the position of the moved face mesh corresponding to the face mesh of the confrontation network.