CN112700524B - 3D character facial expression animation real-time generation method based on deep learning - Google Patents

3D character facial expression animation real-time generation method based on deep learning Download PDF

Info

Publication number
CN112700524B
CN112700524B CN202110316439.5A CN202110316439A CN112700524B CN 112700524 B CN112700524 B CN 112700524B CN 202110316439 A CN202110316439 A CN 202110316439A CN 112700524 B CN112700524 B CN 112700524B
Authority
CN
China
Prior art keywords
animation
pictures
picture
decoder
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110316439.5A
Other languages
Chinese (zh)
Other versions
CN112700524A (en
Inventor
赵锐
侯志迎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yuanli Digital Technology Co ltd
Original Assignee
Jiangsu Yuanli Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Yuanli Digital Technology Co ltd filed Critical Jiangsu Yuanli Digital Technology Co ltd
Priority to CN202110316439.5A priority Critical patent/CN112700524B/en
Publication of CN112700524A publication Critical patent/CN112700524A/en
Application granted granted Critical
Publication of CN112700524B publication Critical patent/CN112700524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a 3D character facial expression animation real-time generation method based on deep learning, which comprises the following steps: acquiring training data, and performing enhancement processing on the acquired training data; building a generation model, wherein the generation model comprises 1 encoder and 3 decoders, the encoder is used for encoding picture data of training data into a hidden space, and the 3 decoders are used for decoding the data of the hidden space into facial action pictures of actors, screen shot pictures of animation files and values of controllers corresponding to the screen shot pictures of the animation files; training the built generation model to obtain the optimal weight values of the encoder and the decoder to obtain the optimal model; inputting the pictures of actors into a trained generation model, coding the pictures to a hidden space by a coder, and decoding data in the hidden space by a corresponding decoder to obtain the value of a corresponding controller; the values of the controller are input into animation software to generate facial movements of the model.

Description

3D character facial expression animation real-time generation method based on deep learning
Technical Field
The invention relates to the technical field of animation production, in particular to a 3D character facial expression animation real-time generation method based on deep learning.
Background
At present, in the market, the facial animation of virtual characters is driven in real time by using the facial expression in videos, and a method based on human face key point detection in computer vision is mainly adopted, and the method has the following defects:
1. the generalization is poor, if a driving mode with higher accuracy is needed, data needs to be marked again when the actors are changed,
2. if data are not marked, only the character model with lower precision can be driven.
The above disadvantages determine that the method cannot meet the requirement of a 3D animation film production flow with high precision requirement (model point is 2-3 ten thousand). Currently, no mature solution is available in the market that can directly generate 3D high-precision angular color animations from actor facial performances.
Disclosure of Invention
The invention aims to provide a 3D character facial expression animation real-time generation method based on deep learning, which reduces early preparation work, has wide application range and can generate animation in real time.
The invention provides the following technical scheme:
A3D character facial expression animation real-time generation method based on deep learning comprises the following steps:
s1, acquiring training data, and performing enhancement processing on the acquired training data, wherein the training data comprises animation files of the model and values of corresponding controllers, facial motion pictures of actors, screen shot pictures of the animation files, and values of the controllers corresponding to the screen shot pictures of the animation files;
s2, building a generation model, wherein the generation model comprises 1 encoder and 3 decoders, the encoder is used for encoding the picture data of the training data into an implied space, and the 3 decoders are used for decoding the data of the implied space into facial action pictures of actors, screen shot pictures of animation files and controller values corresponding to the screen shot pictures of the animation files;
s3, training the built generation model to obtain the optimal weight values of the encoder and the decoder to obtain the optimal model;
s4, inputting the pictures of actors into the trained generation model, coding the pictures into the hidden space by the coder, and decoding the data in the hidden space by the corresponding decoder to obtain the corresponding controller value;
s5, the controller value is input to animation software to generate the facial movement of the model.
Preferably, the method of the training data enhancement processing in step S1 is to change the brightness of the actor' S facial motion picture and the captured picture of the animation file randomly, and perform data enhancement through rotation, displacement, noise addition and simulated illumination change.
Preferably, the facial motion picture of the actor and the screen shot picture of the animation file share the same code.
Preferably, in the training of the generated model in step S3, the training of the encoder, the decoder for generating the facial motion picture of the actor, and the decoder for generating the screen shot picture of the animation file is a process of outputting the restoration input.
Preferably, the training method of the generative model is as follows:
q1, inputting the actor facial motion picture and the animation file screen picture into an encoder, outputting a corresponding picture through a decoder for generating the actor facial motion picture and a decoder for generating the animation file screen picture, wherein the output picture is the input picture in the process, calculating a loss function value between the input picture and the output picture through a structural similarity index at the moment, and updating the weight of the encoder and the corresponding decoder according to the loss function value;
q2, the third decoder outputs the controller value corresponding to the screen picture, the loss function is obtained by averaging the absolute value of the difference value of each controller value, and the weight of the corresponding decoder is updated according to the loss function value.
Preferably, the animation software of step S5 comprises maya or UE.
Preferably, the encoder and decoder employ convolutional neural networks.
The invention has the beneficial effects that: according to the invention, through modeling, the facial actions of the corresponding animation models are generated according to the facial videos and photos of the actors, and paired data corresponding to the video pictures of the actors and the animation files of the corresponding roles are not needed, so that the early-stage data preparation work is greatly reduced; the actors can be changed at will without data annotation work; real-time estimation can be carried out, namely, the facial videos of actors can be acquired in real time and calculated as the facial movements of the animation model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic block diagram of the present invention.
Detailed Description
As shown in fig. 1, a method for generating facial expression animation of a 3D character based on deep learning in real time includes the following steps:
s1, acquiring training data, and performing enhancement processing on the acquired training data, wherein the training data comprises animation files of the model and values of the corresponding controllers, facial motion pictures of actors, screen shot pictures of the animation files, and values of the controllers corresponding to the screen shot pictures of the animation files;
s2, building a generation model, wherein the generation model comprises 1 encoder and 3 decoders, the encoder is used for encoding the picture data of the training data into a hidden space, and the 3 decoders are used for decoding the data of the hidden space into facial action pictures of actors, screen-shot pictures of animation files and controller values corresponding to the screen-shot pictures of the animation files;
s3, training the built generation model to obtain the optimal weight values of the encoder and the decoder to obtain the optimal model;
s4, inputting the pictures of actors into the trained generation model, coding the pictures into the hidden space by the coder, and decoding the data in the hidden space by the corresponding decoder to obtain the corresponding controller value;
s5, the controller value is input to animation software to generate the facial movement of the model.
The first embodiment is as follows:
acquiring training data, comprising:
A. acquiring an animation file of the model and a corresponding value of a controller, wherein the controller is a group of devices capable of controlling the facial action of the animation model, and can be quantized into a group of values, and each group of values can correspond to the facial action of the animation model one by one;
B. acquiring a facial action video of an actor at one end;
C. the animation file is aligned to the face of the model to perform screen shooting operation, and each screen shooting picture and the corresponding value of the controller are obtained;
D. the brightness of facial action pictures of actors and screen-shot pictures of animation files is changed randomly, data enhancement is carried out through rotation, displacement, noise addition and illumination change simulation, and the robustness of the system is improved.
Building a generating model, which is a training process through reconstructing an input neural network, wherein a hidden layered vector of the generating model has a dimension reduction effect, the generating model comprises 1 coder and 3 decoders, and the coder is used for coding picture data of training data into an implicit space and contains the meaning of the input data; the 3 decoders are configured to decode the data in the hidden space into a facial motion picture of an actor (hereinafter referred to as "encoder a"), a screenshot picture of an animation file (hereinafter referred to as "encoder B"), and a value of a controller corresponding to the screenshot picture of the animation file (hereinafter referred to as "encoder C"); the input data will be reconstructed by "implicit space". The final generated model through the training of the neural network will result in a "hidden space" in the hidden layer representing the input data. It can help data classification, visualization, storage. The model is an unsupervised learning mode actually, only data is needed to be input, and label or data of an input-output pair are not needed; both the decoder and encoder herein use convolutional neural networks; the actor's facial motion picture shares the same code with the filmed picture of the animation file.
Wherein, the training encoders A and B do not need labels, and the training encoder C needs labels, namely, the paired screen-shooting pictures of the animation file and the corresponding values of the controller are needed.
And training the built generation model to obtain the optimal weight values of the encoder and the decoder to obtain the optimal model, wherein the training of the encoder, the decoder for generating facial action pictures of actors and the decoder for generating screen shooting pictures of animation files is the process of outputting, restoring and inputting.
Specifically, the training method of the generative model is as follows:
q1, inputting the actor's facial motion picture and the animation file's screenshot picture to the encoder, outputting the corresponding picture through the decoder generating the actor's facial motion picture and the decoder generating the animation file's screenshot picture, the output in the process being the input picture, at this moment, calculating the loss function value between the input and output pictures through the structural similarity index, and updating the weights of the encoder and the corresponding decoder according to the loss function;
q2, the third decoder outputs the controller value corresponding to the screen picture, the loss function is obtained by averaging the absolute value of the difference value of each controller value, and the weight of the corresponding decoder is updated according to the loss function value.
Inputting the pictures of actors into a trained generation model, coding the pictures to a hidden space by a coder, and decoding data in the hidden space by a corresponding decoder to obtain the value of a corresponding controller; the values of the controller are input to animation software (maya, UE), and the facial motion of the model is generated.
According to the invention, through modeling, the facial actions of the corresponding animation models are generated according to the facial videos and photos of the actors, and paired data corresponding to the video pictures of the actors and the animation files of the corresponding roles are not needed, so that the early-stage data preparation work is greatly reduced; the actors can be changed at will without data annotation work; real-time estimation can be carried out, namely, the facial videos of actors can be acquired in real time and calculated as the facial movements of the animation model.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A3D character facial expression animation real-time generation method based on deep learning is characterized by comprising the following steps:
s1, acquiring training data, and performing enhancement processing on the acquired training data, wherein the training data comprises animation files of the model and values of corresponding controllers, facial motion pictures of actors, screen shot pictures of the animation files, and values of the controllers corresponding to the screen shot pictures of the animation files;
s2, building a generation model, wherein the generation model comprises 1 encoder and 3 decoders, the encoder is used for encoding the picture data of the training data into an implied space, and the 3 decoders are used for decoding the data of the implied space into facial action pictures of actors, screen shot pictures of animation files and controller values corresponding to the screen shot pictures of the animation files;
s3, training the built generation model to obtain the optimal weight values of the encoder and the decoder to obtain the optimal model;
the training method of the generative model comprises the following steps:
q1, inputting the actor facial motion picture and the animation file screen picture into an encoder, outputting a corresponding picture through a decoder for generating the actor facial motion picture and a decoder for generating the animation file screen picture, wherein the output picture is the input picture in the process, calculating a loss function value between the input picture and the output picture through a structural similarity index at the moment, and updating the weight of the encoder and the corresponding decoder according to the loss function value;
q2, the encoder, the decoder for generating facial motion pictures of actors and the decoder for generating screen shot pictures of animation files are trained in the process of outputting, restoring and inputting, the third decoder outputs the values of the controllers corresponding to the screen shot pictures, loss functions are obtained by averaging the absolute values of the difference values of the values of each controller, and the weights of the corresponding decoders are updated according to the loss function values;
s4, inputting the pictures of actors into the trained generation model, coding the pictures into the hidden space by the coder, and decoding the data in the hidden space by the corresponding decoder to obtain the corresponding controller value;
s5, the controller value is input to animation software to generate the facial movement of the model.
2. The method for generating facial expression animation of 3D character based on deep learning in real time as claimed in claim 1, wherein the training data enhancement processing in step S1 is to change brightness of facial movement picture of actor and screen shot picture of animation file randomly, and perform data enhancement through rotation, displacement, noise addition and simulated illumination change.
3. The method of claim 1, wherein the facial motion picture of the actor and the screen shot picture of the animation file share the same code.
4. The method for generating facial expression animation of 3D character based on deep learning in real time as claimed in claim 1, wherein the animation software of step S5 comprises maya or UE.
5. The method of claim 1, wherein the encoder and decoder employ convolutional neural networks.
CN202110316439.5A 2021-03-25 2021-03-25 3D character facial expression animation real-time generation method based on deep learning Active CN112700524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110316439.5A CN112700524B (en) 2021-03-25 2021-03-25 3D character facial expression animation real-time generation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110316439.5A CN112700524B (en) 2021-03-25 2021-03-25 3D character facial expression animation real-time generation method based on deep learning

Publications (2)

Publication Number Publication Date
CN112700524A CN112700524A (en) 2021-04-23
CN112700524B true CN112700524B (en) 2021-07-02

Family

ID=75516776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110316439.5A Active CN112700524B (en) 2021-03-25 2021-03-25 3D character facial expression animation real-time generation method based on deep learning

Country Status (1)

Country Link
CN (1) CN112700524B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781616B (en) * 2021-11-08 2022-02-08 江苏原力数字科技股份有限公司 Facial animation binding acceleration method based on neural network
CN114898020A (en) * 2022-05-26 2022-08-12 唯物(杭州)科技有限公司 3D character real-time face driving method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524226A (en) * 2020-04-21 2020-08-11 中国科学技术大学 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting
CN111598979A (en) * 2020-04-30 2020-08-28 腾讯科技(深圳)有限公司 Method, device and equipment for generating facial animation of virtual character and storage medium
CN112200894A (en) * 2020-12-07 2021-01-08 江苏原力数字科技股份有限公司 Automatic digital human facial expression animation migration method based on deep learning framework

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524226A (en) * 2020-04-21 2020-08-11 中国科学技术大学 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting
CN111598979A (en) * 2020-04-30 2020-08-28 腾讯科技(深圳)有限公司 Method, device and equipment for generating facial animation of virtual character and storage medium
CN112200894A (en) * 2020-12-07 2021-01-08 江苏原力数字科技股份有限公司 Automatic digital human facial expression animation migration method based on deep learning framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于深度学习和表情AU参数的人脸动画方法》;闫衍芙 等;《计算机辅助设计与图形学学报》;20191130;第31卷(第11期);第1973-1980页 *
《基于深度学习网络模型实现语音驱动的人脸动画合成》;岳旸;《中国优秀硕士学位论文全文数据库信息科技辑》;20210315(第3期);正文第12-49页 *

Also Published As

Publication number Publication date
CN112700524A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
Mihajlovic et al. LEAP: Learning articulated occupancy of people
US20210201552A1 (en) Systems and methods for real-time complex character animations and interactivity
Lavagetto et al. The facial animation engine: Toward a high-level interface for the design of MPEG-4 compliant animated faces
CN112700524B (en) 3D character facial expression animation real-time generation method based on deep learning
AU2022263508A1 (en) Synthesizing sequences of 3D geometries for movement-based performance
Yang et al. Unifiedgesture: A unified gesture synthesis model for multiple skeletons
CN116385606A (en) Speech signal driven personalized three-dimensional face animation generation method and application thereof
JP2001231037A (en) Image processing system, image processing unit, and storage medium
US20230154090A1 (en) Synthesizing sequences of images for movement-based performance
Jiang et al. Animating arbitrary topology 3D facial model using the MPEG-4 FaceDefTables
US20240078726A1 (en) Multi-camera face swapping
Ming et al. High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
Yan et al. Pose-Driven Compression for Dynamic 3D Human via Human Prior Models
CN116432732A (en) Training method and device for hairline segment processing model and hairline segment processing method
WANG et al. Intelligent Facial Expression Generation Method Based on Artificial Intelligence
CN117292031A (en) Training method and device for 3D virtual digital lip animation generation model
CN115578493A (en) Maya expression coding method and system
Hovden et al. MPEG-4 FAP generation as an optimization problem
Liang et al. Individual face expressions and expression cloning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant