CN113255457A - Animation character facial expression generation method and system based on facial expression recognition - Google Patents

Animation character facial expression generation method and system based on facial expression recognition Download PDF

Info

Publication number
CN113255457A
CN113255457A CN202110470655.5A CN202110470655A CN113255457A CN 113255457 A CN113255457 A CN 113255457A CN 202110470655 A CN202110470655 A CN 202110470655A CN 113255457 A CN113255457 A CN 113255457A
Authority
CN
China
Prior art keywords
animation
facial
character
face
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110470655.5A
Other languages
Chinese (zh)
Inventor
潘烨
张睿思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110470655.5A priority Critical patent/CN113255457A/en
Publication of CN113255457A publication Critical patent/CN113255457A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/02Non-photorealistic rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of animation production, and discloses an animation character facial expression generation method based on facial expression recognition, which comprises the following steps: s1, recognizing the expressions of the human face and the animation character by the human face data set and the animation data set through an emotion recognition network, and matching the human face and the animation data pictures; s2, obtaining an animation training network through deep learning of the mapping relation between the facial pictures with the same expression and the character skeleton parameters; s3, aiming at the input of each frame of the video, using a network animation training network to output a skeleton parameter result; and S5, performing three-dimensional reconstruction on the input picture to obtain the motion parameters of the role, and optimizing the bone parameters by combining the geometric information of the face picture. Correspondingly, the invention also discloses an animation character facial expression generation system based on facial expression recognition. The invention improves the perception degree of the audience to the role emotion change by more finely controlling the geometric characteristics of the human face such as the mouth, the eyes and the like.

Description

Animation character facial expression generation method and system based on facial expression recognition
Technical Field
The invention belongs to the technical field of animation production, and particularly relates to an animation character facial expression generation method and system based on facial expression recognition.
Background
In facial motion capture, a traditional approach, such as ARkit, extracts facial geometric information by using a camera, mapping into a 3D model. And obtaining parameter information of the 3D model by learning the mapping relation between the two-dimensional video and the three-dimensional model parameters. In addition, some commercial software, such as Faceware, can also obtain parameter information of the 3D model by reconstructing the two-dimensional input picture. These methods can effectively extract geometric information of the face, but it is difficult for the viewer to feel the change of the character expression. Models such as exprpgen, depexppr, etc. attempt to improve on this problem, but while optimizing the appearance information, important face geometry information may be lost, making it difficult to control the details of the animated character face. These problems make it difficult to accurately express emotional information of a character while accurately communicating changes in facial information. When the audience watches the video, the change of the expression and the geometric details of the animated character has very important influence on the watching of the video.
Expression recognition and analysis are widely applied in the fields of human-computer interaction, computer graphics and the like. Ekman in 1978 suggested that the facial muscle status of a particular expression could be characterized using an expression action coding system (FACS). In general, facial features can be divided into upper and lower faces, with a small association between the two. The upper expression refers to the expression of the eyes, eyebrows, and cheeks, and the lower expression refers to the expression of the lips, the nasal root, and between. The expression action coding system (FACS) defines 46 basic action units, forming seven thousand or more combinations that can characterize most observed expressions.
In animation, FACS is widely used for emotional perception and manipulation of cartoon characters. Wherein the FACSGen controls the three-dimensional facial expression of the cartoon character by controlling the action unit. But the superposition of the microscopic level hardly causes the user to perceive the cartoon emotion on the whole level. HapFACS therefore improves it, allowing animators to control the character expression at both the level of the control unit and the overall mood. However, microscopic and macroscopic controls also limit the generalization performance of character expressions, and the same expressions are difficult to migrate from one character to another. The 'space-time facial expression animation edition' published by Wanxian Mei is used for the later edition of the facial animation and is used for meeting the special application requirements of an animator. The displacement of the edited characteristic points is propagated to other vertexes of the face model in a spatial domain by using a face expression synthesis technology based on Laplacian; and the Gaussian function is used for transmitting the editing effect of the user to an adjacent animation sequence in a time domain, so that the smooth transition of the facial expression animation is ensured, and the geometric details of the face are kept. The blue sky studio proposes a method for using differential subspace reconstruction to automatically generate role skeletons. By learning the marking information of the differential coordinates and then reconstructing the subspace grids, the deformation error can be effectively reduced, and the generalization performance is improved.
For face tracking, a conventional method detects each window of an input picture by using a multi-layer perceptron or an ensemble learning method, finds out a part containing a face, and synthesizes the part. With the development of deep learning, the application of the deep convolutional network and various variants thereof greatly improves the face detection effect. Models such as Fast R-CNN, YOLO and the like can efficiently and accurately detect a plurality of faces in a picture. The deep ir researchers Sun X, etc. combined the RCNN framework with feature cascading, multi-scale training, model pre-training, and correct calibration of key parameters, to achieve 83% accuracy in FDDB benchmark. Researchers at Chinese academy of sciences Wu S and the like introduce a layered attention mechanism into face recognition, and realize layered perception of face features by extracting local features of a face by using a Gaussian kernel model and modeling relationships among the features by using an LSTM. The model achieves 96.42%, 94.84% and 74.60% accuracy in the FDDB data set, the WIDER FACE data set and the UFDD data set respectively.
By extracting the facial features of the face, the facial emotion feature information can be effectively extracted and used as a feature vector for subsequent recognition and expression migration. Researchers of the empire' S, Zafeirious S, etc., characterize facial expressions using a sparse signal processing method derived from the 11 optimization problem, and classify feature vectors in conjunction with the SVM algorithm. By performing the gridding preprocessing on the face, the algorithm obtains better effect than directly processing the original picture. The algorithm achieved 92.4% accuracy in the CK data set.
With the development of deep learning technology, deep neural networks are used for emotional perception of human face images. Different from the traditional feature extraction task, in the deep learning, the two processes of face detection and feature extraction are performed simultaneously. The recognition of emotion using deep learning can be divided into three steps: preprocessing, deep feature learning and deep feature extraction. The preprocessing refers to processing such as face extraction, rotation correction and data enhancement on an input image. And then, completing the extraction of the picture characteristics in an end-to-end learning mode. Yang H, a researcher at Binghanton university, New York State, etc. learns facial expression information by using a De-expression response Learning (DeRL) method. They first generate a neutral facial picture corresponding to the expression using the generative model, and although the expression information is finally filtered, the emotion information is stored in the middle layer of the generative model. Facial expressions are classified by learning the information remaining in the middle layer. The algorithm achieved 97.30%, 88.00%, 73.23% and 84.17% accuracy in the CK + dataset, the Ouu-CASIA dataset, the MMI dataset and the BU-3DFE dataset, respectively.
The traditional animation capturing method uses a depth camera or a 3D scanner to directly extract face information and map the face information in an animation character. Because the equipment is expensive and the construction is complex, the wide use is difficult to obtain. For example, researcher Weise T of the sons federal institute of technology combines geometric information of a face with pre-stored face depth information, optimizes a probability problem to obtain a blendshape parameter sequence, and effectively improves the speed and stability of expression migration. However, in a specific use, due to the limited resolution, it is difficult to capture subtle geometric and motion changes of the human face. And it is also difficult to capture the change of facial expression. Researchers at Qinghua university such as Bouaziz S improve the 3D parameter optimization algorithm to obtain low-dimensional human face parameter representation, and the extraction speed of the blendshape parameter is effectively improved. Firstly, using an RGB-D camera to obtain a face depth picture, and mapping the face depth picture to a face 3D model; and then carrying out PCA dimension reduction and geometric transformation on the parameters, and mapping the parameters into the animation model. The 2D picture is independently used as the expression input, so that the equipment requirement can be effectively reduced, and the extensive use of the animation expression migration becomes possible. Researchers at university of Zhejiang Cao C and the like propose a method without pre-calibration to realize real-time expression migration. The algorithm firstly regresses the extracted two-dimensional feature points of the human face by a DDE (displaced Dynamic expression) model. And then optimizing the face parameters by combining the errors of the camera, and mapping the parameters in the cartoon head portrait.
Disclosure of Invention
The invention provides an animation character facial expression generation method and system based on facial expression recognition, and aims to solve the problems that in the existing animation production, the human face movement needs to simulate the geometric information of the human face and the human face expression needs to be accurately transmitted.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a facial expression generation method of an animation character based on facial expression recognition comprises the following steps:
s1, recognizing the expressions of the human face and the animation character by the human face data set and the animation data set through an emotion recognition network, and matching the human face and the animation data pictures;
s2, obtaining an animation training network through deep learning of the mapping relation between the facial pictures with the same expression and the character skeleton parameters;
s3, aiming at the input of each frame of the video, using a network animation training network to output a skeleton parameter result;
and S5, performing three-dimensional reconstruction on the input picture to obtain the motion parameters of the role, and optimizing the bone parameters by combining the geometric information of the face picture.
Preferably, in step S1: selecting a front face picture with seven types of labels of six-degree emotion and neutrality in a face data set; the character in the animation dataset comprises skeletal parameters for controlling the expression of the character, and seven classes of labels corresponding to six degrees of emotion and neutrality.
Preferably, step S1 includes the steps of:
s11, firstly, labeling feature points of the 3D animation character, extracting facial feature vectors of the animation image according to the selected facial marker points, and rendering the 3D character to obtain a two-dimensional image;
s12, the pictures in the face data set are then searched for the samples that correspond most similarly in the 3D animation data set.
Preferably, step S12 includes the steps of:
s121, firstly, searching all animation data sets to find a plurality of pictures with the most similar expression distances;
and S122, searching the picture with the characteristic point most similar to the human face in the plurality of pictures by using the geometric distance, and outputting the picture as a result.
Preferably, the present invention further comprises the steps of: and S4, interpolating the output bone parameters by using the relation between the previous key frame and the next key frame.
Preferably, step S4 includes:
s41, firstly, classifying the input face emotion by using the emotion recognition network in the step S1 to obtain a face parameter set which is used for parameter set search and corresponds to the expression in the animation data set;
and S42, searching two parameter combinations which are closest to the optimization result L2 in all expression parameters as results, and interpolating between key frames.
Preferably, in step S2: and training the matched human face and animation data picture in the step S1 as data to generate the skeleton parameters of the animation character.
Preferably, in step S5: the face geometric information includes any one or more of left and right intereyebrow height, left and right eye height, nose width, left and right nose height, mouth width, mouth height, and lip height.
An animation character facial expression generation system based on facial expression recognition comprises:
the data preprocessing module is used for identifying the expressions of the human faces and the animation characters in the human face data set and the animation data set through a deep convolution network and matching the human faces and the animation data pictures;
the off-line training module is used for obtaining the mapping relation between the facial pictures with the same expression and the character skeleton parameters through deep learning;
the online generation module firstly inputs a face key frame to the offline training module to obtain role skeleton parameters; then, interpolating the obtained role skeleton parameters by utilizing the relation between the previous key frame and the next key frame; and then, carrying out three-dimensional reconstruction on the input human face key frame to obtain the motion parameters of the role, and finally optimizing the bone parameters by combining the geometric information of the human face picture.
Compared with the prior art, the invention has the beneficial effects that:
in the traditional method, only the geometric characteristics of the character are considered, the emotion change of the character is introduced into animation three-dimensional modeling, and the character is optimized by combining specific geometric details, so that the human animation capturing effect is improved;
realizing real-time automatic control of the animation role;
the traditional animation production needs a depth camera and the like, and the face information is extracted, so that the production cost is reduced by reconstructing the input two-dimensional face;
by interpolating between the two key frames, the facial expression of the character is effectively and smoothly transited, and the use experience of a user is improved.
Drawings
FIG. 1 is a flow chart of an animation character facial expression generation method based on facial expression recognition according to the present invention;
FIG. 2 is a diagram of the optimization results of different phases of the animated character according to the present invention;
FIG. 3 is a diagram of interpolation effects for animated characters according to the present invention;
FIG. 4 is a graph of the same face migration results of the present invention;
fig. 5 is a comparison graph of the effect of the present invention compared to the prior art based on expression information.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example one
A facial expression generation method of an animation character based on facial expression recognition comprises the following steps:
s1, recognizing the expressions of the human face and the animation character by the human face data set and the animation data set through an emotion recognition network, and matching the human face and the animation data pictures;
the face data set used in this embodiment is from CK +, dispa, KDEF, and MMI. Wherein each data set selects a front face picture with seven types of labels of six degrees emotion and neutrality. And performing operations such as rotation, scaling and the like on the image pairs under each label in the data set to perform data enhancement, so that the number of the images under each label is equal, and finally approximately 10000 images are obtained. In the embodiment, the illustration and the experimental result display picture are both from the data set.
The 3D animation dataset used in this example is from the FERG-3D-DB dataset, including four roles of memory, Bonnie, Ray, Malcolm. Each character comprises a skeletal parameter value for controlling the expression of the character, seven classes of marks corresponding to six-degree emotion and neutrality, and about 40000 samples exist.
The purpose of this process is to train a neural network for the classification of 2D face pictures and animation pictures. A face data set was first trained using the neural network shown in table 1. The face data set is classified into seven categories of angry, aversion, fear, happiness, sadness, neutrality and surprise. In the 2D cartoon data set, the parameters of the front network structure of the POOL layer are kept unchanged, and the FC layer of the trained network is finely adjusted in the cartoon data set. The animation data set is classified into seven categories of angry, disgust, fear, happiness, sadness, neutrality, and surprise.
TABLE 1 Emotion recognition network
Figure RE-GDA0003141055060000041
Figure RE-GDA0003141055060000051
This embodiment uses the PyTorch framework for end-to-end training of the network. During the training process, the data set was used for training, validation and testing at a ratio of 8: 1. An sgd (stored statistical gradient device) optimizer was used, where the momentum was 0.9, the weight was reduced to 0.0005, and the initial learning rate was 0.01. The learning rate is reduced to 1/10 every 10 cycles. Training is performed for 60 cycles in the face data set and 50 cycles in the animation data set, each batch being 50 in size.
And in the 3D network, searching an animation picture corresponding to the face data set picture as a reference value for training the 3D animation network and the character migration network.
Firstly, firstly
Figure RE-GDA0003141055060000052
In software, marking the feature points of the 3D animation role, extracting the facial feature vector of the animation picture according to the selected geometric features of the human face, and rendering the 3D role to obtain a two-dimensional picture. The pictures in the face data set are then searched for the most similar examples that correspond to in the 3D animation data set. The specific mode is as follows: firstly, searching all cartoon data sets to find out 30 pictures with the most similar expression distances; and then searching the picture with the characteristic point most similar to the human face in 30 pictures by using the geometric distance, and outputting the picture as a result.
In the expression distance, the JS dictionary is used as a basis for measuring the expression distance of the two pictures in the embodiment. After the face picture and the animation picture are input into the CNN network of table 1, 512-dimensional vectors output at the FC2 layer are used as expression feature vectors and are marked as H and C. As shown in formula (1), wherein
Figure RE-GDA0003141055060000053
D (H M) and D (C M) are Kullback-Leibler divergence.
Figure RE-GDA0003141055060000054
And in the geometric distance, the selected geometric features of the human face are used as geometric feature vectors and are normalized. And searching the animation geometric feature vector c closest to the human face geometric feature vector h after normalization and outputting the result.
S2, obtaining an animation training network through deep learning of the mapping relation between the facial pictures with the same expression and the character skeleton parameters;
the purpose of this process is to train a neural network for 3D animated character parameter generation. The present embodiment uses the above-mentioned matched face-3D parameters as data for training. In the 3D animation training network shown in table 2, a human face picture is used as an input, and a cross entropy loss function of a network FC3 layer output result and a reference value is used as an objective function to optimize. The specific formula is shown in (2), wherein y is the output result of the 3D animation training network, and y' is the reference value of the face-3D parameter matching.
H(y,y′)=-∑iy′ilog(softmax(yi)) (2)
TABLE 23D animation training network
Figure RE-GDA0003141055060000055
Figure RE-GDA0003141055060000061
This embodiment uses the PyTorch framework for end-to-end training of the network. During training, an SGD optimizer was used, with a momentum of 0.9, a weight drop of 0.0005, and an initial learning rate of 0.001. The learning rate is reduced to 1/10 every 10 cycles. Train 50 cycles, each train batch size is 50.
S3, aiming at the input of each frame of the video, using a network animation training network to output a skeleton parameter result;
s4, interpolating the output bone parameters by using the relation between the previous and the next key frames;
because the time interval exists in the reading of the facial expression of the human face, the interpolation algorithm can be utilized to interpolate the change of the character expression, so that the expression is excessively smooth. Because the reading time interval is short, and the facial expression generally hardly changes greatly, the embodiment obtains two skeleton parameters with the shortest emotional distance and geometric distance between two input key frames as candidate parameters for interpolation by filtering the 3D data set by two layers: firstly, classifying the emotion of the input face by using the emotion recognition network to obtain a parameter set of the corresponding expression in the 3D data set for searching geometric parameters. Of all expression parameters, two parameter combinations closest to the optimization result L2 are searched as a result, and interpolation is performed between key frames. The interpolation results are shown in fig. 3: by interpolating between the two key frames, the character expression transition problem between the front frame and the back frame can be smoothed, so that the expression is more natural.
And S5, performing three-dimensional reconstruction on the input picture to obtain the motion parameters of the role, and optimizing the bone parameters by combining the geometric information of the face picture.
The overall characteristics of the animated character, such as the expression of the character, can be effectively captured through the design of the neural network. However, the detailed parameters of the animated character, such as the eye opening size, the nose width, etc., are difficult to ensure that the neural network can completely learn. Therefore, the present embodiment is generated as the final bone parameters by fusing the geometric features and the overall features of the character.
For face and animation datasets, we extract the following facial landmark points as the face geometric features: the height between the left and right eyebrows (the height from the highest feature point to the lowest feature point of the eyebrows); left and right eye height (eye highest feature point to lowest feature point height); nose width (nose rightmost to leftmost feature point width); left and right nose height (left/right most feature point to nose bottom height); mouth width (rightmost to leftmost feature point width); mouth height (highest feature point to lowest feature point height); lip height (upper lip to fundus height). For each picture, scale to 256 x 256 pixel size for neural network input.
And optimizing the facial parameters, and extracting the obtained facial feature vectors to map in the parameters of the corresponding control facial role.
And optimizing the rotation parameters. And 3D reconstruction is carried out by utilizing information such as the depth of the two-dimensional image extracted by OpenCV, and the xyz coordinate value of the head rotation is obtained and output as a result. The specific method is that facial feature point coordinates (two sides of eyes, two sides of nose and two sides of mouth) corresponding to the facial feature point coordinates of the role and camera internal parameters are utilized to obtain a facial rotation matrix by solving a N-point perspective pose problem (PNP). And converting the obtained face rotation matrix into corresponding xyz coordinate values.
The image of the character after the optimization of the face parameters and the rotation parameters is shown in fig. 2. The following are sequentially arranged from left to right in the figure: inputting a face picture, a role whole skeleton parameter generation result, a role skeleton parameter optimization result and a motion parameter adding result.
Example two
With reference to fig. 1 to 5, an animation character facial expression generation system based on facial expression recognition includes:
the data preprocessing module is used for identifying the expressions of the human faces and the animation characters in the human face data set and the animation data set through a deep convolution network and matching the human faces and the animation data pictures;
the off-line training module is used for obtaining the mapping relation between the facial pictures with the same expression and the character skeleton parameters through deep learning;
the online generation module firstly inputs a face key frame to the offline training module to obtain role skeleton parameters; then, interpolating the obtained role skeleton parameters by utilizing the relation between the previous key frame and the next key frame; and then, carrying out three-dimensional reconstruction on the input human face key frame to obtain the motion parameters of the role, and finally optimizing the bone parameters by combining the geometric information of the human face picture.
It should be noted that, in the embodiment, any module or function implemented by the module may be added to achieve the object of the first embodiment of the present invention, which is not described in detail herein.
When the facial expression recognition network and the role emotion recognition network are trained, 80% of pictures are used as a training set for training, 10% of pictures are used as a verification set for verification, and 10% of pictures are used as a test set for testing. The emotion recognition accuracy rates of the different characters are shown in table 4. Test results prove that the neural network designed by the invention can effectively identify the expressions of the human face and the cartoon character.
TABLE 4 Emotion recognition accuracy
Figure RE-GDA0003141055060000071
The algorithm provided by the invention can effectively transfer the facial expression to different animation roles. Meanwhile, the method can also migrate to the same role aiming at different faces, thereby ensuring the robustness of the effect. Figure 4 shows the results of the migration of the sequence of pictures of the changes in the mood of the face to the memory. Figure 5 shows the results of migration of different facial emotional changes to memory. The result shows that the role migration system can ensure the accuracy of expression transformation before and after the key frame aiming at a single face. Aiming at different faces, the system can accurately identify different expressions, and the robustness of different inputs is ensured.
The invention compares the proposed algorithm with animation effects of character migration based on emotion only and character migration based on geometric information only, and the final result is shown in fig. 5: the first line is the input face picture, the second line is the output result of the text system, and the third line is the output result based on the expression method. Compared with a role migration method based on emotion, the algorithm disclosed by the invention can be used for more finely controlling the mouth and eyes and improving the perception degree of the audience on role emotion change.
The invention provides a real-time animation generation algorithm combining facial expressions and geometric characteristics. By transferring the facial expressions, the role perception of the audience to the animation characters is effectively improved. And simultaneously, the control of the geometric characteristics enables the manipulation of the character details. Meanwhile, the role is controlled in real time and automatically generated through an interpolation optimization algorithm.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims (9)

1. A facial expression generation method of an animation character based on facial expression recognition is characterized by comprising the following steps:
s1, recognizing the expressions of the human face and the animation character by the human face data set and the animation data set through an emotion recognition network, and matching the human face and the animation data pictures;
s2, obtaining an animation training network through deep learning of the mapping relation between the facial pictures with the same expression and the character skeleton parameters;
s3, aiming at the input of each frame of the video, using a network animation training network to output a skeleton parameter result;
and S5, performing three-dimensional reconstruction on the input picture to obtain the motion parameters of the role, and optimizing the bone parameters by combining the geometric information of the face picture.
2. The method for capturing facial expressions of characters based on facial expression recognition of claim 1, wherein in step S1: selecting a front face picture with seven types of labels of six-degree emotion and neutrality in a face data set; the character in the animation dataset comprises skeletal parameters for controlling the expression of the character, and seven classes of labels corresponding to six degrees of emotion and neutrality.
3. The method for capturing facial expressions of characters based on facial expression recognition of claim 2, wherein the step S1 comprises the following steps:
s11, firstly, labeling feature points of the 3D animation character, extracting facial feature vectors of the animation image according to the selected facial marker points, and rendering the 3D character to obtain a two-dimensional image;
s12, the pictures in the face data set are then searched for the samples that correspond most similarly in the 3D animation data set.
4. The method for generating facial expressions of animated characters based on facial expression recognition according to claim 3, wherein the step S12 comprises the following steps:
s121, firstly, searching all animation data sets to find a plurality of pictures with the most similar expression distances;
and S122, searching the picture with the characteristic point most similar to the human face in the plurality of pictures by using the geometric distance, and outputting the picture as a result.
5. The character facial expression operation capturing method based on facial expression recognition is characterized by further comprising the following steps of: and S4, interpolating the output bone parameters by using the relation between the previous key frame and the next key frame.
6. The method for capturing facial expressions of characters based on facial expression recognition as claimed in claim 5, wherein step S4 includes:
s41, firstly, classifying the input face emotion by using the emotion recognition network in the step S1 to obtain a face parameter set which is used for parameter set search and corresponds to the expression in the animation data set;
and S42, searching two parameter combinations which are closest to the optimization result L2 in all expression parameters as results, and interpolating between key frames.
7. The method for capturing facial expressions of characters based on facial expression recognition of claim 1, wherein in step S2: and training the matched human face and animation data picture in the step S1 as data to generate the skeleton parameters of the animation character.
8. The method for capturing facial expressions of characters based on facial expression recognition of claim 1, wherein in step S5: the face geometric information includes any one or more of left and right intereyebrow height, left and right eye height, nose width, left and right nose height, mouth width, mouth height, and lip height.
9. An animation character facial expression generation system based on facial expression recognition is characterized by comprising:
the data preprocessing module is used for identifying the expressions of the human faces and the animation characters in the human face data set and the animation data set through a deep convolution network and matching the human faces and the animation data pictures;
the off-line training module is used for obtaining the mapping relation between the facial pictures with the same expression and the character skeleton parameters through deep learning;
the online generation module firstly inputs a face key frame to the offline training module to obtain role skeleton parameters; then, interpolating the obtained role skeleton parameters by utilizing the relation between the previous key frame and the next key frame; and then, carrying out three-dimensional reconstruction on the input human face key frame to obtain the motion parameters of the role, and finally optimizing the bone parameters by combining the geometric information of the human face picture.
CN202110470655.5A 2021-04-28 2021-04-28 Animation character facial expression generation method and system based on facial expression recognition Pending CN113255457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110470655.5A CN113255457A (en) 2021-04-28 2021-04-28 Animation character facial expression generation method and system based on facial expression recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110470655.5A CN113255457A (en) 2021-04-28 2021-04-28 Animation character facial expression generation method and system based on facial expression recognition

Publications (1)

Publication Number Publication Date
CN113255457A true CN113255457A (en) 2021-08-13

Family

ID=77222532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110470655.5A Pending CN113255457A (en) 2021-04-28 2021-04-28 Animation character facial expression generation method and system based on facial expression recognition

Country Status (1)

Country Link
CN (1) CN113255457A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049678A (en) * 2022-01-11 2022-02-15 之江实验室 Facial motion capturing method and system based on deep learning
CN114494542A (en) * 2022-01-24 2022-05-13 广州喳喳科技有限公司 Character driving animation method and system based on convolutional neural network
CN114529640A (en) * 2022-02-17 2022-05-24 北京字跳网络技术有限公司 Moving picture generation method and device, computer equipment and storage medium
CN114898020A (en) * 2022-05-26 2022-08-12 唯物(杭州)科技有限公司 3D character real-time face driving method and device, electronic equipment and storage medium
USD969216S1 (en) * 2021-08-25 2022-11-08 Rebecca Hadley Educational poster
CN115797569A (en) * 2023-01-31 2023-03-14 盾钰(上海)互联网科技有限公司 Dynamic generation method and system for high-precision twin facial expression and action subdivision
CN115953515A (en) * 2023-03-14 2023-04-11 深圳崇德动漫股份有限公司 Animation image generation method, device, equipment and medium based on real person data
CN116485964A (en) * 2023-06-21 2023-07-25 海马云(天津)信息技术有限公司 Expression processing method, device and storage medium of digital virtual object

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN105528805A (en) * 2015-12-25 2016-04-27 苏州丽多数字科技有限公司 Virtual face animation synthesis method
CN106600667A (en) * 2016-12-12 2017-04-26 南京大学 Method for driving face animation with video based on convolution neural network
CN107154069A (en) * 2017-05-11 2017-09-12 上海微漫网络科技有限公司 A kind of data processing method and system based on virtual role
CN108876879A (en) * 2017-05-12 2018-11-23 腾讯科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium that human face animation is realized
CN109978975A (en) * 2019-03-12 2019-07-05 深圳市商汤科技有限公司 A kind of moving method and device, computer equipment of movement
CN112541445A (en) * 2020-12-16 2021-03-23 中国联合网络通信集团有限公司 Facial expression migration method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN105528805A (en) * 2015-12-25 2016-04-27 苏州丽多数字科技有限公司 Virtual face animation synthesis method
CN106600667A (en) * 2016-12-12 2017-04-26 南京大学 Method for driving face animation with video based on convolution neural network
CN107154069A (en) * 2017-05-11 2017-09-12 上海微漫网络科技有限公司 A kind of data processing method and system based on virtual role
CN108876879A (en) * 2017-05-12 2018-11-23 腾讯科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium that human face animation is realized
CN109978975A (en) * 2019-03-12 2019-07-05 深圳市商汤科技有限公司 A kind of moving method and device, computer equipment of movement
CN112541445A (en) * 2020-12-16 2021-03-23 中国联合网络通信集团有限公司 Facial expression migration method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEEPALI ANEJA 等: ""Learning to Generate 3D Stylized Character Expressions from Humans"", 《IEEE》 *
DEEPALI ANEJA 等: ""Modeling Stylized Character Expressions via Deep Learning"", 《SPRINGER》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USD969216S1 (en) * 2021-08-25 2022-11-08 Rebecca Hadley Educational poster
CN114049678A (en) * 2022-01-11 2022-02-15 之江实验室 Facial motion capturing method and system based on deep learning
CN114049678B (en) * 2022-01-11 2022-04-12 之江实验室 Facial motion capturing method and system based on deep learning
CN114494542A (en) * 2022-01-24 2022-05-13 广州喳喳科技有限公司 Character driving animation method and system based on convolutional neural network
CN114529640A (en) * 2022-02-17 2022-05-24 北京字跳网络技术有限公司 Moving picture generation method and device, computer equipment and storage medium
CN114529640B (en) * 2022-02-17 2024-01-26 北京字跳网络技术有限公司 Moving picture generation method, moving picture generation device, computer equipment and storage medium
CN114898020A (en) * 2022-05-26 2022-08-12 唯物(杭州)科技有限公司 3D character real-time face driving method and device, electronic equipment and storage medium
CN115797569A (en) * 2023-01-31 2023-03-14 盾钰(上海)互联网科技有限公司 Dynamic generation method and system for high-precision twin facial expression and action subdivision
CN115797569B (en) * 2023-01-31 2023-05-02 盾钰(上海)互联网科技有限公司 Dynamic generation method and system for high-precision degree twin facial expression action subdivision
CN115953515A (en) * 2023-03-14 2023-04-11 深圳崇德动漫股份有限公司 Animation image generation method, device, equipment and medium based on real person data
CN116485964A (en) * 2023-06-21 2023-07-25 海马云(天津)信息技术有限公司 Expression processing method, device and storage medium of digital virtual object
CN116485964B (en) * 2023-06-21 2023-10-13 海马云(天津)信息技术有限公司 Expression processing method, device and storage medium of digital virtual object

Similar Documents

Publication Publication Date Title
CN113255457A (en) Animation character facial expression generation method and system based on facial expression recognition
CN111275518B (en) Video virtual fitting method and device based on mixed optical flow
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
Chuang et al. Mood swings: expressive speech animation
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
Li et al. Learning symmetry consistent deep cnns for face completion
De Castro et al. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps
CN111967533B (en) Sketch image translation method based on scene recognition
Sinha et al. Identity-preserving realistic talking face generation
Yang et al. Controllable sketch-to-image translation for robust face synthesis
Xia et al. Controllable continuous gaze redirection
Gu et al. CariMe: unpaired caricature generation with multiple exaggerations
Kwolek et al. Recognition of JSL fingerspelling using deep convolutional neural networks
Hong et al. Dagan++: Depth-aware generative adversarial network for talking head video generation
CN113076918A (en) Video-based facial expression cloning method
CN117333604A (en) Character face replay method based on semantic perception nerve radiation field
Liu et al. 4D facial analysis: A survey of datasets, algorithms and applications
Ekmen et al. From 2D to 3D real-time expression transfer for facial animation
Zhao et al. Purifying naturalistic images through a real-time style transfer semantics network
US11734888B2 (en) Real-time 3D facial animation from binocular video
Wang et al. Expression-aware neural radiance fields for high-fidelity talking portrait synthesis
Borovikov et al. Applied monocular reconstruction of parametric faces with domain engineering
Sun et al. Generation of virtual digital human for customer service industry
Mu Pose Estimation‐Assisted Dance Tracking System Based on Convolutional Neural Network
Wang et al. Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813

RJ01 Rejection of invention patent application after publication