CN110599573A - Method for realizing real-time human face interactive animation based on monocular camera - Google Patents
Method for realizing real-time human face interactive animation based on monocular camera Download PDFInfo
- Publication number
- CN110599573A CN110599573A CN201910839412.7A CN201910839412A CN110599573A CN 110599573 A CN110599573 A CN 110599573A CN 201910839412 A CN201910839412 A CN 201910839412A CN 110599573 A CN110599573 A CN 110599573A
- Authority
- CN
- China
- Prior art keywords
- animation
- model
- face
- voice
- skin
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 18
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 52
- 230000008921 facial expression Effects 0.000 claims abstract description 38
- 230000008451 emotion Effects 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000014509 gene expression Effects 0.000 claims description 20
- 230000001815 facial effect Effects 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 12
- 230000008909 emotion recognition Effects 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 11
- 230000003068 static effect Effects 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 14
- 239000010410 layer Substances 0.000 description 19
- 239000011159 matrix material Substances 0.000 description 11
- 210000003128 head Anatomy 0.000 description 8
- 238000000354 decomposition reaction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000002996 emotional effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 210000003205 muscle Anatomy 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 210000000887 face Anatomy 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000000429 assembly Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 1
- 239000003517 fume Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 210000004872 soft tissue Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention relates to a three-dimensional character animation technology, and discloses a method for realizing real-time human face interactive animation based on a monocular camera. The method can be summarized as follows: capturing a face video image and voice input information, and extracting a face expression animation parameter and a voice emotion animation parameter; learning a training sequence consisting of skeleton motion and corresponding skin deformation through an action state space model, establishing a virtual character skeleton skin model based on an auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.
Description
Technical Field
The invention relates to a three-dimensional character animation technology, in particular to a method for realizing real-time human face interactive animation based on a monocular camera.
Background
In recent years, with the continuous development of computer hardware and software (such as the latest augmented reality application development kit (ARKit) released by apple Inc., the ARCore 1.0 and series of support tools released by Google Inc.), the multimedia technology is brought into a full development period, and meanwhile, as the visualization requirements of people on human-computer interaction interfaces are higher and higher, the human face modeling and animation technology plays more and more important roles in human-computer interaction. The application field of the three-dimensional facial expression animation technology is very wide, such as game entertainment, movie making, man-machine interaction, advertisement making and the like, and the three-dimensional facial expression animation technology has important application value and theoretical significance.
Since Parke[1]Since 1972, the pioneering work of using computer-generated face animation by people, more and more researchers in the world have found the research and use value of three-dimensional face modeling and animation technology thereof and made a lot of important contributions. As shown in fig. 1, these works mainly include how to use an effective model to represent the shape change of the face, how to accurately and rapidly capture facial expressions, how to finely build a three-dimensional face reconstruction model in real time, how to construct a digital face avatar, and how to drive it to generate a face model with a sense of reality.
Cao[2]In 2013, the inventor provides a real-time face tracking and animation method based on three-dimensional shape regression. The method uses a monocular video camera as the acquisition equipment of the face image and mainly comprises two steps of preprocessing and real-time operation. In the preprocessing stage, a monocular camera is used to capture a user-specific gesture expression comprising a series of facial expressions and head rotation movements, followed by semi-automatic feature point labeling of the user's facial image using a human face feature point labeling algorithm. Based on the face image after the feature points are calibrated, Cao[3]In 2014, the three-dimensional facial expression library FaceWarehouse oriented to the visual computing application is constructed. In the database, a Bilinear Face Model (Bilinear Face Model) containing two attributes of an individual and an expression is provided for fitting and generating a user-specific expression fusion Model. And calculating a three-dimensional face shape vector consisting of the three-dimensional positions of the feature points in each image of the user acquired by the camera through the expression fusion model. The Regression algorithm of the method adopts a double-layer cascade Regression (Two-level Boosted Regression) algorithm of shape related features, and all images and three-dimensional face shape vectors corresponding to the images are used as input to train the imagesA user-specific three-dimensional shape regressor. In the real-time operation stage, the user-specific three-dimensional shape regressor obtains three-dimensional shape parameters and face motion parameters including rigid transformation parameters of the head and non-rigid face motion parameters of the facial expression based on the face motion parameters obtained from the current frame and the previous frame, and then the parameters are migrated and mapped to the virtual character to drive the virtual character to generate expression animation corresponding to the face motion.
However, the above method has certain limitations, and for each new user, a pre-processing process of about 45 minutes is required to generate a user-specific expression fusion model and a three-dimensional face shape regressor. Cao[4]In 2014, the real-time face tracking algorithm based on the offset dynamic expression regression is also provided, and is the algorithm based on the double-layer cascade regression, but no preprocessing operation is required for a new user, so that the real-time face expression tracking capture algorithm of any user is realized.
Cao et al, 2013, propose real-time face tracking and animation method based on three-dimensional shape regression and Cao[4]The work gravity center of a real-time face tracking algorithm based on the offset dynamic expression regression, which is proposed again in 2014 by the people, lies in how to accurately, efficiently and robustly track large-amplitude face movements in a video, such as large-amplitude expressions like frowning, laughing, mouth opening and the like, and rigid movements like head rotation and translation are added. But both ignore detail information on the face, such as a raised line in the face when lifting the eyebrow, secondary motion of the skin of the face caused by motion and the like, and the details are the important features for helping people to understand expressions and enabling the face to be richer in expressive force.
Reference documents:
[1]Parke F I.Computer generated animation of faces[C]//ACM Conference.ACM,1972:451-457.
[2]Cao C,Weng Y,Lin S,et al.3D shape regression for real-time facial animation[J].ACM Transactions on Graphics,2013,32(4):1.
[3]Cao C,Weng Y,Zhou S,et al.FaceWarehouse:A 3D Facial Expression Database for Visual Computing[J].IEEE Transactions on Visualization&Computer Graphics,2014,20(3):413-425.
[4]Cao C,Hou Q,Zhou K.Displaced dynamic expression regression for real-time facial tracking and animation[J].Acm Transactions on Graphics,2014,33(4):1-10.
[5]Ekman P,Friesen W V.Facial Action Coding System:Manual[J].Agriculture,1978.
[6]Duffy N,Helmbold D.Boosting Methods for Regression[J].Machine Learning,2002,47(2-3):153-200.
disclosure of Invention
The technical problem to be solved by the invention is as follows: a method for realizing real-time interactive animation of human faces based on a monocular camera is provided, animation parameters are generated by fusing human face expression capture and speech emotion recognition technologies, and visual dynamic skin deformation animation is synthesized in real time by a skeleton-based technology, so that the expression of the generated real-time animation has richer nature, more realistic sense and more self-characteristic.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for realizing real-time human face interactive animation based on a monocular camera comprises the following steps:
s1, capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
s2, marking human face characteristic points in the human face image sequence, and extracting human face expression animation parameters;
s3, extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
s4, learning a training sequence consisting of skeleton motion and corresponding skin deformation through the action state space model, establishing a virtual character skeleton skin model based on the auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.
As a further optimization, in step S2, a double-layer cascade regression model is used to mark the facial feature points, and a Candide-3 facial model based on a facial activity coding system is used as a parameter carrier to extract facial expression animation parameters.
For further optimization, the double-layer cascade regression model adopts a two-layer regression structure, wherein the first layer adopts an enhanced regression model formed by combining T weak regressors in a superposition mode; the second layer is formed by superposing strong regressors which are formed by cascading K regression models aiming at each weak regressor in the first layer.
As a further optimization, step S3 specifically includes:
s31, analyzing and extracting the speech emotion information features in the speech input information;
s32, performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment;
and S33, corresponding the voice emotion result to an AU unit-based face activity coding system, extracting corresponding AU parameters and obtaining voice emotion animation parameters.
As a further optimization, in step S4, the motion state space model is composed of three key elements: (S, A, { P })
S represents a facial expression state set of each frame of the virtual character;
a represents a group of action sets, parameters obtained through facial expression recognition and speech emotion recognition are used as a group of action vectors, and the change state of the next frame of virtual characters is driven;
p is state transition probability and represents the expression state s of the virtual character in the current frame ttE.g. S, by performing action ate.A and then to other states.
As a further optimization, in step S4, the method for creating a virtual character skeleton skin model based on an auxiliary bone controller includes:
a. taking a bone covering model of the manufactured virtual character without the auxiliary bone as an original model;
b. carrying out covering weight optimization on the skeleton covering model;
c. gradually inserting the auxiliary bone into a region where the maximum approximate error is generated between the original model and the target model face;
d. solving two sub-problems of skin weight optimization and auxiliary bone position conversion optimization by adopting a block coordinate descent algorithm;
e. constructing an auxiliary bone controller, wherein a skin transformation q based on the auxiliary bone controller is represented by connecting a static component x and a dynamic component y, and q is x + y; wherein, the static component X is calculated according to the main skeleton posture in the original model; the dynamic component y is controlled using a motion state space model.
The invention has the beneficial effects that:
1. facial expressions are the affluence of human emotions, but in some special cases, facial expressions do not fully express the internal emotion of a character. If facial point-to-point driving is performed by capturing and tracking facial expression feature points as parameters, it is obvious that the generated facial animation is not vivid enough. For example, when a character smiles and laughs, the facial expressions of the character and the character are similar, but different language words are emitted, so that the current emotional state change of the character can be better captured from the perspective of voice by adding the voice emotion recognition technology. The invention combines the facial expression capturing technology and the voice emotion recognition technology, and can greatly improve the richness, naturalness and reality of the expression animation of the virtual character.
2. Because the movement of the bones and the muscles drives the change of the skin expression together, in order to better simulate the skin movement, the invention adopts a bone covering model, automatically adds auxiliary bones through a skin decomposition algorithm based on the bones, and drives the main bones simulating the head bone movement and the auxiliary bones simulating the muscle movement together to drive the virtual character to carry out animation.
Drawings
FIG. 1 is a current state of the art of three-dimensional face animation;
FIG. 2 is a schematic diagram of the implementation of real-time human face interaction animation according to the present invention;
FIG. 3 is a schematic diagram of an enhanced regression structure;
FIG. 4 is a schematic diagram of a two-layer cascade regression structure;
FIG. 5 is a diagram illustrating the state transition process of the ASSM.
Detailed Description
The invention aims to provide a method for realizing real-time human face interactive animation based on a monocular camera, which generates animation parameters by fusing human face expression capture and speech emotion recognition technologies and synthesizes visual dynamic skin deformation animation in real time by a skeleton-based technology, so that the expression of the generated real-time animation has richer nature, more realistic sense and more self-characteristics. To achieve the purpose, the scheme of the invention is mainly realized from the following aspects:
1. face motion capture:
face motion capture includes two parts: non-rigid capture of facial expressions and head rigid transformation capture. According to the unique muscle movement characteristics of the facial expressions, the facial five sense organs are coordinated as a unified whole to show each facial expression. The method uses the intermediate description method with invariance as the reliable feature representation of the facial expression recognition to make up the deficiency of the bottom layer feature in the facial expression recognition.
2. And speech emotion recognition: the current emotional state of a person is captured through voice input of a performer, and voice emotional animation parameters corresponding to the current emotional state of the performer are generated through steps of voice feature extraction, dimensionality reduction, classification and the like.
3. And (3) target digital substitution expression aspect: a bone-based dynamic avatar expression method is used, which obtains the optimal transfer of nonlinear complex deformation including soft tissue by learning a training sequence consisting of bone motion and a corresponding skin deformation sequence. And driving the movement of the bones of the head of the character to perform programmed control on the auxiliary bones through the expression semantics of the user extracted from the human face motion capture so as to simulate the dynamic deformation of the facial skin.
In terms of specific implementation, the principle of the method for implementing the real-time interactive animation of the human face based on the monocular camera is shown in fig. 2, and the method comprises the following steps:
(1) capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
(2) capturing and tracking a human face: marking human face characteristic points from the captured human face image, and extracting human face expression animation parameters;
the positioning of the face feature points is a key link in face recognition, face tracking, face animation and three-dimensional face modeling. Due to factors such as human face diversity and illumination, locating human face feature points in natural environments is still a difficult challenge. The specific definition of the human face characteristic points is as follows: for a face shape containing N face feature points, S ═ x1,y1,...,xN,yN]For an input face picture, the goal of face feature point positioning is to estimate a face feature point shape S, so that S is equal to the true shape of the face feature pointHas the smallest difference of S andthe minimized alignment difference between can be defined as L2-normal formThis equation is used to guide the training of a face feature point locator or to evaluate the performance of a locating algorithm for face feature points.
The invention aims to adopt an algorithm frame based on a regression model to carry out real-time and efficient human face detection and tracking.
a) Enhanced Regression (boost Regression)
Using enhanced regression to combine T weak regressors (R)1,...Rt,...RT) Are combined in a superimposed manner. For a given face sample I and an initialization shape S0Each regressor calculates a shape increment based on the sample featuresAnd updates the current shape in a cascading manner:
St=St+1+Rt(I,St-1),t=1,...,T (1)
Rt(I,St-1) Represents a regressor RtUsing the input sample image I and the last shape St-1Calculated shape increment, RtFrom the input sample image I and the last shape St-1Deciding, using shape index features, to learn RtAs shown in fig. 3;
given N training samples Represents the I-th sampleiTrue shape of (c), pair (R)1,...Rt,...RT) And (5) circulating the training until the training error is not increased any more. Each RtAre calculated by minimizing the alignment error, i.e.:
representing the last shape estimate, R, of the ith imagetThe output of (c) is a shape increment.
b) Double-layer lifting Regression (Two-level Boosted Regression)
The enhanced regression algorithm regresses the entire shape, but the large appearance difference in the input image and the rough initialized face shape make the single-layer weak regressor no longer applicable. A single regressor is too weak, convergence is slow during training, and results are poor during testing. In order to make the convergence faster and more stable during training, the invention adopts a two-layer cascade structure, as shown in fig. 4.
The first layer adopts the aboveThe enhanced regression model of (1). For each regressor R of the first layertAnd again using K regression models, i.e. Rt=(r1,...rk,...rK) The r is called a primary regressor here, and a strong regressor is cascaded by K primary regressors. The difference between the first layer and the second layer is that each regressor R in the first layertIs inputted byt-1Are all different and each regressor r in the second layerkThe inputs are all the same. Such as RtAll regressor inputs in the second layer of (1) are St-1。
In the aspect of generating facial expression animation parameters, the invention utilizes Ekman and the like[5]Proposed AU-unit based face activity coding system FACS describes a total of 44 basic motion units, each controlled by an underlying part or muscle mass. Specifically, the Candide-3 face model based on the facial activity coding system can be used as a parameter carrier to extract the AU parameter E corresponding to the facial expression.
The Candide-3 face model is represented as follows:
in the formula,the basic shape of the model is represented, S is a static deformation matrix, A is a dynamic deformation matrix, sigma is a static deformation parameter, alpha is a dynamic deformation parameter, and R and t respectively represent a head rigid rotation matrix and a head translation matrix. g is a column vector of the vertex coordinates of the model and is used for representing various specific facial expression shapes. The model g is determined by four parameters of R, t, alpha and sigma.
(3) Extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
analyzing and extracting the speech emotion information characteristics in the speech input information; performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment; and corresponding the voice emotion result to an AU unit-based face activity coding system, and extracting a corresponding AU parameter to obtain a voice emotion animation parameter V.
(4) Learning a training sequence consisting of skeleton motion and corresponding skin deformation through an action state space model, establishing a virtual character skeleton skin model based on an auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.
(a) Action State Space Model (ASSM):
the action state space model consists of three key elements (S, a, { P }), where:
s: representing a set of states, facial expression states of the virtual character (e.g., happy, sad, etc.);
a: representing a group of action sets, taking parameters obtained through facial expression recognition and speech emotion recognition as a group of action vectors, and driving the change state of the next frame of virtual character;
p is state transition probability which represents the expression state s of the virtual character in the current frame ttE.g. S, by performing action ate.A and then to other states.
The dynamic process of ASSM is as follows: virtual character in state s0Motion vector a of the performer0E.g. driven by A, and is transferred to the next frame state s according to the probability P1Then perform action a1… so far we can get the process shown in fig. 5.
(b) Auxiliary bone framework:
auxiliary bone location procedure giving a set of primary bone index sets P for computing a global transformation matrix G for the primary bonep∈P. Order toAndthe position of the ith vertex on the static skin and the principal skeleton matrix in the original pose are represented.Representing the skin transformation matrix corresponding to the dominant bone. The index set of secondary bones called auxiliary bones is denoted by H and the corresponding skin formula is as follows:
viindicating the location of the deformed skin vertex, ShRepresenting the skin matrix corresponding to the h-th auxiliary bone. The first term of the above equation corresponds to skin deformation driven by the primary bone, and the second term provides additional control over deformation using the secondary bone. The number of auxiliary bones is given by the designer to balance the quality of the deformation and the computational cost.
Skin decomposition, which is to divide the skin decomposition into two sub-problems for description. First sub-problem estimates all optimal skin weights And skin matrixThe best approximation of the training data at each frame T e T. The second sub-problem approximates discrete transformations by an auxiliary bone control model based on the original skeleton
Given main skeleton skin matrixTraining sequence of (2) and corresponding vertex animationHere, the skin optimization decomposition problem is formulated as a least squares constraint problem, minimizing the original model and the targetThe sum of the squared shape differences across the training data set between models.
Wherein,
in the above formula, | · non-conducting fumenIs represented bynIn paradigm, V represents the subscript of the vertex set. The constant k represents the maximum number of skin mesh vertices affected by bone to adjust the balance between computational cost and accuracy.
Auxiliary bone controller-assuming that the auxiliary bone is driven by the original skeleton with only spherical joints, the posture of the auxiliary bone is determined by the main skeleton r of all rotating assembliespIs uniquely determined by e SO (3). Represented by a column vector as:
u:=Δt0||Δr0||r1||r2||…||r|p| (9)
in which u is E.R3|p|+6And | represents the join operator of vector values, | P | is the number of major bones, | t0∈R3Representing the time variation of the root node, Δ r0E SO (3) represents the change of direction of the root node.
Each auxiliary bone is attached to the main bone as a sub-bone of the main bone, for example, Φ (h) is regarded as the main bone corresponding to the h-th auxiliary bone, and s (h) is a skin matrix corresponding to the h-th auxiliary bone, such thatBy local conversion of LhAnd global translation. Local conversionLhBy a translational component thAnd a rotational component rhAnd (4) forming.
The model assumes that skin deformation is modeled as a concatenation of static and dynamic deformations, the former being determined from the pose of the main skeleton, and the latter depending on the skeleton motion and the change in skin deformation over the past time step. Thus, the skin transformation of the auxiliary bone q is represented by a static component x and a dynamic component y, q ═ x + y, connected. The static transformation x is computed from the skeletal pose and the dynamic transformation y is controlled using a state space model that takes into account the accumulated information of previous skeletal poses and ancillary skeletal transformations.
Claims (6)
1. A method for realizing real-time human face interactive animation based on a monocular camera is characterized by comprising the following steps:
s1, capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
s2, marking human face characteristic points in the human face image sequence, and extracting human face expression animation parameters;
s3, extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
s4, learning a training sequence consisting of skeleton motion and corresponding skin deformation through the action state space model, establishing a virtual character skeleton skin model based on the auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.
2. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
in step S2, a double-layer cascade regression model is used to mark the facial feature points, and a Candide-3 facial model based on a facial activity coding system is used as a parameter carrier to extract facial expression animation parameters.
3. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 2,
the double-layer cascade regression model adopts a two-layer regression structure, wherein the first layer adopts an enhanced regression model formed by combining T weak regressors in a superposition mode; the second layer is formed by superposing strong regressors which are formed by cascading K regression models aiming at each weak regressor in the first layer.
4. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
step S3 specifically includes:
s31, analyzing and extracting the speech emotion information features in the speech input information;
s32, performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment;
and S33, corresponding the voice emotion result to an AU unit-based face activity coding system, extracting corresponding AU parameters and obtaining voice emotion animation parameters.
5. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
in step S4, the motion state space model is composed of three key elements: (S, A, { P })
S represents a facial expression state set of each frame of the virtual character;
a represents a group of action sets, parameters obtained through facial expression recognition and speech emotion recognition are used as a group of action vectors, and the change state of the next frame of virtual characters is driven;
p is state transition probability and represents the expression state s of the virtual character in the current frame ttE.g. S, by performing action ate.A and then to other states.
6. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
in step S4, the method for creating a virtual character skeleton skin model based on an auxiliary bone controller includes:
a. taking a bone covering model of the manufactured virtual character without the auxiliary bone as an original model;
b. carrying out covering weight optimization on the skeleton covering model;
c. gradually inserting the auxiliary bone into a region where the maximum approximate error is generated between the original model and the target model face;
d. solving two sub-problems of skin weight optimization and auxiliary bone position conversion optimization by adopting a block coordinate descent algorithm;
e. constructing an auxiliary bone controller, wherein a skin transformation q based on the auxiliary bone controller is represented by connecting a static component x and a dynamic component y, and q is x + y; wherein, the static component X is calculated according to the main skeleton posture in the original model; the dynamic component y is controlled using a motion state space model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910839412.7A CN110599573B (en) | 2019-09-03 | 2019-09-03 | Method for realizing real-time human face interactive animation based on monocular camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910839412.7A CN110599573B (en) | 2019-09-03 | 2019-09-03 | Method for realizing real-time human face interactive animation based on monocular camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110599573A true CN110599573A (en) | 2019-12-20 |
CN110599573B CN110599573B (en) | 2023-04-11 |
Family
ID=68857773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910839412.7A Active CN110599573B (en) | 2019-09-03 | 2019-09-03 | Method for realizing real-time human face interactive animation based on monocular camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110599573B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813491A (en) * | 2020-08-19 | 2020-10-23 | 广州汽车集团股份有限公司 | Vehicle-mounted assistant anthropomorphic interaction method and device and automobile |
CN111968207A (en) * | 2020-09-25 | 2020-11-20 | 魔珐(上海)信息科技有限公司 | Animation generation method, device, system and storage medium |
CN112190921A (en) * | 2020-10-19 | 2021-01-08 | 珠海金山网络游戏科技有限公司 | Game interaction method and device |
CN112669424A (en) * | 2020-12-24 | 2021-04-16 | 科大讯飞股份有限公司 | Expression animation generation method, device, equipment and storage medium |
CN113050794A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Slider processing method and device for virtual image |
CN113269872A (en) * | 2021-06-01 | 2021-08-17 | 广东工业大学 | Synthetic video generation method based on three-dimensional face reconstruction and video key frame optimization |
CN113554745A (en) * | 2021-07-15 | 2021-10-26 | 电子科技大学 | Three-dimensional face reconstruction method based on image |
TWI773458B (en) * | 2020-11-25 | 2022-08-01 | 大陸商北京市商湯科技開發有限公司 | Method, device, computer equipment and storage medium for reconstruction of human face |
CN115588224A (en) * | 2022-10-14 | 2023-01-10 | 中南民族大学 | Face key point prediction method, virtual digital person generation method and device |
CN115731330A (en) * | 2022-11-16 | 2023-03-03 | 北京百度网讯科技有限公司 | Target model generation method, animation generation method, device and electronic equipment |
CN117809002A (en) * | 2024-02-29 | 2024-04-02 | 成都理工大学 | Virtual reality synchronization method based on facial expression recognition and motion capture |
CN117876549A (en) * | 2024-02-02 | 2024-04-12 | 广州一千零一动漫有限公司 | Animation generation method and system based on three-dimensional character model and motion capture |
Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1466104A (en) * | 2002-07-03 | 2004-01-07 | 中国科学院计算技术研究所 | Statistics and rule combination based phonetic driving human face carton method |
JP2007058846A (en) * | 2005-07-27 | 2007-03-08 | Advanced Telecommunication Research Institute International | Statistic probability model creation apparatus for lip sync animation creation, parameter series compound apparatus, lip sync animation creation system, and computer program |
US20090231347A1 (en) * | 2008-03-11 | 2009-09-17 | Masanori Omote | Method and Apparatus for Providing Natural Facial Animation |
CN102473320A (en) * | 2009-07-13 | 2012-05-23 | 微软公司 | Bringing a visual representation to life via learned input from the user |
CN103093490A (en) * | 2013-02-02 | 2013-05-08 | 浙江大学 | Real-time facial animation method based on single video camera |
CN103218841A (en) * | 2013-04-26 | 2013-07-24 | 中国科学技术大学 | Three-dimensional vocal organ animation method combining physiological model and data driving model |
CN103279970A (en) * | 2013-05-10 | 2013-09-04 | 中国科学技术大学 | Real-time human face animation driving method by voice |
CN103824089A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascade regression-based face 3D pose recognition method |
CN103942822A (en) * | 2014-04-11 | 2014-07-23 | 浙江大学 | Facial feature point tracking and facial animation method based on single video vidicon |
JP2015092347A (en) * | 2014-11-19 | 2015-05-14 | Necプラットフォームズ株式会社 | Emotion-expressing animation face display system, method and program |
CN105139438A (en) * | 2014-09-19 | 2015-12-09 | 电子科技大学 | Video face cartoon animation generation method |
CN105787448A (en) * | 2016-02-28 | 2016-07-20 | 南京信息工程大学 | Facial shape tracking method based on space-time cascade shape regression |
CN106447785A (en) * | 2016-09-30 | 2017-02-22 | 北京奇虎科技有限公司 | Method for driving virtual character and device thereof |
CN106653052A (en) * | 2016-12-29 | 2017-05-10 | Tcl集团股份有限公司 | Virtual human face animation generation method and device |
CN106919251A (en) * | 2017-01-09 | 2017-07-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition |
CN107274464A (en) * | 2017-05-31 | 2017-10-20 | 珠海金山网络游戏科技有限公司 | A kind of methods, devices and systems of real-time, interactive 3D animations |
CN107886558A (en) * | 2017-11-13 | 2018-04-06 | 电子科技大学 | A kind of human face expression cartoon driving method based on RealSense |
CN109116981A (en) * | 2018-07-03 | 2019-01-01 | 北京理工大学 | A kind of mixed reality interactive system of passive touch feedback |
US20190082211A1 (en) * | 2016-02-10 | 2019-03-14 | Nitin Vats | Producing realistic body movement using body Images |
CN109493403A (en) * | 2018-11-13 | 2019-03-19 | 北京中科嘉宁科技有限公司 | A method of human face animation is realized based on moving cell Expression Mapping |
CN109635727A (en) * | 2018-12-11 | 2019-04-16 | 昆山优尼电能运动科技有限公司 | A kind of facial expression recognizing method and device |
CN109712627A (en) * | 2019-03-07 | 2019-05-03 | 深圳欧博思智能科技有限公司 | It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon |
US20190197755A1 (en) * | 2016-02-10 | 2019-06-27 | Nitin Vats | Producing realistic talking Face with Expression using Images text and voice |
CN110009716A (en) * | 2019-03-28 | 2019-07-12 | 网易(杭州)网络有限公司 | Generation method, device, electronic equipment and the storage medium of facial expression |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
-
2019
- 2019-09-03 CN CN201910839412.7A patent/CN110599573B/en active Active
Patent Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1466104A (en) * | 2002-07-03 | 2004-01-07 | 中国科学院计算技术研究所 | Statistics and rule combination based phonetic driving human face carton method |
JP2007058846A (en) * | 2005-07-27 | 2007-03-08 | Advanced Telecommunication Research Institute International | Statistic probability model creation apparatus for lip sync animation creation, parameter series compound apparatus, lip sync animation creation system, and computer program |
US20090231347A1 (en) * | 2008-03-11 | 2009-09-17 | Masanori Omote | Method and Apparatus for Providing Natural Facial Animation |
CN102473320A (en) * | 2009-07-13 | 2012-05-23 | 微软公司 | Bringing a visual representation to life via learned input from the user |
CN103093490A (en) * | 2013-02-02 | 2013-05-08 | 浙江大学 | Real-time facial animation method based on single video camera |
CN103218841A (en) * | 2013-04-26 | 2013-07-24 | 中国科学技术大学 | Three-dimensional vocal organ animation method combining physiological model and data driving model |
CN103279970A (en) * | 2013-05-10 | 2013-09-04 | 中国科学技术大学 | Real-time human face animation driving method by voice |
CN103824089A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascade regression-based face 3D pose recognition method |
CN103942822A (en) * | 2014-04-11 | 2014-07-23 | 浙江大学 | Facial feature point tracking and facial animation method based on single video vidicon |
CN105139438A (en) * | 2014-09-19 | 2015-12-09 | 电子科技大学 | Video face cartoon animation generation method |
JP2015092347A (en) * | 2014-11-19 | 2015-05-14 | Necプラットフォームズ株式会社 | Emotion-expressing animation face display system, method and program |
US20190082211A1 (en) * | 2016-02-10 | 2019-03-14 | Nitin Vats | Producing realistic body movement using body Images |
US20190197755A1 (en) * | 2016-02-10 | 2019-06-27 | Nitin Vats | Producing realistic talking Face with Expression using Images text and voice |
CN105787448A (en) * | 2016-02-28 | 2016-07-20 | 南京信息工程大学 | Facial shape tracking method based on space-time cascade shape regression |
CN106447785A (en) * | 2016-09-30 | 2017-02-22 | 北京奇虎科技有限公司 | Method for driving virtual character and device thereof |
CN106653052A (en) * | 2016-12-29 | 2017-05-10 | Tcl集团股份有限公司 | Virtual human face animation generation method and device |
CN106919251A (en) * | 2017-01-09 | 2017-07-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition |
CN107274464A (en) * | 2017-05-31 | 2017-10-20 | 珠海金山网络游戏科技有限公司 | A kind of methods, devices and systems of real-time, interactive 3D animations |
CN107886558A (en) * | 2017-11-13 | 2018-04-06 | 电子科技大学 | A kind of human face expression cartoon driving method based on RealSense |
CN109116981A (en) * | 2018-07-03 | 2019-01-01 | 北京理工大学 | A kind of mixed reality interactive system of passive touch feedback |
CN109493403A (en) * | 2018-11-13 | 2019-03-19 | 北京中科嘉宁科技有限公司 | A method of human face animation is realized based on moving cell Expression Mapping |
CN109635727A (en) * | 2018-12-11 | 2019-04-16 | 昆山优尼电能运动科技有限公司 | A kind of facial expression recognizing method and device |
CN109712627A (en) * | 2019-03-07 | 2019-05-03 | 深圳欧博思智能科技有限公司 | It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon |
CN110009716A (en) * | 2019-03-28 | 2019-07-12 | 网易(杭州)网络有限公司 | Generation method, device, electronic equipment and the storage medium of facial expression |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
Non-Patent Citations (5)
Title |
---|
CHEN CHEN等: "Real-time 3D facial expression control based on performance", 《2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD)》 * |
MOZAMMEL CHOWDHURY等: "Fuzzy rule based approach for face and facial feature extraction in biometric authentication", 《2016 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ)》 * |
孙晨: "表演驱动的实时人脸表情动画合成", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
李可: "人脸表情动画研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
范懿文等: "支持表情细节的语音驱动人脸动画", 《计算机辅助设计与图形学学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813491B (en) * | 2020-08-19 | 2020-12-18 | 广州汽车集团股份有限公司 | Vehicle-mounted assistant anthropomorphic interaction method and device and automobile |
CN111813491A (en) * | 2020-08-19 | 2020-10-23 | 广州汽车集团股份有限公司 | Vehicle-mounted assistant anthropomorphic interaction method and device and automobile |
CN111968207B (en) * | 2020-09-25 | 2021-10-29 | 魔珐(上海)信息科技有限公司 | Animation generation method, device, system and storage medium |
CN111968207A (en) * | 2020-09-25 | 2020-11-20 | 魔珐(上海)信息科技有限公司 | Animation generation method, device, system and storage medium |
US11893670B2 (en) | 2020-09-25 | 2024-02-06 | Mofa (Shanghai) Information Technology Co., Ltd. | Animation generation method, apparatus and system, and storage medium |
CN112190921A (en) * | 2020-10-19 | 2021-01-08 | 珠海金山网络游戏科技有限公司 | Game interaction method and device |
TWI773458B (en) * | 2020-11-25 | 2022-08-01 | 大陸商北京市商湯科技開發有限公司 | Method, device, computer equipment and storage medium for reconstruction of human face |
CN112669424A (en) * | 2020-12-24 | 2021-04-16 | 科大讯飞股份有限公司 | Expression animation generation method, device, equipment and storage medium |
CN112669424B (en) * | 2020-12-24 | 2024-05-31 | 科大讯飞股份有限公司 | Expression animation generation method, device, equipment and storage medium |
US11842457B2 (en) | 2021-03-24 | 2023-12-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for processing slider for virtual character, electronic device, and storage medium |
CN113050794A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Slider processing method and device for virtual image |
CN113269872A (en) * | 2021-06-01 | 2021-08-17 | 广东工业大学 | Synthetic video generation method based on three-dimensional face reconstruction and video key frame optimization |
CN113554745A (en) * | 2021-07-15 | 2021-10-26 | 电子科技大学 | Three-dimensional face reconstruction method based on image |
CN113554745B (en) * | 2021-07-15 | 2023-04-07 | 电子科技大学 | Three-dimensional face reconstruction method based on image |
CN115588224A (en) * | 2022-10-14 | 2023-01-10 | 中南民族大学 | Face key point prediction method, virtual digital person generation method and device |
CN115731330A (en) * | 2022-11-16 | 2023-03-03 | 北京百度网讯科技有限公司 | Target model generation method, animation generation method, device and electronic equipment |
CN117876549A (en) * | 2024-02-02 | 2024-04-12 | 广州一千零一动漫有限公司 | Animation generation method and system based on three-dimensional character model and motion capture |
CN117809002A (en) * | 2024-02-29 | 2024-04-02 | 成都理工大学 | Virtual reality synchronization method based on facial expression recognition and motion capture |
CN117809002B (en) * | 2024-02-29 | 2024-05-14 | 成都理工大学 | Virtual reality synchronization method based on facial expression recognition and motion capture |
Also Published As
Publication number | Publication date |
---|---|
CN110599573B (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110599573B (en) | Method for realizing real-time human face interactive animation based on monocular camera | |
Ersotelos et al. | Building highly realistic facial modeling and animation: a survey | |
Sifakis et al. | Simulating speech with a physics-based facial muscle model | |
CN106971414B (en) | Three-dimensional animation generation method based on deep cycle neural network algorithm | |
CN107274464A (en) | A kind of methods, devices and systems of real-time, interactive 3D animations | |
CN117496072B (en) | Three-dimensional digital person generation and interaction method and system | |
Li et al. | A survey of computer facial animation techniques | |
CN114170353A (en) | Multi-condition control dance generation method and system based on neural network | |
Kobayashi et al. | Motion capture dataset for practical use of AI-based motion editing and stylization | |
CN108908353B (en) | Robot expression simulation method and device based on smooth constraint reverse mechanical model | |
CN113436302B (en) | Face animation synthesis method and system | |
Lei et al. | A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | |
Dong et al. | SignAvatar: Sign Language 3D Motion Reconstruction and Generation | |
Victor et al. | Pose Metrics: a New Paradigm for Character Motion Edition | |
Tian et al. | Augmented Reality Animation Image Information Extraction and Modeling Based on Generative Adversarial Network | |
Gao | The Application of Virtual Technology Based on Posture Recognition in Art Design Teaching | |
Zhang et al. | Implementation of Animation Character Action Design and Data Mining Technology Based on CAD Data | |
Jia et al. | A novel training quantitative evaluation method based on virtual reality | |
CN118135069B (en) | Real character dance video synthesis method | |
Zhao et al. | Dance Action Capture and CAD Design Based on Big Data Algorithm Optimization | |
de Aguiar et al. | Representing and manipulating mesh-based character animations | |
Venkatrayappa et al. | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | |
Liu | Research on the Construction of a Dance Simulation Training System Using VR Technology | |
Zhou et al. | Digital Remodeling of Dance Based on Multimodal Fusion Technology | |
Guo | Deep Learning for 3D Human Action Modeling and Understanding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |