CN110599573B - Method for realizing real-time human face interactive animation based on monocular camera - Google Patents

Method for realizing real-time human face interactive animation based on monocular camera Download PDF

Info

Publication number
CN110599573B
CN110599573B CN201910839412.7A CN201910839412A CN110599573B CN 110599573 B CN110599573 B CN 110599573B CN 201910839412 A CN201910839412 A CN 201910839412A CN 110599573 B CN110599573 B CN 110599573B
Authority
CN
China
Prior art keywords
animation
model
face
parameters
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910839412.7A
Other languages
Chinese (zh)
Other versions
CN110599573A (en
Inventor
谢宁
杨心如
申恒涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910839412.7A priority Critical patent/CN110599573B/en
Publication of CN110599573A publication Critical patent/CN110599573A/en
Application granted granted Critical
Publication of CN110599573B publication Critical patent/CN110599573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to a three-dimensional character animation technology, and discloses a method for realizing real-time human face interactive animation based on a monocular camera. The method can be summarized as follows: capturing a face video image and voice input information, and extracting a face expression animation parameter and a voice emotion animation parameter; learning a training sequence consisting of bone motion and corresponding skin deformation through an action state space model, establishing a virtual character bone skinning model based on an auxiliary bone controller, driving the virtual character bone skinning model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.

Description

Method for realizing real-time human face interactive animation based on monocular camera
Technical Field
The invention relates to a three-dimensional character animation technology, in particular to a method for realizing real-time human face interactive animation based on a monocular camera.
Background
In recent years, with the continuous development of computer hardware and software (such as the latest augmented reality application development kit (ARKit) released by apple Inc., the ARCore 1.0 and series of support tools released by Google Inc.), the multimedia technology is brought into a full development period, and meanwhile, as the visualization requirements of people on human-computer interaction interfaces are higher and higher, the human face modeling and animation technology plays more and more important roles in human-computer interaction. The application field of the three-dimensional facial expression animation technology is very wide, such as game entertainment, movie making, man-machine interaction, advertisement making and the like, and the three-dimensional facial expression animation technology has important application value and theoretical significance.
Since Parke [1] Since 1972, pioneering work using computer-generated face animation, more and more researchers around the world have found the research and use value of three-dimensional face modeling and animation technology and made many important contributions. As shown in fig. 1, these works mainly include how to use an effective model to represent the shape change of the face, how to accurately and rapidly capture facial expressions, how to finely build a three-dimensional face reconstruction model in real time, how to construct a digital face avatar, and how to drive it to generate a face model with a sense of reality.
Cao [2] In 2013, et al propose a method based on three-dimensional shape regressionA real-time face tracking and animation method. The method uses a monocular video camera as the acquisition equipment of the face image and mainly comprises two steps of preprocessing and real-time operation. In the preprocessing stage, a monocular camera is used to capture a user-specific gesture expression comprising a series of facial expressions and head rotation movements, followed by semi-automatic feature point labeling of the user's facial image using a human face feature point labeling algorithm. Based on the face image after the feature points are calibrated, cao [3] In 2014, the three-dimensional facial expression library FaceWarehouse oriented to the visual computing application is constructed. In the database, a bilinear face Model (bilinear face Model) containing two attributes of an individual and an expression is provided, and is used for fitting and generating a user-specific expression fusion Model. And calculating a three-dimensional face shape vector consisting of the three-dimensional positions of the feature points in each image of the user acquired by the camera through the expression fusion model. The Regression algorithm of the method adopts a double-level Boosted Regression (Two-level Boosted Regression) algorithm of shape related features, and all images and three-dimensional face shape vectors corresponding to the images are used as input to train a three-dimensional shape regressor specific to the user. In the real-time operation stage, the user-specific three-dimensional shape regressor obtains three-dimensional shape parameters and face motion parameters including rigid transformation parameters of the head and non-rigid face motion parameters of the facial expression based on the face motion parameters obtained from the current frame and the previous frame, and then the parameters are migrated and mapped to the virtual character to drive the virtual character to generate expression animation corresponding to the face motion.
However, the above method has certain limitations, and for each new user, a pre-processing process of about 45 minutes is required to generate a user-specific expression fusion model and a three-dimensional face shape regressor. Cao [4] In 2014, the real-time face tracking algorithm based on the offset dynamic expression regression is also provided, and is the algorithm based on the double-layer cascade regression, but no preprocessing operation is required for a new user, so that the real-time face expression tracking capture algorithm of any user is realized.
Cao et al, 2013, proposed real-time face tracking based on three-dimensional shape regressionAnimation method and Cao [4] The work gravity center of a real-time face tracking algorithm based on the offset dynamic expression regression, which was proposed again in 2014 by the people, lies in how to accurately, efficiently and robustly track large-amplitude movements of the face in the video, such as large-amplitude expressions like frowning, laughing and mouth opening, and rigid movements like head rotation and translation are added. But both ignore detail information on the face, such as a raised line in the face when lifting the eyebrow, secondary motion of the skin of the face caused by motion and the like, and the details are the important features for helping people to understand expressions and enabling the face to be richer in expressive force.
Reference:
[1]Parke F I.Computer generated animation of faces[C]//ACM Conference.ACM,1972:451-457.
[2]Cao C,Weng Y,Lin S,et al.3D shape regression for real-time facial animation[J].ACMTransactions on Graphics,2013,32(4):1.
[3]Cao C,Weng Y,Zhou S,et al.FaceWarehouse:A 3D Facial Expression Database forVisual Computing[J].IEEE Transactions on Visualization&Computer Graphics,2014,20(3):413-425.
[4]Cao C,Hou Q,Zhou K.Displaced dynamic expression regression for real-time facialtracking and animation[J].Acm Transactions on Graphics,2014,33(4):1-10.
[5]Ekman P,Friesen W V.Facial Action Coding System:Manual[J].Agriculture,1978.
[6]Duffy N,Helmbold D.Boosting Methods for Regression[J].Machine Learning,2002,47(2-3):153-200.
disclosure of Invention
The technical problem to be solved by the invention is as follows: a method for realizing real-time interactive animation of human faces based on a monocular camera is provided, animation parameters are generated by fusing human face expression capture and speech emotion recognition technologies, and visual dynamic skin deformation animation is synthesized in real time by a skeleton-based technology, so that the expression of the generated real-time animation has richer nature, more realistic sense and more self-characteristic.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for realizing real-time human face interactive animation based on a monocular camera comprises the following steps:
s1, capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
s2, marking face characteristic points in the face image sequence, and extracting facial expression animation parameters;
s3, extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
and S4, learning a training sequence consisting of skeleton motion and corresponding skin deformation through the action state space model, establishing a virtual character skeleton skin model based on the auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating the real-time interactive animation.
As a further optimization, in step S2, a double-layer cascade regression model is used to mark feature points of the human face, and a Candide-3 human face model based on a facial activity coding system is used as a parameter carrier to extract parameters of facial expression animation.
For further optimization, the double-layer cascade regression model adopts a two-layer regression structure, wherein the first layer adopts an enhanced regression model formed by combining T weak regressors in a superposition mode; the second layer is formed by superposing strong regressors which are formed by cascading K regression models aiming at each weak regressor in the first layer.
As a further optimization, step S3 specifically includes:
s31, analyzing and extracting the speech emotion information features in the speech input information;
s32, performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment;
and S33, corresponding the voice emotion result to an AU unit-based face activity coding system, extracting corresponding AU parameters, and obtaining voice emotion animation parameters.
As a further optimization, in step S4, the motion state space model is composed of three key elements: (S, A, { P })
S represents a facial expression state set of each frame of the virtual character;
a represents a group of action sets, and parameters obtained through facial expression recognition and voice emotion recognition are used as a group of action vectors to drive the change state of the next frame of virtual characters;
p is state transition probability and represents the expression state s of the virtual character in the current frame t t E.g. S, by performing action a t And e, transferring to the probability distribution of other states after the state A is formed.
As a further optimization, in step S4, the method for establishing a virtual character skeleton skin model based on an auxiliary bone controller includes:
a. taking a bone covering model of the manufactured virtual character without the auxiliary bone as an original model;
b. carrying out covering weight optimization on the skeleton covering model;
c. gradually inserting the auxiliary bone into a region where the maximum approximate error is generated between the original model and the target model face;
d. solving two sub-problems of skin weight optimization and auxiliary bone position conversion optimization by adopting a block coordinate descent algorithm;
e. constructing an auxiliary bone controller, wherein a skin transformation q based on the auxiliary bone controller is represented by a static component x and a dynamic component y in a connected mode, and q = x + y; wherein, the static component X is calculated according to the main skeleton posture in the original model; the dynamic component y is controlled using a motion state space model.
The invention has the beneficial effects that:
1. facial expressions are the affluence of human emotions, but in some special cases, facial expressions do not fully express the internal emotion of a character. If facial point-to-point driving is performed by capturing and tracking facial expression feature points as parameters, it is obvious that the generated facial animation is not vivid enough. For example, when a character smiles and laughs, the facial expressions of the character and the character are similar, but different language words are emitted, so that the current emotional state change of the character can be better captured from the perspective of voice by adding the voice emotion recognition technology. The invention combines the facial expression capturing technology and the voice emotion recognition technology, and can greatly improve the richness, naturalness and reality of the expression animation of the virtual character.
2. Because the movement of the bones and the muscles drives the change of the skin expression together, in order to better simulate the skin movement, the invention adopts a bone skin model, automatically adds auxiliary bones through a skin decomposition algorithm based on the bones, and drives the main bones for simulating the head bone movement and the auxiliary bones for simulating the muscle movement to drive the virtual character together for animation.
Drawings
FIG. 1 is a current state of the art of three-dimensional face animation;
FIG. 2 is a schematic diagram of the implementation of real-time human face interaction animation according to the present invention;
FIG. 3 is a schematic diagram of an enhanced regression structure;
FIG. 4 is a schematic diagram of a two-layer cascade regression structure;
FIG. 5 is a diagram illustrating the state transition process of the ASSM.
Detailed Description
The invention aims to provide a method for realizing real-time human face interactive animation based on a monocular camera, which generates animation parameters by fusing human face expression capture and speech emotion recognition technologies and synthesizes visual dynamic skin deformation animation in real time by a skeleton-based technology, so that the expression of the generated real-time animation has richer nature, more realistic sense and more self-characteristics. To achieve the purpose, the scheme of the invention is mainly realized from the following aspects:
1. in the aspect of human face motion capture:
face motion capture includes two parts: non-rigid capture of facial expressions and head rigid transformation capture. According to the unique muscle movement characteristics of the facial expressions, the facial five sense organs are coordinated as a unified whole to show each facial expression. The method uses the intermediate description method with invariance as the reliable feature representation of the facial expression recognition to make up the deficiency of the bottom layer feature in the facial expression recognition.
2. And speech emotion recognition: the current emotional state of a person is captured through voice input of a performer, and voice emotional animation parameters corresponding to the current emotional state of the performer are generated through steps of voice feature extraction, dimensionality reduction, classification and the like.
3. And (3) target digital substitution expression aspect: a bone-based dynamic avatar expression method is used, which obtains the optimal transfer of nonlinear complex deformation including soft tissue by learning a training sequence consisting of bone motion and a corresponding skin deformation sequence. And driving the movement of the bones of the head of the character to perform programmed control on the auxiliary bones through the expression semantics of the user extracted from the human face motion capture so as to simulate the dynamic deformation of the facial skin.
In terms of specific implementation, the principle of the method for implementing the real-time interactive animation of the human face based on the monocular camera is shown in fig. 2, and the method comprises the following steps:
(1) Capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
(2) Capturing and tracking a human face: marking human face characteristic points from the captured human face image, and extracting human face expression animation parameters;
the positioning of the face feature points is a key link in face recognition, face tracking, face animation and three-dimensional face modeling. Due to factors such as human face diversity and illumination, locating human face feature points in natural environments is still a difficult challenge. The specific definition of the human face characteristic points is as follows: for a face shape containing N face feature points S = [ x ] 1 ,y 1 ,...,x N ,y N ]For an input face picture, the aim of face feature point positioning is to estimate a face feature point shape S, so that S is equal to the real shape of the face feature point
Figure GDA0004037009640000051
Difference of (2)Minimum value, S and->
Figure GDA0004037009640000052
The minimized alignment difference between can be defined as L 2 -normal form
Figure GDA0004037009640000053
This equation is used to guide the training of the face feature point locator or to evaluate the performance of the face feature point location algorithm.
The invention aims to adopt an algorithm frame based on a regression model to carry out real-time and efficient human face detection and tracking.
a) Enhanced Regression (boost Regression)
Using enhanced regression to combine T weak regressors (R) 1 ,...R t ,...R T ) Are combined in a superimposed manner. For a given face sample I and an initialization shape S 0 Each regressor calculates a shape increment based on the sample features
Figure GDA0004037009640000058
And updates the current shape in a cascading manner:
S t =S t+1 +R t (I,S t-1 ),t=1,...,T(1)
R t (I,S t-1 ) Represents a regressor R t Using the input sample image I and the last shape S t-1 Calculated shape increment, R t From the input sample image I and the last shape S t-1 Deciding, using shape index features, to learn R t As shown in fig. 3;
given N training samples
Figure GDA0004037009640000054
Figure GDA0004037009640000055
I representing the ith sample i True shape of (c), pair (R) 1 ,...R t ,...R T ) And (5) circulating the training until the training error is not increased any more.Each R t Are calculated by minimizing the alignment error, i.e.:
Figure GDA0004037009640000056
Figure GDA0004037009640000057
representing the last shape estimate, R, of the ith image t The output of (c) is a shape increment.
b) Double-layer lifting Regression (Two-level Boosted Regression)
The enhanced regression algorithm regresses the entire shape, but the large appearance difference in the input image and the rough initialized face shape make the single-layer weak regressor no longer applicable. A single regressor is too weak, convergence is slow during training, and results are poor during testing. In order to make the convergence faster and more stable during training, the invention adopts a two-layer cascade structure, as shown in fig. 4.
The first layer employs the enhanced regression model described above. For each regressor R of the first layer t Again using K regression models, i.e. R t =(r 1 ,...r k ,...r K ) Herein, r is called a primary regressor, and is cascaded by K primary regressors to form a strong regressor. The difference between the first layer and the second layer is that each regressor R in the first layer t Is inputted by t-1 Are all different and each regressor r in the second layer k The inputs are all the same. Such as R t All regressor inputs in the second layer of (1) are S t-1
In generating the parameters of facial expression animation, the invention utilizes Ekman and the like [5] Proposed AU-unit based face activity coding system FACS describes a total of 44 basic motion units, each controlled by an underlying part or muscle mass. Specifically, the Candide-3 face model based on the facial activity coding system can be used as a parameter carrier to extract the AU parameter E corresponding to the facial expression.
The Candide-3 face model is represented as follows:
Figure GDA0004037009640000061
in the formula (I), the compound is shown in the specification,
Figure GDA0004037009640000062
the basic shape of the model is represented, S is a static deformation matrix, A is a dynamic deformation matrix, sigma is a static deformation parameter, alpha is a dynamic deformation parameter, and R and t respectively represent a head rigid rotation matrix and a head translation matrix. g is a column vector of the vertex coordinates of the model, and is used for representing various specific facial expression shapes. The model g is determined by four parameters of R, t, alpha and sigma.
(3) Extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
analyzing and extracting the speech emotion information characteristics in the speech input information; performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment; and corresponding the voice emotion result to an AU unit-based face activity coding system, and extracting corresponding AU parameters to obtain a voice emotion animation parameter V.
(4) Learning a training sequence consisting of bone motion and corresponding skin deformation through an action state space model, establishing a virtual character bone skinning model based on an auxiliary bone controller, driving the virtual character bone skinning model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation.
(a) Action State Space Model (ASSM):
the action state space model consists of three key elements (S, a, { P }), where:
s: representing a state set, facial expression states of the virtual character (such as happy, sad, etc.);
a: representing a group of action sets, taking parameters obtained through facial expression recognition and speech emotion recognition as a group of action vectors, and driving the change state of the next frame of virtual character;
p is state transition probability which represents the expression state s of the virtual character in the current frame t t E.g. S, by performing action a t e.A and then to other states.
The dynamic process of ASSM is as follows: virtual character in state s 0 Motion vector a of the performer 0 E is driven by A, and the state is transferred to the next frame state s according to the probability P 1 Then perform action a 1 8230, and so on we can get the process shown in fig. 5.
(b) Auxiliary bone framework:
auxiliary bone location procedure giving a set of primary bone index sets P for computing a global transformation matrix G for the primary bone p∈P . Order to
Figure GDA0004037009640000071
And &>
Figure GDA0004037009640000072
Representing the location of the ith vertex on the static skin and the principal skeleton matrix in the original pose. />
Figure GDA0004037009640000073
Representing the skin transformation matrix corresponding to the dominant bone. The index set of secondary bones called auxiliary bones is denoted by H and the corresponding skin formula is as follows:
Figure GDA0004037009640000074
v i indicating the location of the deformed skin vertex, S h Representing the skin matrix corresponding to the h-th auxiliary bone. The first term of the above equation corresponds to skin deformation driven by the primary bone, and the second term provides additional control over deformation using the secondary bone. The number of auxiliary bones is given by the designer to balance the quality of the deformation and the computational cost.
Skin decomposition, which is to divide the skin decomposition into two sub-problems for description. First sub-problem estimates all optimal skin weights
Figure GDA0004037009640000075
And skin matrix->
Figure GDA0004037009640000076
The best approximation of the training data at each frame T e T. The second sub-problem approximates discrete transitions ≧ based on the original skeleton by the auxiliary bone control model>
Figure GDA0004037009640000077
Given main skeleton skin matrix
Figure GDA0004037009640000078
And the corresponding vertex animation>
Figure GDA0004037009640000079
Here, the skinning optimization decomposition problem is formulated as a least squares constraint problem, minimizing the sum of the squared shape differences between the original and target models over the entire training data set.
Figure GDA00040370096400000710
Wherein the content of the first and second substances,
Figure GDA00040370096400000711
Figure GDA00040370096400000712
Figure GDA00040370096400000713
in the above formula, | · non-conducting fume n Is represented by n In the paradigm, V represents the subscript of the vertex set. The constant k represents the maximum number of skin mesh vertices affected by bone to adjust the computational cost and accuracyBalance between them.
Auxiliary bone controller-assuming that the auxiliary bone is driven by the original skeleton with only spherical joints, the posture of the auxiliary bone is determined by the main skeleton r of all rotating assemblies p Is uniquely determined by e SO (3). Represented by a column vector as:
u:=Δt 0 ||Δr 0 ||r 1 ||r 2 ||…||r |p| (9)
in which u is E.R 3|p|+6 And | represents the join operator of vector values, | P | is the number of major bones, | t 0 ∈R 3 Representing the time variation of the root node, Δ r 0 E SO (3) represents the change of direction of the root node.
Each auxiliary bone is attached to the main bone as a sub-bone of the main bone, for example, Φ (h) is regarded as a main bone corresponding to the h-th auxiliary bone, and S (h) is a skin matrix corresponding to the h-th auxiliary bone, such that
Figure GDA0004037009640000081
By local conversion of L h And global translation. Local transformation L h By a translational component t h And a rotational component r h And (4) forming.
The model assumes that skin deformation is modeled as a concatenation of static and dynamic deformation, the former determined from the pose of the main skeleton, and the latter dependent on the skeleton motion and the change in skin deformation over the past time step. Thus, the skin transformation of the auxiliary bone q is represented by a static component x and a dynamic component y connected, q = x + y. The static transformation x is computed from the skeletal pose and the dynamic transformation y is controlled using a state space model that takes into account the accumulated information of previous skeletal poses and ancillary skeletal transformations.

Claims (5)

1. A method for realizing real-time human face interactive animation based on a monocular camera is characterized by comprising the following steps:
s1, capturing a face video image through a monocular camera to obtain a face image sequence; simultaneously capturing voice input information through a voice sensor;
s2, marking face characteristic points in the face image sequence, and extracting facial expression animation parameters;
s3, extracting voice features from the captured voice input information and extracting voice emotion animation parameters;
s4, learning a training sequence consisting of skeleton motion and corresponding skin deformation through the action state space model, establishing a virtual character skeleton skin model based on an auxiliary bone controller, driving the virtual character skeleton skin model through the extracted facial expression animation parameters and voice emotion animation parameters, and generating real-time interactive animation;
in step S4, the motion state space model is composed of three key elements: (S, A, { P })
S represents a facial expression state set of each frame of the virtual character;
a represents a group of action sets, parameters obtained through facial expression recognition and speech emotion recognition are used as a group of action vectors, and the change state of the next frame of virtual characters is driven;
p is state transition probability and represents the expression state s of the virtual character in the current frame t t E.g. S, by performing action a t e.A and then to other states.
2. The method for realizing the real-time interactive animation of the human face based on the monocular camera as recited in claim 1,
in the step S2, a double-layer cascade regression model is adopted to mark the human face characteristic points, and a Candide-3 human face model based on a facial activity coding system is used as a parameter carrier to extract the parameters of the facial expression animation.
3. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 2,
the double-layer cascade regression model adopts a two-layer regression structure, wherein the first layer adopts an enhanced regression model formed by combining T weak regressors in a superposition mode; the second layer is formed by superposing strong regressors which are formed by cascading K regression models aiming at each weak regressor in the first layer.
4. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
step S3 specifically includes:
s31, analyzing and extracting the speech emotion information features in the speech input information;
s32, performing emotion recognition on the extracted voice emotion characteristics to finish emotion judgment;
and S33, corresponding the voice emotion result to an AU unit-based face activity coding system, extracting corresponding AU parameters, and obtaining voice emotion animation parameters.
5. The method for realizing real-time interactive animation of human faces based on a monocular camera as recited in claim 1,
in step S4, the method for establishing a virtual character skeleton skin model based on an auxiliary bone controller includes:
a. taking a bone covering model of the manufactured virtual character without the auxiliary bone as an original model;
b. carrying out covering weight optimization on the skeleton covering model;
c. gradually inserting the auxiliary bone into a region where the maximum approximate error is generated between the original model and the target model face;
d. solving two sub-problems of skin weight optimization and auxiliary bone position conversion optimization by adopting a block coordinate descent algorithm;
e. constructing an auxiliary bone controller, wherein a skin transformation q based on the auxiliary bone controller is represented by a static component x and a dynamic component y in a connected mode, and q = x + y; wherein, the static component X is calculated according to the main skeleton posture in the original model; the dynamic component y is controlled using a motion state space model.
CN201910839412.7A 2019-09-03 2019-09-03 Method for realizing real-time human face interactive animation based on monocular camera Active CN110599573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910839412.7A CN110599573B (en) 2019-09-03 2019-09-03 Method for realizing real-time human face interactive animation based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910839412.7A CN110599573B (en) 2019-09-03 2019-09-03 Method for realizing real-time human face interactive animation based on monocular camera

Publications (2)

Publication Number Publication Date
CN110599573A CN110599573A (en) 2019-12-20
CN110599573B true CN110599573B (en) 2023-04-11

Family

ID=68857773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910839412.7A Active CN110599573B (en) 2019-09-03 2019-09-03 Method for realizing real-time human face interactive animation based on monocular camera

Country Status (1)

Country Link
CN (1) CN110599573B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813491B (en) * 2020-08-19 2020-12-18 广州汽车集团股份有限公司 Vehicle-mounted assistant anthropomorphic interaction method and device and automobile
CN111968207B (en) * 2020-09-25 2021-10-29 魔珐(上海)信息科技有限公司 Animation generation method, device, system and storage medium
CN112190921A (en) * 2020-10-19 2021-01-08 珠海金山网络游戏科技有限公司 Game interaction method and device
CN112419454B (en) * 2020-11-25 2023-11-28 北京市商汤科技开发有限公司 Face reconstruction method, device, computer equipment and storage medium
CN112669424A (en) * 2020-12-24 2021-04-16 科大讯飞股份有限公司 Expression animation generation method, device, equipment and storage medium
CN113050794A (en) 2021-03-24 2021-06-29 北京百度网讯科技有限公司 Slider processing method and device for virtual image
CN113269872A (en) * 2021-06-01 2021-08-17 广东工业大学 Synthetic video generation method based on three-dimensional face reconstruction and video key frame optimization
CN113554745B (en) * 2021-07-15 2023-04-07 电子科技大学 Three-dimensional face reconstruction method based on image
CN115588224B (en) * 2022-10-14 2023-07-21 中南民族大学 Virtual digital person generation method and device based on face key point prediction
CN115731330A (en) * 2022-11-16 2023-03-03 北京百度网讯科技有限公司 Target model generation method, animation generation method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473320A (en) * 2009-07-13 2012-05-23 微软公司 Bringing a visual representation to life via learned input from the user
CN106919251A (en) * 2017-01-09 2017-07-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1320497C (en) * 2002-07-03 2007-06-06 中国科学院计算技术研究所 Statistics and rule combination based phonetic driving human face carton method
JP4631078B2 (en) * 2005-07-27 2011-02-16 株式会社国際電気通信基礎技術研究所 Statistical probability model creation device, parameter sequence synthesis device, lip sync animation creation system, and computer program for creating lip sync animation
US8743125B2 (en) * 2008-03-11 2014-06-03 Sony Computer Entertainment Inc. Method and apparatus for providing natural facial animation
CN103093490B (en) * 2013-02-02 2015-08-26 浙江大学 Based on the real-time face animation method of single video camera
CN103218841B (en) * 2013-04-26 2016-01-27 中国科学技术大学 In conjunction with the three-dimensional vocal organs animation method of physiological models and data-driven model
CN103279970B (en) * 2013-05-10 2016-12-28 中国科学技术大学 A kind of method of real-time voice-driven human face animation
CN103824089B (en) * 2014-02-17 2017-05-03 北京旷视科技有限公司 Cascade regression-based face 3D pose recognition method
CN103942822B (en) * 2014-04-11 2017-02-01 浙江大学 Facial feature point tracking and facial animation method based on single video vidicon
CN105139438B (en) * 2014-09-19 2018-01-12 电子科技大学 video human face cartoon generation method
JP2015092347A (en) * 2014-11-19 2015-05-14 Necプラットフォームズ株式会社 Emotion-expressing animation face display system, method and program
US11736756B2 (en) * 2016-02-10 2023-08-22 Nitin Vats Producing realistic body movement using body images
WO2017137947A1 (en) * 2016-02-10 2017-08-17 Vats Nitin Producing realistic talking face with expression using images text and voice
CN105787448A (en) * 2016-02-28 2016-07-20 南京信息工程大学 Facial shape tracking method based on space-time cascade shape regression
CN106447785A (en) * 2016-09-30 2017-02-22 北京奇虎科技有限公司 Method for driving virtual character and device thereof
CN106653052B (en) * 2016-12-29 2020-10-16 Tcl科技集团股份有限公司 Virtual human face animation generation method and device
CN107274464A (en) * 2017-05-31 2017-10-20 珠海金山网络游戏科技有限公司 A kind of methods, devices and systems of real-time, interactive 3D animations
CN107886558A (en) * 2017-11-13 2018-04-06 电子科技大学 A kind of human face expression cartoon driving method based on RealSense
CN109116981A (en) * 2018-07-03 2019-01-01 北京理工大学 A kind of mixed reality interactive system of passive touch feedback
CN109493403A (en) * 2018-11-13 2019-03-19 北京中科嘉宁科技有限公司 A method of human face animation is realized based on moving cell Expression Mapping
CN109635727A (en) * 2018-12-11 2019-04-16 昆山优尼电能运动科技有限公司 A kind of facial expression recognizing method and device
CN109712627A (en) * 2019-03-07 2019-05-03 深圳欧博思智能科技有限公司 It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon
CN110009716B (en) * 2019-03-28 2023-09-26 网易(杭州)网络有限公司 Facial expression generating method and device, electronic equipment and storage medium
CN110070944B (en) * 2019-05-17 2023-12-08 段新 Social function assessment training system based on virtual environment and virtual roles

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473320A (en) * 2009-07-13 2012-05-23 微软公司 Bringing a visual representation to life via learned input from the user
CN106919251A (en) * 2017-01-09 2017-07-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition

Also Published As

Publication number Publication date
CN110599573A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110599573B (en) Method for realizing real-time human face interactive animation based on monocular camera
Guo et al. Ad-nerf: Audio driven neural radiance fields for talking head synthesis
Magnenat-Thalmann et al. Handbook of virtual humans
Ersotelos et al. Building highly realistic facial modeling and animation: a survey
CN108288072A (en) A kind of facial expression synthetic method based on generation confrontation network
CN107274464A (en) A kind of methods, devices and systems of real-time, interactive 3D animations
Li et al. A survey of computer facial animation techniques
CN114170353A (en) Multi-condition control dance generation method and system based on neural network
CN108908353B (en) Robot expression simulation method and device based on smooth constraint reverse mechanical model
Krishna SignPose: Sign language animation through 3D pose lifting
CN114967937B (en) Virtual human motion generation method and system
CN110853131A (en) Virtual video data generation method for behavior recognition
Kobayashi et al. Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization
CN115914660A (en) Method for controlling actions and facial expressions of digital people in meta universe and live broadcast
Victor et al. Pose Metrics: a New Paradigm for Character Motion Edition
Xia et al. Technology based on interactive theatre performance production and performance platform
Guo et al. Scene Construction and Application of Panoramic Virtual Simulation in Interactive Dance Teaching Based on Artificial Intelligence Technology
CN117496072B (en) Three-dimensional digital person generation and interaction method and system
Zhang et al. Implementation of Animation Character Action Design and Data Mining Technology Based on CAD Data
Tian et al. Augmented Reality Animation Image Information Extraction and Modeling Based on Generative Adversarial Network
de Aguiar et al. Representing and manipulating mesh-based character animations
Jia et al. A Novel Training Quantitative Evaluation Method Based on Virtual Reality
Johnson A Survey of Computer Graphics Facial Animation Methods: Comparing Traditional Approaches to Machine Learning Methods
Venkatrayappa et al. Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications
Gao The Application of Virtual Technology Based on Posture Recognition in Art Design Teaching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant