CN1920886A - Video flow based three-dimensional dynamic human face expression model construction method - Google Patents

Video flow based three-dimensional dynamic human face expression model construction method Download PDF

Info

Publication number
CN1920886A
CN1920886A CNA2006100533938A CN200610053393A CN1920886A CN 1920886 A CN1920886 A CN 1920886A CN A2006100533938 A CNA2006100533938 A CN A2006100533938A CN 200610053393 A CN200610053393 A CN 200610053393A CN 1920886 A CN1920886 A CN 1920886A
Authority
CN
China
Prior art keywords
centerdot
dimensional
frame
face
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100533938A
Other languages
Chinese (zh)
Other versions
CN100416612C (en
Inventor
庄越挺
张剑
肖俊
王玉顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2006100533938A priority Critical patent/CN100416612C/en
Publication of CN1920886A publication Critical patent/CN1920886A/en
Application granted granted Critical
Publication of CN100416612C publication Critical patent/CN100416612C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a three-dimension dynamic face pathetic model construction method based on video flow, which can return the three-dimension face pathetic based on input video flow, wherein the algorism comprises: (1) marking face character point at the first frame of input video; (2) using light stream method of affine correction to track the character point; (3) rebuilding the two-dimension track data based on factor decomposition into three-dimension data; (4 using rebuilt three-dimension date to match general face model, to generate personal face and dynamic pathetic motion; (5) using character face technique to compress the original video; (6) using character face to rebuild input video and projecting dynamic pattern, to compose true virtual appearance. The invention has high time/spatial efficiency and high value.

Description

Three-dimensional dynamic human face expression model construction method based on video flowing
Technical field
The present invention relates to the crossing domain of computer vision and computer graphics, relate in particular to a kind of three-dimensional dynamic human face expression model construction method based on video flowing.
Background technology
It all is a challenging subject that personalized human face modeling and sense of reality expression animation generate all the time, and is widely used in aspects such as virtual reality, film making, Entertainment.From Parke[1] since initiative work in 1972, the research of people's face and expression modeling aspect has obtained remarkable progress.According to required input data difference,, modeling pattern mainly is divided into following a few class: based on catching the three-dimensional samples data modeling that obtains; Based on image modeling; Based on the video flowing modeling.People such as Blanz [2] are by the statistical nature in the study three-dimensional face storehouse, set up the personalized human face model according to width of cloth input facial image,, this need use expensive laser scanning equipment to scan in advance and set up the three-dimensional face storehouse, and data volume is too big, and computation complexity is too high.People such as Deng [3] move and extract independent expression parameter and synthetic expression by catching labeled real human face, and this needs comparatively expensive capturing movement equipment equally, and must make marks on the face the performer.Document [4,5,6,7] extracts three-dimensional information and rebuilds faceform, Pighin[4 from image] use several picture reconstruction faceform, but must be on every width of cloth image manual marker characteristic point, and the generation of expression also needs a lot of manual interactions.Document [5] operating specification orthogonal image is to the modeling of people's face and use the driving of expressing one's feelings of muscle vector, and shortcoming is that the muscle vector position is difficult to correct setting, and the quadrature constraint is too strict, makes method lack generalization.Document [6] uses two width of cloth direct pictures to the modeling of people's face, and camera must be demarcated in advance, and the unique point of rebuilding is fewer, only the unique point interpolation is generated people's face grid and is difficult to accurately reflect people's face local feature.Document [7] adopts orthogonal image equally, by one progressively the process optimization of refinement obtain the faceform, have the too strict shortcoming of constraint condition equally.People [8] such as Li Zhang utilize structured light, rebuild human face expression by the stereoscopic vision method from video flowing, this need comprise the hardware device of structured light projection instrument, and the model that scanning obtains will carry out loaded down with trivial details manual pre-service, and ambient lighting is had relatively high expectations.The method that people [9] such as Zicheng Liu propose highly significant, the i.e. video flowing reconstruction of three-dimensional faceform who never demarcates, the method is not strict with the input data, but the detection of angle point and coupling robust inadequately itself, be subjected to the influence of illumination easily, this may cause the failure of reconstruction.
Traditional face cartoon method is mainly considered faceform's geometric deformation [5,6,7,9], texture is to model vertices, thereby when grid produces deformation, texture also can stretch thereupon and twist, therefore traditional texture can be regarded a kind of static method as. yet people's face is highly non-rigid surface, human face expression not only comprises the small geometric deformation (as wrinkle) in surface, also comprise the change of the colour of skin and expression, and from the angle of geometric deformation is very difficult these variations are simulated merely. therefore in this sense traditional texture mapping method is not enough to produce the human face expression with height sense of reality.
[1]Parke F.Computer generated animation of faces.Proceedings of the ACM Annual Conference,Boston,1972:451-457.
[2]Blanz V,Vetter T.A morphable model for the synthesis of 3D faces.Proceedings ofSIGGRAPH’99,Los Angeles,1999:187-194.
[3]Deng Z,Bulut M,Neumann U,Narayanan S.Automatic dynamic expression synthesis forspeech animation.Proceedings of IEEE Computer Animation and Social Agents,Geneva,2004:267-274.
[4]Pighin F,Hecker J,Lichinski D,Szeliski R,Salesin D.H.Synthesizing realistic facialexpressions from photographs.Proceedings of SIGGRAPH’98,Orlando,Florida,1998:75-84.
[5]Mei L,Bao HJ,Peng QS.Quick customization of particular human face and muscle-drivenexpression animation.Joumal of Computer-Aided Design & Computer Graphics,2001,13(12):1077-1082.
[6]Wang K,Zheng NN.3D face modeling based on SFM algorithm.Chinese Journal of Computers,2005,28(6):1048-1053.
[7]Su CY,Zhuang YT,Huang L,Wu F.Analysis-by-synthesis approach for facial modeling basedon orthogonal images.Journal of Zhejiang University(Engineering Science),2005,39(2):175-179
[8]Li Zhang,Snavely N,Curless B,Seitz S.Spacetime faces:high resolution capture for modelingand animation.ACM Transactions on Graphics,2004,23(3):548-558.
[9]ZC Liu,ZY Zhang,Jacobs C,Cohen M.Rapid modeling of animated faces from video images.ACM Multimedia,Los Angeles,2000:475-476.
Summary of the invention
The object of the present invention is to provide a kind of three-dimensional dynamic human face expression model construction method based on video flowing.
Method step is:
1) do not demarcate the manual position that marks human face characteristic point of the first frame of video at the monocular of input;
2) adopt the optical flow method of affine rectification that the unique point of first frame mark is followed the tracks of, determine the change in location situation of these unique points every frame in video sequence;
3) adopt the method for decomposing that the two-dimensional tracking data are reverted to the three-dimensional motion data based on the factor;
4) preceding 3 frames of three-dimensional motion data are averaged, thereby produce the personalized three-dimensional faceform with the adaptive general three-dimensional face model of this mean value;
5) use other this personalized three-dimensionals of three-dimensional motion data-driven faceform, generate the dynamic 3 D human face expression;
6) adopt video-frequency compression method that input video is compressed, with less storage space based on eigenface;
7) the use characteristic face is rebuild input video, and in conjunction with the two-dimensional tracking data dynamic 3 D people face is carried out the dynamic texture mapping automatically, generates realistic three-dimensional face expression sequence.
Described human face characteristic point: people's face shape defined parameters and human face animation parameter according to mpeg 4 standard are provided with, have 40, be distributed in positions such as facial contour, eyes, lip edge, not only can better reflect people's face topology, and the human face expression motion can be described, when remaining neutral expression, people's face can be regarded as rigid body substantially, unique point definition this moment people face shape feature; When people's face presents the expression motion, unique point definition human face animation parameter.
The optical flow method of affine rectification: the accuracy of correcting traditional optical flow tracking method by the affined transformation between the calculating frame of video; The purpose of tradition optical flow tracking is that the search side-play amount makes matching error minimum with character pair spot correlation neighborhood: given two adjacent video frames I 1And I 2, mark I 1In the position of certain unique point be f=(u, v) T, the mark light stream is p=(p u, p v) T, I then 2The position of middle character pair point is f+p; P can be by minimizing
Figure A20061005339300081
Obtain, wherein to be one be the square area at center with f to T; Yet when human face posture in the image and the bigger variation of illumination generation, the tracking effect of the point on nose, chin and the crown can be very poor in people's face, but the tracking effect of the point of canthus, hairline, mouth and cheek is still very accurate, therefore defines P 1 aAnd P 2 aBe I 1And I 2In the unique point of accurately following the tracks of, then according to hypothesis, P 1 aAnd P 2 aBetween an available affined transformation w change P mutually 2 a=wP 1 a=AP 1 a+ B; W is applied to I 1In unique point p to be corrected 1 IaObtain P w=wP 1 Ia, establish P oBe P 1 IaAt I 2In traditional optical flow method tracking results, then the tracking results P of these unique points can be corrected for P=argmin (| P-P o| 2+ | P-P w| 2), promptly utilize P wAs constraint condition further to P oBe optimized.
Method based on factor decomposition: with weak perspective projection modeling video imaging process; According to the method, the non-rigid object shape is regarded the weighted linear combination of one group of shape bases as, and shape bases is one group of basic 3D shape, and any 3D shape can be organized the 3D shape base by this and combine; Given tracking data, the unique point in every frame can be as follows with weak perspective projection model description:
P fn = ( x , y ) fn T = [ e f c f 1 R f . . . e f c fK R f ] · [ S 1 n . . . S Kn ] T + t f f=1,...,F n=1,...,N
Wherein F and N are respectively the numbers of frame number and unique point, e fBe the weak perspective projection zoom factor of non-zero, S 1n... S KnBe K shape bases, c F1... c FKBe the combining weights of shape bases, t fBe translation, R fPreceding two row of representing f camera projection matrix, P FnRepresent n unique point in the f frame, then if regard x, the y coordinate of each unique point in every frame as one 2 * 1 matrix, all tracking datas form the matrix P of a 2F * K, and P=MS+T, wherein M is a broad sense camera projection matrix, and S is a K shape bases, and T is a translation matrix:
M = e 1 c 11 R 1 · · · e 1 c 1 K R 1 · · · · · · · · · e F c F 1 R F · · · e F c FK R F , S = S 11 · · · S 1 N · · · · · · · · · S K 1 · · · S KN
Deduct translation matrix and obtain canonical form P=MS, P is carried out svd, the exponent number that obtains P is the approximate value of 3K P ~ = M ~ · S ~ , K can determine that this decomposition is not unique by rank (P)/3, given any reversible 3K * 3K matrix A, P ~ = M ~ A · A - 1 S ~ All set up; Therefore as A when being known, broad sense camera projection matrix and shape bases can be represented M = M ~ · A , S = A - 1 · S ~ , For calculating A, the orthogonal property that at first utilizes projection matrix makes Q=AA as constraint condition T, then MM T = M ~ Q M ~ T , Order again
Figure A20061005339300095
Expression
Figure A20061005339300096
I 2 row submatrixs, can get following two orthogonality constraint conditions according to the orthogonal property of projection matrix: M ~ 2 · i - 1 Q M ~ 2 · i - 1 T = M ~ 2 · i Q M ~ 2 · i T , M ~ 2 · i - 1 Q M ~ 2 · i T = 0 ; Next use shape bases constraint condition to eliminate condition of orthogonal constraints ambiguity in some cases, the individual three row submatrixs of the k of A are expressed as a k, for each Q k=a ka k TK=1 ..., K, set other one group of shape bases constraint according to the independence between shape bases:
M ~ 2 · i - 1 Q k M ~ 2 · j - 1 T = 1 (i,j)∈ω 1 M ~ 2 · i - 1 Q k M ~ 2 · j - 1 T = 0 (i,j)∈ω 2
M ~ 2 · i Q k M ~ 2 · j T = 1 (i,j)∈ω 1 M ~ 2 · i Q k M ~ 2 · j T = 0 (i,j)∈ω 2
M ~ 2 · i - 1 Q k M ~ 2 · j T = 0 (i,j)∈ω 1∪ω 2 M ~ 2 · i Q k M ~ 2 · j - 1 T = 1 (i,j)∈ω 1∪ω 2
ω 1={(i,j)|i=j=k} ω 2={(i,j)|i=1,...,K,j=1,...,F,i≠k}
Correctly find the solution Q in conjunction with this two classes constraint condition, obtain A through svd again, M by Obtain zoom factor e 1..., e FCan regard constant as, so broad sense camera projection matrix can be expressed as M = c 11 1 R 1 · · · c 1 K 1 R 1 · · · · · · · · · c F 1 1 R F · · · c FK 1 R F ; Because R f = r f 1 r f 2 r f 3 r f 4 r f 5 r f 6 F=1 ..., F is preceding two row of camera rotation matrix, two of expression f frame among the M is gone launch to obtain
m f = c f 1 1 r f 1 c f 1 1 r f 2 c f 1 1 r f 3 · · · c fK 1 r f 1 c fK 1 r f 2 c f 2 1 r f 3 c f 1 1 r f 4 c f 1 1 r f 5 c f 1 1 r f 6 · · · c fK 1 r f 4 c fK 1 r f 5 c fK 1 r f 6 , Adjust element position and obtain new matrix
m f 1 = c f 1 1 r f 1 c f 1 1 r f 2 c f 1 1 r f 3 c f 1 1 r f 4 c f 1 1 r f 5 c f 1 1 r f 6 · · · c fK 1 r f 1 c fK 1 r f 2 c fK 1 r f 3 c fK 1 r f 4 c fK 1 r f 5 c fK 1 r f 6 , This matrix is column vector (c F1 1... c FK 1) TWith row vector (r F1r F2r F3r F4r F5r F6) product; Thus, the camera projection matrix of every frame and shape bases combining weights can be by m f 1Obtain through svd, the 3D shape in the Euclidean space also draws thus, and this shape is exactly the unique point three-dimensional coordinate, calculates the three-dimensional coordinate of every frame unique point in Euclidean space, has in fact just obtained one group of three-dimensional motion data.
General three-dimensional face model: comprise more than 3000 summit, be through registration by several true three-dimension people faces that obtain by laser scanning, simplify and average and obtain, the fine structure feature of people's face can be described, 3 frames before the three-dimensional motion data are averaged, as the three-dimensional feature point of describing people's face shape, and on general three-dimensional face, specify feature summit with three-dimensional feature point similar number, side-play amount between defined feature summit and the unique point is d, and with d and feature summit training radial basis function, can release the skew on these feature summits with the radial basis function that all the other feature summit inputs train, thereby obtain the personalized three-dimensional faceform.
Other three-dimensional motion data: all removing preceding 3 frame data that are used for defining people's face shape in the three-dimensional motion data, every frame expression drives the same radial basis function that adopts and carries out.
Video-frequency compression method based on eigenface: given one section video sequence, suppose that video sequence comprises the F frame, every two field picture resolution is R * C, all row of every two field picture are stacked together frame of video are converted into the column vector of RC * 1, thereby video sequence is converted into the sample matrix X of a RC * F, if X is a sample average, then normalized sample is X ~ = ( X - X ‾ ) / F 1 / 2 ; In order to handle the too high problem of bringing of dimension, it is as follows in conjunction with svd calculated characteristics vector to adopt QR to decompose:
[ q , r ] = QR ( X ~ ) [u,s,v]=SVD(r) U=q·u
The QR decomposition is found the solution the proper vector of higher dimensional matrix with stable manner on the mathematics; Obtain proper vector U by above three formulas, U has reflected the statistical law that contains in the sample space, and we are referred to as eigenface, and given any frame of video f projects to it on the U, obtains one group of projection coefficient y=U T(f-X), then f available feature face and this group coefficient reconstruction are f ~ = U · y + X ‾ ; When video transmission, only need transmit sample average altogether, proper vector, the projection coefficient of every frame, therefore general faceform and three-dimensional feature point coordinate have saved storage space.
Dynamic texture mapping: regard every frame two dimensional character point position coordinates that tracking obtains the texture coordinate of a predefined stack features summit on the three-dimensional face model as, thereby map to the faceform that reconstruct corresponding frame by frame from original video with each frame of video by people's face texture information that interpolation will extract automatically;
The dynamic texture mapping is divided into two steps:
1) overall texture:
At first make as giving a definition:
T=(u nv n) T: characteristic point coordinates in every frame, n=1...N wherein, N is the number of unique point;
Num: the number on all summits in the three-dimensional face model;
I: the index on the three-dimensional model feature summit of a series of prior appointments, i satisfy i| (i  1 ..., num}) ∩ (| i|=N) } and in whole process i remain unchanged;
P=(X[i] Y[i] Z[i]) T: in every frame three-dimensional model with image characteristic point characteristic of correspondence apex coordinate;
When carrying out overall texture, the corresponding relation on first frame specific characteristic point and some three-dimensional model summits, the every frame thereafter upgrade T and P automatically and carry out interpolation with T and P training radial basis function and shine upon;
2) local grain optimization: overall texture depends on mutual appointment initial characteristics summit, and manual characteristic specified summit may not be optimum, therefore needs the process of an optimization to find feature summit accurately;
For describing local grain optimization, make as giving a definition:
F: follow the tracks of a two dimensional character point that obtains;
S: initial characteristic specified summit;
f 1: the two dimensional character point that S obtains by weak perspective projection;
Δ p:f and f 1Between error;
I Input: input video frame;
I Project: the two dimensional image that the three-dimensional model that has texture of reconstruction obtains by weak perspective projection;
T: image I InputGoing up with f is the square area at center;
Local grain optimization is finished by the process of an iteration:
Loop
Δp = arg min Σ f i ∈ T | | I input ( f i ) - I project ( f i + Δp ) | | 2 ;
P sets out by Δ, oppositely tries to achieve the shifted by delta S on three-dimensional feature summit through weak perspective projection model;
Upgrade S, make S=S+ Δ S;
Again carry out overall texture, upgrade I Project
The variation of Until S is less than a certain threshold value.
The three-dimensional face dynamic expression modeling method that the present invention is based on video flowing has then been broken away from the constraint of priori, can flow from natural video frequency to reconstruct three-dimensional human face expression (as films and television programs).Compare with traditional optical flow tracking method, the optical flow tracking method of affine rectification is without any need for training data, to the variation of gradation of image robust comparatively, and reduced the iterations of optical flow algorithm, improved the time efficiency of algorithm; With respect to traditional texture mapping method, the dynamic texture mapping can produce more true and natural expression effect; The eigenface technology has effectively been compressed video under the prerequisite that keeps picture quality, reduced the storage space that original video takies.
Table 1 has indicated the compression efficiency contrast of eigenface technology and MPEG-2 compress technique.Select the number of eigenface when carrying out video compress according to the big freedom in minor affairs of original video, optimize compromise so that between compression efficiency and picture quality, do one.E in the table 1 fThe eigenface of using during-5 expression compressions is 5, and all the other by that analogy.The compression efficiency of MPEG-2 technology is constant about 60: 1 as shown in Table 1, irrelevant with video size to be compressed, and the compression efficiency of eigenface technology improves with the increase of original video volume, video to 1000 frames uses the MPEG-2 technology to can be compressed to 16.64MB, and use characteristic face technology (15 eigenface) then can be compressed to 14.83MB.This shows that in some application scenario eigenface technology and Moving Picture Experts Group-2 are more approaching aspect compression efficiency and picture quality, but the eigenface technology is simpler than the compression/decompression algorithm of Moving Picture Experts Group-2.
The compression efficiency contrast of table 1 eigenface technology and MPEG-2 technology
Frame number Different-format video occupation space unit (MB)
AVI MPEG-2 e f-5 e f-7 e f-10 e f-15
100 98.89 1.66 4.94
200 197.78 3.33 6.92
500 494.44 8.32 9.98
1000 988.72 16.64 14.83
The present invention can fast and effeciently not demarcate from monocular and recovers the three-dimensional dynamic human face expression the video flowing, and the expression true nature that generates has also kept greater efficiency in time domain and spatial domain, have abundanter expressive force than the two dimension expression, have good practical value in fields such as virtual reality, man-machine interaction, Entertainment and the creation of video display animation.
Description of drawings
Fig. 1 is based on the three-dimensional dynamic human face expression model construction method schematic flow sheet of video flowing;
Fig. 2 is a human face characteristic point synoptic diagram of the present invention;
Fig. 3 need not correct the remarkable characteristic that can accurately follow the tracks of among the present invention;
Fig. 4 is the optical flow tracking effect of affine rectification of the present invention and simple optical flow tracking effect comparison synoptic diagram;
Fig. 5 is general three-dimensional face model of the present invention and personalized three-dimensional faceform's contrast, and (a) (c) is the front and the lateral plan of common people's face, and (b) (d) is the front and the lateral plan of personalized human face;
The expression frame of video and corresponding three-dimensional face model of tracking acquisition that Fig. 6 has been a description of the invention with expression deformation, (a) (b) (c) is respectively indignation, fear, the surprised expression of following the tracks of with affine rectification optical flow method, and (d) (e) is corresponding model deformation (f);
Fig. 7 is dynamic texture mapping of the present invention and traditional static texture effect comparison synoptic diagram, is the effect of dynamic texture mapping (a), (b) is the effect of static texture;
Fig. 8 is different video-frequency compression method contrast synoptic diagram, (a) is original video frame, (b) is the frame of video of rebuilding with 5 eigenface among the present invention, (c) is the frame of video with the Moving Picture Experts Group-2 compression;
Fig. 9 is the final effect synoptic diagram of Three-Dimensional Dynamic expression of the present invention modeling, and (a) (c) is to catch the sequence of frames of video that obtains (e), is respectively angry, surprised and frightened expression, and (b) (d) is corresponding sense of reality dynamic 3 D expression sequence (f).
Embodiment
As shown in Figure 1, implement as follows based on the three-dimensional dynamic human face expression model construction method of video flowing:
The first step is in 40 unique points that pre-define of monocular video head frame mark of not demarcating, and we have developed an interactive tools and have marked unique point at the first frame of video according to prompting with mouse easily for the user;
Second step used the optical flow approach of affine rectification unique point to be carried out the tracking of robust, in optical flow tracking, these 8 unique points of the inside and outside branch hole angle of the both sides corners of the mouth, eyes and both sides temples can accurately be followed the tracks of, therefore we utilize these 8 unique points to calculate affined transformation between two frames, optimize the optical flow tracking result of all the other 32 unique points with this affined transformation;
The 3rd step adopted the algorithm that decomposes based on the factor to recover the unique point three-dimensional coordinate and distortion obtains personalized human face model and expression effect to common people's face;
In the 4th step, we use the mean value of preceding 3 frame three-dimensional feature point coordinate as the three-dimensional feature point of describing specific people's face shape, with these unique points general faceform are out of shape to obtain the personalized three-dimensional faceform.This distortion is finished based on radial basis function, and the kernel function of radial basis function is made as Gaussian function, and the parameter of Gaussian function is made as 0.01;
The 5th step used continuous three-dimensional unique point coordinate that the distortion that the personalized three-dimensional faceform carries out is frame by frame moved to produce continuous expression, and this distortion realizes with radial basis function equally;
The 6th step adopted the eigenface technology that input video is compressed to save storage space, when use characteristic face technology, the number of eigenface depends on the frame number of input video, when with the frame of video of n eigenface reconstruction and the error between the original video frame during less than a certain threshold value q, n is appropriate eigenface number;
In the 7th step, the dynamic texture mapping uses texture variations rather than geometric deformation to simulate the slight change on people's face surface in the expression motion, as wrinkle and colour of skin variation etc." dynamically " refers to our each frame update texture at three-dimensional animation, rather than when initial the disposable texture of finishing.Owing to compare with still image, contained abundant expression detailed information in the continuous video flowing, have strict corresponding relation frame by frame because of reconstruction of three-dimensional people face and original video stream again, so we extract texture information frame by frame and map to the three-dimensional face corresponding with this frame from input video stream.Specify 40 initial three-dimensional feature summits in advance according to 40 unique points on three-dimensional face model before carrying out the dynamic texture mapping, aforementioned 40 characteristic point coordinates have obtained and can regard as the texture coordinate on this group three-dimensional feature summit when video tracking.So set up the corresponding relation of one group of three-dimensional feature summit to two dimensional image, because tracking data is known, and the faceform that every frame reconstructs has topological invariance, therefore this group corresponding relation has unchangeability, and the value that only need upgrade previous frame with the unique point coordinate and the three-dimensional feature point coordinate of this frame when every frame shines upon get final product.After having set up the discrete corresponding relation of this group, obtain dense three-dimensional vertices and the corresponding relation between the texture by the radial basis function interpolation, finish texture frame by frame.Whether preassigned three-dimensional feature summit accurately will influence the effect of dynamic texture mapping, therefore need optimize from initial three-dimensional feature summit and obtain three-dimensional feature apex coordinate accurately, finally finish texture, this is an iterative process based on light stream.
We use a hand-held camera Sony HDV 1080i who does not demarcate to catch three kinds of typical human face expressions, and promptly angry, surprised and frightened, the resolution of frame of video has reached 1920 * 1080 pixels.After the manual mark of the first step, all the other steps can automatically perform.Fig. 2 is 40 human face characteristic points that the present invention defines, Fig. 3 is 8 unique points of accurately following the tracks of that wherein are used for calculating the interframe affined transformation, affine rectification optical flow tracking algorithm is without any need for training data, and be no more than under 30 ° the situation still effective in the rotation of level/vertically, Fig. 4 first row is the tracking results that adopts affine rectification optical flow method, and second row is simple tracking results based on optical flow approach.Be not difficult to find out, mistake has appearred in the method based on light stream when following the tracks of nose and chin and crown point merely, and reasonable this problem that solved of the optical flow tracking of affine rectification, than traditional optical flow tracking method, the optical flow tracking method of affine rectification is more accurate.In video capture, we point out the performer to keep neutral expression earlier, and performance is angry, surprised and frightened respectively successively then, and every kind of expression all comprises a dynamic progressive formation, promptly carries out the transition to the amplitude peak of expression from neutrality.Because people's face presents neutral expression in preceding 3 frames, therefore the coordinate of three-dimensional feature point has been described the shape facility of people's face, we are averaged preceding 3 frame unique point coordinates and with this mean value general faceform are out of shape and obtain the personalized human face model, Fig. 5 is general three-dimensional face model and personalized three-dimensional faceform's a contrast synoptic diagram, (a) (c) be the front and the lateral plan of common people's face, (b) (d) is the front and the lateral plan of personalized human face.When people's face presented the expression motion, the three-dimensional feature point of reconstruction can well drive the personalized human face model, made it to produce the expression effect.We use the interpolation method based on radial basis function to drive, and when the training radial basis function, not have directly to use the three-dimensional feature point coordinate of reconstruction, and are to use three-dimensional feature in the every frame o'clock side-play amount with respect to three-dimensional feature point in first frame.After having obtained to specify the side-play amount on summit, radial basis function optimization has obtained the side-play amount on all the other summits, and it is that unit carries out frame by frame that radial basis function drives with the frame.Fig. 6 has described and has followed the tracks of the expression frame of video and the corresponding three-dimensional face model with expression deformation that obtains, (a) (b) (c) is three kinds of typical case's expressions (indignation, frightened, surprised) of following the tracks of with affine rectification optical flow method, and (d) (e) is corresponding model deformation (f).Compare with static texture, the dynamic texture mapping method that the present invention relates to provides more natural outward appearance.Fig. 7 (a) and Fig. 7 (b) are compared, and when using the dynamic texture mapping as can be seen, very significantly wrinkle has all appearred in the bridge of the nose, chin and wing of nose both sides, and these expression minutias are that static texture is beyond expression.To be applied to original video sequence based on the compression algorithm of eigenface, we find only to need 5 eigenface just can well rebuild every frame picture for the video sequence about one section 100 frame, and the while image quality loss is very little.Eigenface technology and MPEG-2 technology are applied to video compress respectively, and picture quality contrasts as shown in Figure 8, (a) is original video frame, (b) is the frame of video of rebuilding with 5 eigenface, (c) is the frame of video with the Moving Picture Experts Group-2 compression.As can be seen based on the video compress effect of eigenface aspect the picture quality very near Moving Picture Experts Group-2.
At the indignation of catching, surprised and frightened three kinds of expressions, we have carried out the expression modeling respectively.
Embodiment 1
The modeling embodiment of indignation expression:
Step 1: input video has 100 frames, and in 40 unique points that pre-define of the first frame mark of the monocular video of not demarcating, unique point as shown in Figure 2;
Step 2: use the optical flow approach of affine rectification unique point to be carried out the tracking of robust, utilize the inside and outside branch hole angle of the both sides corners of the mouth, eyes and these 8 unique points of both sides temples to calculate affined transformation between two frames, optimize the optical flow tracking result of all the other 32 unique points with this affined transformation;
Step 3: the algorithm that employing is decomposed based on the factor recovers the unique point three-dimensional coordinate and distortion obtains the personalized human face model and the effect of expressing one's feelings to common people's face;
Step 4: the mean value that uses preceding 3 frame three-dimensional feature point coordinate adopts radial basis function as the three-dimensional feature point of describing specific people's face shape, general faceform is out of shape obtain the personalized three-dimensional faceform.The kernel function of radial basis function is made as Gaussian function, and the parameter of Gaussian function is made as 0.01;
Step 5: use continuous three-dimensional unique point coordinate that the distortion that the personalized three-dimensional faceform carries out is frame by frame moved to produce continuous expression, this distortion realizes with radial basis function equally;
Step 6: adopt 5 eigenface that input video is carried out compression expression;
Step 7: the compression expression based on eigenface is rebuild original input video frame by frame, adopts the dynamic texture mapping techniques that the frame of video of rebuilding is mapped to the three-dimensional face model that has the expression motion accordingly frame by frame then, produces sense of reality indignation expression sequence;
This example reconstructs the indignation expression sequence of 100 frame dynamic 3 D people faces according to 100 frame videos, and the wrinkle on people's face surface is high-visible, and is very lively, has abundant expressive force, can be used for the creation of video display animation, development of games.
Embodiment 2
The modeling embodiment of surprised expression:
Step 1: input video has 80 frames, in 40 unique points that pre-define of the first frame mark of the monocular video of not demarcating;
Step 2: use the optical flow approach of affine strong distance unique point to be carried out the tracking of robust, utilize the inside and outside branch hole angle of the both sides corners of the mouth, eyes and these 8 unique points of both sides temples to calculate affined transformation between two frames, optimize the optical flow tracking result of all the other 32 unique points with this affined transformation;
Step 3: the algorithm that employing is decomposed based on the factor recovers the unique point three-dimensional coordinate and distortion obtains the personalized human face model and the effect of expressing one's feelings to common people's face;
Step 4: the mean value that uses preceding 3 frame three-dimensional feature point coordinate adopts radial basis function as the three-dimensional feature point of describing specific people's face shape, general faceform is out of shape obtain the personalized three-dimensional faceform.The kernel function of radial basis function is made as Gaussian function, and the parameter of Gaussian function is made as 0.05;
Step 5: use continuous three-dimensional unique point coordinate that the distortion that the personalized three-dimensional faceform carries out is frame by frame moved to produce continuous expression, this distortion realizes with radial basis function equally;
Step 6: adopt 5 eigenface that input video is carried out compression expression;
Step 7: the compression expression based on eigenface is rebuild original input video frame by frame, adopts the dynamic texture mapping techniques that the frame of video of rebuilding is mapped to the three-dimensional face model that has the expression motion accordingly frame by frame then, produces the surprised expression sequence of the sense of reality;
This example reconstructs the surprised expression sequence of 80 frame dynamic 3 D people faces according to 80 frame videos, and the lighting effect on people's face surface is comparatively obvious, and surprised expression is comparatively lively, can be used for the creation of video display animation, development of games.
Embodiment 3
The modeling embodiment of frightened expression:
Step 1: input video has 100 frames, in 40 unique points that pre-define of the first frame mark of the monocular video of not demarcating;
Step 2: use the optical flow approach of affine rectification unique point to be carried out the tracking of robust, utilize the inside and outside branch hole angle of the both sides corners of the mouth, eyes and these 8 unique points of both sides temples to calculate affined transformation between two frames, optimize the optical flow tracking result of all the other 32 unique points with this affined transformation;
Step 3: the algorithm that employing is decomposed based on the factor recovers the unique point three-dimensional coordinate and distortion obtains the personalized human face model and the effect of expressing one's feelings to common people's face;
Step 4: the mean value that uses preceding 3 frame three-dimensional feature point coordinate adopts radial basis function as the three-dimensional feature point of describing specific people's face shape, general faceform is out of shape obtain the personalized three-dimensional faceform.The kernel function of radial basis function is made as Gaussian function, and the parameter of Gaussian function is made as 0.03;
Step 5: use continuous three-dimensional unique point coordinate that the distortion that the personalized three-dimensional faceform carries out is frame by frame moved to produce continuous expression, this distortion realizes with radial basis function equally;
Step 6: adopt 5 eigenface that input video is carried out compression expression;
Step 7: the compression expression based on eigenface is rebuild original input video frame by frame, adopts the dynamic texture mapping techniques that the frame of video of rebuilding is mapped to the three-dimensional face model that has the expression motion accordingly frame by frame then, produces the frightened expression of sense of reality sequence;
This example reconstructs the fear expression sequence of 100 frame dynamic 3 D people faces according to 100 frame videos, and the human face expression details is comparatively lively, demonstrates fully the tense situation of personage's heart, can be used for the creation of video display animation, development of games and man-machine interaction.
Final effect as shown in Figure 9.Fig. 9 is the final effect synoptic diagram of Three-Dimensional Dynamic expression modeling, and (a) (c) is to catch the sequence of frames of video that obtains (e), is respectively angry, surprised and frightened expression, and (b) (d) is corresponding sense of reality dynamic 3 D expression sequence (f).For the video sequence of one section 100 frame, whole process of reconstruction approximately needs 7-8 branch clock time on the computer of a Pentium-IV 2.4GHZ.The present invention is not particularly limited input video, can not only produce the three-dimensional face expression sequence with suitable sense of reality, and all kept higher performance on time domain and spatial domain.Entered digital times at present, new things such as digital video, digital communication, digital library emerge in an endless stream, this method is that the character expression that material carries out in the virtual environment is made with the video under this background, the trend that meets era development, have wide application prospect, especially high value of practical is arranged in fields such as man-machine interaction, cartoon making and Entertainments.

Claims (8)

1. three-dimensional dynamic human face expression model construction method based on video flowing is characterized in that the step of method is:
1) do not demarcate the manual position that marks human face characteristic point of the first frame of video at the monocular of input;
2) adopt the optical flow method of affine rectification that the unique point of first frame mark is followed the tracks of, determine the change in location situation of these unique points every frame in video sequence;
3) adopt the method for decomposing that the two-dimensional tracking data are reverted to the three-dimensional motion data based on the factor;
4) preceding 3 frames of three-dimensional motion data are averaged, thereby produce the personalized three-dimensional faceform with the adaptive general three-dimensional face model of this mean value;
5) use other this personalized three-dimensionals of three-dimensional motion data-driven faceform, generate the dynamic 3 D human face expression;
6) adopt video-frequency compression method that input video is compressed, with less storage space based on eigenface;
7) the use characteristic face is rebuild input video, and in conjunction with the two-dimensional tracking data dynamic 3 D people face is carried out the dynamic texture mapping automatically, generates realistic three-dimensional face expression sequence.
2. a kind of three-dimensional dynamic human face expression model construction method according to claim 1 based on video flowing, it is characterized in that described human face characteristic point: people's face shape defined parameters and human face animation parameter according to mpeg 4 standard are provided with, have 40, be distributed in positions such as facial contour, eyes, lip edge, not only can better reflect people's face topology, and the human face expression motion can be described, when remaining neutral expression, people's face can be regarded as rigid body substantially, unique point definition this moment people face shape feature; When people's face presents the expression motion, unique point definition human face animation parameter.
3. a kind of three-dimensional dynamic human face expression model construction method based on video flowing according to claim 1 is characterized in that the optical flow method of described affine rectification: correct the accuracy of traditional optical flow tracking method by calculating affined transformation between frame of video; The purpose of tradition optical flow tracking is that the search side-play amount makes matching error minimum with character pair spot correlation neighborhood: given two adjacent video frames I 1And I 2, mark I 1In the position of certain unique point be f=(u, v) T, the mark light stream is p=(p u, p v) T, I then 2The position of middle character pair point is f+p; P can be by minimizing Σ ft ∈ T ( I 2 ( ft + p ) - I 1 ( ft ) ) 2 Obtain, wherein to be one be the square area at center with f to T; Yet when human face posture in the image and the bigger variation of illumination generation, the tracking effect of the point on nose, chin and the crown can be very poor in people's face, but the tracking effect of the point of canthus, hairline, mouth and cheek is still very accurate, therefore defines P 1 aAnd P 2 aBe I 1And I 2In the unique point of accurately following the tracks of, then according to hypothesis, P 1 aAnd P 2 aBetween an available affined transformation w change P mutually 2 a=wP 1 a=AP 1 a+ B; W is applied to I 1In unique point P to be corrected 1 IaObtain P w=wP 1 Ia, establish P oBe P 1 IaAt I 2In traditional optical flow method tracking results, then the tracking results P of these unique points can be corrected for P=argmin (| P-P o| 2+ | P-P w| 2), promptly utilize P wAs constraint condition further to P oBe optimized.
4. a kind of three-dimensional dynamic human face expression model construction method based on video flowing according to claim 1 is characterized in that described method of decomposing based on the factor: with weak perspective projection modeling video imaging process; According to the method, the non-rigid object shape is regarded the weighted linear combination of one group of shape bases as, and shape bases is one group of basic 3D shape, and any 3D shape can be organized the 3D shape base by this and combine; Given tracking data, the unique point in every frame can be as follows with weak perspective projection model description:
P fn = ( x , y ) T fn = [ e f c f 1 R f · · · e f c fK R f ] · [ S 1 n · · · S Kn ] T + t f , f = 1 , · · · , F , n = 1 , · · · , N
Wherein F and N are respectively the numbers of frame number and unique point, e fBe the weak perspective projection zoom factor of non-zero, S 1n... S KnBe K shape bases, c F1... c FKBe the combining weights of shape bases, t fBe translation, R fPreceding two row of representing f camera projection matrix, P FnRepresent n unique point in the f frame, then if regard x, the y coordinate of each unique point in every frame as one 2 * 1 matrix, all tracking datas form the matrix P of a 2F * K, and P=MS+T, wherein M is a broad sense camera projection matrix, and S is a K shape bases, and T is a translation matrix:
M = e 1 c 11 R 1 · · · e 1 c 1 K R 1 · · · · · · · · · e F c F 1 R F · · · e F c FK R F , S = S 11 · · · S 1 N · · · · · · · · · S K 1 · · · S KN
Deduct translation matrix and obtain canonical form P=MS, P is carried out svd, the exponent number that obtains P is the approximate value of 3K P ~ = M ~ · S ~ , K can determine that this decomposition is not unique by rank (P)/3, given any reversible 3K * 3K matrix A, P ~ = M ~ A · A - 1 S ~ All set up; Therefore as A when being known, broad sense camera projection matrix and shape bases can be expressed as M = M ~ · A , S = A - 1 · S ~ , For calculating A, the orthogonal property that at first utilizes projection matrix makes Q=AA as constraint condition T, then M M T = M ~ Q M ~ T , Order again
Figure A2006100533930003C9
Expression
Figure A2006100533930003C10
I 2 row submatrixs, can get following two orthogonality constraint conditions according to the orthogonal property of projection matrix: M ~ 2 · i - 1 Q M ~ 2 · i - 1 T = M ~ 2 · i Q M ~ 2 · i T , M ~ 2 · i - 1 Q M ~ 2 · i T = 0 ; Next use shape bases constraint condition to eliminate condition of orthogonal constraints ambiguity in some cases, the individual three row submatrixs of the k of A are expressed as a k, for each Q k=a ka k TK=1 ..., K, set other one group of shape bases constraint according to the independence between shape bases:
M ~ 2 · i - 1 Q k M ~ 2 · j - 1 T = 1 , ( i , j ) ∈ ω 1 M ~ 2 · i - 1 Q k M ~ 2 · j - 1 T = 0 , ( i , j ) ∈ ω 2
M ~ 2 · i Q k M ~ 2 · j T = 1 , ( i , j ) ∈ ω 1 M ~ 2 · i Q k M ~ 2 · j T = 0 , ( i , j ) ∈ ω 2
M ~ 2 · i - 1 Q k M ~ 2 · j T = 0 , ( i , j ) ∈ ω 1 ∪ ω 2 M ~ 2 · i Q k M ~ 2 · j - 1 T = 0 , ( i , j ) ∈ ω 1 ∪ ω 2
ω 1 = { ( i , j ) | i = j = k } ω 2 = { ( i , j ) | i = 1 , · · · , K , j = 1 , · · · , F , i ≠ k }
Correctly find the solution Q in conjunction with this two classes constraint condition, obtain A through svd again, M by
Figure A2006100533930003C21
Obtain zoom factor e 1..., e FCan regard constant as, so broad sense camera projection matrix can be expressed as M = c 11 1 R 1 · · · c 1 K 1 R 1 · · · · · · · · · c F 1 1 R F · · · c FK 1 R F ; Because R f = r f 1 r f 2 r f 3 r f 4 r f 5 r f 6 , f = 1 , . . . , F Be preceding two row of camera rotation matrix, two of expression f frame among the M gone launch to obtain m f = c f 1 1 r f 1 c f 1 1 r f 2 c f 1 1 r f 3 · · · c fK 1 r f 1 c fK 1 r f 2 r fK 1 r f 3 c f 1 1 r f 4 c f 1 1 r f 5 c f 1 1 r f 6 · · · c fK 1 r f 4 c fK 1 r f 5 c fK 1 r f 6 , Adjust element position and obtain new matrix m f 1 = c f 1 1 r f 1 c f 1 1 r f 2 c f 1 1 r f 3 c f 1 1 r f 4 c f 1 1 r f 5 c f 1 1 r f 6 · · · c fK 1 r f 1 c fK 1 r f 2 c fK 1 r f 3 c fK 1 r f 4 c fK 1 r f 5 c fK 1 r f 6 , This matrix is column vector (c F1 1... c FK 1) TWith row vector (r F1r F2r F3r F4r F5r F6) product; Thus, the camera projection matrix of every frame and shape bases combining weights can be by m f 1Obtain through svd, the 3D shape in the Euclidean space also draws thus, and this shape is exactly the unique point three-dimensional coordinate, calculates the three-dimensional coordinate of every frame unique point in Euclidean space, has in fact just obtained one group of three-dimensional motion data.
5. a kind of three-dimensional dynamic human face expression model construction method according to claim 1 based on video flowing, it is characterized in that described general three-dimensional face model: comprise more than 3000 summit, be through registration by several true three-dimension people faces that obtain by laser scanning, simplify and average and obtain, the fine structure feature of people's face can be described, 3 frames before the three-dimensional motion data are averaged, as the three-dimensional feature point of describing people's face shape, and on general three-dimensional face, specify feature summit with three-dimensional feature point similar number, side-play amount between defined feature summit and the unique point is d, and with d and feature summit training radial basis function, can release the skew on these feature summits with the radial basis function that all the other feature summit inputs train, thereby obtain the personalized three-dimensional faceform.
6. a kind of three-dimensional dynamic human face expression model construction method according to claim 1 based on video flowing, it is characterized in that described other three-dimensional motion data: all removing preceding 3 frame data that are used for defining people's face shape in the three-dimensional motion data, every frame expression drives the same radial basis function that adopts and carries out.
7. a kind of three-dimensional dynamic human face expression model construction method according to claim 1 based on video flowing, it is characterized in that described video-frequency compression method: given one section video sequence based on eigenface, suppose that video sequence comprises the F frame, every two field picture resolution is R * C, all row of every two field picture are stacked together frame of video are converted into the column vector of RC * 1, thereby video sequence is converted into the sample matrix X of a RC * F, if X is a sample average, then normalized sample is X ~ = ( X - X ‾ ) / F 1 / 2 ; In order to handle the too high problem of bringing of dimension, it is as follows in conjunction with svd calculated characteristics vector to adopt QR to decompose:
[ q , r ] = QR ( X ‾ ) [ u , s , v ] = SVD ( r ) U = q · u
The QR decomposition is found the solution the proper vector of higher dimensional matrix with stable manner on the mathematics; Obtain proper vector U by above three formulas, U has reflected the statistical law that contains in the sample space, and we are referred to as eigenface, and given any frame of video f projects to it on the U, obtains one group of projection coefficient y=U T(f-X), then f available feature face and this group coefficient reconstruction are f ~ = U · y + X ‾ ; When video transmission, only need transmit sample average altogether, proper vector, the projection coefficient of every frame, therefore general faceform and three-dimensional feature point coordinate have saved storage space.
8. a kind of three-dimensional dynamic human face expression model construction method according to claim 1 based on video flowing, it is characterized in that described dynamic texture mapping: regard every frame two dimensional character point position coordinates that tracking obtains the texture coordinate of a predefined stack features summit on the three-dimensional face model as, thereby map to the faceform that reconstruct corresponding frame by frame from original video with each frame of video by people's face texture information that interpolation will extract automatically;
The dynamic texture mapping is divided into two steps:
1) overall texture:
At first make as giving a definition:
T=(u nv n) T: characteristic point coordinates in every frame, n=1...N wherein, N is the number of unique point;
Num: the number on all summits in the three-dimensional face model;
I: the index on the three-dimensional model feature summit of a series of prior appointments, i satisfy i| (i  1 ..., num}) ∩ (| i|=N) } and in whole process i remain unchanged;
P=(X[i] Y[i] Z[i]) T: in every frame three-dimensional model with image characteristic point characteristic of correspondence apex coordinate;
When carrying out overall texture, the corresponding relation on first frame specific characteristic point and some three-dimensional model summits, the every frame thereafter upgrade T and P automatically and carry out interpolation with T and P training radial basis function and shine upon;
2) local grain optimization: overall texture depends on mutual appointment initial characteristics summit, and manual characteristic specified summit may not be optimum, therefore needs the process of an optimization to find feature summit accurately;
For describing local grain optimization, make as giving a definition:
F: follow the tracks of a two dimensional character point that obtains;
S: initial characteristic specified summit;
f 1: the two dimensional character point that S obtains by weak perspective projection;
Δ p:f and f 1Between error;
I Input: input video frame;
I Project: the two dimensional image that the three-dimensional model that has texture of reconstruction obtains by weak perspective projection;
T: image I InputGoing up with f is the square area at center;
Local grain optimization is finished by the process of an iteration:
Loop
Δp = arg min Σ f i ∈ T | | I input ( f i ) - I project ( f i + Δp ) | | 2 ;
P sets out by Δ, oppositely tries to achieve the shifted by delta S on three-dimensional feature summit through weak perspective projection model;
Upgrade S, make S=S+ Δ S;
Again carry out overall texture, upgrade I Project
The variation of Until S is less than a certain threshold value.
CNB2006100533938A 2006-09-14 2006-09-14 Video flow based three-dimensional dynamic human face expression model construction method Expired - Fee Related CN100416612C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100533938A CN100416612C (en) 2006-09-14 2006-09-14 Video flow based three-dimensional dynamic human face expression model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100533938A CN100416612C (en) 2006-09-14 2006-09-14 Video flow based three-dimensional dynamic human face expression model construction method

Publications (2)

Publication Number Publication Date
CN1920886A true CN1920886A (en) 2007-02-28
CN100416612C CN100416612C (en) 2008-09-03

Family

ID=37778605

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100533938A Expired - Fee Related CN100416612C (en) 2006-09-14 2006-09-14 Video flow based three-dimensional dynamic human face expression model construction method

Country Status (1)

Country Link
CN (1) CN100416612C (en)

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101312524B (en) * 2007-05-23 2010-06-23 财团法人工业技术研究院 Moving object detecting apparatus and method using light track analysis
CN101894377A (en) * 2010-06-07 2010-11-24 中国科学院计算技术研究所 Tracking method of three-dimensional mark point sequence and system thereof
CN101976453A (en) * 2010-09-26 2011-02-16 浙江大学 GPU-based three-dimensional face expression synthesis method
CN101976455A (en) * 2010-10-08 2011-02-16 东南大学 Color image three-dimensional reconstruction method based on three-dimensional matching
CN101183462B (en) * 2007-12-12 2011-08-31 腾讯科技(深圳)有限公司 Cartoon image generation, implantation method and system
CN101625768B (en) * 2009-07-23 2011-11-09 东南大学 Three-dimensional human face reconstruction method based on stereoscopic vision
CN102254308A (en) * 2011-07-27 2011-11-23 清华大学 Method and system for computing interpolation of realistic scene
CN102254336A (en) * 2011-07-14 2011-11-23 清华大学 Method and device for synthesizing face video
CN101669147B (en) * 2007-08-08 2012-06-06 科乐美数码娱乐株式会社 Game device, game device control method
WO2012139276A1 (en) * 2011-04-11 2012-10-18 Intel Corporation Avatar facial expression techniques
CN101657839B (en) * 2007-03-23 2013-02-06 汤姆森许可贸易公司 System and method for region classification of 2D images for 2D-to-3D conversion
CN102970510A (en) * 2012-11-23 2013-03-13 清华大学 Method for transmitting human face video
CN103198519A (en) * 2013-03-15 2013-07-10 苏州跨界软件科技有限公司 Virtual character photographic system and virtual character photographic method
CN103377367A (en) * 2012-04-28 2013-10-30 中兴通讯股份有限公司 Facial image acquiring method and device
CN103985156A (en) * 2014-05-22 2014-08-13 华为技术有限公司 Method and device for generating three-dimensional image
CN104318234A (en) * 2014-10-23 2015-01-28 东南大学 Three-dimensional extraction method of human face wrinkles shown in point cloud data and device thereof
CN104899921A (en) * 2015-06-04 2015-09-09 杭州电子科技大学 Single-view video human body posture recovery method based on multi-mode self-coding model
CN104915978A (en) * 2015-06-18 2015-09-16 天津大学 Realistic animation generation method based on Kinect
CN105338369A (en) * 2015-10-28 2016-02-17 北京七维视觉科技有限公司 Method and apparatus for synthetizing animations in videos in real time
US9277273B2 (en) 2012-11-30 2016-03-01 Mstar Semiconductor, Inc. Video data conversion method, device and smart TV
CN105427385A (en) * 2015-12-07 2016-03-23 华中科技大学 High-fidelity face three-dimensional reconstruction method based on multilevel deformation model
US9357174B2 (en) 2012-04-09 2016-05-31 Intel Corporation System and method for avatar management and selection
CN105678702A (en) * 2015-12-25 2016-06-15 北京理工大学 Face image sequence generation method and device based on feature tracking
US9386268B2 (en) 2012-04-09 2016-07-05 Intel Corporation Communication using interactive avatars
CN106327482A (en) * 2016-08-10 2017-01-11 东方网力科技股份有限公司 Facial expression reconstruction method and device based on big data
US9589357B2 (en) 2013-06-04 2017-03-07 Intel Corporation Avatar-based video encoding
CN107169529A (en) * 2017-06-07 2017-09-15 南京京君海网络科技有限公司 A kind of high-performance non-rigid object motion structure restoration methods
CN107592449A (en) * 2017-08-09 2018-01-16 广东欧珀移动通信有限公司 Three-dimension modeling method, apparatus and mobile terminal
CN107610209A (en) * 2017-08-17 2018-01-19 上海交通大学 Human face countenance synthesis method, device, storage medium and computer equipment
CN107622511A (en) * 2017-09-11 2018-01-23 广东欧珀移动通信有限公司 Image processing method and device, electronic installation and computer-readable recording medium
CN107704854A (en) * 2017-09-21 2018-02-16 苏州轩明视测控科技有限公司 A kind of detection method of the print character defect based on optical flow field
CN108021847A (en) * 2016-11-02 2018-05-11 佳能株式会社 For identifying apparatus and method, image processing apparatus and the system of countenance
CN108053434A (en) * 2017-12-28 2018-05-18 西安中科微光影像技术有限公司 A kind of support alignment method and device based on cardiovascular OCT
WO2018103220A1 (en) * 2016-12-09 2018-06-14 武汉斗鱼网络科技有限公司 Image processing method and device
CN108537881A (en) * 2018-04-18 2018-09-14 腾讯科技(深圳)有限公司 A kind of faceform's processing method and its equipment, storage medium
CN108830894A (en) * 2018-06-19 2018-11-16 亮风台(上海)信息科技有限公司 Remote guide method, apparatus, terminal and storage medium based on augmented reality
CN109035516A (en) * 2018-07-25 2018-12-18 深圳市飞瑞斯科技有限公司 Control method, apparatus, equipment and the storage medium of smart lock
CN109087340A (en) * 2018-06-04 2018-12-25 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system comprising dimensional information
CN109215061A (en) * 2018-11-06 2019-01-15 广东工业大学 A kind of face pore tracking and system
CN109271950A (en) * 2018-09-28 2019-01-25 广州云从人工智能技术有限公司 A kind of human face in-vivo detection method based on mobile phone forward sight camera
CN109711335A (en) * 2018-12-26 2019-05-03 北京百度网讯科技有限公司 The method and device that Target Photo is driven by characteristics of human body
CN110189404A (en) * 2019-05-31 2019-08-30 重庆大学 Virtual facial modeling method based on real human face image
CN110807364A (en) * 2019-09-27 2020-02-18 中国科学院计算技术研究所 Modeling and capturing method and system for three-dimensional face and eyeball motion
CN111161395A (en) * 2019-11-19 2020-05-15 深圳市三维人工智能科技有限公司 Method and device for tracking facial expression and electronic equipment
CN111599012A (en) * 2013-08-09 2020-08-28 三星电子株式会社 Hybrid visual communication
CN111798551A (en) * 2020-07-20 2020-10-20 网易(杭州)网络有限公司 Virtual expression generation method and device
CN112671995A (en) * 2014-10-31 2021-04-16 微软技术许可有限责任公司 Method, user terminal and readable storage medium for implementing during video call
CN112734895A (en) * 2020-12-30 2021-04-30 科大讯飞股份有限公司 Three-dimensional face processing method and electronic equipment
CN112767453A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Face tracking method and device, electronic equipment and storage medium
CN112887633A (en) * 2021-01-14 2021-06-01 四川航天神坤科技有限公司 Video splicing and three-dimensional monitoring display method and system based on camera
CN113343761A (en) * 2021-05-06 2021-09-03 武汉理工大学 Real-time facial expression migration method based on generation confrontation
US11295502B2 (en) 2014-12-23 2022-04-05 Intel Corporation Augmented facial animation
WO2022147736A1 (en) * 2021-01-07 2022-07-14 广州视源电子科技股份有限公司 Virtual image construction method and apparatus, device, and storage medium
US11887231B2 (en) 2015-12-18 2024-01-30 Tahoe Research, Ltd. Avatar animation system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208357B1 (en) * 1998-04-14 2001-03-27 Avid Technology, Inc. Method and apparatus for creating and animating characters having associated behavior
US6552729B1 (en) * 1999-01-08 2003-04-22 California Institute Of Technology Automatic generation of animation of synthetic characters
CN100483462C (en) * 2002-10-18 2009-04-29 清华大学 Establishing method of human face 3D model by fusing multiple-visual angle and multiple-thread 2D information

Cited By (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101657839B (en) * 2007-03-23 2013-02-06 汤姆森许可贸易公司 System and method for region classification of 2D images for 2D-to-3D conversion
CN101312524B (en) * 2007-05-23 2010-06-23 财团法人工业技术研究院 Moving object detecting apparatus and method using light track analysis
CN101669147B (en) * 2007-08-08 2012-06-06 科乐美数码娱乐株式会社 Game device, game device control method
CN101183462B (en) * 2007-12-12 2011-08-31 腾讯科技(深圳)有限公司 Cartoon image generation, implantation method and system
CN101625768B (en) * 2009-07-23 2011-11-09 东南大学 Three-dimensional human face reconstruction method based on stereoscopic vision
CN101894377A (en) * 2010-06-07 2010-11-24 中国科学院计算技术研究所 Tracking method of three-dimensional mark point sequence and system thereof
CN101976453A (en) * 2010-09-26 2011-02-16 浙江大学 GPU-based three-dimensional face expression synthesis method
CN101976455A (en) * 2010-10-08 2011-02-16 东南大学 Color image three-dimensional reconstruction method based on three-dimensional matching
CN101976455B (en) * 2010-10-08 2012-02-01 东南大学 Color image three-dimensional reconstruction method based on three-dimensional matching
WO2012139276A1 (en) * 2011-04-11 2012-10-18 Intel Corporation Avatar facial expression techniques
US9330483B2 (en) 2011-04-11 2016-05-03 Intel Corporation Avatar facial expression techniques
CN102254336B (en) * 2011-07-14 2013-01-16 清华大学 Method and device for synthesizing face video
CN102254336A (en) * 2011-07-14 2011-11-23 清华大学 Method and device for synthesizing face video
CN102254308A (en) * 2011-07-27 2011-11-23 清华大学 Method and system for computing interpolation of realistic scene
US11303850B2 (en) 2012-04-09 2022-04-12 Intel Corporation Communication using interactive avatars
US11595617B2 (en) 2012-04-09 2023-02-28 Intel Corporation Communication using interactive avatars
US9386268B2 (en) 2012-04-09 2016-07-05 Intel Corporation Communication using interactive avatars
US9357174B2 (en) 2012-04-09 2016-05-31 Intel Corporation System and method for avatar management and selection
CN103377367B (en) * 2012-04-28 2018-11-09 南京中兴新软件有限责任公司 The acquisition methods and device of face-image
CN103377367A (en) * 2012-04-28 2013-10-30 中兴通讯股份有限公司 Facial image acquiring method and device
CN102970510A (en) * 2012-11-23 2013-03-13 清华大学 Method for transmitting human face video
CN102970510B (en) * 2012-11-23 2015-04-15 清华大学 Method for transmitting human face video
US9277273B2 (en) 2012-11-30 2016-03-01 Mstar Semiconductor, Inc. Video data conversion method, device and smart TV
CN103198519A (en) * 2013-03-15 2013-07-10 苏州跨界软件科技有限公司 Virtual character photographic system and virtual character photographic method
US9589357B2 (en) 2013-06-04 2017-03-07 Intel Corporation Avatar-based video encoding
CN111599012A (en) * 2013-08-09 2020-08-28 三星电子株式会社 Hybrid visual communication
CN111599012B (en) * 2013-08-09 2023-08-29 三星电子株式会社 Hybrid Visual Communication
CN103985156B (en) * 2014-05-22 2017-07-21 华为技术有限公司 three-dimensional image generating method and device
CN103985156A (en) * 2014-05-22 2014-08-13 华为技术有限公司 Method and device for generating three-dimensional image
CN104318234B (en) * 2014-10-23 2017-06-16 东南大学 Face wrinkle three-dimensional extracting method and its equipment that a kind of use cloud data is represented
CN104318234A (en) * 2014-10-23 2015-01-28 东南大学 Three-dimensional extraction method of human face wrinkles shown in point cloud data and device thereof
CN112671995A (en) * 2014-10-31 2021-04-16 微软技术许可有限责任公司 Method, user terminal and readable storage medium for implementing during video call
US11295502B2 (en) 2014-12-23 2022-04-05 Intel Corporation Augmented facial animation
CN104899921B (en) * 2015-06-04 2017-12-22 杭州电子科技大学 Single-view videos human body attitude restoration methods based on multi-modal own coding model
CN104899921A (en) * 2015-06-04 2015-09-09 杭州电子科技大学 Single-view video human body posture recovery method based on multi-mode self-coding model
CN104915978A (en) * 2015-06-18 2015-09-16 天津大学 Realistic animation generation method based on Kinect
CN105338369A (en) * 2015-10-28 2016-02-17 北京七维视觉科技有限公司 Method and apparatus for synthetizing animations in videos in real time
CN105427385B (en) * 2015-12-07 2018-03-27 华中科技大学 A kind of high-fidelity face three-dimensional rebuilding method based on multilayer deformation model
CN105427385A (en) * 2015-12-07 2016-03-23 华中科技大学 High-fidelity face three-dimensional reconstruction method based on multilevel deformation model
US11887231B2 (en) 2015-12-18 2024-01-30 Tahoe Research, Ltd. Avatar animation system
CN105678702A (en) * 2015-12-25 2016-06-15 北京理工大学 Face image sequence generation method and device based on feature tracking
CN105678702B (en) * 2015-12-25 2018-10-19 北京理工大学 A kind of the human face image sequence generation method and device of feature based tracking
CN106327482A (en) * 2016-08-10 2017-01-11 东方网力科技股份有限公司 Facial expression reconstruction method and device based on big data
CN106327482B (en) * 2016-08-10 2019-01-22 东方网力科技股份有限公司 A kind of method for reconstructing and device of the facial expression based on big data
CN108021847A (en) * 2016-11-02 2018-05-11 佳能株式会社 For identifying apparatus and method, image processing apparatus and the system of countenance
CN108021847B (en) * 2016-11-02 2021-09-14 佳能株式会社 Apparatus and method for recognizing facial expression, image processing apparatus and system
WO2018103220A1 (en) * 2016-12-09 2018-06-14 武汉斗鱼网络科技有限公司 Image processing method and device
CN108229239B (en) * 2016-12-09 2020-07-10 武汉斗鱼网络科技有限公司 Image processing method and device
CN108229239A (en) * 2016-12-09 2018-06-29 武汉斗鱼网络科技有限公司 A kind of method and device of image procossing
CN107169529B (en) * 2017-06-07 2021-08-13 南京京君海网络科技有限公司 Non-rigid object motion structure recovery method
CN107169529A (en) * 2017-06-07 2017-09-15 南京京君海网络科技有限公司 A kind of high-performance non-rigid object motion structure restoration methods
CN107592449A (en) * 2017-08-09 2018-01-16 广东欧珀移动通信有限公司 Three-dimension modeling method, apparatus and mobile terminal
CN107610209A (en) * 2017-08-17 2018-01-19 上海交通大学 Human face countenance synthesis method, device, storage medium and computer equipment
CN107622511A (en) * 2017-09-11 2018-01-23 广东欧珀移动通信有限公司 Image processing method and device, electronic installation and computer-readable recording medium
CN107704854A (en) * 2017-09-21 2018-02-16 苏州轩明视测控科技有限公司 A kind of detection method of the print character defect based on optical flow field
CN108053434B (en) * 2017-12-28 2021-11-26 中科微光医疗研究中心(西安)有限公司 Cardiovascular OCT (optical coherence tomography) -based stent alignment method and device
CN108053434A (en) * 2017-12-28 2018-05-18 西安中科微光影像技术有限公司 A kind of support alignment method and device based on cardiovascular OCT
US11257299B2 (en) 2018-04-18 2022-02-22 Tencent Technology (Shenzhen) Company Limited Face model processing for facial expression method and apparatus, non-volatile computer-readable storage-medium, and electronic device
CN108537881A (en) * 2018-04-18 2018-09-14 腾讯科技(深圳)有限公司 A kind of faceform's processing method and its equipment, storage medium
CN109087340A (en) * 2018-06-04 2018-12-25 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system comprising dimensional information
CN108830894B (en) * 2018-06-19 2020-01-17 亮风台(上海)信息科技有限公司 Remote guidance method, device, terminal and storage medium based on augmented reality
CN108830894A (en) * 2018-06-19 2018-11-16 亮风台(上海)信息科技有限公司 Remote guide method, apparatus, terminal and storage medium based on augmented reality
US11394950B2 (en) * 2018-06-19 2022-07-19 Hiscene Information Technology Co., Ltd Augmented reality-based remote guidance method and apparatus, terminal, and storage medium
WO2019242262A1 (en) * 2018-06-19 2019-12-26 亮风台(上海)信息科技有限公司 Augmented reality-based remote guidance method and device, terminal, and storage medium
CN109035516A (en) * 2018-07-25 2018-12-18 深圳市飞瑞斯科技有限公司 Control method, apparatus, equipment and the storage medium of smart lock
CN109271950A (en) * 2018-09-28 2019-01-25 广州云从人工智能技术有限公司 A kind of human face in-vivo detection method based on mobile phone forward sight camera
CN109271950B (en) * 2018-09-28 2021-02-05 广州云从人工智能技术有限公司 Face living body detection method based on mobile phone forward-looking camera
CN109215061A (en) * 2018-11-06 2019-01-15 广东工业大学 A kind of face pore tracking and system
CN109215061B (en) * 2018-11-06 2022-04-19 广东工业大学 Face pore tracking method and system
CN109711335A (en) * 2018-12-26 2019-05-03 北京百度网讯科技有限公司 The method and device that Target Photo is driven by characteristics of human body
CN110189404B (en) * 2019-05-31 2023-04-07 重庆大学 Virtual face modeling method based on real face image
CN110189404A (en) * 2019-05-31 2019-08-30 重庆大学 Virtual facial modeling method based on real human face image
CN110807364B (en) * 2019-09-27 2022-09-30 中国科学院计算技术研究所 Modeling and capturing method and system for three-dimensional face and eyeball motion
CN110807364A (en) * 2019-09-27 2020-02-18 中国科学院计算技术研究所 Modeling and capturing method and system for three-dimensional face and eyeball motion
CN111161395A (en) * 2019-11-19 2020-05-15 深圳市三维人工智能科技有限公司 Method and device for tracking facial expression and electronic equipment
CN111161395B (en) * 2019-11-19 2023-12-08 深圳市三维人工智能科技有限公司 Facial expression tracking method and device and electronic equipment
CN111798551A (en) * 2020-07-20 2020-10-20 网易(杭州)网络有限公司 Virtual expression generation method and device
CN112734895A (en) * 2020-12-30 2021-04-30 科大讯飞股份有限公司 Three-dimensional face processing method and electronic equipment
WO2022147736A1 (en) * 2021-01-07 2022-07-14 广州视源电子科技股份有限公司 Virtual image construction method and apparatus, device, and storage medium
CN112887633A (en) * 2021-01-14 2021-06-01 四川航天神坤科技有限公司 Video splicing and three-dimensional monitoring display method and system based on camera
CN112767453A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Face tracking method and device, electronic equipment and storage medium
CN113343761A (en) * 2021-05-06 2021-09-03 武汉理工大学 Real-time facial expression migration method based on generation confrontation

Also Published As

Publication number Publication date
CN100416612C (en) 2008-09-03

Similar Documents

Publication Publication Date Title
CN1920886A (en) Video flow based three-dimensional dynamic human face expression model construction method
CN109584353B (en) Method for reconstructing three-dimensional facial expression model based on monocular video
Xu et al. Photo-inspired model-driven 3D object modeling
Cao et al. Facewarehouse: A 3d facial expression database for visual computing
CN103366400B (en) A kind of three-dimensional head portrait automatic generation method
CN1294541C (en) Method and system using data-driven model for monocular face tracking
US9818217B2 (en) Data driven design and animation of animatronics
CN103733226B (en) Quickly there is the tracking of joint motions
Sharma et al. 3d face reconstruction in deep learning era: A survey
JP5985619B2 (en) Controlling objects in a virtual environment
Bermano et al. Facial performance enhancement using dynamic shape space analysis
CN101038678A (en) Smooth symmetrical surface rebuilding method based on single image
Cong Art-directed muscle simulation for high-end facial animation
Jain et al. Three-dimensional proxies for hand-drawn characters
US20230230304A1 (en) Volumetric capture and mesh-tracking based machine learning 4d face/body deformation training
Bao et al. High-quality face capture using anatomical muscles
Reinert et al. Animated 3D creatures from single-view video by skeletal sketching.
Liu et al. BuildingSketch: Freehand mid-air sketching for building modeling
Zhang et al. Adaptive affine transformation: A simple and effective operation for spatial misaligned image generation
US11769309B2 (en) Method and system of rendering a 3D image for automated facial morphing with a learned generic head model
Milosevic et al. A SmartPen for 3D interaction and sketch-based surface modeling
Jain et al. Leveraging the talent of hand animators to create three-dimensional animation
Zhao et al. Cartoonish sketch-based face editing in videos using identity deformation transfer
Ma et al. Semantically-aware blendshape rigs from facial performance measurements
Malleson et al. Hybrid modeling of non-rigid scenes from RGBD cameras

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080903

Termination date: 20120914