CN102402691A

CN102402691A - Method for tracking gestures and actions of human face

Info

Publication number: CN102402691A
Application number: CN2010102780635A
Authority: CN
Inventors: 王阳生; 周明才
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2010-09-08
Filing date: 2010-09-08
Publication date: 2012-04-04

Abstract

The invention discloses a method for tracking gestures and actions of a human face, which comprises steps as follows: a step S1 includes that frame-by-frame images are extracted from a video streaming, human face detection is carried out for a first frame of image of an input video or when tracking is failed, and a human face surrounding frame is obtained, a step S2 includes that after convergent iteration of a previous frame of image, more remarkable feature points of textural features of a human face area of the previous frame of image match with corresponding feather points found in a current frame of image during normal tracking, and matching results of the feather points are obtained, a step S3 includes that the shape of an active appearance model is initialized according to the human face surrounding frame or the feature point matching results, and an initial value of the shape of a human face in the current frame of image is obtained, and a step S4 includes that the active appearance model is fit by a reversal synthesis algorithm, so that human face three-dimensional gestures and face action parameters are obtained. By the aid of the method, online tracking can be completed full-automatically in real time under the condition of common illumination.

Description

A kind of method that human face posture and action are followed the tracks of

Technical field

The present invention relates to Flame Image Process and computer vision field, particularly based on the human face posture and the motion tracking method of image.

Background technology

The estimation of people's face in image or the video being carried out attitude and face action has important use with tracking and is worth in fields such as man-machine interaction, virtual reality, intelligent monitoring, colourful attitude recognition of face and Expression Recognition.Such as; In expression animation system based on video; Can drive a visual human or cartoon character according to true man's head pose that from video, extracts and face action and do similar expression action, be with a wide range of applications in industries such as interactive digital amusement, animation making.

Different according to the employed information of track algorithm, can face tracking method be divided into based on characteristic method (Feature Based) and based on the method (Appearance Based) of outward appearance.Usually select some easy tracking based on the method for characteristic, and to the characteristics of image of robust relatively such as illumination, attitude, expression, semantic point etc. is arranged like color, edge, angle point or some.Usually do not need training data because local feature point matees, thereby usually illumination variation and people's face texture variations are compared robust based on the method for characteristic.Yet a shortcoming of these class methods is that tracking results is accurate inadequately, and with shake.Attempt whole people's face display model and input picture are mated based on the method for outward appearance, thereby realize face tracking.Method based on outward appearance generally requires current frame image whole people's face texture and reconstructed image to be complementary.Compare with method, owing to utilized the texture information of whole human face region, can follow the tracks of more accurately usually, stable and not shake based on the method for outward appearance based on characteristic.But this method is relatively more responsive for the original shape position, is absorbed in local minimum easily.

In method, the method based on the method for three-dimensional deformation model and two and three dimensions deformable faceform's active appearance models is arranged typically based on outward appearance.Because two and three dimensions deformable faceform's active appearance models algorithm has very big advantage than the three-dimensional deformation model method on match speed, thereby is a kind of practical method.

Two and three dimensions deformable faceform's active appearance models (2D+3D AAM) algorithm is a kind of two-dimentional active appearance models algorithm that adds the 3 d human face mesh model constraint in essence, thereby its core still is two-dimentional active appearance models algorithm.For further improving the performance of two-dimentional active appearance models algorithm; People improve two-dimentional active appearance models algorithm from all angles, are one of very important directions in many improvement directions to the improvement of active appearance models texture expression aspect.Everybody recognizes and uses original half-tone information to express as texture, is difficult to satisfy the demand of practical application.In practical application, the sample that we can collect is limited after all, can not contain into many variable factors such as all imaging circumstances, ethnic group, the colour of skin, attitude, expression, age, picture quality.Even if be encompassed in the inside, the ability to express of the texture model that the use gray feature trains out also is limited.

(Active Appearance Model, AAM) fitting algorithm is based on the iterative optimization method that gradient descends to active appearance models.Provide one preferably initial value be the successful prerequisite of iterative optimization method.In video tracking; When the rapid speed of user movement; If simply use result behind the previous frame image iteration convergence as the initial value of people's face shape in the current frame image; At this moment the method that descends based on gradient is easy to be absorbed in local minimum, but not correct global optimum causes following the tracks of and interrupts or failure.To this problem, common disposal route is in the process of following the tracks of, to merge particle filter algorithm.Yet a very big problem of these class methods is exactly that calculated amount is too big, is difficult to requirement of real time.

Summary of the invention

To above-mentioned prior art problems; The demand of balance various aspects of performance; Consider simultaneously in the practical application requirement of computing velocity the objective of the invention is the people's face to proper motion in the video, realize full-automatic, in real time, robust ground carries out people's face 3 d pose and face action tracking; For this reason, a kind of full automatic method that human face posture and action are followed the tracks of is provided.

To achieve these goals, the present invention is following with the technical scheme steps that the action method of following the tracks of proposes to human face posture:

Step S1: from video flowing, extract by two field picture, carry out people's face during to input video first two field picture or tracking failure and detect, obtain people's face and surround frame;

Step S2: behind previous frame image iteration convergence; During normal the tracking; Compare the notable attribute point for some textural characteristics of human face region in the previous frame image, find these corresponding with it unique points to mate, obtain the matching result of these unique points at current frame image;

Step S3: according to people's face encirclement frame or Feature Points Matching result the shape of active appearance models is carried out initialization, obtain the people's face shape initial value in the current frame image;

Step S4: use the counter-rotating composition algorithm that active appearance models is carried out match, obtain people's face 3 d pose and face action parameter.

The said method that human face posture and action are followed the tracks of is used based on the method for adaptive boosting and is carried out the detection of people's face, obtains position and the size of people's face in image.

Said unique point is to calculate the value of each gloomy matrix determinant in pixel sea earlier, chooses the relatively large pixel of extra large gloomy matrix determinant then as unique point.

Said previous frame image and said current frame image are used multi-resolution representation respectively, in order to improve Feature Points Matching speed.

Said Feature Points Matching is based on the multiresolution framework of image, the method that adopts piece coupling again current frame image find with the previous frame image in the point that is complementary of unique point.

Said active appearance models uses the multiband texture that merges gray scale and marginal information to express, in order to improve the active appearance models algorithm to the adaptive faculty of illumination variation and to not meeting the generalization ability of people's face.

It is said that based on the Feature Points Matching result shape of active appearance models to be carried out initialized treatment step following:

Step S31: in the previous frame image, on the basis, people's face location, select unique point;

Step S32: calculated characteristics point is in the barycentric coordinates that belong to separately in the triangular plate;

Step S33: carry out Feature Points Matching at current frame image;

Step S34: the unique point according to coupling is estimated people's face shape.

Said active appearance models comprises two-dimension human face shape, two-dimension human face texture model, 3-d deformable faceform and imaging model; Said imaging model adopts complete perspective projection, so that obtain human face posture parameter accurately.

Advantage of the present invention is:

1. can realize full-automatic human face posture and motion tracking.At first two field picture or follow-up when losing; The method that can adopt people's face to detect detects people's face and the face encirclement frame of choosing comes the initialization active appearance models; And under normal situation of following the tracks of, utilize Feature Points Matching to obtain the initial value of people's face shape of current frame image; Therefore whole tracing process can automatically be accomplished, and does not need manual intervention.

2. tracking velocity is fast, for the computing machine of Pentium 2.8G, under 320 * 240 resolution, can carry out real-time online and follow the tracks of.

3. track algorithm has good extensive performance, can under common illumination condition, carry out robust tracking to unseen people.

Description of drawings

Fig. 1 is the process flow diagram of full-automatic human face posture of the present invention and motion tracking.

Fig. 2 a to Fig. 2 c is the shape initialization procedure synoptic diagram based on Feature Points Matching of the present invention.

Fig. 3 a to Fig. 3 c is the average face synoptic diagram that multiband texture of the present invention is expressed.

Fig. 4 is a complete perspective projection model synoptic diagram of the present invention.

Embodiment

To combine accompanying drawing that the present invention is specified below, and be to be noted that described embodiment only is intended to be convenient to understanding of the present invention, and it is not played any qualification effect.

Referring to Fig. 1 the automatic watch mutual affection analysis method based on motion tracking is shown, implements according to following steps:

1. people's face detects

The purpose that people's face detects is to detect the position of people's face in image automatically, and the present invention utilizes adaptive boosting (Adaboost) algorithm to carry out people's face and detects automatically.Adaptive boosting is a kind of statistical learning algorithm commonly used, successfully has been applied to detection of people's face and the classification of people's face.Adaptive boosting is provided with weights for each training sample; And revise the weights of sample iteratively; To suitably be reduced by the weights of the sample of correct classification, and will suitably be improved, so just can focus on the sample of difficult classification by the weights of the sample of mis-classification.

People's face only detects and when first two field picture or follow-up tracking failure, just carries out.At this moment choose people's face that face detection algorithm obtains surrounds frame and comes active appearance models (AAM) is carried out initialization.Because people's face detection block has only size and positional information, thus the x in can only active appearance models global affine transformation parameter, y direction translation parameters and whole zooming parameter carry out initialization, and other parameters then are set to zero.

2. based on the shape initialization of Feature Points Matching

In normal tracing process, the present invention utilizes Feature Points Matching to obtain the initial value of the people's face shape in the current frame image, can effectively improve the capturing ability to rapid movement.Basic thought based on the shape initialization algorithm of Feature Points Matching is: behind previous frame image iteration convergence; Choose some relatively notable attribute points at human face region; At current frame image these unique points are mated then, obtain the matching result of these unique points; Again according to the unique point of these couplings to estimating the form parameter of people's face in the current frame image.The concrete steps of algorithm are following:

1) on basis, previous frame image people face location, select unique point:

In the present invention, unique point is made up of two parts: a part is the more preassigned semantic point that has, and in the present invention, we have selected 30 semantic points, comprise some angle points and point on the eyes eyebrow nose face; Another part is the comparison notable attribute point of choosing according to the value of each gloomy matrix determinant in pixel sea, and the computing formula of extra large gloomy matrix is:

D (u, v) = |\begin{matrix} I_{xx} (u, v) & I_{xy} (u, v) \\ I_{xy} (u, v) & I_{yy} (u, v) \end{matrix}|,

I wherein _Xx(u v) is pixel (u, square (or the weighted sum of squares of field point to play a smoothing effect) on every side of the x direction gradient of v) locating, I _Yy(u is v) with I _Xx(u, v) similar, I _Xy(u v) is that (u v) locates x and y direction gradient product (or the weighted product sum of field point) on every side to pixel.

Last selected unique point is shown in Fig. 2 a.

2) calculated characteristics point is established x=(x in the barycentric coordinates that belong to separately in the triangular plate; Y) be with

be in the triangle on summit a bit; Wherein subscript r, s, t are used to identify leg-of-mutton three summits, and then this unique point can be expressed as the weighted mean on Atria summit:

x = α x_{r}^{0} + β x_{s}^{0} + γ x_{t}^{0},

Its weight, β and γ are exactly these barycentric coordinates in triangle, and computing formula is:

α = \frac{x_{s}^{0} y_{t}^{0} - y_{s}^{0} x_{t}^{0} - {xy}_{t}^{0} + {yx}_{t}^{0} - {yx}_{s}^{} + {xy}_{s}^{0}}{x_{s}^{0} y_{t}^{0} - x_{r}^{0} y_{t}^{0} - x_{s}^{0} y_{r}^{0} - y_{s}^{0} x_{t}^{0} + y_{r}^{0} x_{t}^{0} + y_{s}^{0} x_{r}^{0}},

\begin{matrix} β = \frac{{xy}_{t}^{0} - x_{r}^{0} y_{t}^{0} - {xy}_{r}^{0} - {yx}_{t}^{} + y_{r}^{0} x_{t}^{0} + {yx}_{r}^{0}}{x_{s}^{0} y_{t}^{0} - x_{r}^{0} y_{t}^{0} - x_{s}^{0} y_{r}^{0} - y_{s}^{0} x_{t}^{0} + y_{r}^{0} x_{t}^{0} + y_{s}^{0} x_{r}^{0}}, \\ γ = 1 - α - β, \end{matrix}

After selecting unique point, calculate each unique point x _iBarycentric coordinates in its place triangular mesh are (c _I1, c _I2, c _I3), and the sequence number of an Atria summit, record place in shape can be used when calculating in the current frame image people's face shape initial value in the back, i=1, and 2,3.....M, M are unique point quantity.

3) carry out Feature Points Matching at current frame image

The present invention adopts the method for piece template matches to carry out the coupling of unique point.Concrete grammar is in the previous frame image, is the center with the unique point, takes out a pocket T, on current frame image, is that a bigger area I is selected at the center with this unique point also then, and area I should be able to inclusion region T.Subregion T in the previous frame image is slided in the current frame image area I, calculate the normalized correlation coefficient of lap.With the maximum position of normalized correlation coefficient as with the previous frame image on the position that is complementary of unique point.In the present invention, the size of regional T is taken as 9 * 9 (empirical values).The two frame peak excursions that the size of area I is followed the tracks of are as required confirmed.In order to accelerate matching speed, can use by thick to smart multiresolution framework.

What Fig. 2 b showed is the example as a result of a Feature Points Matching.Can find out that from this figure most of unique point all correct match has arrived, have only the fraction unique point not match.Fail the reason of correct match have multiple, such as since the variation of attitude cause local grain generation marked change even be blocked.Fig. 2 c is the original shape of the active appearance models that obtains according to Feature Points Matching.Can find out from this figure,, thereby obtain more satisfactory active appearance models original shape in view of the above owing to Feature Points Matching gets better.

4) estimate people's face shape according to the unique point of coupling

Then we can utilize the unique point of these couplings and information such as barycentric coordinates that the front obtains to roughly estimate people's face shape of current frame image to suppose in current frame image, to have M unique point

that matches.This can realize through minimizing following formula:

p_{0} = \min_{p} Σ_{i = 1}^{M} - w_{i} ρ ({| | Σ_{j = 1}^{3} c_{ij} W (x_{ij}; p) - z_{i} | |}^{2}, r),

W wherein _iNormalized correlation coefficient when being coupling,

To be i unique point be mapped to when the given form parameter p coordinate position on the current frame image, Be i the corresponding average face key point coordinate in shape in triangle 3 summits, unique point place, ρ (, r) be a robust error function.We can use Gauss-newton (Gauss-Newton) algorithm to carry out iterative optimum shape parameter p ₀ρ (, definition r) is following:

ρ (δ, r) = \{\begin{matrix} \frac{3 (r^{2} - δ^{2})}{4 r^{3}} & δ < r \\ 0 & otherwise \end{matrix},

Wherein r is for putting the letter radius.

3. human face posture and action parameter extract

The present invention uses two and three dimensions deformable faceform's active appearance models algorithm to obtain people's face 3 d pose and face action parameter.Two and three dimensions deformable faceform's active appearance models algorithm can be regarded the two-dimentional active appearance models algorithm that has added the 3 d human face mesh model constraint in essence as.

3.1 merging the multiband texture of gray scale and marginal information expresses.

The present invention proposes a kind of multiband texture that merges gray scale and marginal information and express, to improve the active appearance models algorithm to the adaptive faculty of illumination variation and to not meeting the generalization ability of people's face.Its basic thought is that first wave filter with 3 quadratures carries out filtering to image, then filtered 2 width of cloth gradient images is done a square processing, to remove directional information, at last three width of cloth images is done (Sigmoid) normalization processing of S shape.The multiband texture that specifically can adopt following steps to obtain fusion gray scale and marginal information is expressed:

Wherein the sigmoid function definition is as follows:

f (m) = \frac{m}{m + \overset{&OverBar;}{m}},

Wherein is the average of m in the whole area-of-interest; The effect of this function is to approach 0 value being transformed to much smaller than the input value of average

, approaches 1 value and be transformed to the input value much larger than average

.When this function is applied to the normalization edge image, can strengthen real edge image effectively, suppress possible noise simultaneously.

When calculating the multiband texture expression of merging gray scale and marginal information in the above, we have done square operation to the gradient of x direction and y direction.This operation has the benefit of two aspects:

1) effectively strengthens possible edge;

2) directional information of removal edge gradient, a preserving edge strength information.When not knowing that background is bright or darker than people face, it is necessary removing gradient direction information.In addition; Here be

and

to be done Sigmoid normalization respectively handle, calculated amount is less relatively.

There is any need to prove in addition; Here during compute gradient, need earlier input picture is deformed on the average shape, obtain the irrelevant facial image of shape; And then on the irrelevant facial image of shape compute gradient, with the gradient that guarantees to calculate in the image irrelevant to rotation direction.

Fig. 3 a to Fig. 3 c has shown the average face image of this multiband texture model.Wherein Fig. 3 a is the average face image of gray scale wave band, and Fig. 3 b and Fig. 3 c are respectively the average face images of x and y direction gradient wave band.As can be seen from the figure, the information of these three wave bands has very strong complementarity, and Fig. 3 a characterizes overall intensity and distributes, and Fig. 3 b characterizes x direction coboundary and distributes, and Fig. 3 c characterizes y direction coboundary and distributes.Simultaneously owing to used square operation and the operation of Sigmoid function normalization, the essential marginal texture information of people's face has been contained in the multiband texture expression of merging gray scale and marginal information, filtering nonessential interfere information.

3.2 three-dimensional face model and imaging model.

In the present invention, used deformable three-dimensional face model (Candide-3) is a three-dimensional model that is called Kan Daide-3 (Candide-3).This model not only can suitably be out of shape according to different people's face shapes approaching personalized human face, and can produce corresponding deformation according to face action, is very suitable for human face action and describes and follow the tracks of.

The shape g of 3-d deformable face wire frame model can be with each apex coordinate P of three-dimensional model _i=(x _i, y _i, z _i) ^TThe vector that is formed by connecting is represented:

g＝(x ₁，y ₁，z ₁，x ₂，y ₂，z ₂，…，x _n，y _n，z _n) ^T，

Wherein n is a grid vertex quantity, (x _i, y _i, z _i) ^TBe i grid vertex P _iThree-dimensional coordinate, T representing matrix transposition.The three-dimensional coordinate here is the coordinate figure in model self local coordinate system, for people's face shape of propertyization one by one, can use following shape to generate:

g = \overset{&OverBar;}{g} + Sσ + Aα,

Wherein

is the average face shape; S and A are respectively change in shape and action transformation matrices; The all corresponding a kind of independently changing pattern of each row of matrix, σ and α are respectively change in shape and action variation factor vector.Therefore, S σ has described grid model to the different variation of people's face on global shape, like whole fat or thin of people's face, distance between two and the position of eyes eyebrow nose mouth in people's face etc.A α describes the variation of the mesh shape that face action causes, as opens one's mouth, and lifts eyebrow etc.In addition, we suppose that it is separate that change in shape changes with action.

The present invention has adopted complete perspective projection model to describe imaging process, and is as shown in Figure 4.Each coordinate system is the cartesian coordinate system that satisfies the right-hand rule among the figure.

If the Candide-3 faceform is last 1 P _o=(X _o, Y _o, Z _o) ^T, coordinate is P under camera coordinates system after rotating R, translation T rigid body translation _c=(X _c, Y _c, Z _c) ^T:

(\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}) = R (\begin{matrix} X_{o} \\ Y_{o} \\ Z_{o} \end{matrix}) + T,

Point P _cThrough complete perspective projection 1 P to the plane of delineation _i, establishing this some coordinate under camera coordinates system is P _i=(X _i, Y _i, Z _i) ^T, then the relation between these two coordinates is:

(\begin{matrix} X_{i} \\ Y_{i} \\ Z_{i} \end{matrix}) = (\begin{matrix} f \cdot X_{c} / Z_{c} \\ f \cdot Y_{c} / Z_{c} \\ f \end{matrix}),

Wherein f is the digital focal length of camera, can calculate according to the resolution of entire image and the visual angle size of camera.

Some P on the plane of delineation _iCoordinate under image coordinate system can obtain through computes:

(\begin{matrix} u_{i} \\ v_{i} \end{matrix}) = (\begin{matrix} X_{i} + W / 2 \\ - Y_{i} + H / 2 \end{matrix}),

Wherein W and H are respectively the width and the height of image.Because y axle positive dirction typically refers to downwards under image coordinate system, so following formula is to Y _iDone individual inversion operation.In addition since the initial point of image coordinate system usually in the upper left corner of image, so x, the y coordinate adds respectively and equals image half-breadth and half high skew.

3.3 active appearance models algorithm based on the two and three dimensions deformable faceform of Kan Daide-3 three-dimensional face model.

The proposition of initial two and three dimensions active appearance models algorithm is in order to improve the performance of two-dimentional active appearance models algorithm; Force a constraint promptly for two-dimentional active appearance models, make that the two-dimensional shapes of utilizing two-dimentional global affine transformation parameter and form parameter to generate is legal.Here, legal implication is to have deformation parameter in effective three-dimensional rotation and translation parameters and the three-dimensional face model, and the feasible plane of delineation gained two-dimensional shapes that projects to is consistent with the shape that the two-dimensional shapes model generates.On the mathematics, this constraint can be expressed as:

minE _m＝min||s′(p)-P(Q(g′(σ，α)))|| ²，

G (σ wherein; α) expression Kan Daide-3 people face shape; Q (x) expression is done the three-dimensional rigid body conversion that comprises rotation matrix R and translation vector T to the three-dimensional vertices vector x; P representes that each three-dimensional vertices among the three-dimensional vertices vector Q (x) is projected to the plane of delineation obtains its pixel coordinate under image coordinate system, the two-dimensional shapes that s (p) expression generates according to the two-dimensional shapes parameter p, the part of subscript ' whole vector of expression.Because the summit is not one to one on the key point of two-dimensional shapes and the three-dimensional face grid, have only the sub-fraction ability corresponding, subscript ' be exactly this part on the expression ability correspondence.

Above-mentioned bound term is added in the objective function of original two dimensional active appearance models, promptly obtain so-called two and three dimensions deformable faceform's active appearance models, that is:

| | A_{0} + Σ_{i = 1}^{m} λ_{i} A_{i} - I (W (p)) | | + w_{m} {| | s^{'} (p) - P (Q (g^{'} (σ, α))) | |}^{2},

A wherein ₀Be average face outward appearance, A _iFor with preceding n people's face outward appearance base vector, λ _iBe apparent coefficient, W (p) is a coordinate transform function, w _mBe the weight of 3D shape constraint portions, w in the present invention _m=0.1.

Optimization for two and three dimensions deformable faceform's active appearance models is found the solution, and still can use the fast algorithm in the two-dimentional active appearance models.The first step, the span (A in the subspace _i) ^⊥The optimization aim function:

{| | A_{0} - I (W (p)) | |}_{span {(A_{i})}^{&perp;}}^{2} + w_{m} \underset{i}{Σ} {F_{i}}^{2} (p; P; σ; α),

F wherein _I(p; P; σ; α) the site error of i obligatory point of expression on x, y direction.Here succinct in order to write, with the synthetic expression formula of the site error on x, the y both direction.The method that second step was found the solution apparent parameter lambda is with two-dimentional active appearance models.

In order to represent that conveniently we are linked to be a vectorial q=(p with all unknown numbers; P; σ; α), then can accomplish through following two step iteration the optimization of following formula:

The first step: calculating parameter renewal amount:

Δq = - H_{3 D}^{- 1} [(\begin{matrix} Δ p_{SD} \\ 0 \end{matrix}) + w_{m} \cdot \underset{i}{Σ} {(\frac{&PartialD; F}{&PartialD; q})}^{T} \cdot Fi (q)],

Δ p wherein _SDBe the steepest decline figure of two dimension part, H _3DBe the extra large gloomy matrix of two and three dimensions active appearance models, F _iBe each bound term, the concrete definition as follows:

Δ p_{SD} = \underset{x &Element; s_{0}}{Σ} SD (x) \cdot (A (x) - I (W (x; p))),

H_{3 D} = (\begin{matrix} H_{2 D} & 0 \\ 0 & 0 \end{matrix}) + w_{m} \cdot \underset{i}{Σ} {(\frac{&PartialD; F_{i}}{&PartialD; q})}^{T} (\frac{&PartialD; F_{i}}{&PartialD; q}),

H_{2 D} = \underset{x &Element; s_{0}}{Σ} {SD}^{T} (x) \cdot SD (x),

SD (x) = {(\begin{matrix} &dtri; A_{0} (x) \frac{&PartialD; N}{&PartialD; q} & &dtri; A_{0} (x) \frac{&PartialD; W}{&PartialD; p} \end{matrix})}_{span {(A_{i})}^{&perp;}},

Second step: parameter update:

Utilize counter-rotating synthetic method undated parameter p, use addition rule undated parameter P, σ and α.

Need to prove, in above-mentioned arthmetic statement,, ignored to F in order to describe and to be concise in expression _i(p; P; σ; Necessity correction that need do when α) carrying out the single order Taylor expansion, i.e. F _i(p; P; σ; Increment problem about p α) need be come from the mapping of the increment on the average shape.

In superincumbent two and three dimensions deformable faceform's the active appearance models fitting algorithm, can try to achieve human face posture parameter, personalized form parameter σ and face action parameter alpha simultaneously.Under the tracking environmental of reality, confirm personalized form parameter σ in the tracking initiation stage usually, remain unchanged at follow-up tracking phase, only if just the tracking failure needs calculating once more in the time of need reinitializing grid model.In follow-up tracing process, only need tracks facial action parameter α.This can bring the benefit of two aspects:

1) reduced the number of parameters that to find the solution when following the tracks of, and then improved tracking velocity, again the necessary operation parameter can be provided simultaneously.

2) can be to the form generation of the two-dimentional active appearance models stronger constraint strength of fixing personalized shape coefficient.

Top description is to be used to realize the present invention and embodiment, and therefore, scope of the present invention should not described by this and limit.It should be appreciated by those skilled in the art,, all belong to claim of the present invention and come restricted portion in any modification that does not depart from the scope of the present invention or local replacement.

Claims

1. one kind to human face posture and the action method of following the tracks of, and comprises step:

2. according to the said method that human face posture and action are followed the tracks of of claim 1, it is characterized in that also comprising step: use based on the method for adaptive boosting and carry out the detection of people's face, obtain position and the size of people's face in image.

3. according to the said method that human face posture and action are followed the tracks of of claim 1; It is characterized in that: said unique point is to calculate the value of each gloomy matrix determinant in pixel sea earlier, chooses the relatively large pixel of extra large gloomy matrix determinant then as unique point.

4. according to the said method that human face posture and action are followed the tracks of of claim 1, it is characterized in that: said previous frame image and said current frame image are used multi-resolution representation respectively, in order to improve Feature Points Matching speed.

5. according to the said method that human face posture and action are followed the tracks of of claim 1; It is characterized in that; Said Feature Points Matching is based on the multiresolution framework of image, the method that adopts piece coupling again current frame image find with the previous frame image in the point that is complementary of unique point.

6. according to the said method that human face posture and action are followed the tracks of of claim 1; It is characterized in that: said active appearance models uses the multiband texture that merges gray scale and marginal information to express, in order to improve the active appearance models algorithm to the adaptive faculty of illumination variation and to not meeting the generalization ability of people's face.

7. according to claim 1 is said human face posture and the action method of following the tracks of is characterized in that said based on the Feature Points Matching result shape of active appearance models to be carried out initialized treatment step following:

Step S33: carry out Feature Points Matching at current frame image;

8. according to the said method that human face posture and action are followed the tracks of of claim 1, it is characterized in that: said active appearance models comprises two-dimension human face shape, two-dimension human face texture model, 3-d deformable faceform and imaging model; Said imaging model adopts complete perspective projection, so that obtain human face posture parameter accurately.