CN103258204A

CN103258204A - Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features

Info

Publication number: CN103258204A
Application number: CN2012100413414A
Authority: CN
Inventors: 吴奇; 申寻兵; 傅小兰
Original assignee: Institute of Psychology of CAS
Current assignee: Institute of Psychology of CAS
Priority date: 2012-02-21
Filing date: 2012-02-21
Publication date: 2013-08-21
Anticipated expiration: 2032-02-21
Also published as: CN103258204B

Abstract

The invention provides an automatic expression recognition method. The method includes the following steps: step10, capturing face areas in frame images of a video and preprocessing the face areas, step20, extracting Gabor features and EOH features of images of the corresponding face areas, step30, integrating the corresponding features to acquire final superficial features of the target video and acquiring an expression tag sequence of each frame of video images through a classifier acquired by training, step 40, scanning the expression tag sequences, judging duration time of expressions and outputting expression classes according to the acquired micro- expressions.

Description

A kind of automatic little expression recognition method based on Gabor and EOH feature

Technical field

The present invention relates to Expression Recognition and image recognition technology, more specifically, relate to a kind of automatic little expression recognition method.

Background technology

Current unipolitics general layout is turbulent, the terrorist activity many places take place frequently.The scientists and engineers of countries in the world is striving to find the behavior clue relevant with extreme behavior with violence, and attempts technology or method that exploitation can detect above-mentioned behavior.

Little expression is closely related with human inherent emotion information process, and it can't be forged, and is not subjected to consciousness control, has reflected true emotion and the intention of human heart.Thereby little expression becomes can carry out lie and the dangerous effective clue that is intended to detection.U.S. Department of Defense, the Central Intelligence Agency, native country security bureau, safety management inspection administration etc. even begun to accept the training course of Ekman utilize little expression to carry out anti-terrorism work.

Yet people are to the essential understanding of little expression, also very limited to the practical application of little expression.This mainly is because just at the early-stage as the research of the basic little expression expression of little expression research.Because the duration of little expression is very short, there is not trained observer to be difficult to the little expression of correct identification.Though the researcher has released little expression training tool, but Frank, Herbasz, Sinuk, Keller and Nolan discover, even accepted little expression training tool (METT) training, the achievement of the little expression in the tested identification reality is very poor (accuracy rate has only about 40%) still.At present, little expression researcher, just must encode to the video that may comprise little expression by means of facial behavior coded system FACS if want little expression is accurately identified frame by frame with the relevant worker of practice.But not only the training of FACS coding is more time-consuming, and the coding person generally need accept 100 hours training just can reach preliminary skilled degree; And it is also very time-consuming to use FACS to encode, and the video of encoding 10 minutes needs 10 hours at least.

In people's such as Porter research, only use low speed video camera (30 frame/second) that tested facial expression is carried out short record, the video that needs in the research to analyze is namely up to 300,000 frames.And under such situation since the quantity of little expression of capturing very little, only can be described statistics and can't use the method for inferring statistics to carry out data analysis the expression of little expression.If consider to use high-speed camera, to satisfy the demand of statistical study, then need the video frame number of manual coding will increase several times.If consider non-volatile recording facial expression (for example under the hearing situation), then need the number of videos of manual coding to increase more fast.Coding rate and magnanimity need the video data of artificial treatment too slowly, make the arduousness work in order to waste time and energy of being encoded into of little expression, make correlative study and practice all be difficult to go on, this is the biggest obstacle that faces in present little expression research and the little expression application work.Therefore, no matter be little expression research, the still application of little expression, the little expression analysis tool for robotization all has very large demand.Little expression analysis tool of development robotization is present little expression researchers' top priority.

The technology and method of computer vision (Computer Vision) is combined with psychologic correlative study achievement, is the little Expression Recognition system that might develop robotization.In fact, at present, had from the U.S. and Japan two independently research group explore in this problem.

Polikovsky, Kameda and Ohta have proposed the method that a kind of 3D of utilization gradient orientation histogram (3D gradients orientation histogram) extracts facial movement information in 2009.They are divided into 12 region of interest (ROI) with people's face, and characterize by respective regions in the video being followed the trail of finish the 3D gradient orientation histogram that extracts respective regions.Remove in addition, they have also collected one ' little expression database '.They require testedly to make facial expression with alap expression intensity, fast as far as possible speed, and the tested process of making facial expression is made a video recording with high-speed camera.The result of K mean cluster (K-means) shows that on this database, the 3D gradient orientation histogram can effectively characterize the facial expression motor unit that is in out of phase (phase) (action unit) of different facial zones.

Shreve equals to have proposed in the period of 2009 to 2011 a kind of new expression methods of video segmentation.They utilize light stream (optical flow) to come calculating optical stress (optical strain).Then, they are divided into 8 region-of-interests with face, and the image of eyes, nose and mouth is removed with a T phenotypic marker.They have realized cutting apart macroscopic view expression (being generic expression) and little expression video by comparing in each region-of-interest optical stresses that calculates and a certain threshold value that training obtains.

These two research distances target of the little expression of identification automatically still have bigger distance, only are desk study.The method of propositions such as Polikovsy can only be distinguished different AU and the residing out of phase thereof of different facial zones, and can not directly directly measure the duration of expression and expression.And the measurement of expression duration is the function that must be able to finish for automatic little Expression Recognition system.In addition, the method for bringing out little expression in this research also has very big problem: in this research, little expression is that the tested imitation of requirement is come out, and requires expression intensity low as far as possible.At present, the researcher thinks that little expression is difficult to be forged, the difference of little expression and generic expression be its duration very short and with the expression intensity do not have any relation.The method that Shreve equals to propose in 2009 can not be cut apart the video that comprises little expression and macroscopic view expression simultaneously, 2011 they realized first in a united frame, the two being cut apart, but the result shows, this method is to the capture rate very low (50%) of little expression, rate of false alarm very high (50%).In addition, little expression data set of collections such as Shreve also has very big problem, is to collect with the method that requires tested imitation.Most critical be that the method for Shreve etc. is the method cut apart of expression video, can not identify classification under the expression that comprises in the video.

Summary of the invention

For overcoming above-mentioned defective of the prior art, the present invention proposes a kind of automatic little expression recognition method.

According to an aspect of the present invention, proposed a kind of automatic little expression recognition method, having comprised: the human face region in the two field picture of step 10), capturing video, and carry out pre-service; Step 20), the image to corresponding human face region extracts Gabor feature and EOH feature; Step 30), individual features is merged, obtain the final sign of target video; By the sorter of training gained, obtain the expression sequence label of each frame video image; Step 40), this expression sequence label is scanned, judge the duration of expression, according to little expression of obtaining, output expression classification.

The application's method has improved recognition speed and accuracy, has improved speed and the efficient of training, but has made little Expression Recognition enter application.

Description of drawings

Fig. 1 is the process flow diagram according to little expression recognition method of the present invention.

As shown in the figure, in order clearly to realize the structure of embodiments of the invention, specific structure and device have been marked in the drawings, but this only needs for signal, be not that intention limits the invention in this ad hoc structure, device and the environment, according to concrete needs, those of ordinary skill in the art can adjust these devices and environment or revise, and the adjustment of carrying out or modification still are included in the scope of accompanying Claim.

Embodiment

Below in conjunction with the drawings and specific embodiments a kind of automatic little expression recognition method provided by the invention is described in detail.In the following description, a plurality of different aspects of the present invention will be described, yet, for those skilled in the art, can only utilize more of the present invention or entire infrastructure or flow process are implemented the present invention.For the definition of explaining, set forth specific number, configuration and order, but clearly, do not had also can to implement the present invention under the situation of these specific detail.In other cases, in order not obscure the present invention, will no longer be described in detail for some well-known features.

Generally speaking, when carrying out little Expression Recognition, with comprehensively extracting Gabor feature and the EOH feature of little expression, after the comprehensive characterization of the Gabor+EOH that has obtained target, directly all features are not used for training, but feature is filtered.Experiment shows, only filters at the EOH feature to be only effectively.The method of filtering is: the Gentleboost (MI+DWT) that obtains with the initial stage carries out taking turns training to training sample after initialization training sample weight, obtain the weighting error rate of Weak Classifier on each dimensional feature; If this error rate is higher than a certain pre-set threshold a and then this feature is abandoned, otherwise then this feature is kept.After the feature of filtering, will be with the capable feature selecting of PreAvgGentleboost_EOHfilter.This method will (reduce feature quantity about 74%) when greatly reducing the algorithm computation complexity, keep the nicety of grading of sorter, make that the Gabor+EOH feature is used for little Expression Recognition becomes possibility.

When using PreAvgGentleboost_EOHfilter to carry out feature selecting, will take all factors into consideration weighted error and the weighted quadratic error of Weak Classifier, rather than simply consider the weighted quadratic error of Weak Classifier.PreAvgGentleboost_EOHfilter will sort according to this overall target, and the Weak Classifier that obtains during with the training loop iteration is arranged with this overall target ascending order.The selection of Weak Classifier will be tabulated according to this, carry out according to system of selection before.This method will solve the problem that can damage the sorter accuracy rate when directly adopting the Gabor+EOH feature.SVM and PreAvgGentleboost_EOHfilter are made up, be used for training to obtain new sorter.

Automatic little Expression Recognition based on the Gabor feature

The human visual system can be similar to regards a layer structure as, the processing of visual information is stratiform goes forward one by one, and wherein the primary vision cortex is the basis of this layer structure, therefore, can push away from this conclusion, in the response of human primary vision cortex to static expression picture, should comprise the useful information that the difference expression can be made a distinction, that is, human primary vision cortex is a kind of effective characteristic manner of identifying expression to the characteristic manner of static expression picture.If human primary vision cortex can be simulated or imitate to computer vision system to the characteristic manner of expression, and analyzed in some way, then with the high speed processing speed of computer system, the final computer vision system that forms then may humanly be identified expression the mode that expression scans with hypervelocity with a kind of being equivalent to.Such system can identify everyone each expression in each frame of video, with identification the result analyzed after, it is respectively that computer system should be able to recognize the little expression that whether comprises little expression in the video and comprise with certain accuracy rate for what expression.

Wavelet transformation is a kind of mathematical tool that new development is in recent ten years got up.Wherein, the Gabor wavelet transformation has obtained using widely in area of pattern recognition as a kind of feature extraction and characterization image method.The Gabor wavelet transformation has the characteristic that can obtain optimal partialization the while at spatial domain and frequency domain simultaneously, therefore partial structurtes information corresponding to spatial frequency feature (spatial frequency characteristics has another name called yardstick scale), locus (spatial localization) and directional selectivity (orientation selectivity) can be described well.The researcher is verified, and the filter response of most of primary vision cortex simple cell of mammal (comprising human) can be simulated by the two-dimensional Gabor small echo of one group of self similarity.

People's face detects and preprocess method

For improving the system automation degree, adopt Kienzle, Bakir, people's face detection algorithm that Franz and Scholkopf propose is caught automatically to the people's face in the video.This algorithm uses 5 layers support vector machine, and (support vector machines SVM) forms (cascaded-based) face detection system of a cascade.Wherein, the SVM of ground floor adopts rank defect (rank deficient) method to carry out improving to improve the recognition speed of system.This algorithm can be realized in real time the people's face in the video being detected in the system of Pentium 4 2.8Ghz.Its test data on the MIT-CMU face database shows, this algorithm accuracy very high (hitting rate 93.1%, rate of false alarm 0.034%).

The image preprocessing process that adopts is as follows: finish after moving face detects, the facial image that captures will at first be converted into 8 gray level images; Then, with the secondary linear interpolation method with image normalization to 48 * 48 pixel sizes.In this application, embodiment 1 and embodiment 2 will take this method to carry out the image pre-service.

The Gabor characterizing method of human face expression

Adopt the two-dimensional Gabor filter group that the facial image that captures is carried out feature extraction, characterize with the Gabor that forms human face expression.Two-dimensional Gabor filter is one group of plane wave with Gaussian envelope, can accurately extract the local feature of image, and displacement, deformation, rotation, dimensional variation and illumination variation are had certain robustness.Its Gabor nuclear is defined as:

Ψ_{u, v} (z) = \frac{{| | k_{u, v} | |}^{2}}{σ^{2}} e^{(- {| | k_{u, v} | |}^{2} {| | z | |}^{2} / 2 σ^{2})} (e^{i k_{u, v} z} - e^{- σ^{2} / 2}) . - - - (1)

Wherein

K _MaxBe maximum frequency, f is the interval factor between the Gabor nuclear in the frequency domain.

φ _u[0, π), this has determined the direction of Gabor wave filter to ∈.(‖ ‖ represents modulo operation to z=for x, y) expression position, and parameters u and v represent direction and the yardstick of control Gabor wave filter respectively.

First oscillating part that determines wave filter of formula (1), second then is used for the compensating direct current component, to eliminate filter response to the susceptibility of image overall illumination conversion.The number of oscillation of parameter σ control envelope function.When extracting the Gabor feature of facial expression image, generally can select different yardsticks and direction to form one group of Gabor nuclear, then Gabor nuclear and given image are carried out the Gabor feature that convolution algorithm produces image.Parameters u is depended in the design of Gabor bank of filters, v, K _Max, f, the selection of σ.In this application, adopt the Gabor filter set of 9 yardsticks, 8 directions that the facial image that captures is carried out feature extraction.Concrete parameter is selected as follows:

σ＝2π，k _max＝π/2，

v＝0，…，8，u＝0，…，7.(2)

The Gabor of human face expression image I (z) characterizes | o (z) | _{U, v}Can be produced by the convolution algorithm of this image and Gabor wave filter, that is:

{| o (z) |}_{u, v} = \sqrt{{(Re {(o (z))}_{u, v})}^{2} + {(Im {(o (z))}_{u, v})}^{2}},

Re(o(z)) _u，v＝I(z)*Re(Ψ _u，v(z))，

Im(o(z)) _u，v＝I(z)*Im(Ψ _u，v(z)). (3)

Will | o (z) | _{U, v}Be converted into column vector O _{U, u}, then with O _{U, v}Be connected successively with direction by yardstick, form a column vector G (I), that is:

G(I)＝O＝(O _0，0O _0，1…O _8，7). (4)

Because a plurality of directions and yardstick are arranged, the final Gabor intrinsic dimensionality that forms will have very high redundance up to 48 * 48 * 9 * 8=165888 dimension.

Embodiment one, in conjunction with the identification of the human face expression of Gabor feature and Gentleboost

Want and to analyze the little expression in the video, at first want and to identify the generic expression in the static images.Embodiment one at first goes up at human face expression database (claiming CK again) the human face expression recognition performance of algorithm is assessed.

This database comprises 100 university students (the range of age: 6 kinds of basic facial expression videos 18-30 year).Wherein, 65% model is the women, and 15% model is Black people, and 3% model is Asian or Latin descendants.The video capture mode is: the model is performed the expression action of regulation as requested, and video camera carries out record with simulation S terminal (S-video) signal to tested front face expression.Video is finally stored with 8 gray level image forms of 640 * 480.

In embodiment one, select for use neutrality expression and one or two 6 kinds of basic facial expression images that are in summit phase place (peak) in 374 sections videos of 97 models wherein to form data set.The expression picture quantity of finally selecting for use is 518.In order to test the extensive performance of algorithm, adopt 10 folding crosschecks (10-fold cross validation) that algorithm performance is assessed.

At first adopt Gentleboost as sorter (classifier).Gentleboost is a kind of council machine, and it adopts the method for the principle (principle of divide and conquer) of dividing and rule to solve complicated learning tasks.It has simulated the process of human colony's decision-making, namely seeks ' expert ' and obtains knowledge about problem, and by allowing the mode of these experts' ballots, form the final description to problem.Gentleboost is converted into strong learning model (strong learning model) by changing the distribution of sample with a weak learning model (weak learning model).Therefore, the Gentleboost algorithm can be applicable to the classification problem of this class complexity of Expression Recognition, and has outstanding performance.Because the Gentleboost algorithm is implying feature selection process when training, and the Gabor that adopts sign has very high intrinsic dimensionality, therefore Gentleboost will select very few number feature to be used for last classification, the final sorter that forms should be able to have very high-class speed and extensive performance.

The reason of the Adaboost that selects Gentleboost rather than use always more is that the researcher has proved that the speed of Gentleboost convergence is faster, and the accuracy of Gentleboost is higher for object identification.In embodiment one, adopt with mutual information (mutual information, MI) and the changeable weight cutting (dynamical weight trimming, DWT) Gentleboost that crosses of method improvement carries out Expression Recognition.

Because the Gabor feature that adopts is high redundancy, therefore, adopt the method for mutual information to remove information redundancy between the Weak Classifier that is chosen to here, to reject invalid Weak Classifier, promote the performance of Gentleboost.

Mutual information is a kind of tolerance to correlativity between two stochastic variables.The mutual information of two stochastic variable X and Y may be defined as:

I(X；Y)＝H(X)+H(Y)-H(X，Y)＝H(X)-H(X|Y)＝H(Y)-H(Y|X).(5)

If stochastic variable is for what disperse, then entropy H (X) may be defined as:

H(X)＝-∑p(x)lg(p(x)) (6)

Supposing has had T Weak Classifier { h when the T+1 of training wheel _{V (1)}, h _{V (2)}... h _{V (T)}Be selected, then calculate alternative sorter and selected that the function of maximum MI may be defined as between the sorter:

R (h_{j}) = \max_{t = 1,2, . . ., T} MI (h_{j}, h_{v (t)}) . - - - (7)

MI (h wherein _j, h _{V (t)}) be stochastic variable h _jAnd h _{V (g)}Between MI.

By with R (h _j) compare to judge with a certain predetermined threshold value TMI (threshold of mutual information) whether this Weak Classifier is abandoned, and is not comprised by the set of being selected Weak Classifier to form with the information that guarantees the Weak Classifier that new training obtains when training.If R (h _j)＜TMI illustrates that then this Weak Classifier is effective, and with among it adding Weak Classifier set, otherwise then to be abandoned, algorithm will be selected a new Weak Classifier from alternative Weak Classifier set, till qualified Weak Classifier is selected.If no Weak Classifier can be eligible, then training stops.In said process, except considering the MI of Weak Classifier, algorithm also needs the performance of Weak Classifier is taken into account.Take turns when selecting at each, algorithm is selected is to have minimum weight square error (weighted-squared error) and meet the Weak Classifier of MI condition at training set.

The changeable weight cutting

After adding MI, owing to select the extra cycle index that increases algorithm of Weak Classifier meeting according to MI, so the training time of algorithm can make the problem of training speed become more serious than also long before unmodified.As previously mentioned, the algorithm training time is long, at this DWT is extended to Gentleboost.This method still is referred to as DWT.

Each is taken turns in the iterative process at algorithm, and the sample in the training set will filter according to its sample weights.If sample weights w _i＜t (β) then this sample will be rejected from current training set.T (β) is the β th percentile that the current sample weights of current circulation distributes.

Weak Classifier and multicategory classification problem

Adopting regression stump is the Weak Classifier of Gentleboost, namely has:

h _t(x)＝aδ(x ^f＞θ)+b. (8)

X in the following formula ^fBe f ' the th feature of proper vector x, θ is threshold value, and δ is indication function (indicator function), and a and b are regression parameters.Wherein

b = \frac{Σ_{i} w_{i} y_{i} δ (x^{f} \leq θ)}{Σ_{i} w_{i} δ (x^{f} \leq θ)}, - - - (9)

a + b = \frac{Σ_{i} w_{i} y_{i} δ (x^{f} > θ)}{Σ_{i} w_{i} δ (x^{f} > θ)} . - - - (10)

Y is the class label (± 1) of sample in formula (9) and (10).

The method of the one-to-many of Gentleboost algorithm can be regarded the Adaboost.MH algorithm under a kind of specified conditions as.For this multicategory classification problem of Expression Recognition, adopt the method for one-to-many to be achieved, be about to the sample of a certain classification as the positive class in the training, and the sample of other all categories is as the negative class in the training, determine the affiliated classification of sample at last by ballot, namely have:

F (x) = \arg \max_{l} H (x, l), l = 1,2, . . ., K - - - (11)

L is the class label of sample in the following formula, and (x l) is the discriminant equation of two classes classification sorter to H, and K is the quantity of the classification that will classify.

Experiment has contrasted the performance of Gentleboost algorithm under the various conditions that open and close MI and DWT respectively.Table 1 has been listed accuracy rate and the training time of each algorithm.

Expression Recognition accuracy rate and the training time of the various Gentleboost algorithms of table 1

In various Gentleboost algorithms, have the highest accuracy rate (88.61%) based on the Gentleboost of MI and DWT, but such accuracy rate is satisfactory not enough for little Expression Recognition.

The result shows that the combination of MI+DWT has promoted the training speed of Gentleboost, and training speed has risen to 1.44 times of original Gentleboost, can save the closely training time of half approximately.This effect is brought by DWT.It should be noted that the difference according to problem, the acceleration effect of DWT should be different.Simple question is got at the classification interface, and the acceleration effect of DWT should be just more good.Adopt higher threshold value can bring training speed faster, but also can reduce the accuracy rate of sorter simultaneously.When using DWT, the researcher need directly make balance in speed and accuracy.The parameter that adopts is set to β=0.1, and this should be able to guarantee that the Gentleboost algorithm all has higher performance and training speed faster in the face of most classification problem the time.

Embodiment two, in conjunction with the identification of the human face expression of Gabor feature and GentleSVM

SVM is a kind of general feed-forward type neural network, and it sets up a lineoid (hyper-plane) as the decision-making curved surface by minimizing structure risk, makes that the interval (margin) between positive example and the counter-example maximizes.For Expression Recognition, the performance of SVM and Gentleboost is close, all has the peak performance in this field.

Select with the Gabor feature of Gentleboost, SVM then trains to form final sorter in the new sign that forms through feature selecting.In this research, this combination is referred to as GentleSVM.

The data set that embodiment two adopts is identical with embodiment one.In order to test the extensive performance of algorithm, adopt 10 folding crosschecks that algorithm performance is assessed.

After the training of Gentleboost was finished, the Gabor feature that the Weak Classifier that Gentleboost is selected adopts reconnected after removing redundancy feature (rejecting duplicate keys), and the new Gabor that forms human face expression characterizes.And SVM will train in this new sign.SVM is a kind of two class sorters, for this multiclass pattern recognition problem of Expression Recognition, can realize by the combination that is decomposed into a plurality of two class problems.Adopt one-to-many manner to realize the multi-class classification problem of SVM among the embodiment two.Specifically have:

F (x) = \arg \max_{i} \frac{({(w_{i})}^{T} φ (x) + b_{i})}{| | w_{i} | |}, i = 1,2, . . ., K - - - (12)

I is the sample class label in the following formula, and w is weight vectors, and φ (x) is proper vector, and K is the quantity of classification.

Consider when adopting Adaboost to carry out feature selecting, the performance of linear SVM and non-linear SVM is very approaching, and the classification speed of linear SVM adopts the 1 norm linear SVM in soft interval (1-norm soft margin linear SVM) as sorter in embodiment two far faster than nonlinear SVM.

Embodiment 2 has contrasted the performance of the Expression Recognition of various GentleSVM algorithm combination and original SVM.The result is shown in table 2 and table 3.

Expression Recognition accuracy rate and the training time of the various GentleSVM algorithms of table 2

The recognition accuracy of all kinds of expressions of table 3

As shown in table 2, the accuracy rate of all GentleSVM algorithm combination has all surpassed the accuracy rate of improved Gentleboost among original SVM and the embodiment one.In addition, all GentleSVM algorithm combination have all just been finished 10 times training within 20 seconds, this means after the training of finishing Gentleboost, and system only need pay the little time cost and just performance further can be improved.In all GentleSVM algorithm combination, the accuracy rate the highest (92.66%) of MI+DWT combination Expression Recognition.Combine with the result of embodiment one and to see that The above results explanation MI+DWT combination has promoted the performance when Gentleboost is used for classification with feature selecting effectively.

As shown in table 3, all GentleSVM algorithm combination, and SVM all can be identified surprised, detest, happy and neutral expression very exactly.For sad, indignation and frightened, above-mentioned algorithm is all than difficult identification.But, except SVM, all GentleSVM combination for the accuracy rate of above-mentioned 3 kinds of Expression Recognition all more than 80%.For a full automatic Expression Recognition system, such accuracy rate is within the acceptable scope.

Embodiment three, based on automatic little Expression Recognition of Gabor feature

On the basis of embodiment two, in embodiment three, make up the automatic little Expression Recognition based on the Gabor feature.

For the extensive performance of Hoisting System, collected a new training set and come system is trained.This training set comprises 1135 expression pictures altogether.Wherein, 518 pictures are selected from aforesaid CK database, and 146 are selected from MMI expression database, 185 are selected from the FG-NET database, and 143 are selected from the JAFEE database, and 63 are selected from the STOIC database, 54 are selected from the POFA database, are downloaded voluntarily by network for other 26.

Comprise altogether among the METT expressed sad, surprised, angry, detest, 56 sections little expression videos altogether of 7 kinds of basic facial expressions such as frightened, happy, contempt.Selecting totally 48 sections little expression videos of wherein having expressed preceding 6 kinds of basic facial expressions for use is that test set is assessed system performance.The performance index that are used for system evaluation comprise: little expression is caught accuracy rate, and namely system will correctly point out to have in the video do not have little expression, and several little expressions are arranged; Little expression is re-recognized accuracy rate, and namely system also will correctly identify the expression classification of little expression except will correctly catching little expression.

In order to handle from the difference of difference expression database table feelings image on illumination level, in little Expression Recognition system, added extra preprocessing process.Except carrying out the preprocessing process among the embodiment 1 and 2, also select the gray scale of image is carried out an extra normalized, namely the gradation of image average is normalized to 0, and the variance of gradation of image is normalized to 1.

For little expression is identified, at first the algorithm that obtains with embodiment 2 is identified each two field picture in the video, obtains the output to the expression label of this video.After this, will scan this label output of expressing one's feelings, to confirm the turning point of expression conversion in the video.After this, system will measure the duration of expression according to the frame per second of the turning point that obtains and video.For example, the frame per second of supposing one section video was 30 frame/seconds, and its expression label is output as 111222, and then the expression conversion turning point of this video is mid point and the last frame of first frame, label 1 → 2.1 and 2 duration is 1/10s so express one's feelings in this embodiment.

After this, system will extract little expression and affiliated label thereof according to little expression definition.Wherein, system will be that expression between 1/25s to 1/5s is thought of as little expression the duration.The expression of duration greater than 1/5s will be identified as generic expression, and be ignored.

At present, consider that the ubiquity of the expression of contempt is not so good as other 6 kinds of basic facial expressions (Frank et al, 2009), 6 kinds of little expressions of present training system identification (sad, surprised, angry, detest, frightened, happy).

Present embodiment compares the performance when using different training sets with different preprocess method, if training set is CK, and do not carry out extra pre-service, the recognition accuracy of system on METT is quite low, only have only 50%, this in addition be lower than and do not have the tested achievement on the METT before measurement of the trained mankind.When training set was replaced by new training set, the accuracy rate of catching of system had risen to 91.67%, and the recognition accuracy of this moment is 81.25%.If system also carries out aforesaid additional pre-treatment process simultaneously, the catching accuracy rate and can rise to 95.83% of system then, and the recognition accuracy of this moment can reach 85.42%.This achievement is better than the tested achievement in survey behind the METT of the trained mankind and (is about 70%-80%, sees Frank et al, 2009; Russell et al, 2006).The above results has pointed out that representative big training sample is for the importance of automatic little Expression Recognition.Consider that when system uses new training set additional pre-treatment is about 4% to the contribution degree of system, the The above results prompting, the representativeness of sample may be more important for little Expression Recognition, uses the complicated pretreatment method can bring limited performance boost.

Table 4 system is to the identification achievement of little expression of different ethnic groups

Automatic little Expression Recognition based on the Gabor+EOH composite character

Tentatively obtained the automatic little Expression Recognition system that an energy is analyzed little expression in the video by above-described embodiment.But from aforementioned result, the accuracy rate of this system can also be further enhanced.One of them major issue is exactly the problem of small sample.Consider that at present the researcher in Expression Recognition field mostly is the west researcher greatly automatically, what comprise in their the used expression database mostly is white man greatly, and therefore, what want to obtain enough sample sizes has enough representational training set and be not easy.

How in the accuracy rate that only has boosting algorithm under the small sampling condition? this is the problem that faces.Address this problem and to set about from the used sign of change system.Good sign can make that the between class distance between different concepts is big, and distance is little in the class in the concept, and simultaneously, good sign also has higher degrees of tolerance for error.In brief, good sign can also can be described the classification interface of problem preferably under the small sample situation.

EOH (Edge Orientation Histogram) can provide such characteristic.EOH extracts be in the image about the information of image border and direction thereof, namely relevant for the information of picture shape, insensitive for the image overall illumination variation, insensitive for displacement and the rotation of small scale.This feature extracting method also has physiological basis.For example, the researcher finds, the receptive field of cell has significant directional sensitivity under the primary vision cortex, single neuron is only made a response to the stimulation that is in its receptive field, be that single neuron only presents stronger reaction to the information of a certain frequency range, as characteristics of image such as the edge of specific direction, line segment, stripeds.EOH can regard the simulation to this characteristic as.As previously mentioned, adopt the visual information characterizing method with Basic of Biology can help the researcher to construct computer vision system better.Simultaneously, research has proved also that EOH can successfully extract in people's face to distinguish and has laughed at and the feature of ridiculing.The more important thing is that the researcher finds that EOH also can provide outstanding performance under the less situation of the sample size of training set.The extraction of Gabor+EOH composite character

For the Gabor feature extraction, as previously mentioned.Merge as follows for EOH feature extraction and feature:

At first image is carried out edge extracting, use the Sobel operator to carry out edge extracting at this.Wherein, image I point (x, gradient y) can be by the convolution of the Sobel operator of respective direction and image and is got, and namely has:

G _x(x，y)＝Sobel _x*I(x，y)

G _y(x，y)＝Sobel _y*I(x，y) (13)

Wherein:

{Sobel}_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}],

{Sobel}_{y} = [\begin{matrix} - 1 & - 2 & - 1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}] - - - (14)

Then image I point (x, edge strength y) is:

G (x, y) = \sqrt{G_{x} {(x, y)}^{2} + G_{y} {(x, y)}^{2}} - - - (15)

For the noise in the marginal information is filtered, we have

G^{'} (x, y) = \{\begin{matrix} G (x, y) & G (x, y) &GreaterEqual; T \\ 0 & G (x, y) < T \end{matrix} - - - (16)

Image I point (x, edge direction y) may be defined as:

θ (x, y) = \arctan (\frac{G_{y} (x, y)}{G_{x} (x, y)}) - - - (17)

If with edge direction be divided into K the interval, then image I point (x, edge orientation histogram y) can be calculated as:

Then, the edge of image integrogram can be calculated as:

Wherein R is arbitrary region in the image I.

Because the EOH intrinsic dimensionality is very high, when carrying out the EOH feature extraction, select the people's face that captures is zoomed to 24 * 24 pixel sizes.In addition, only adopt the ratio feature, namely have:

A_{k_{1}, k_{2}} (R) = \frac{E_{k_{1}} (R) + ϵ}{E_{k_{2}} (R) + ϵ} - - - (20)

Wherein ε is smoothing factor.

After finishing Gabor and EOH feature extraction, namely carry out feature and merge.The mode that EOH feature and Gabor is same is converted into a column vector, after this, itself and Gabor feature is connect, and forms a new column vector, carries out the feature fusion, namely has:

F＝{f ^Gabor，f ^EOH} (21)

Below all use said method to carry out feature extraction.

Embodiment four, based on the Expression Recognition of Gabor+EOH composite character

The data set that embodiment 4 adopts, method for detecting human face and preprocess method are identical with embodiment 1 and 2.In order to test the extensive performance of algorithm, adopt 10 folding crosschecks that algorithm performance is assessed.

Sorter among the embodiment 4 adopts Gentleboost and the GentleSVM that obtains among embodiment 1 and the embodiment 2 respectively, the performance of system when adopting different sorter to investigate.

Wherein, algorithm is made amendment, removed error rate and checked and condition namely to have become WT by DWT, the threshold value that sample filters can not change, and can be owing to the too high and training again of error rate.It seems that from experimental result the accuracy rate of Gentleboost identification is quite low when using the Gabor+EOH composite character, has only about 50%, even is lower than the situation of only using the Gabor feature.This explanation is under Gabor+EOH composite character situation, and it is inapplicable identifying with Gentleboost.

The recognition result of GentleSVM is subjected to the influence of Gentleboost feature selecting less (accuracy rate is 85%-88%), this mainly is because the feature that the Gentleboost of GenteSVM selects has been carried out the feature fusion, to connect at the feature that the difference expression is selected, and removed redundancy feature wherein.Yet the recognition accuracy of two groups of GentleSVM all is lower than the situation when only using the Gabor feature at this moment.This result illustrates that again Gentleboost had lost efficacy under the situation of mixing Gabor and EOH feature.

Embodiment five, based on the Gentleboost of average error grade

In embodiment 5, adopt the method for average error grade that Gentleboost is further improved, the algorithm that newly obtains is called AvgGentleboost.This method will solve and adopt Gabor and EOH to mix sign, the problem that the Gentleboost algorithm can lose efficacy.Consideration is in the t of iteration wheel, and Gentleboost trains set A at certain _tGone up and obtained N Weak Classifier h _{I, t}, i=1,2 ..., N simultaneously, establishes h _{I, t}At training set A _tOn weighted error be E _{I, t}, the weighted quadratic error is ε _{I, t}, and E is arranged _t={ E _{I, t}| i=1,2 ..., N}, ε _t={ ε _{I, t}| i=1,2 ..., N}, then:

r _1，i，t＝Rank(E _t，E _i，t)，

r _2，i，t＝Rank(ε _t，ε _i，t). (22)

Function R ank in the following formula (x y) is a ranking functions, expression be after variable x is done ascending sort, variable y is residing position in this is arranged.

With Weak Classifier h _{I, T}The average error tier definition be:

R _i，t＝(r _1，i，t+r _2，i，t)/2. (23)

Then take turns in the iteration Weak Classifier that the Gentleboost selection has the minimum average B configuration grade of errors at t.That is:

j = \arg \min_{i} R_{i, t}, i = 1,2, . . ., N,

h _t＝h _j，t (24)

If consider to use MI in training, what then Gentleboost selected is that first meets the Weak Classifier of MI filtercondition in tabulating with the alternative Weak Classifier that the average error grade obtains with ascending sort.

The data set that embodiment 5 adopts, method for detecting human face and preprocess method are identical with embodiment 4.In order to test the extensive performance of algorithm, adopt 10 folding crosschecks that algorithm performance is assessed.

Adopt Gabor+EOH and Gabor feature among the embodiment 5 respectively, to compare the performance of system under the different characteristic extracting method.

Sorter among the embodiment 5 is the AvgGentleboost and the AvgGentleSVM that newly obtain, comparing with embodiment 4, and investigates the performance of system when adopting different sorter.

When adopting Gabor to characterize, the performance of AvgGentleboost and Gentleboost is very approaching, and both are indifference almost, and just when adopting the feature negligible amounts, AvgGentleboost has shown the performance that is higher than Gentleboost.This result shows, in Gentleboost Algorithm Error Method for minimization also in the applicable scope, and performance and the Gentleboost of AvgGentleboost close (comparable).When characteristic manner changed Gabor+EOH mixing sign into, the performance of AvgGentleboost will be higher than Gentleboost far away.The result shows the validity of AvgGentleboost.When sorter is AvgGentleboost, algorithm performance when adopting Gabor+EOH to characterize is better than when only using the Gabor feature, simultaneously, the performance of the combination of AvgGentleboost+Gabor+EOH also is better than the performance of Gentleboost+Gabor combination, has illustrated that again the algorithm that newly obtains is for the validity of Expression Recognition problem.

The best Expression Recognition accuracy rate that the various Gentleboost algorithm combination of table 5.1 obtain

The best Expression Recognition accuracy rate that the various GentleSVM algorithm combination of table 5.2 obtain

Under various conditions, the performance of AvgGentleSVM and GentleSVM is all more approaching.Wherein, when using Gabor to characterize, AvgGentleSVM is almost completely identical with the performance of GentleSVM.When the sign that AvgGentleSVM is used was replaced by Gabor+EOH, the accuracy rate of algorithm is the highest can to reach 94.01%, the situation when only using Gabor to characterize, and the difference between this moment sorter mainly is present in the more situation of use characteristic.When characteristic manner was Gabor+EOH, the performance of AvgGentleSVM will be higher than GentleSVM, and the validity of AvgGentleboost on feature selecting has been described.

Based on the above results, can obtain a conclusion: with respect to Gentleboost, the performance of AvgGentleboost on classification and feature selecting all is improved.Yet experimental result shows that for the algorithm that we obtain among the embodiment 2, the lifting of the algorithm that embodiment 5 obtains on accuracy rate is also more restricted, and the space of further lifting is still arranged.

Embodiment six, based on the pre-filtered AvgGentleboost of feature

When practical application, the scope of the slow excessively serious limiting parameter optimizing of training speed meeting and the size of the actual training set that uses.Want Gabor and EOH mixing characterizing method are actually used in little Expression Recognition, the slow excessively problem of Gentleboost algorithm training speed necessarily need be resolved.Gentleboost algorithm training speed reason slowly is because Gentleboost need carry out exhaustive search when the training Weak Classifier, to select optimum Weak Classifier.The time complexity of this algorithm training method is O (NMT log N), and wherein, N is the quantity of the sample that comprises in the training set, and M is the intrinsic dimensionality of sample, the quantity of the Weak Classifier that T will obtain for training.Can see that when the value of any one variable among N, M or the T is big, the training time of algorithm just will become very long.If when wherein both or three were bigger value, then the algorithm training speed just might become the people can't be accepted.

Since before the method for the decomposition variable M that proposes of researcher can't make up with existing AvgGentleboost, therefore, in this research, proposition comes variable M is decomposed with the pre-filtered method of feature, with the training process of this further accelerating algorithm.This method and DWT are made up, can be optimized variable N and M simultaneously, make algorithm under the high-dimensional situation of large sample, also can train fast.

Before carrying out the Gentleboost iteration, sample characteristics is carried out pre-filtering.Using the Gentleboost algorithm is Weak Classifier of each features training at whole training samples.If the error rate of this Weak Classifier on training set more than or equal to pre-set threshold α, then abandoned the corresponding feature of this Weak Classifier, otherwise then kept.At last, all features that kept are connect, formed new sign, formed new training set.After this AvgGentleboost will train at this new training set.We are called PreAvgGentleboost with the two combination.

The data set that embodiment 6 adopts, method for detecting human face and preprocess method are identical with embodiment 5.In order to test the extensive performance of algorithm, adopt 10 folding crosschecks that algorithm performance is assessed.

We adopt the Gabor+EOH composite character among the embodiment 6, but adopt different parameters to carry out feature extraction respectively, to investigate at different parameters the performance of system down are set.

Sorter among the embodiment 6 is AvgGentleboost and AvgGentleSVM, and the PreAvgGentleboost that newly obtains and PreAvgGentleSVM, with the pre-filtered validity of verification characteristics.

The EOH feature extraction parameter that embodiment 6 adopts is:

K＝{4，6，8}，θ∈[-π，π)or[-π/2，π/2)，T＝100，ε＝0.01 (25)

Consideration is in training set, and the positive class sample size of the expression of a certain classification will be far smaller than its negative class quantity, then there is a strategy the simplest in this two classes sample classification, is about to all samples and is divided into negative class.If the accuracy rate of the Weak Classifier that obtains in the feature pre-filtering is less than or equal to above-mentioned accuracy rate, then may illustrate the used feature of this Weak Classifier can't align class preferably and negative class is made differentiation.Therefore, in the present embodiment, feature pre-filtering parameter is α=m/N.Wherein, m is the positive class sample size of a certain class expression, and N is the quantity of the training examples in the training set.

Preliminary experiment is the result show, the training speed of AvgGentleboost is greatly improved, and at K=4, [π, π) under the condition, the training time shortened to 10 hours by original nearly 20 days to θ ∈.The best Expression Recognition accuracy rate that feature pre-filtering algorithm obtains under the different parameters is shown in table 6.1:

The best Expression Recognition accuracy rate that feature pre-filtering algorithm obtains under table 6.1 different parameters

Can see that from last table feature pre-filtering method can significantly reduce the feature quantity for training, filter about 97% feature, thereby reduce the computation complexity when training greatly.Yet experimental result shows, directly with the method the Gabor+EOH feature is filtered, can be when reducing the training computation complexity, and make the nicety of grading of Gentleboost and two kinds of sorters of GentlebSVM be subjected to infringement in various degree.The accuracy rate of these two kinds of sorters all can be reduced to than not adding before the EOH feature also low level.Wherein, adopt the condition of the EOH of the edge gradient that symbol is arranged all to be better than the condition of the EOH of signless edge gradient.For Gentleboost, the EOH that adopts 6 directions is best parameter, 8 directions during with 4 directions the result similar.For GentleSVM, the direction quantity that the EOH feature has is more many, and the injured degree of its performance is just more high, when adopting the EOH of 8 directions, and its performance even be lower than the Gentleboost that adopts identical parameters.

Such result occurring should be because feature pre-filtering algorithm has been removed too many feature and caused that this greatly reduces the redundant degree of feature.As everyone knows, the Boosting algorithm has higher performance easily when processing has the sample of feature of high redundancy.Therefore, directly the Gabor+EOH feature is carried out the feature filtration and can destroy the needed condition of work of our employed feature selecting algorithm.

So how keep the characteristic of Gabor+EOH feature height redundancy to make training can accelerate and not damage the performance of sorter again simultaneously? consider that Gabor is characterized as 160,000 dimensions, intrinsic dimensionality still within acceptable scope, thereby the training time longly mainly caused the dimension calamity to cause owing to the EOH intrinsic dimensionality is too high.In conjunction with the result of last table and the result of example 5, we think, add the significantly performance of boosting algorithm of how finer EOH feature, and this is likely because a lot of EOH feature is invalid for Expression Recognition.Therefore, if keep complete Gabor feature, only the EOH feature is filtered, just might realize that we are the accelerating algorithm training process, keep the target of sorter nicety of grading again.

More than Biao experimental result is the basis, we have selected the more significant parameter of part, carried out new experiment, in this experiment, we select only the EOH parameter to be filtered, and we are referred to as PreAvgGentleboost_EOHfilter and PreAvgGentleSVM_EOHfilter this algorithm.The best Expression Recognition accuracy rate that PreAvgGentleboost_EOHfilter obtains under table 6.2 different parameters

According to the result of table 6.2, we can see that when adopting 6 directions that the EOH feature of symbol is arranged, there is bigger lifting in performance the best of PreAvgGentleboost_EOHfilter with respect to the result in the table 6.1.Under two seed ginseng said conditions of table 6.2, the highest recognition accuracy of this algorithm all a little less than the PreAvgGentleboost in the example 5 in following the highest 91.31% the accuracy rate that obtains of Gabor+EOH characteristic condition, yet its computation complexity but greatly reduces.The best Expression Recognition accuracy rate that PreAvgGentleboost_EOHfilter obtains under table 6.3 different parameters

According to the result of table 6.3, we can see that performance the best of PreAvgGentleSVM_EOHfilter is much better than the condition that 6 directions have symbol when adopting 4 directions that the EOH feature of symbol is arranged.Under two seed ginseng said conditions of table 3, the performance of PreAvgGentleSVM_EOHfilter all is better than PreAvgGentleboost_EOHfilter.So far, in our 6 examples, the performance of GentleSVM is the conforming Gentleboost that has been higher than all.Therefore we have reason to believe that for Expression Recognition the performance of GentleSVM is better than Gentleboost.

In table 6.3, the best accuracy rate similar (94.01%) that AvgGentleSVM obtains under the Gabor+EOH characteristic condition in the optimum (94.4%) that employing (4,2) parameter obtains and the experiment 5.Yet, computation complexity when the PreAvgGentleSVM_EOHfilter of employing identical parameters trains is compared AvgGentleSVM and has been reduced 74.08%, training time shortens greatly, and this makes one of structure become possibility based on the little Expression Recognition system of Gabor+EOH feature.In example 7, we will be with one of this algorithm actual configuration little Expression Recognition system based on the Gabor+EOH feature.

Embodiment seven, based on automatic little Expression Recognition of Gabor+EOH composite character

On the basis of embodiment 6, make up the automatic little Expression Recognition system based on the Gabor+EOH composite character.To use two training sets among the embodiment 7.The new training set that obtains among one of them training set and the embodiment 3 is identical.Be referred to as training set 1 in the present embodiment.The new training set that 977 Asian's facial expression images that another training set extracts from existing human face expression database for the basis at training set 1 adds form is referred to as training set 2.By using different training sets, in embodiment 7, investigate the training set sample size for the influence of system performance.

The test collection that embodiment 7 adopts is identical with embodiment 3.

People's face detection algorithm of embodiment 7 is identical with embodiment 6.Gabor feature extraction preprocess method is identical with embodiment 3.EOH feature extraction preprocess method is identical with embodiment 6.Feature extracting method is with embodiment 5.Sorter among the embodiment 7 is GentleSVM, AvgGentleSVM, and the PreAvgGentleSVM_EOHfilter that newly obtains, the performance when using different sorter with the investigation system.Experimental comparison result such as following table that embodiment 7 carries out.

The little expression acquisition performance of table 7.1 algorithms of different under different parameters relatively

The little Expression Recognition performance of table 7.2 algorithms of different under different parameters relatively

From table 7.1, can see, for little expression is caught, when using the less training set 1 of sample size, little expression of AvgGentleSVM-Gabor is caught accuracy rate and is better than GentleSVM-Gabor, and this moment, PreAvgGentleSVM_EOHfilter can reach mxm. 100% when adopting more feature, was better than AvgGentleSVM-Gabor.When training set is replaced by the training set 2 that has multisample more, GentleSVM-Gabor and AvgGentleSVM-Gabor catch accuracy rate and can further promote, and under certain parameter, all can reach 100% of ceiling, and the PreAvgGentleSVM_EOHfilter of this moment has almost all reached 100% under each parameter.This presentation of results PreAvgGentleSVM_EOHfilter carries out the performance that little expression catches will be higher than its two kinds of algorithms, and the modification that we carry out for little expression is caught is effective.

From table 7.2, can see, for little Expression Recognition, when using the less training set 1 of sample size, little Expression Recognition accuracy rate of AvgGentleSVM-Gabor basic with the GentleSVM-Gabor indifference, only when the more feature of employing, slightly be better than GentleSVM-Gabor.PreAvgGentleSVM_EOHfilter can reach the highest 87.5% recognition accuracy under this condition, be better than AvgGentleSVM-Gabor.

When training set was replaced by the training set 2 that has multisample more, the recognition accuracy of GentleSVM-Gabor and AvgGentleSVM-Gabor can further promote, but the amplitude that promotes is limited, has all only promoted 2%, this moment two kinds of algorithms the performance indifference.Yet, after having changed the training set that has multisample more, the recognition accuracy of PreAvgGentleSVM_EOHfilter but can obtain lifting by a larger margin, can reach 91.67% recognition accuracy, is much better than the tested average level of the trained mankind (about 80%).

It should be noted that at last, above embodiment is only in order to describe technical scheme of the present invention rather than the present technique method is limited, the present invention can extend to other modification, variation, application and embodiment on using, and therefore thinks that all such modifications, variation, application, embodiment are in spirit of the present invention and teachings.

Claims

1. automatic little expression recognition method comprises:

Human face region in the two field picture of step 10), capturing video, and carry out pre-service;

Step 20), the image to corresponding human face region extracts Gabor feature and EOH feature;

Step 30), individual features is merged, obtain the final sign of target video; By the sorter of training gained, obtain the expression sequence label of each frame video image;

Step 40), this expression sequence label is scanned, judge the duration of expression, according to little expression of obtaining, output expression classification.

2. method according to claim 1, wherein, step 10 comprises: the facial image that captures will at first be converted into 8 gray level images; With the secondary linear interpolation method with image normalization to 48 * 48 pixel sizes.

3. method according to claim 1, wherein, step 20 comprises:

Adopt the two-dimensional Gabor filter group that the facial image that captures is carried out feature extraction, characterize with the Gabor that forms human face expression; Two-dimensional Gabor filter is one group of plane wave with Gaussian envelope, and its Gabor nuclear is defined as:

Ψ_{u, v} (z) = \frac{{| | k_{u, v} | |}^{2}}{σ^{2}} e^{(- {| | k_{u, v} | |}^{2} {| | z | |}^{2} / 2 σ^{2})} (e^{i k_{u, v} z} - e^{- σ^{2} / 2}) .,

Wherein

k_{u, v} = k_{v} e^{i φ_{u}},

k_{v} = \frac{k_{\max}}{f^{v}},

K _MaxBe maximum frequency, f is the interval factor between the Gabor nuclear in the frequency domain,

φ _u[0, π), (parameters u and v represent direction and the yardstick of control Gabor wave filter respectively to z=to ∈ for x, y) expression position.

4. method according to claim 3, wherein, step 20 also comprises: image is carried out edge extracting, use the Sobel operator to carry out edge extracting at this, wherein image is obtained by the Sobel operator of respective direction and the convolution of image in the gradient of certain point, selects the people's face that captures is zoomed to 24 * 24 pixel sizes.

5. method according to claim 4, wherein, step 20 also comprises: after finishing Gabor and EOH feature extraction, EOH feature and Gabor are converted into a column vector, and connect with the Gabor feature, forms a new column vector, carry out the feature fusion.

6. method according to claim 5, wherein, step 30 comprises:

Adopt Gentleboost as sorter, and adopt the method for mutual information to remove information redundancy between the Weak Classifier that is chosen to, to reject invalid Weak Classifier; Further,

Take turns when selecting at each, select to reject at training set and have the minimum weight square error and meet the Weak Classifier of MI condition; Further,

Sample in the training set will filter according to its sample weights.

7. method according to claim 5, wherein, the training step of step 30 also comprises:

Fusion feature with the Gabor of PreAvgGentleboost_EOHfilter and EOH is selected, and selects SVM feed-forward type neural network to train to form sorter in the new sign that forms through feature selecting; Wherein,

After the training of PreAvgGentleboost_EOHfilter was finished, the feature that its Weak Classifier of selecting is adopted reconnected after removing redundancy feature, forms the new sign to human face expression, and SVM will train in this new sign.

8. method according to claim 5, wherein, the training step of step 30 also comprises: for Gentleboost, select the Weak Classifier with minimum average B configuration grade of errors in the training iteration.

9. method according to claim 5, wherein, the pre-service of step 10 also comprises the gradation of image average is normalized to 0, and the variance of gradation of image is normalized to 1.

10. method according to claim 1, wherein, step 40 comprises:

Step 410), each two field picture in the video is identified, obtain expression label to this video;

Step 420), will scan this label output of expressing one's feelings, to confirm the turning point of expression conversion in the video;

Step 430), the turning point that obtains and the frame per second of video were measured the duration of expression, according to little expression definition little expression and affiliated label thereof are extracted.

11. according to claim 6,7 or 8 method, wherein, the training step of step 30 also comprises:

Before carrying out Gentleboost training iteration, sample characteristics is carried out pre-filtering;

Using the Gentleboost algorithm is Weak Classifier of each features training at whole training samples, and the error rate of this Weak Classifier on training set then abandoned the corresponding feature of this Weak Classifier more than or equal to pre-set threshold;

The feature of withing a hook at the end is connect, formed new sign, formed new training set;

For Gabor and EOH composite character, only filter at the EOH feature.