CN102831447A

CN102831447A - Method for identifying multi-class facial expressions at high precision

Info

Publication number: CN102831447A
Application number: CN2012103144354A
Authority: CN
Inventors: 罗森林; 谢尔曼; 潘丽敏
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2012-08-30
Filing date: 2012-08-30
Publication date: 2012-12-19
Anticipated expiration: 2032-08-30
Also published as: CN102831447B

Abstract

The invention relates to a method for identifying multi-class facial expressions at a high precision based on Haar-like features, which belongs to the technical field of computer science and graphic image process. Firstly, the high-accuracy face detection is achieved by using the Haar-like features and a series-wound face detection classifier; further, the feature selection is carried out on the high-dimension Haar-like feature by using the Ada Boost. MH algorithm; and finally, the expression classifier training is carried out by using the Random Forest algorithm to complete the expression identification. Compared with the prior art, the method can reduce training and identifying time while increasing the multi-class expression identification rate, and can implement the parallelization conveniently to increase the identification rate and meet the requirement of real-time processing and mobile computing. The method can identify the static image and the dynamic image at a high precision, is not only applicable to the desktop computer but also to the mobile computing platforms, such as cellphone, tablet personal computer and the like.

Description

The high-precision recognition methods of multi-class facial expression

Technical field

The present invention relates to a kind of high-precision recognition methods of multi-class facial expression based on Haar-like features, belong to computer science and graph and image processing technical field.

Background technology

Facial expression is the important channel of Human communication, expression recognition（Facial expression recognition, FER）As a technology in man-machine interaction, just increasingly it is taken seriously.Diversified expression is generally divided into several basic class by people, and then solves the problems, such as identification using sorting technique.For example, Cohn-Kanade, JAFFE Facial expression database have recorded it is angry, detest, frightened, glad, sad, surprised 6 kinds of expressions, CAS-PEAL-R1 face expression databases have recorded smile, frown, it is surprised, dehisce, 5 kinds of expressions of closing one's eyes.

Human facial expression recognition needs to solve 2 basic problems：1. the high characteristic vector of representative strong, discrimination how is extracted to characterize different facial expressions；2. which kind of accuracy rate of use is high, fireballing recognition methods makes a distinction to different facial expressions.Existing human facial expression recognition technology is taken a broad view of, commonly used approach has：

1. in terms of feature extraction：

(1) Optical-flow Feature：Binaryzation or gray processing are carried out to sequence of video images, and then feature extraction is carried out to the light stream sports ground of the sequence, characteristic sequence is obtained.The problem of this method is in progress Expression Recognition application one is that sign extraction rate is not fast enough, and two be the accuracy of identification deficiency of discrimination model.

(2) Gabor characteristic：Gabor filter is divided into the passage of certain amount, and then carries out Two-Dimensional Gabor Wavelets conversion to the Facial Expression Image after standardization processing to extract the textural characteristics of Facial Expression Image by Gabor filter.The shortcoming of this method is that extraction rate is slower, is had difficulties in real-time application.

(3) expressive features moment characteristics：Each two field picture that battle array is directed in facial expression image sequence extracts the length of normalized face's key point displacement and particular geometric feature successively, and these data are constituted into a feature column vector；All feature column vectors in sequence one eigenmatrix of arrangement form in order, each eigenmatrix represents a facial expression image sequence.Identification of this method due to being related to face's key point, therefore extraction rate and precision all existing defects.

(4) the image local feature extracting method based on 2 D partial least square method：Sample graph is divided into several equal-sized sub-blocks by expression classification first, recycle the textural characteristics of each sub-block of LBP operator extractions, and Local textural feature matrix is constituted, using adaptive weighted mechanism, statistical nature extraction is carried out to Local textural feature matrix using 2 D partial least square method.The algorithm design comparison of this method is complicated, and extraction rate is not suitable for real-time disposition than relatively low.

(5) feature based on AVR and enhancing LBP：Wavelet decomposition is carried out to standard faces image, then extracts LBP features, enhancing variance-rate AVR characteristic values and the additional penalty factor are calculated afterwards, the characteristic value of some groups of different dimensions mutually distinguished with AVR values is finally extracted.This method needs to use the steps such as wavelet transformation, LBP feature extractions, penalty factor calculating, and extraction rate is relatively low, it is impossible to meet the demand handled in real time.

(6) face parameter attribute：The position of face first in identification facial zone, and then extract each organ according to the image information of face（Such as eye, nose, eyebrow, the corners of the mouth）Texture, profile parameters be used as characteristic vector.This method is related to the identification to face organ, therefore accuracy of identification and the representative aspect existing defects of feature.

In addition, earlier some study expressive features point motion features also including histogram, histogram of gradients, based on piecewise affine transformations etc..For the characteristic type that dimension is larger, dimension-reduction treatment is also often referred to, common Feature Dimension Reduction processing method has：Cluster linear discriminant analysis method, PCA etc..

2. in terms of differentiating method of expressing one's feelings：

(1) SVMs（SVM）Algorithm；SVMs (Support Vector Machine, SVM) be built upon that the VC of Statistical Learning Theory dimensions are theoretical and Structural risk minization basis on, according to limited sample information model complexity（I.e. to the study precision of specific training sample）And learning ability（The ability of arbitrary sample is recognized without error）Between seek optimal compromise, to obtain best Generalization Ability.SVM algorithm to kernel function, kernel functional parameter in training, it is necessary to be constantly adjusted to optimize, therefore training process is often more complicated, and this is the important deficiency in the algorithm use；In addition, SVM algorithm is a kind of two sorting algorithm, the identification for plurality of classes is, it is necessary to which further to algorithm improved.

(2) Canonical Correlation Analysis：This method borrows principal component analysis dimensionality reduction thought, principal component is extracted to two groups of variables respectively, and the degree of correlation between the principal component extracted from two groups of variables is reached maximum, and from orthogonal between each principal component of same group of internal extraction, the overall linear relationship of two groups of variables is described with the correlation for the principal component extracted respectively between two groups.This method is more accurate for the description of linear relationship, but precision when being measured for more complicated relation is not satisfactory, and this is the limitation of the algorithm in use.

(3) Histogram Matching：The input of this method is two groups of statistics with histogram amounts, is generally regarded as two groups of one-dimensional vectors, and then use the distance metric method of one-dimensional vector（Such as Euclidean distance, card side, histogram intersection, Bhattacharyya distances, land mobile distance）, similarity measurement is carried out to statistics with histogram amount.But this method designs histogrammic statistic unit, requires stricter by the representative of statistic, if above-mentioned 2 points can not meet very well, recognition effect will be greatly affected.

(4) AdaBoost algorithms：The algorithm is a kind of iterative algorithm, and its core concept is that different graders (Weak Classifier) are trained for same training set, and then these weak classifier sets are got up, a stronger final classification device (strong classifier) is constituted.Its algorithm is realized by change data distribution in itself.The limitation one that this method is used is its training time, for the high dimensional data of larger data amount, and this method generally requires the plenty of time and is trained；Two be the selection of Weak Classifier, and optimal Weak Classifier can just be searched out by generally requiring to carry out largely test.

In summary, this application scenarios is recognized for a variety of expression high-precision high-speeds, existing feature extracting method existing characteristics representativeness is limited, precision and not high enough the deficiency of extraction rate；Meanwhile, existing expression differentiating method also having that accuracy of identification is undesirable, complexity is too high, recognizable expression categorical measure is limited, the low limitation of recognition speed.

The content of the invention

The purpose of the present invention is to recognize problems to solve a variety of facial expression high-precision high-speeds, proposes a kind of human facial expression recognition method based on Haar-like features.

The present invention design principle be：The Face datection of high accuracy is realized first by Haar-like features and series connection Face datection grader；And then Feature Selection is carried out to higher-dimension Haar-like features using AdaBoost.MH algorithms；It is final to carry out expression classifier training using random forests algorithm, to complete Expression Recognition.

The technical scheme is that be achieved by the steps of：

Step 1, in order to realize automatically extracting for facial zone image, first by multiple facial zone images（It is used as positive sample）With multiple non-face area images（It is used as negative sample）Off-line training is carried out, face recognition grader is obtained.Face recognition grader can be obtained by a variety of conventional training methods in the prior art.The present invention uses the AdaBoost Cascading Classifier training methods based on Haar-like features.

Step 2, on the basis of step 1, the off-line training of facial expression grader is carried out.Detailed process is as follows：

Step 2.1, expression mark is carried out to face-image training data first, specific method is：Collect the picture or video of various expression classifications to be identified（For expression video, extract key frame therein and be used as training image）, training image collection A is formed, wherein the picture number included is m；Then use continuous integer numbering as each pictures or the class label of key frame, it is expression classification number to be identified to form expression class label collection Y={ 1,2 ..., k }, wherein k.

Step 2.2, facial zone data extraction, the face-image being cut out are carried out to every width training image after step 2.1 mark.

Facial zone data extract specific method be：The integrogram of each width image is calculated first.Described integrogram is identical with original image size, and the value of any point (x, y) is artwork corresponding points (x ', y ') and its all pixel value sums in upper left side thereon：

ii (x, y) = \underset{x^{'} \leq x, y^{'} \leq y}{Σ} i (x^{'}, y^{'}),

Ii (x, y) represents the value of point (x, y) on integrogram in formula, and i (x ＇, y ＇) represents the pixel value of point (x ', y ') on original image.

Calculating is obtained after integrogram, the face recognition grader obtained according to step 1, and the Haar-like features of image in sliding window region are extracted using sliding window method, facial zone is quickly determined；Then the image cutting-out of facial zone is come out, the size zoomed to needed for Expression Recognition, keeps original expression mark, form training image collection B.

Step 2.3, in order to train expression classifier, each image in the training image collection B formed to step 2.2 carries out the second extraction of Haar-like features, and the specific method of Haar-like feature extractions is：The integrogram for the image that each width is cut out is calculated, calculating corresponding H according to each integrogram ties up Haar-like characteristic values（Wherein H is determined by the Haar-like characteristic types and the size of picture used）.

The corresponding H dimensions Haar-like characteristic vectors of each image are denoted as a line, whole H of all m width images is tieed up Haar-like characteristic vectors and constitutes a m row, the eigenmatrix X of H row；

Step 2.4, using AdaBoost.MH algorithms, the Haar-like eigenmatrixes X obtained to step 2.3 carries out Feature Selection；Described AdaBoost.MH algorithms take turns computing by F, and screening is iterated to single feature Weak Classifier, and selection F dimension principal characters obtain a m row, the principal character matrix X ' of F row to complete Feature Selection from H dimension Haar-like characteristic value collections；

Weak Classifier used in the iteration screening, need to meet following condition：1. the input of Weak Classifier is one-dimensional characteristic value（A certain specific dimension i.e. in characteristic vector）；2. for class label to be identified, the output of Weak Classifier is 1 or -1.

Using AdaBoost.MH algorithms carry out Feature Selection detailed process be：

Step 2.4.1, the corresponding weight of initialization each image, is denoted as D₁(i, y_i)=1/ (mk), y_i∈ Y represent the expression class label of i-th of image, i=1 ... m；

Step 2.4.2, starts f wheel iteration (f=1 ... F）：Eigenmatrix X each column data is carried out H computing, obtain r as the input of a Weak Classifier successively_{F, j}：

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [l_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... H, x_{I, j}Represent j-th of element in X the i-th row（J-th of characteristic value in characteristic vector corresponding to i.e. i-th training image）；h_j(x_{I, j}, y_i) represent with x_{I, j}It is used as the Weak Classifier of input, D_f(i, y_i) represent that f takes turns the weighted value of i-th of training image in iteration,

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

Terminate after H computing, H r for taking epicycle iteration to obtain_{F, j}In maximum, be denoted as r_f, and by r_fJth dimensional feature value x corresponding, using X_jIt is used as the Weak Classifier h of input_j(x_j, Y), the Weak Classifier h filtered out is taken turns as f_f(x_j, Y), while by x_jNew feature space is added to as the feature dimensions filtered out；

Step 2.4.3, calculates the Weak Classifier h selected by step 2.4.2_f(x_j, Y) weight α_f：

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

Step 2.4.4, calculates the weight D of each image in f+1 wheel iteration_f+1；

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . m .

Wherein, h_f(x_{I, j},y_i) represent f wheel iteration in filter out, the Weak Classifier of input, Z are used as using the jth dimensional feature value of i-th of image_fIt is normalization factor

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 2.4.5, the new weight that step 2.4.4 is obtained substitutes into step 2.4.2, according to step 2.4.2 to step 2.4.4 method iteration, until filtering out F dimension principal characters, and F row are extracted in eigenmatrix X, form a m row, the principal character matrix X ' of F row；

Step 3, the expression class label collection Y marked using the principal character matrix X ' obtained through step 2 and by step 2.1, training generation Expression Recognition grader, the process of training follows random forests algorithm, and specific method is：

Step 3.1, decision tree number T and node intrinsic dimensionality u in design requirement, generate T CART categorised decision tree.The record format of the root node of the decision tree is N (J), and the record format of intermediate node is N (V, J), and the record format of leafy node is (V, J, y_t).Wherein, J represents node N disruptive features dimension, and V represents node N characteristic value, y_tRepresent node N class label.

The generation method of every CART categorised decision tree is：

Step 3.1.1, carries out that grab sample can be put back to m times, and principal character matrix X ' a line is extracted every time, constitutes a new m row, the matrix X " of F row, the growth for this CART categorised decision tree；The corresponding training sample label of each row feature constitutes new expression class label collection Y ＂ in X "；

Step 3.1.2, since root node, node split is carried out by node, is finally completed the growth of whole tree.Each the fission process of node is：

a）U row are randomly choosed from eigenmatrix X ＂

As the training data needed for this node split, wherein

Represent X " jth row（That is the jth dimension in F dimension Haar-like feature spaces）；

b）The x ＂ j selected information gain IG is calculated respectively_j, obtain u IG_j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}), - - - (1)

Wherein, H (Y ＂) is expression class label collection Y " comentropy：

H (Y^{''}) = - Σ_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

Vw represents the value of w-th of class label in Y ＂, namely V_w∈ { 1,2 ..., k }；

It is based on for expression class label collection Y ＂

Conditional entropy：

H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s})

= - Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s})

V_sRepresent x ＂_jS kind values in middle all different values of element；V_w/sRepresent V_sCorresponding expression class label；H (Y ＂ | x "_{J, s}=V_s) it is V_sThe comentropy of corresponding expression class label set, x ＂_jRepresent the element in X ＂ jth row, h≤m, q≤k；

c）The u information gain value IG that comparison step b is obtained_j, and IG will be made in X ＂_jThe maximum row of value are extracted, and are denoted as x "_J, while the columns J that this is listed in X ' is recorded, tie up, used during for identification as the disruptive features of this node；

d）Count x "_JIn all different characteristic values quantity c, then set up c respectively using different characteristic value as node characteristic value V child node to current node, and using the child node as the root node of subtree, the new subtree of generation, the growing method of subtree is：By x "_JIn all values be equal to the element of this feature value where row vector in X ＂ propose, constitute new eigenmatrix X^v, then by the corresponding institute's espressiove class label proposition of proposed row vector, constitute new expression class label set Y^vThen (X is used^v, Y^v) (X ", Y ") is substituted, step a-d operation is recursively carried out, until meeting following condition for the moment, terminates the growth of this subtree：

1. this node can not continue division（That is X^vLine number be less than 2, or each row in all characteristic values it is all equal）, corresponding expression class label set Y^vMiddle frequency of occurrences highest label as the node class label y_tPreserve；

2. the Y under this node^vIn expression class label it is all identical, unique expression class label is used as to the class label y of the node_tPreserve.

Step 3.2, all T CART categorised decision trees are preserved, final random forest Expression Recognition grader is formed.

Step 4, the random forest Expression Recognition grader obtained using step 3 off-line training, ONLINE RECOGNITION is carried out to still image to be measured or dynamic video；

1) recognition methods to still image is：

Step a, extracts the facial zone in still image to be identified；

Step b, on the basis of step a, the Haar-like feature extracting methods according to step 2.2, and the principal character matrix X ' that step 2.4.5 is obtained, the F dimension Haar-like features needed for identification are extracted, the characteristic vector of facial expression image to be identified is constituted, is denoted as x；Use x_JRepresent characteristic vector x J dimensional feature values；

The characteristic vector x of facial expression image to be identified is identified respectively for T CART categorised decisions tree in step c, the random forest Expression Recognition grader obtained using step 3 off-line training, and the identification of every CART categorised decision tree is since root node, and detailed process is：

C.1. the disruptive features dimension J of current node is obtained from grader, the characteristic vector x of facial expression image to be identified J dimensional feature values x is read_J；

C.2. searched in the child node of current node, select node characteristic value and x_JMost close child node；

C.3. operation c.1-c.2 is recursively carried out repeatedly, until current node is leafy node, stops recurrence, and by the class label y of the leafy node_tExported as the recognition result of this CART categorised decision tree；

Step d, to the T output result y of T CART categorised decision tree in random forest Expression Recognition grader_tCounted, will appear from frequency highest class label and exported as final recognition result.

2) recognition methods to dynamic video is：

Step e, is decoded to video file, is extracted per frame data, is obtained images to be recognized sequence；

Step f, on the basis of step e, carries out facial zone data extraction to each image in image sequence to be identified, obtains face-image sequence to be identified；

Step g, according to the Haar-like feature extracting methods described in step 2.2, the F dimension Haar-like features filtered out to every breadth portion image extracting step 2.4.5 in the face-image sequence to be identified obtained by step f；

Step h, on the basis of step g, every width face-image in face-image sequence to be identified is identified the random forest Expression Recognition grader obtained using step 3 off-line training, obtains classification sequence of expressing one's feelings；The identification process of the face-image is identical with step c, d；

Step i, the expression classification sequence obtained to step h carries out smooth, the burr judgement among removal recognition sequence, obtains final recognition result.

Beneficial effect

Compared to the method based on features such as the colour of skin, border, Gabor, wavelet transformations, the method for detecting human face that the present invention is used has the characteristics of recognition speed is fast, accuracy rate is high.

Compared with the methods such as Gabor characteristic, Wavelet Transform Feature, Optical-flow Feature, the technology that the present invention is used has higher accuracy rate and smaller calculating consumption, is applicable not only to desktop computer, is also applied for the mobile computing platforms such as mobile phone, tablet personal computer.

Compared with the machine learning methods such as AdaBoost, SVM and traditional Canonical Correlation Analysis, the method based on template matches and similarity measurement, the present invention realizes the final identification of expression using the method for " Feature Selection+random forest ", recognition accuracy with faster recognition speed and Geng Gao, and parallelization can be conveniently realized, with the demand for further improving recognition efficiency, meeting processing and mobile computing in real time.

Brief description of the drawings

Fig. 1 is human facial expression recognition method schematic of the invention；

Fig. 2 is the schematic diagram of embodiment septum reset image extraction method；

Fig. 3 is 10 class Haar-like features used in embodiment septum reset image extraction method；

The 5 class Haar-like features that Fig. 4 is used by embodiment septum reset Expression Recognition；

Fig. 5 is the schematic diagram of " Feature Selection+random forest " method in embodiment；

Fig. 6 is in embodiment, when being tested using CAS-PEAL-R1 expressions storehouse, performance test of the invention with traditional AdaBoost.MH algorithms, (a) figure is recognition accuracies of traditional AdaBoost.MH to all kinds of expressions, and (b) figure is recognition accuracy of " Feature Selection+random forest " method proposed by the invention to all kinds of expressions；

It is the overall accuracy performance that the present invention carries " Feature Selection+random forest " method and AdaBoost.MH when being tested using JAFFE expressions storehouse during Fig. 7 is embodiment.

Embodiment

In order to better illustrate objects and advantages of the present invention, the embodiment to the inventive method is described in further details with reference to the accompanying drawings and examples.

Respectively so that static images and dynamic video as input, are designed and disposed with 3 tests：(1) for CAS-PEAL-R1 express one's feelings storehouse static images test, (2) for JAFFE express one's feelings storehouse static images test, (3) be directed to dynamic video test.

CAS-PEAL-R1 is the face database that Inst. of Computing Techn. Academia Sinica builds, wherein front photograph of the face expression database comprising 379 people, and everyone records 5 kinds of expressions：Frown, smile, closing one's eyes, dehiscing, it is surprised.JAFFE describes the 6 classes expression of 10 Japanese womens：Glad, sad, dejected, angry, surprised, detest.

In order to describe algorithm performance curve, it is necessary to test different parameter combinations（That is u, T value）The performance impact that static images are recognized, therefore in test (1), (2), 400 random forest graders trained under different node intrinsic dimensionality u are tested respectively.During first round Feature Selection, it is step-length that F value, which is defined as 10000, u value with 10, and 4000 are risen to by 10；T value is defined as u²Upper round numbers, record every group（u、T）Overall accuracy under value.For a certain N-dimensional confusion matrix C, its overall accuracy P definition is：

p = \frac{Σ_{i = 1}^{N} c_{ii}}{Σ_{i = 1}^{N} Σ_{j = 1}^{N} c_{ij}} \times 100 % . - - - (2)

More objectively parser performance, in test (1), the overall accuracy of every bit is all obtained using 10 folding interior extrapolation methods on algorithm performance curve.For test (2), because expression storehouse picture sum is less, be not suitable for 10 folding cross-betas of deployment, thus using the method for more strict opener test.

For test (3), the video shot using camera is included on screen as input, by the recognition result of every two field picture and identification are time-consuming.

Above-mentioned 3 testing process will one by one be illustrated below, all tests are completed on same computer, and concrete configuration is：Intel double-cores CPU（Dominant frequency 1.8G）, 1G internal memories, WindowsXP SP3 operating systems.

In above-mentioned 3 are tested, automatically extracting for facial zone image is carried out using identical face recognition grader.The specific training flow of face recognition grader using 10 class Haar-like features shown in Fig. 3 as shown in Fig. 2 using the AdaBoost Cascading Classifier training methods based on Haar-like features, be trained.

In addition, in above-mentioned 3 are tested, using identical Weak Classifier.The definition of Weak Classifier is：

h_{j} (x, y) = \{\begin{matrix} 1 & p_{j, y} x_{j} < p_{j, y} θ_{j, y} \\ - 1 & p_{j, y} x_{j} &GreaterEqual; p_{j, y} θ_{j, y} \end{matrix} - - - (3)

Wherein, x_jRepresent the input of Weak Classifier, θ_{J, y}Represent the threshold value obtained after training, p_{J, y}Indicate the direction of the sign of inequality.

1. the static images test in storehouse of being expressed one's feelings for CAS-PEAL-R1

150 width are respectively selected as experimental data in 5 kinds of facial expression images for expressing one's feelings storehouse from CAS-PEAL-R1.In order to carry out 10 folding cross validations, for every group of F, T valued combinations, 750 width images are randomly divided into 10 groups, 10 training for taking turns expression classifier and identification test are carried out respectively.In each wheel test, using 1 group in 10 groups of images as test data, it is estimated for the accuracy rate to grader；Remaining 9 groups of data is as training data, the off-line training for facial expression grader.After the completion of 10 wheel tests, each image was tested once, and test result is collected, and generated confusion matrix, and calculate overall accuracy according to formula (2).

10 above-mentioned wheel test process are identical, and the idiographic flow for often taking turns test is：

Step 1, it is loaded into face recognition grader.

Step 2, the off-line training of facial expression grader is carried out.

Step 2.1, expression mark is carried out to face-image training data first.Because the expression in CAS-PEAL-R1 expressions storehouse is documented on filename, therefore the work of this step is directly to correspond to integer { 1,2,3,4,5 } according to the mark keyword on filename to dump among annotation repository.

Step 2.2, for the face-image being cut out, facial zone data extraction is carried out to every width training image after step 2.1 mark.Facial zone data extract specific method be：The integrogram of entire image, and then the face recognition grader obtained according to step 1 are calculated first, and the Haar-like features of image in sliding window region are extracted using sliding window method, facial zone is quickly determined（The increase coefficient of sliding window is 1.2）；Finally the image cutting-out of facial zone is come out, the size zoomed to needed for Expression Recognition（In the present embodiment, the size used is 32 × 32pix）, original expression mark is kept, training image collection is formed.

Step 2.3, in order to carry out Expression Recognition, the face-image being cut out to step 2.2 carries out the second extraction of Haar-like features.

Fig. 4 illustrates the 5 class Haar-like features that the present embodiment is used, and this feature has following three feature：

1. arithmetic speed is fast.Coordinate integrogram, the extraction of any size Haar-like features only need to perform the digital independent and plus and minus calculation of fixed number of times.Haar-like features comprising 2 rectangles need to only read 6 points from integrogram and carry out plus/minus computing, and the feature of 3 rectangles need to only read 8 points, and the feature of 4 rectangles need to only read 9 points.

2. distinction is strong.The dimension of Haar-like feature spaces is very high, by taking 5 category features that the present embodiment is used as an example, and the image of one 32 × 32, total dimension of 5 category features has exceeded 510,000, and particular number is as shown in table 1.

The quantity of the class Haar-like features of 1 one 32 × 32 images of table 5

The dimension of pixel count of this dimension considerably beyond picture in itself, the also traditional characteristic such as significantly larger than Gabor, facial expression feature point feature, therefore with higher differentiation potentiality.

The specific method of extraction is：The integrogram for the face-image that each width is cut out is calculated, 510112 all dimension Haar-like characteristic values are calculated according to integrogram, Haar-like characteristic value collections are obtained.

Calculate the integrogram for the image that each width is cut out, 510112 corresponding dimension Haar-like characteristic values are calculated according to each integrogram, the corresponding 510112 dimension Haar-like characteristic vectors of each image are denoted as a line, whole Haar-like characteristic vectors of all 675 width images is constituted 675 rows, the eigenmatrix X of 510112 row.

In the following description, using y_i∈ Y represent the expression class label of i-th of image, x_iRepresent eigenmatrix X the i-th row（510112 dimension Haar-like characteristic vectors corresponding to i.e. i-th training image）, x_jRepresent X jth row（Jth dimension in i.e. 510112 dimension Haar-like feature spaces）, x_{I, j}Represent j-th of element in X the i-th row（J-th of characteristic value in characteristic vector corresponding to i.e. i-th training image）；

Step 2.4, using AdaBoost.MH algorithms, the Haar-like eigenmatrixes X obtained to step 2.3 carries out Feature Selection；Described AdaBoost.MH algorithms pass through 10000 wheel computings, screening is iterated to single feature Weak Classifier, selection 10000 ties up principal character to complete Feature Selection from H dimension Haar-like characteristic value collections, obtains 675 rows, the principal character matrix X ' of 10000 row；

Weak Classifier used in above-mentioned interative computation, need to meet following condition：1. the input of Weak Classifier is one-dimensional characteristic value（A certain specific dimension i.e. in characteristic vector）；2. for class label l to be identified, the output of Weak Classifier is 1 or -1.

Using AdaBoost.MH algorithms carry out Feature Selection detailed process be：

Step 2.4.1, the corresponding weight of initialization each image, is denoted as D₁(i, y_i)=1/(675×5)。

Step 2.4.2, starts epicycle iteration（In illustrating below, the wheel number of iteration is represented using f）, eigenmatrix X each column data is carried out 510112 computings, r is calculated according to the following formula as the input of a Weak Classifier successively_{F, j}Value：

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [l_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... 510112, x_{I, j}Represent j-th of element in X the i-th row；h_j(x_{I, j}, y_i) represent with x_{I, j}It is used as the Weak Classifier of input, D_f(i, y_i) represent epicycle iteration（That is f takes turns iteration）In i-th of training image weighted value,

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

After above-mentioned 510112 computings terminate, compare 510112 r that epicycle is iterated to calculate out_{F, j}Value, take its maximum, be denoted as r_f, and then find and make r_{F, j}Reach maximum occurrences r_f, using Weak Classifier h of the jth dimensional feature value as input_j(x_j, Y), the Weak Classifier filtered out as epicycle（In order to which following description is convenient, h is denoted as_f(x_j, Y)）, while the jth dimensional feature x that the Weak Classifier is used_jNew feature space is added to as the feature dimensions filtered out；

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

Step 2.4.4, calculates the weight D of each image in next round iteration_f+1；

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . 675 .

Wherein, h_f(x_i, l) represent that the jth dimensional feature value extracted in f wheel iteration using in i-th of image is used as the Weak Classifier of input, Z_fIt is normalization factor

Z_{f} = \underset{il}{Σ} D_{f} (i, y_{i}) \exp (- α_{f} K_{i} [y_{i}] h_{f} (x_{i}, l)) i = 1 . . . 675 .

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 2.4.5, the new weight that step 2.4.4 is obtained substitutes into step 2.4.2, is taken turns according to step 2.4.2 to step 2.4.4 method iteration 10000, so as to filter out 10000 dimension principal characters, forms new feature space, i.e.,：The characteristic series filtered out is extracted by wheel from eigenmatrix X, 675 rows, the principal character matrix X ' of 10000 row is formed；

Step 3, expression class label collection Y={ 1,2,3,4,5 } marked using the principal character matrix X ' obtained through step 2 and by step 2.1, training generation Expression Recognition grader, the process of training follows random forests algorithm, and specific method is：

The generation method of every CART categorised decision tree is：

Step 3.1.1, carries out that grab sample can be put back to 675 times, and principal character matrix X ' a line is extracted every time, a 675 new rows, the matrix X " of 10000 row is constituted, dedicated for the growth of this CART categorised decision tree；X " in the corresponding training sample label of each row feature constitute new expression class label collection Y "；

a）U row are randomly choosed from eigenmatrix X

As the training data needed for this node split, wherein

b）The x selected " is calculated respectively_jInformation gain IG_j, obtain u IG_j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}), - - - (1)

Wherein, H (Y ") is expression class label collection Y " comentropy：

H (Y^{''}) = - Σ_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

V_wRepresent the value of w-th of class label in Y ", namely V_w∈ { 1,2 ..., k }；

It is based on for expression class label collection Y ＂Conditional entropy：

H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s})

= - Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s})

V_sRepresent x "_jS kind values in middle all different values of element；V_w/sRepresent V_sCorresponding expression class label；H(Y″|x″_{J, s}=V_s) it is V_sThe comentropy of corresponding expression class label set, x ＂_jRepresent the element in X ＂ jth row, h≤m, q≤k；

d）Count x "_JIn all different characteristic values quantity c, then set up c respectively using different characteristic value as node characteristic value V child node to current node, and using the child node as the root node of subtree, the new subtree of generation, the growing method of subtree is：By x "_JIn all values be equal to the element of this feature value where row vector in X ＂ propose, constitute new eigenmatrix X^v, then by the corresponding institute's espressiove class label proposition of proposed row vector, constitute new expression class label set Y^v；Then use (Xv, Yv) to substitute (X ", Y "), recursively carry out step a-d operation, until meeting following condition for the moment, terminate the growth of this subtree：

Step 3.2, by the unified preservation of all T CART categorised decisions trees, final random forest Expression Recognition grader is formed.

Step 4, every width test pictures are carried out Expression Recognition by the Expression Recognition grader trained using step 3 respectively, and record recognition result and identification are time-consuming.The specific method of every width test pictures identification is：

Step a, extracts the facial zone in still image to be identified；

Step b, on the basis of step a, the Haar-like feature extracting methods according to step 2.2, and step 2.4.5 Feature Selection result, extract 10000 dimension Haar-like features needed for identification, constitute the characteristic vector of facial expression image to be identified, be denoted as x；Use x_JRepresent characteristic vector x J dimensional feature values；

Step c, the characteristic vector x extracted using step b, the random forest Expression Recognition grader obtained using step 3 off-line training, the characteristic vector x of facial expression image to be identified is identified using T CART categorised decisions tree in grader respectively, the identification of every CART categorised decision tree is since root node, and detailed process is：

C.2. searched in the child node of current node, select node value v and x_JMost close child node child^v；

C.3. operation c.1-c.2 is recursively carried out repeatedly, until current node is leaf node, stops recurrence, and the class label of the leaf node is made into y_tExported for the recognition result of this CART categorised decision tree；

Step e, after the test of 10 wheels terminates, carries out collecting comparison, obtains confusion matrix, and according to formula (2), calculate overall accuracy to the recognition result of all 750 pictures.

2. the static images test in storehouse of being expressed one's feelings for JAFFE

JAFFE describes the 6 classes expression of 10 Japanese womens：Glad, sad, dejected, angry, surprised, detest.Because expression storehouse picture sum is less（216 width image altogether）, be not suitable for 10 folding cross-betas of deployment, thus using the method for more strict opener test, 10 width are extracted as test set in expression per class, remaining 26 width is used as training set.

Idiographic flow is similar with test 1, and different part is：(1) expression classification number k value is 6, and (2) training image quantity m value is 156.

3. for the test of dynamic video

In order to test the present invention to the recognition performance of dynamic video, the video shot using camera is included on screen as input, by the recognition result of every two field picture and identification are time-consuming.The best expression classifier parameter of recognition performance in test 1 is chosen at, is set it to：T=60, u=3550（I.e. grader has 60 CART categorised decision trees；The progress of 3550 dimensions is randomly selected in each node growth of each tree from 10000 dimensional features）.Concretely comprise the following steps：

Step 1：The video data obtained to USB camera, extracts per frame data, obtains images to be recognized sequence.

Step 2：On the basis of step 1, facial zone data extraction is carried out to each image in image sequence to be identified, face-image sequence to be identified is obtained.

Step 3：Haar-like features are tieed up to every breadth portion image zooming-out F in the face-image sequence to be identified obtained by step 2（In this example, F=10000）.

Step 4：On the basis of step 3, the random forest expression classifier obtained using step 3 off-line training in test 1（T=60, u=3550）Every width face-image in face-image sequence to be identified is identified, classification sequence of expressing one's feelings is obtained, and by the recognition result of every frame and the time-consuming output of identification over the display.The identification process of every width face-image is identical with step 4 in test 1.

Test result

For test 1, in order to contrast institute's extracting method of the present invention and difference of the AdaBoost.MH methods in Expression Recognition accuracy rate, carried out using traditional AdaBoost.MH methods with 1 similar experiment of test, " Feature Selection+random forest " method carried with the present invention is compared.Two methods are as shown in Figure 6 to the recognition accuracy of all kinds of expressions.As can clearly see from the figure：AdaBoost.MH to the discrimination highest of eye closing, to dehiscing, the discrimination of surprised two class expression it is relatively low（It is not above 75%）.Correspondingly, as u ＞ 900, " Feature Selection+random forest " method has exceeded 90% to the recognition accuracy that 5 classes are expressed one's feelings.

In terms of recognition speed, table 3 have recorded the average of each link when in test 1 750 width pictures are carried out with Expression Recognition and take.It can be seen that, the identification of " Feature Selection+random forest " method is taken as 5.2ms, adds the time overhead of recognition of face, recognition speed can reach 27.62 frames/second.

The identification of table 3 " Feature Selection+random forest " method takes

For test 2, equally carried out using traditional AdaBoost.MH methods with 2 similar experiments of test, " Feature Selection+random forest " method carried with the present invention is compared.Fig. 7 illustrates the performance of two methods overall accuracy, it is seen then that the accuracy rate of method of the invention is significantly higher than AdaBoost.MH methods.

In test 3, this method has very high recognition accuracy；Meanwhile, the Expression Recognition of single frames is time-consuming in 5ms or so.

Above-mentioned 3 tests test result indicates that, the present invention has that accuracy rate is high, fireballing feature.The 10 folding cross validation results on CAS-PEAL-R1 expressions storehouse show that overall recognition accuracy reaches 94.7%；In the opener test that JAFFE expresses one's feelings storehouse, 91.2% recognition accuracy also obtain；In terms of recognition speed, the average identification of every face is taken as 5.2ms, can meet the demand of Real time identification.

Claims

1. the high-precision recognition methods of multi-class facial expression, it is characterised in that：Comprise the following steps：

Step 1, using multiple facial zone images as positive sample, multiple non-face area images as negative sample carry out off-line training, obtain face recognition grader；

Step 2, on the basis of step 1, the off-line training of facial expression grader is carried out；Detailed process is as follows：

Step 2.1, expression mark is carried out to face-image training data, specific method is：The picture of various expression classifications to be identified or the key frame of video are collected, training image collection A is formed, wherein the picture number included is m；Using continuous integer numbering as each pictures or the class label of key frame, it is expression classification number to be identified to form expression class label collection Y={ 1,2 ..., k }, wherein k；

Step 2.2, facial zone data extraction is carried out to every width training image after step 2.1 mark, the face-image being cut out forms training image collection B；

Step 2.3, in order to train expression classifier, each image in the training image collection B formed to step 2.2 carries out the second extraction of Haar-like features, and the specific method of Haar-like feature extractions is：The integrogram for the image that each width is cut out is calculated, calculating corresponding H according to each integrogram ties up Haar-like characteristic values；

Step 2.4, using AdaBoost.MH algorithms, the Haar-like eigenmatrixes X obtained to step 2.3 carries out Feature Selection；Detailed process is：

Step 2.4.2, starts f wheel iteration, f=1 ... F：Eigenmatrix X each column data is carried out H computing, obtain r as the input of a Weak Classifier successively_{F, j}：

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [l_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... H, x_{I, j}Represent j-th of element in X the i-th row；h_j(x_{I, j}, y_i) represent with x_{I, j}It is used as the Weak Classifier of input, D_f(i, y_i) represent that f takes turns the weighted value of i-th of training image in iteration,

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

Terminate after H computing, H r for taking epicycle iteration to obtain_{F, j}In maximum, be denoted as r_f, and by r_fIt is corresponding, using X jth dimensional feature value xj as input Weak Classifier h_j(x_j, Y), the Weak Classifier h filtered out is taken turns as f_f(x_j, Y), while by x_jNew feature space is added to as the feature dimensions filtered out；

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

Step 2.4.4, calculates the weight D of each image in f+1 wheel iteration_f+1；

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . m .

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 3.1, decision tree number T and node intrinsic dimensionality u in design requirement, generate T CART categorised decision tree；The record format of the root node of the decision tree is N (J), and the record format of intermediate node is N (V, J), and the record format of leafy node is (V, J, y_t)；Wherein, J represents node N disruptive features dimension, and V represents node N characteristic value, y_tRepresent node N class label；

The generation method of every CART categorised decision tree is：

Step 3.1.2, since root node, node split is carried out by node, is finally completed the growth of whole tree；Each the fission process of node is：

a）The random selection u row from eigenmatrix X "

As the training data needed for this node split, wherein

Represent X " jth row；

b）The x selected " j information gain IG is calculated respectively_j, obtain u IG_j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}),

Wherein, H (Y ") is expression class label collection Y " comentropy：

H (Y^{''}) = - Σ_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

V_wRepresent the value of w-th of class label in Y, V_w∈ { 1,2 ..., k }；

It is based on for expression class label collection Y ＂

Conditional entropy：

H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s})

= - Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s})

V_sRepresent x "_jS kind values in middle all different values of element；V_w/sRepresent V_sCorresponding expression class label；H (Y ＂ | x "_{J, s}=V_s) it is V_sThe comentropy of corresponding expression class label set, x ＂_jRepresent the element in X ＂ jth row, h≤m, q≤k；

c）The u information gain value IG that comparison step b is obtained_j, and IG will be made in X ＂_jThe maximum row of value are extracted, and are denoted as x "_J, while the columns J that this is listed in X ' is recorded, tieed up as the disruptive features of this node；

d）Count x "_JIn all different characteristic values quantity c, then set up c respectively using different characteristic value as node characteristic value V child node to current node, and using the child node as the root node of subtree, the new subtree of generation, the growing method of subtree is：By x "_JIn all values be equal to the element of this feature value where row vector in X ＂ propose, constitute new eigenmatrix X^v, then by the corresponding institute's espressiove class label proposition of proposed row vector, constitute new expression class label set Y^v；Then (X is used^v, Y^v) (X ", Y ") is substituted, step a-d operation is recursively carried out, until meeting following condition for the moment, terminates the growth of this subtree：

①X^vLine number to be less than in 2, or each row all characteristic values all equal when causing this node can not to continue division, corresponding expression class label set Y^vMiddle frequency of occurrences highest label as the node class label y_tPreserve；

2. the Y under this node^vIn expression class label it is all identical when, unique expression class label is used as to the class label y of the node_tPreserve；

Step 3.2, all T CART categorised decision trees are preserved, final random forest Expression Recognition grader is formed；

1) recognition methods to still image is：

Step a, extracts the facial zone in still image to be identified；

Step b, on the basis of step a, according to Haar-like feature extracting methods, and the principal character matrix X ' that step 2.4.5 is obtained, extracts the F dimension Haar-like features needed for identification, constitutes the characteristic vector of facial expression image to be identified, be denoted as x；Use x_JRepresent characteristic vector x J dimensional feature values；

Step d, to the T output result y of T CART categorised decision tree in random forest Expression Recognition grader_tCounted, will appear from frequency highest class label and exported as final recognition result；

2) recognition methods to dynamic video is：

2. the high-precision recognition methods of multi-class facial expression according to claim 1, it is characterised in that：In Feature Selection method described in step 2.4, the input of the Weak Classifier used in interative computation is the one-dimensional characteristic value in characteristic vector, meanwhile, for expression class label y to be identified, the output of Weak Classifier is 1 or -1.

3. the high-precision recognition methods of multi-class facial expression according to claim 1, it is characterised in that：The specific method of facial zone data extraction is described in step 2.2：The integrogram of each width image is calculated first；Described integrogram is identical with original image size, and the value of any point (x, y) is artwork corresponding points (x ', y ') and its all pixel value sums in upper left side thereon：

ii (x, y) = \underset{x^{'} \leq x, y^{'} \leq y}{Σ} i (x^{'}, y^{'}),

Ii (x, y) represents point x, y on integrogram in formula) value, i (x ＇, y ＇) represents the pixel value of point (x ', y ') on original image；

Calculating is obtained after integrogram, the face recognition grader obtained according to step 1, and the Haar-like features of image in sliding window region are extracted using sliding window method, facial zone is quickly determined；Then the image cutting-out of facial zone is come out, the size zoomed to needed for Expression Recognition, keeps original expression mark.

4. the high-precision recognition methods of multi-class facial expression according to claim 1, it is characterised in that：The value of H described in step 2.3 is determined by the Haar-like characteristic types and dimension of picture used.

5. the high-precision recognition methods of multi-class facial expression according to claim 1, it is characterised in that：Face recognition grader described in step 1 is obtained using the AdaBoost Cascading Classifier training methods based on Haar-like features.