CN102831447B

CN102831447B - Method for identifying multi-class facial expressions at high precision

Info

Publication number: CN102831447B
Application number: CN201210314435.4A
Authority: CN
Inventors: 罗森林; 谢尔曼; 潘丽敏
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2012-08-30
Filing date: 2012-08-30
Publication date: 2015-01-21
Anticipated expiration: 2032-08-30
Also published as: CN102831447A

Abstract

The invention relates to a method for identifying multi-class facial expressions at a high precision based on Haar-like features, which belongs to the technical field of computer science and graphic image process. Firstly, the high-accuracy face detection is achieved by using the Haar-like features and a series-wound face detection classifier; further, the feature selection is carried out on the high-dimension Haar-like feature by using the Ada Boost. MH algorithm; and finally, the expression classifier training is carried out by using the Random Forest algorithm to complete the expression identification. Compared with the prior art, the method can reduce training and identifying time while increasing the multi-class expression identification rate, and can implement the parallelization conveniently to increase the identification rate and meet the requirement of real-time processing and mobile computing. The method can identify the static image and the dynamic image at a high precision, is not only applicable to the desktop computer but also to the mobile computing platforms, such as cellphone, tablet personal computer and the like.

Description

The recognition methods of multi-class facial expression high precision

Technical field

The present invention relates to a kind of multi-class facial expression high precision recognition methods based on Haar-like feature, belong to computer science and graph and image processing technical field.

Background technology

Facial expression is the important channel of Human communication, and expression recognition (facial expression recognition, FER), as the technology of in man-machine interaction, just more and more comes into one's own.Diversified expression is divided into several base class by people usually, and then uses sorting technique to solve identification problem.Such as, Cohn-Kanade, JAFFE Facial expression database have recorded anger, detest, fear, happiness, sadness, surprised 6 kinds of expressions, CAS-PEAL-R1 face expression database have recorded smile, frowns, surprised, dehisce, 5 kinds of expressions of closing one's eyes.

Human facial expression recognition needs solution 2 basic problems: 1. how to extract representative strong, that discrimination is high proper vector to characterize different facial expressions; 2. adopt high, the fireballing recognition methods of which kind of accuracy rate to distinguish different facial expressions.Take a broad view of existing human facial expression recognition technology, normally used method has:

1. in feature extraction:

(1) Optical-flow Feature: binaryzation or gray processing are carried out to sequence of video images, and then feature extraction is carried out to the light stream sports ground of this sequence, obtain characteristic sequence.The method is that sign extraction rate is fast not in the problem one of carrying out in Expression Recognition application, and two is that the accuracy of identification of discrimination model is not enough.

(2) Gabor characteristic: passage Gabor filter being divided into some, and then by Gabor filter, Two-Dimensional Gabor Wavelets conversion is carried out with the textural characteristics extracting Facial Expression Image to the Facial Expression Image after standardization processing.The shortcoming of the method is that extraction rate is comparatively slow, has difficulties in application in real time.

(3) these data are formed a characteristic series vector by expressive features moment characteristics: battle array extracts the length of normalized face key point displacement and particular geometric feature successively for each two field picture in facial expression image sequence; All characteristic series vectors in sequence arrange formation eigenmatrix in order, and each eigenmatrix represents a facial expression image sequence.The method owing to relating to the identification of face's key point, therefore extraction rate and precision all existing defects.

(4) based on the image local feature extracting method of 2 D partial least square method: first sample graph is divided into several sub-blocks equal-sized by expression classification, the textural characteristics of each sub-block of recycling LBP operator extraction, and form Local textural feature matrix, adopt adaptive weighted mechanism, use 2 D partial least square method to carry out statistical nature extraction to Local textural feature matrix.The algorithm design more complicated of the method, extraction rate is lower, is not suitable for real-time disposition.

(5) based on AVR and the feature strengthening LBP: carry out wavelet decomposition to standard faces image, extract LBP feature again, calculate afterwards and strengthen variance-rate AVR eigenwert and the additional penalty factor, finally extract the eigenwert of the different dimensions that some groups are distinguished mutually with AVR value.The method needs to use the steps such as wavelet transformation, LBP feature extraction, penalty factor calculating, and extraction rate is lower, cannot meet the demand of process in real time.

(6) face parameter attribute: the position first identifying face in facial zone, and then extract the texture of each organ (as eye, nose, eyebrow, the corners of the mouth etc.), profile parameters as proper vector according to the image information of face.The method relates to the identification to face organ, therefore accuracy of identification and feature representative aspect existing defects.

In addition, some researchs more early also comprise histogram, histogram of gradients, expressive features point motion feature etc. based on piecewise affine transformations.For the characteristic type that dimension is larger, also usually relate to dimension-reduction treatment, common Feature Dimension Reduction disposal route has: cluster linear discriminant analysis method, principal component analysis (PCA) etc.

2. in expression differentiating method:

(1) support vector machine (SVM) algorithm; Support vector machine (Support Vector Machine, SVM) be that the VC being based upon Statistical Learning Theory ties up on theoretical and Structural risk minization basis, between the complicacy (namely to the study precision of specific training sample) and learning ability (namely identifying the ability of arbitrary sample error-free) of model, optimal compromise is sought, to obtaining best Generalization Ability according to limited sample information.SVM algorithm, when training, needs constantly to adjust to be optimized to kernel function, kernel functional parameter, therefore training process often more complicated, and this is the important deficiency during this algorithm uses; In addition, SVM algorithm is a kind of two sorting algorithms, for the identification of plurality of classes, needs to do further improvement to algorithm.

(2) Canonical Correlation Analysis: the method uses principal component analysis (PCA) dimensionality reduction thought, respectively major component is extracted to two groups of variablees, and make the degree of correlation between the major component extracted from two groups of variablees reach maximum, and from uncorrelated mutually between each major component of same group of internal extraction, the linear relationship of two groups of variable entirety is described by the correlativity from the major component extracted respectively between two groups.The method is relatively accurate for the description of linear relationship, but precision when measuring for more complicated relation is not satisfactory, and this is this algorithm limitation in use.

(3) Histogram Matching: the method be input as two groups of statistics with histogram amounts, usually two groups of one-dimensional vector are regarded as, and then use the distance metric method (such as Euclidean distance, card side, histogram intersection, Bhattacharyya distance, land mobile distance etc.) of one-dimensional vector, similarity measurement is carried out to statistics with histogram amount.But the method designs histogrammic statistic unit, it is relatively stricter to be required by the representativeness of statistic, if above-mentioned 2 can not meet very well, then recognition effect will be greatly affected.

(4) AdaBoost algorithm: this algorithm is a kind of iterative algorithm, its core concept trains different sorters (Weak Classifier) for same training set, then these weak classifier set are got up, form a stronger final sorter (strong classifier).Its algorithm itself realizes by changing Data distribution8.The limitation one that the method uses is its training time, and for the high dimensional data of larger data amount, the method often needs the plenty of time to train; Two is choosing of Weak Classifier, often needs to carry out a large amount of test and just can search out optimum Weak Classifier.

In sum, for this application scenarios of multiple expression high-precision high-speed identification, existing feature extracting method existing characteristics representativeness is limited, precision and the high not deficiency of extraction rate; Meanwhile, also there is the limitations such as accuracy of identification is undesirable, complexity is too high, discernible expression categorical measure is limited, recognition speed is low in existing expression differentiating method.

Summary of the invention

The object of the invention is, for solving multiple facial expression high-precision high-speed identification problem, to propose a kind of human facial expression recognition method based on Haar-like feature.

Design concept of the present invention is: first use Haar-like characteristic sum series connection Face datection sorter to realize the Face datection of high accuracy; And then utilize AdaBoost.MH algorithm to carry out Feature Selection to higher-dimension Haar-like feature; Final utilization random forests algorithm carries out expression classifier training, to complete Expression Recognition.

Technical scheme of the present invention realizes as follows:

Step 1, in order to realize the automatic extraction of facial zone image, first using multiple facial zone images (as positive sample) and multiple non-face area images (as negative sample) to carry out off-line training, obtaining face recognition sorter.Face recognition sorter obtains by conventional training method multiple in prior art.The present invention adopts the AdaBoost Cascading Classifier training method based on Haar-like feature.

Step 2, on the basis of step 1, carries out the off-line training of facial expression sorter.Detailed process is as follows:

Step 2.1, first carry out expression mark to face-image training data, concrete grammar is: picture or the video (for expression video, extraction key frame is wherein as training image) of collecting various expression classification to be identified, form training plan image set A, the picture number wherein comprised is m; Then use continuous print integer numbers the class label as each pictures or key frame, forms expression class label collection Y={1,2 ..., k}, wherein k is expression classification number to be identified.

Step 2.2, carries out the extraction of facial zone data to the every width training image after marking through step 2.1, obtains the face-image cut out out.

The concrete grammar that facial zone data are extracted is: the integrogram first calculating each width image.Described integrogram is identical with original image size, and on it, the value of any point (x, y) is all pixel value sum in former figure corresponding point (x ', y ') and upper left side thereof:

ii (x, y) = \underset{x^{'} \leq x, y^{'} \leq y}{Σ} i (x^{'}, y^{'}),

In formula, ii (x, y) represents the value of point (x, y) on integrogram, and i (x ＇, y ＇) represents the pixel value of point on original image (x ', y ').

After calculating integrogram, according to the face recognition sorter that step 1 obtains, use sliding window method to extract the Haar-like feature of image in sliding window region, determine facial zone fast; Then by the image cutting-out of facial zone out, zoom to the size needed for Expression Recognition, keep original expression to mark, form training plan image set B.

Step 2.3, in order to train expression classifier, every width image in the training plan image set B formed step 2.2 carries out the second extraction of Haar-like feature, the concrete grammar of Haar-like feature extraction is: the integrogram calculating the image that each width is cut out out, calculates corresponding H tie up Haar-like eigenwert (wherein H is determined by the size of the Haar-like characteristic type adopted and picture) according to each integrogram.

Corresponding for every width image H is tieed up Haar-like proper vector and is denoted as a line, make whole H of all m width images tie up Haar-like proper vectors and form that m is capable, the eigenmatrix X of H row;

Step 2.4, uses AdaBoost.MH algorithm, carries out Feature Selection to the Haar-like eigenmatrix X that step 2.3 obtains; Described AdaBoost.MH algorithm takes turns computing by F, carries out iteration screening to single feature Weak Classifier, ties up Haar-like characteristic value collection select F to tie up principal character to complete Feature Selection from H, obtains that m is capable, the principal character matrix X ' of F row;

The Weak Classifier used in described iteration screening, need meet the following conditions: 1. the input of Weak Classifier is one-dimensional eigenwert (a certain specific dimension namely in proper vector); 2., for class label to be identified, the output of Weak Classifier is 1 or-1.

The detailed process utilizing AdaBoost.MH algorithm to carry out Feature Selection is:

Step 2.4.1, the weight that initialization every width image is corresponding, is denoted as D ₁(i, y _i)=1/ (mk), y _i∈ Y represents the expression class label of i-th image, i=1 ... m;

Step 2.4.2, starts f and takes turns iteration (f=1 ... F): successively using the input of each column data of eigenmatrix X as a Weak Classifier, carry out H computing, obtain r _{f, j}:

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [l_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... H, x _{i, j}represent the jth element (the jth eigenwert namely in proper vector corresponding to i-th training image) in i-th row of X; h _j(x _{i, j}, y _i) represent with x _{i, j}as the Weak Classifier of input, D _f(i, y _i) represent that f takes turns the weighted value of i-th training image in iteration,

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

After terminating H computing, get H the r that epicycle iteration obtains _{f, j}in maximal value, be denoted as r _f, and by r _fjth dimensional feature value x that is corresponding, that adopt X _jas the Weak Classifier h of input _j(x _j, Y), the Weak Classifier h filtered out is taken turns as f _f(x _j, Y), simultaneously by x _jnew feature space is joined as the feature dimensions filtered out;

Step 2.4.3, calculates the Weak Classifier h selected by step 2.4.2 _f(x _j, Y) weight α _f:

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

Step 2.4.4, calculates the weight D that f+1 takes turns each image in iteration _f+1;

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . m .

Wherein, h _f(x _{i, j}, y _i) represent that f takes turns that filter out in iteration, using the jth dimensional feature value of i-th image as input Weak Classifier, Z _fit is normalized factor

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 2.4.5, the new weight obtained by step 2.4.4 substitutes into step 2.4.2, according to the method iteration of step 2.4.2 to step 2.4.4, until filter out F to tie up principal character, and in eigenmatrix X, extract F row, forms the principal character matrix X ' that m is capable, F arranges;

Step 3, use the principal character matrix X ' obtained through the step 2 and expression class label collection Y marked through step 2.1, training generates Expression Recognition sorter, and the process of training follows random forests algorithm, and concrete grammar is:

Step 3.1, according to the decision tree number T in designing requirement and node intrinsic dimensionality u, generates T CART categorised decision tree.The record format of the root node of described decision tree is N (J), and the record format of intermediate node is N (V, J), and the record format of leafy node is (V, J, y _t).Wherein, J represents the disruptive features dimension of node N, and V represents the eigenwert of node N, y _trepresent the class label of node N.

The generation method of every CART categorised decision tree is:

Step 3.1.1, carrying out m time can put back to grab sample, each a line extracting principal character matrix X ', forms that new m is capable, the matrix X of F row ", for the growth that this CART categorised decision is set; X " in training sample label corresponding to each row feature form new expression class label collection Y ＂;

Step 3.1.2, from root node, carries out node split by node, finally completes the growth of whole tree.The fission process of each node is:

A) from eigenmatrix X ＂, Stochastic choice u arranges as the training data needed for this node split, wherein represent X " jth row (jth that namely F ties up in Haar-like feature space is tieed up);

B) the information gain IG of the x ＂ j selected is calculated respectively _j, obtain u IG _j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}), - - - (1)

Wherein, H (Y ＂) is expression class label collection Y " information entropy:

H (Y^{''}) = - Σ_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

Vw represents the value of w class label in Y ＂, is also V _w∈ 1,2 ..., k};

for expression class label collection Y ＂ based on conditional entropy:

H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s})

= - Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s})

V _srepresent x ＂ _js kind value in all different values of middle element; V _w/srepresent V _scorresponding expression class label; H (Y ＂ | x " _{j, s}=V _s) be V _sthe information entropy of corresponding expression class label set, x ＂ _jrepresent the element in the jth row of X ＂, h≤m, q≤k;

C) u information gain value IG obtaining of comparison step b _j, and make IG by X ＂ _jthe maximum row of value extract, and are denoted as x " _j, the columns J this be listed in X ' records simultaneously, and the disruptive features as this node is tieed up, during for identification;

D) x is added up " _jin the quantity c of all different eigenwerts, then set up c respectively with the child node of different eigenwert for node eigenwert V to current node, and using this child node as the root node of subtree, generate new subtree, the growing method of subtree is: by x " _jin all values equal row vector in the element place X ＂ of this eigenwert and propose, form new eigenmatrix X ^v, then institute's espressiove class label corresponding for proposed row vector is proposed, form new expression class label set Y ^vthen (X is used ^v, Y ^v) substitute (X ", Y "), recursively carries out the operation of step a-d, until meet following condition for the moment, terminates the growth of this subtree:

1. this node cannot continue to divide (i.e. X ^vline number be less than 2, or in each row, all eigenwerts are all equal), the expression class label set Y of correspondence ^vthe highest label of the middle frequency of occurrences is as the class label y of this node _tpreserve;

2. the Y under this node ^vin expression class label all identical, using the class label y of unique expression class label as this node _tpreserve.

Step 3.2, preserves all T CART categorised decision tree, forms final random forest Expression Recognition sorter.

Step 4, the random forest Expression Recognition sorter utilizing step 3 off-line training to obtain, carries out ONLINE RECOGNITION to still image to be measured or dynamic video;

1) to the recognition methods of still image be:

Step a, extracts the facial zone in still image to be identified;

Step b, on the basis of step a, according to the Haar-like feature extracting method described in step 2.2, and the principal character matrix X ' that step 2.4.5 obtains, the F extracted needed for identifying ties up Haar-like feature, forms the proper vector of facial expression image to be identified, is denoted as x; Use x _jthe J dimensional feature value of representation feature vector x;

Step c, T CART categorised decision tree in the random forest Expression Recognition sorter utilizing step 3 off-line training to obtain identifies the proper vector x of facial expression image to be identified respectively, the identification of every CART categorised decision tree is from root node, and detailed process is:

C.1. from sorter, obtain the disruptive features dimension J of current node, read the J dimensional feature value x of the proper vector x of facial expression image to be identified _j;

C.2. search in the child node of current node, select node eigenwert and x _jthe most close child node;

C.3. repeatedly recursively carry out operation c.1-c.2, until current node is leafy node, stop recurrence, and by the class label y of this leafy node _trecognition result as this CART categorised decision tree exports;

Steps d, to T the Output rusults y of T CART categorised decision tree in random forest Expression Recognition sorter _tadd up, class label the highest for the frequency of occurrences is exported as final recognition result.

2) to the recognition methods of dynamic video be:

Step e, decodes to video file, extracts every frame data, obtains image sequence to be identified;

Step f, on the basis of step e, the every width image treated in recognition image sequence carries out the extraction of facial zone data, obtains face-image sequence to be identified;

Step g, according to the Haar-like feature extracting method described in step 2.2, the F that the every breadth portion image extracting step 2.4.5 in the face-image sequence to be identified obtain step f filters out ties up Haar-like feature;

Step h, on the basis of step g, the random forest Expression Recognition sorter using step 3 off-line training to obtain, identifies the every width face-image in face-image sequence to be identified, obtains expression classification sequence; The identifying of described face-image is identical with step c, d;

Step I, the expression classification sequence obtained step h is smoothing, removes the burr judgement in the middle of recognition sequence, obtains final recognition result.

Beneficial effect

Compared to the method based on features such as the colour of skin, border, Gabor, wavelet transformations, the method for detecting human face that the present invention adopts has the advantages that recognition speed is fast, accuracy rate is high.

Compared with the method such as Gabor characteristic, Wavelet Transform Feature, Optical-flow Feature, the technology that the present invention adopts has higher accuracy rate and less calculating consumption, is not only applicable to desktop computer, is also applicable to the mobile computing platform such as mobile phone, panel computer.

With machine learning method and traditional Canonical Correlation Analysis such as AdaBoost, SVM, compare with the method for similarity measurement based on template matches, the present invention adopts the method for " Feature Selection+random forest " to realize the final identification of expression, there is the recognition accuracy of recognition speed and Geng Gao faster, and parallelization can be realized easily, to improve recognition efficiency further, to meet the demand of process and mobile computing in real time.

Accompanying drawing explanation

Fig. 1 is human facial expression recognition Method And Principle figure of the present invention;

Fig. 2 is the schematic diagram of embodiment septum reset image extraction method;

The 10 class Haar-like features that Fig. 3 uses for embodiment septum reset image extraction method;

The 5 class Haar-like features that Fig. 4 adopts for embodiment septum reset Expression Recognition;

Fig. 5 is the schematic diagram of " Feature Selection+random forest " method in embodiment;

Fig. 6 is in embodiment, when using CAS-PEAL-R1 expression storehouse to test, the performance test of the present invention and traditional AdaBoost.MH algorithm, (a) figure be traditional AdaBoost.MH to the recognition accuracy of all kinds of expression, (b) figure is that " Feature Selection+random forest " method proposed by the invention is to the recognition accuracy of all kinds of expression;

Fig. 7 is in embodiment, uses JAFFE to express one's feelings storehouse when testing, carry by the present invention " Feature Selection+random forest " method and AdaBoost.MH overall accuracy show.

Embodiment

In order to better objects and advantages of the present invention are described, be described in further details below in conjunction with the embodiment of drawings and Examples to the inventive method.

Respectively using to static images and dynamic video as input, design and dispose 3 tests: (1) expresses one's feelings the static images test in storehouse, (3) for the test of dynamic video for the express one's feelings static images test in storehouse, (2) of CAS-PEAL-R1 for JAFFE.

CAS-PEAL-R1 is the face database that Inst. of Computing Techn. Academia Sinica builds, and wherein face expression database comprises the front photograph of 379 people, and everyone records 5 kinds of expressions: frown, smile, close one's eyes, dehisce, surprised.JAFFE describes 6 class expressions of 10 Japanese womens: glad, sad, dejected, angry, surprised, detest.

In order to describe algorithm performance curve, need to test different parameter combinations (i.e. the value of u, T) performance impacts to static images identification, therefore in test (1), (2), 400 the random forest sorters trained under testing different node intrinsic dimensionality u respectively.In first round Feature Selection process, the value that the value of F is defined as 10000, u for step-length, rises to 4000 by 10 with 10; The value of T is defined as u ²upper round numbers, record the overall accuracy under every group (u, T) value.Tie up confusion matrix C for a certain N, its overall accuracy P is defined as:

p = \frac{Σ_{i = 1}^{N} c_{ii}}{Σ_{i = 1}^{N} Σ_{j = 1}^{N} c_{ij}} \times 100 % . - - - (2)

Analytical algorithm performance more objectively, in test (1), on algorithm performance curve, the overall accuracy of every bit all adopts 10 folding bracketing methods to obtain.For test (2), because expression storehouse picture sum is less, be not suitable for deployment 10 folding cross-beta, thus adopt the method for more strict opener test.

For test (3), use the video of camera shooting as input, the recognition result of every two field picture is presented on screen with identification is consuming time.

To be described one by one above-mentioned 3 testing processs below, all tests all complete on same computer, and concrete configuration is: Intel double-core CPU(dominant frequency 1.8G), 1G internal memory, WindowsXP SP3 operating system.

In above-mentioned 3 tests, identical face recognition sorter is used to carry out the automatic extraction of facial zone image.The concrete training flow process of face recognition sorter as shown in Figure 2, adopts the AdaBoost Cascading Classifier training method based on Haar-like feature, uses 10 class Haar-like features shown in Fig. 3 to train.

In addition, in above-mentioned 3 tests, identical Weak Classifier is used.The definition of Weak Classifier is:

h_{j} (x, y) = \{\begin{matrix} 1 & p_{j, y} x_{j} < p_{j, y} θ_{j, y} \\ - 1 & p_{j, y} x_{j} &GreaterEqual; p_{j, y} θ_{j, y} \end{matrix} - - - (3)

Wherein, x _jrepresent the input of Weak Classifier, θ _{j, y}the threshold value obtained after representing training, p _{j, y}the direction of the instruction sign of inequality.

1. for the static images test in CAS-PEAL-R1 expression storehouse

From 5 kinds of facial expression images in CAS-PEAL-R1 expression storehouse, respectively select 150 width as experimental data.In order to carry out 10 folding cross validations, for the valued combinations often organizing F, T, 750 width images being divided into 10 groups at random, carrying out 10 respectively and take turns the training of expression classifier and identify test.Respectively taking turns in test, using in 10 groups of images 1 group as test data, for assessing the accuracy rate of sorter; Remaining 9 groups of data as training data, for the off-line training of facial expression sorter.10 take turns after test completes, and test result once, gathers by the tested mistake of each image, generates confusion matrix, and according to formula (2) calculated population accuracy rate.

Above-mentioned 10 to take turns test process identical, and the idiographic flow of often taking turns test is:

Step 1, is loaded into face recognition sorter.

Step 2, carries out the off-line training of facial expression sorter.

Step 2.1, first carries out expression mark to face-image training data.Because the expression in CAS-PEAL-R1 expression storehouse is documented on filename, therefore the work of this step is that { 1,2,3,4,5} dumps in the middle of mark storehouse according to the direct corresponding integer of the mark key word on filename.

Step 2.2, in order to obtain the face-image cut out out, carries out the extraction of facial zone data to the every width training image after marking through step 2.1.The concrete grammar that facial zone data are extracted is: the integrogram first calculating entire image, and then according to the face recognition sorter that step 1 obtains, use sliding window method to extract the Haar-like feature of image in sliding window region, determine facial zone (increase coefficient of sliding window is 1.2) fast; Out by the image cutting-out of facial zone finally, zoom to the size (in the present embodiment, employing is of a size of 32 × 32pix) needed for Expression Recognition, keep original expression to mark, form training plan image set.

Step 2.3, in order to carry out Expression Recognition, to step 2.2 the face-image cut out out carry out the second extraction of Haar-like feature.

Fig. 4 illustrates the 5 class Haar-like features that the present embodiment uses, and this feature has following three features:

1. fast operation.Coordinate integrogram, the extraction of any size Haar-like feature only need perform digital independent and the plus and minus calculation of fixed number of times.The Haar-like feature comprising 2 rectangles only need read 6 points and carry out plus/minus computing from integrogram, and the feature of 3 rectangles only need read 8 points, and the feature of 4 rectangles only need read 9 points.

2. distinction is strong.The dimension of Haar-like feature space is very high, and 5 category features used for the present embodiment, the image of 32 × 32, total dimension of 5 category features has exceeded 510,000, and concrete quantity is as shown in table 1.

The quantity of table 1 32 × 32 image 5 class Haar-like features

This dimension, considerably beyond the pixel count of picture itself, also far away higher than the dimension of the traditional characteristics such as Gabor, facial expression feature point patterns, therefore has higher differentiation potentiality.

The concrete grammar extracted is: the integrogram calculating the face-image that each width is cut out out, calculates 510112 all dimension Haar-like eigenwerts, obtain Haar-like characteristic value collection according to integrogram.

Calculate the integrogram of the image that each width is cut out out, 510112 corresponding dimension Haar-like eigenwerts are calculated according to each integrogram, corresponding for every width image 510112 dimension Haar-like proper vectors are denoted as a line, make whole Haar-like proper vectors of all 675 width images form the eigenmatrix X of 675 row, 510112 row.

In the following description, y is used _i∈ Y represents the expression class label of i-th image, x _ii-th row (510112 dimension Haar-like proper vectors namely corresponding to i-th training image) of representation feature matrix X, x _jrepresent the jth row (the jth dimensions namely in 510112 dimension Haar-like feature spaces) of X, x _{i, j}represent the jth element (the jth eigenwert namely in proper vector corresponding to i-th training image) in i-th row of X;

Step 2.4, uses AdaBoost.MH algorithm, carries out Feature Selection to the Haar-like eigenmatrix X that step 2.3 obtains; Described AdaBoost.MH algorithm takes turns computing by 10000, iteration screening is carried out to single feature Weak Classifier, tieing up Haar-like characteristic value collection from H selects 10000 dimension principal characters to complete Feature Selection, obtains the principal character matrix X ' of 675 row, 10000 row;

The Weak Classifier used in above-mentioned interative computation, need meet the following conditions: 1. the input of Weak Classifier is one-dimensional eigenwert (a certain specific dimension namely in proper vector); 2., for class label l to be identified, the output of Weak Classifier is 1 or-1.

Step 2.4.1, the weight that initialization every width image is corresponding, is denoted as D ₁(i, y _i)=1/ (675 × 5).

Step 2.4.2, starts epicycle iteration (in following explanation, using f to represent the wheel number of iteration), successively using the input of each column data of eigenmatrix X as a Weak Classifier, carries out 510112 computings, calculate r according to the following formula _{f, j}value:

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [l_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... 510112, x _{i, j}represent the jth element in i-th row of X; h _j(x _{i, j}, y _i) represent with x _{i, j}as the Weak Classifier of input, D _f(i, y _i) represent the weighted value of i-th training image in epicycle iteration (i.e. f wheel iteration),

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

After above-mentioned 510112 computings terminate, compare 510112 r that epicycle iterative computation goes out _{f, j}value, get its maximal value, be denoted as r _f, and then find and make r _{f, j}reach maximum occurrences r _f, adopt jth dimensional feature value as the Weak Classifier h of input _j(x _j, Y), the Weak Classifier that it can be used as epicycle to filter out (in order to following description is convenient, is denoted as h _f(x _j, Y)), the jth dimensional feature x simultaneously this Weak Classifier adopted _jnew feature space is joined as the feature dimensions filtered out;

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

Step 2.4.4, calculates the weight D of each image in next round iteration _f+1;

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . 675 .

Wherein, h _f(x _i, l) represent that the jth dimensional feature value extracted in i-th image in f wheel iteration is as the Weak Classifier inputted, Z _fit is normalized factor

Z_{f} = \underset{il}{Σ} D_{f} (i, y_{i}) \exp (- α_{f} K_{i} [y_{i}] h_{f} (x_{i}, l)) i = 1 . . . 675 .

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp (- α_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 2.4.5, the new weight obtained by step 2.4.4 substitutes into step 2.4.2, take turns according to the method iteration 10000 of step 2.4.2 to step 2.4.4, thus filter out 10000 dimension principal characters, form new feature space, that is: from eigenmatrix X, extract by wheel the characteristic series filtered out, form the principal character matrix X ' of 675 row, 10000 row;

Step 3, use the principal character matrix X ' obtained through the step 2 and expression class label collection Y={1 marked through step 2.1,2,3,4,5}, training generates Expression Recognition sorter, and the process of training follows random forests algorithm, and concrete grammar is:

The generation method of every CART categorised decision tree is:

Step 3.1.1, carrying out 675 times can put back to grab sample, each a line extracting principal character matrix X ', forms the matrix X of new 675 row, 10000 row ", be specifically designed to the growth that this CART categorised decision is set; X " in training sample label corresponding to each row feature form new expression class label collection Y ";

A) from eigenmatrix X, Stochastic choice u arranges as the training data needed for this node split, wherein represent X " jth row (jth that namely F ties up in Haar-like feature space is tieed up);

B) x selected is calculated respectively " _jinformation gain IG _j, obtain u IG _j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}), - - - (1)

Wherein, H (Y ") is expression class label collection Y " information entropy:

H (Y^{''}) = - Σ_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

V _wrepresent Y " in the value of w class label, be also V _w∈ 1,2 ..., k};

for expression class label collection Y ＂ based on conditional entropy:

H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s})

= - Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s})

V _srepresent x " _js kind value in all different values of middle element; V _w/srepresent V _scorresponding expression class label; H (Y " | x " _{j, s}=V _s) be V _sthe information entropy of corresponding expression class label set, x ＂ _jrepresent the element in the jth row of X ＂, h≤m, q≤k;

D) x is added up " _jin the quantity c of all different eigenwerts, then set up c respectively with the child node of different eigenwert for node eigenwert V to current node, and using this child node as the root node of subtree, generate new subtree, the growing method of subtree is: by x " _jin all values equal row vector in the element place X ＂ of this eigenwert and propose, form new eigenmatrix X ^v, then institute's espressiove class label corresponding for proposed row vector is proposed, form new expression class label set Y ^v; Then use (Xv, Yv) substitute (X ", Y "), recursively carries out the operation of step a-d, until meet following condition for the moment, terminates the growth of this subtree:

Step 3.2, by all T the unified preservation of CART categorised decision tree, forms final random forest Expression Recognition sorter.

Step 4, uses the Expression Recognition sorter that step 3 trains, and carry out Expression Recognition respectively to every width test picture, record recognition result is consuming time with identification.The concrete grammar of every width test picture recognition is:

Step a, extracts the facial zone in still image to be identified;

Step b, on the basis of step a, according to the Haar-like feature extracting method described in step 2.2, and the Feature Selection result of step 2.4.5, extract 10000 dimension Haar-like features needed for identifying, form the proper vector of facial expression image to be identified, be denoted as x; Use x _jthe J dimensional feature value of representation feature vector x;

Step c, use the proper vector x that step b extracts, the random forest Expression Recognition sorter utilizing step 3 off-line training to obtain, T CART categorised decision tree in sorter is used to identify the proper vector x of facial expression image to be identified respectively, the identification of every CART categorised decision tree is from root node, and detailed process is:

C.2. search in the child node of current node, select node value v and x _jthe most close child node child ^v;

C.3. repeatedly recursively carry out operation c.1-c.2, until current node is leaf node, stops recurrence, and the class label of this leaf node is made y _tfor the recognition result of this CART categorised decision tree exports;

Step e, takes turns after test terminates until 10, gathers comparison, obtain confusion matrix to the recognition result of all 750 pictures, and according to formula (2), calculated population accuracy rate.

2. for the static images test in JAFFE expression storehouse

JAFFE describes 6 class expressions of 10 Japanese womens: glad, sad, dejected, angry, surprised, detest.Due to expression storehouse picture sum less (amounting to 216 width images), be not suitable for deployment 10 folding cross-beta, thus adopt the method for more strict opener test, extract 10 width as test set in every class expression, all the other 26 width are as training set.

Idiographic flow is similar with test 1, and difference is with it: the value of (1) expression classification number k is 6, and the value of (2) training image quantity m is 156.

3. for the test of dynamic video

In order to test the recognition performance of the present invention to dynamic video, using the video of camera shooting as input, the recognition result of every two field picture is presented on screen with identification is consuming time.Be chosen at the expression classifier parameter that recognition performance in test 1 is best, set it to: T=60, u=3550(and sorter have 60 CART categorised decision trees; Each node growth of each tree is randomly drawed 3550 dimensions and is carried out from 10000 dimensional features).Concrete steps are:

Step 1: the video data obtained USB camera, extracts every frame data, obtains image sequence to be identified.

Step 2: on the basis of step 1, the every width image treated in recognition image sequence carries out the extraction of facial zone data, obtains face-image sequence to be identified.

Step 3: the every breadth portion image zooming-out F in the face-image sequence to be identified obtain step 2 ties up Haar-like feature (in this example, F=10000).

Step 4: on the basis of step 3, use the random forest expression classifier (T=60 that in test 1, step 3 off-line training obtains, u=3550) the every width face-image in face-image sequence to be identified is identified, obtain expression classification sequence, and by the recognition result of every frame with identify output consuming time over the display.The identifying of every width face-image is identical with step 4 in test 1.

Test result

For test 1, in order to contrast institute's extracting method of the present invention and the difference of AdaBoost.MH method in Expression Recognition accuracy rate, use traditional AdaBoost.MH method to carry out the experiment similar with test 1, " Feature Selection+random forest " method carried with the present invention compares.Two kinds of methods to the recognition accuracy of all kinds of expression as shown in Figure 6.As can clearly see from the figure: AdaBoost.MH is the highest to the discrimination of closing one's eyes, to the discrimination (all not more than 75%) on the low side of dehiscing, surprised two classes are expressed one's feelings.Correspondingly, as u > 900, " Feature Selection+random forest " method has all exceeded 90% to the recognition accuracy that 5 classes are expressed one's feelings.

In recognition speed, table 3 have recorded the on average consuming time of each link when carrying out Expression Recognition to 750 width pictures in test 1.Visible, " Feature Selection+random forest " method identification is consuming time is 5.2ms, and add the time overhead of recognition of face, recognition speed can reach 27.62 frames/second.

The identification of table 3 " Feature Selection+random forest " method is consuming time

For test 2, the traditional AdaBoost.MH method of same use carries out the experiment similar with test 2, and " Feature Selection+random forest " method carried with the present invention compares.Fig. 7 illustrates the performance of two kinds of method overall accuracy, visible, and the accuracy rate of method of the present invention is significantly higher than AdaBoost.MH method.

In test 3, this method has very high recognition accuracy; Meanwhile, the Expression Recognition of single frames is consuming time all at about 5ms.

The experimental result of above-mentioned 3 tests shows, the present invention has high, the fireballing feature of accuracy rate.The 10 folding cross validation results displays of expressing one's feelings at CAS-PEAL-R1 on storehouse, overall recognition accuracy reaches 94.7%; Express one's feelings in the opener test in storehouse at JAFFE, also obtain the recognition accuracy of 91.2%; In recognition speed, often opening the average identification of face consuming time is 5.2ms, can meet the demand of Real time identification.

Claims

1. multi-class facial expression high precision recognition methods, is characterized in that: comprise the steps:

Step 1, uses multiple facial zone images to carry out off-line training as positive sample, multiple non-face area images as negative sample, obtains face recognition sorter;

Step 2, on the basis of step 1, carries out the off-line training of facial expression sorter; Detailed process is as follows:

Step 2.1, carry out expression mark to face-image training data, concrete grammar is: collect the various picture of expression classification to be identified or the key frame of video, and form training plan image set A, the picture number wherein comprised is m; Use continuous print integer numbers the class label as each pictures or key frame, forms expression class label collection Y={1,2 ..., k}, wherein k is expression classification number to be identified;

Step 2.2, carries out the extraction of facial zone data to the every width training image after marking through step 2.1, obtains the face-image cut out out, forms training plan image set B;

Step 2.3, in order to train expression classifier, every width image in the training plan image set B formed step 2.2 carries out the second extraction of Haar-like feature, the concrete grammar of Haar-like feature extraction is: the integrogram calculating the image that each width is cut out out, calculates corresponding H tie up Haar-like eigenwert according to each integrogram;

Step 2.4, uses AdaBoost.MH algorithm, carries out feature extraction to the Haar-like eigenmatrix X that step 2.3 obtains; The detailed process of Haar-like feature extracting method is:

Step 2.4.2, starts f and takes turns iteration, f=1 ... F: successively using the input of each column data of eigenmatrix X as a Weak Classifier, carry out H computing, obtain r _f,j:

r_{f, j} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) K [y_{i}] h_{j} (x_{i, j}, y_{i}),

Wherein, j=1 ... H, x _i,jrepresent the jth element in i-th row of X; h _j(x _i,j, y _i) represent with x _i,jas the Weak Classifier of input, D _f(i, y _i) represent that f takes turns the weighted value of i-th training image in iteration,

K [y_{i}] = \{\begin{matrix} + 1 & y_{i} &Element; Y = {1,2, . . ., k} \\ - 1 & y_{i} &NotElement; Y = {1,2, . . ., k} \end{matrix};

After terminating H computing, get H the r that epicycle iteration obtains _f,jin maximal value, be denoted as r _f, and by r _fjth dimensional feature value x that is corresponding, that adopt X _jas the Weak Classifier h of input _j(x _j, Y), the Weak Classifier h filtered out is taken turns as f _f(x _j, Y), simultaneously by x _jnew feature space is joined as the feature dimensions filtered out;

α_{f} = \frac{1}{2} \ln (\frac{1 + r_{f}}{1 - r_{f}});

D_{f + 1} = \frac{D_{f} (i, y_{i}) \exp ({- α}_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}))}{Z_{f}}, i = 1 . . . m .

Wherein, h _f(x _i,j, y _i) represent that f takes turns that filter out in iteration, using the jth dimensional feature value of i-th image as input Weak Classifier, Z _fit is normalized factor

Z_{f} = Σ_{i = 1}^{m} D_{f} (i, y_{i}) \exp ({- α}_{f} K [y_{i}] h_{f} (x_{i, j}, y_{i}));

Step 2.4.5, the new weight obtained by step 2.4.4 substitutes into step 2.4.2, according to the method iteration of step 2.4.2 to step 2.4.4, until filter out F to tie up Haar-like feature, and in eigenmatrix X, extract F row, forms the principal character matrix X ' that m is capable, F arranges;

Step 3, use the principal character matrix X ' obtained through step 2.4.5 and the expression class label collection Y marked through step 2.1, training generates Expression Recognition sorter, and the process of training follows random forests algorithm, and concrete grammar is:

Step 3.1, according to the decision tree number T in designing requirement and node intrinsic dimensionality u, generates T CART categorised decision tree; The record format of the root node of described decision tree is N (J), and the record format of intermediate node is N (V, J), and the record format of leafy node is (V, J, y _t); Wherein, J represents the disruptive features dimension of node N, and V represents the eigenwert of node N, y _trepresent the class label of node N;

The generation method of every CART categorised decision tree is:

Step 3.1.1, carrying out m time can put back to grab sample, each a line extracting principal character matrix X ', forms that new m is capable, the matrix X of F row ", for the growth that this CART categorised decision is set; X " in training sample label corresponding to each row feature form new expression class label collection Y ";

Step 3.1.2, from root node, carries out node split by node, finally completes the growth of whole tree; The fission process of each node is:

A) from matrix X " Stochastic choice u arrange as the training data needed for this node split, wherein represent X " jth row;

B) calculate respectively and to select information gain IG _j, obtain u IG _j;

{IG}_{j} = IG (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = H (Y^{''}) - H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}),

Wherein, H (Y ") is expression class label collection Y " information entropy:

H (Y^{''}) = {- Σ}_{w = 1}^{k} p ({y^{''}}_{w} = V_{w}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w}),

V _wrepresent Y " in the value of w class label, V _w∈ 1,2 ..., k};

for expression class label collection Y " based on conditional entropy:

\begin{matrix} H (Y^{''} | {\overset{&OverBar;}{x}}_{j}^{''}) = Σ_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot H (Y^{''} | {x^{''}}_{j} = V_{s}) \\ = {- Σ}_{s = 1}^{h} p ({x^{''}}_{j} = V_{s}) \cdot Σ_{w = 1}^{q} p ({y^{''}}_{w} = V_{w | s}) \cdot \log_{2} p ({y^{''}}_{w} = V_{w | s}) \end{matrix}

V _srepresent x " _js kind value in all different values of middle element; V _w|srepresent V _scorresponding expression class label; H (Y " | x " _j=V _s) be V _sthe information entropy of corresponding expression class label set, x " _jrepresent X " jth row in element, h≤m, q≤k;

C) u information gain value IG obtaining of comparison step b _j, and by X " in make IG _jthe maximum row of value extract, and are denoted as x " _j, the columns J this be listed in X ' records simultaneously, and the disruptive features as this node is tieed up;

D) x is added up " _jin the quantity c of all different eigenwerts, then set up c respectively with the child node of different eigenwert for node eigenwert V to current node, and using this child node as the root node of subtree, generate new subtree, the growing method of subtree is: by x " _jin all values equal the element place X of this eigenwert " in row vector propose, form new eigenmatrix X ^v, then institute's espressiove class label corresponding for proposed row vector is proposed, form new expression class label set Y ^v; Then (X is used ^v, Y ^v) substitute (X ", Y "), recursively carries out the operation of step a-d, until meet following condition for the moment, terminates the growth of this subtree:

1. X ^vline number be less than 2, or in each row all eigenwerts all equal cause this node cannot continue division time, the expression class label set Y of correspondence ^vthe highest label of the middle frequency of occurrences is as the class label y of this node _tpreserve;

2. the Y under this node ^vin expression class label all identical time, using the class label y of unique expression class label as this node _tpreserve;

Step 3.2, preserves all T CART categorised decision tree, forms final random forest Expression Recognition sorter;

1) to the recognition methods of still image be:

Step a, extracts the facial zone in still image to be identified;

Step b, on the basis of step a, according to Haar-like feature extracting method, and the principal character matrix X ' that step 2.4.5 obtains, the F extracted needed for identifying tie up Haar-like feature, form the proper vector of facial expression image to be identified, are denoted as x; Use x _jthe J dimensional feature value of representation feature vector x;

Steps d, to T the Output rusults y of T CART categorised decision tree in random forest Expression Recognition sorter _tadd up, class label the highest for the frequency of occurrences is exported as final recognition result;

2) to the recognition methods of dynamic video be:

Step g, according to the Haar-like feature extracting method described in step 2.4, the F that the every breadth portion image extracting step 2.4.5 in the face-image sequence to be identified obtain step f filters out ties up Haar-like feature;

2. multi-class facial expression high precision according to claim 1 recognition methods, it is characterized in that: in Feature Selection method described in step 2.4, the input of the Weak Classifier used in interative computation is the one-dimensional eigenwert in proper vector, simultaneously, for expression class label y to be identified, the output of Weak Classifier is 1 or-1.

3. multi-class facial expression high precision according to claim 1 recognition methods, is characterized in that: the concrete grammar that facial zone data described in step 2.2 are extracted is: the integrogram first calculating each width image; Described integrogram is identical with original image size, and on it, the value of any point (x, y) is all pixel value sum in former figure corresponding point (x ', y ') and upper left side thereof:

ii (x, y) = \underset{x^{'} \leq x, y^{'} \leq y}{Σ} i (x^{'}, y^{'}),

In formula, ii (x, y) represents the value of point (x, y) on integrogram, and i (x', y') represents the pixel value of point on original image (x ', y ');

After calculating integrogram, according to the face recognition sorter that step 1 obtains, use sliding window method to extract the Haar-like feature of image in sliding window region, determine facial zone fast; Then by the image cutting-out of facial zone out, zoom to the size needed for Expression Recognition, keep original expression to mark.

4. multi-class facial expression high precision according to claim 1 recognition methods, is characterized in that: the value of H described in step 2.3 is determined by the Haar-like characteristic type adopted and dimension of picture.

5. multi-class facial expression high precision according to claim 1 recognition methods, is characterized in that: face recognition sorter described in step 1 adopts the AdaBoost Cascading Classifier training method based on Haar-like feature to obtain.