CN105139004A

CN105139004A - Face expression identification method based on video sequences

Info

Publication number: CN105139004A
Application number: CN201510612526.XA
Authority: CN
Inventors: 于明; 郭迎春; 师硕; 于洋; 刘依; 阎刚; 邓玉娟
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2015-09-23
Filing date: 2015-09-23
Publication date: 2015-12-09
Anticipated expiration: 2035-09-23
Also published as: CN105139004B

Abstract

The invention provides a face expression identification method based on video sequences and relates to a method used for identifying graphs. With this method, dynamic space-time texture characteristics of face expression sequences are extracted by use of the HCBP-TOP algorithm. The method comprises steps of preprocessing face expression sequences; performing image layering and partitioning processing for face expression sequences with the space pyramid partition method; extracting dynamic space-time texture characteristics of face expression sequences by use of the HCBP-TOP algorithm; and using an SVM classifier to train and predict face expressions. According to the invention, defects in the prior art are overcome that central pixels are not taken into consideration; local detain information is neglected; identification efficiency and precision of face expressions are quite low; and the traditional method is not widely applicable.

Description

Based on the facial expression recognizing method of video sequence

Technical field

Technical scheme of the present invention relates to the method for identifying figure, specifically based on the facial expression recognizing method of video sequence.

Background technology

Expression is the most effective mode during human emotion exchanges, expression recognition system had important application in the field relating to vision system and pattern-recognition in recent years, as psychological study, video conference, affection computation, intelligent human-machine interaction and medical industry.Along with human-computer interaction technology promotes all sidedly, how research allows perception human expressions in computer system more capable ground be the focus of current artificial intelligence, and research and development expression recognition system is significant.

Early stage facial expression recognizing method concentrates on the human face expression in research still image, but the subject matter that still image exists have ignored time-domain information.Human face expression is as a dynamic change procedure, and its time-domain information plays very important effect.Facial expression recognizing method based on video sequence can reflect the change of human face expression itself better, thus improves accuracy and the robustness of expression recognition.Therefore, the expression recognition studied based on video sequence has important significance of scientific research.

The existing facial expression recognizing method based on video sequence has: the Bartlett of University of California equals within 1999, to utilize multiple dimensioned multidirectional Gabor filter to extract facial image feature, then support vector machine (supportvectormachines is utilized, SVM) carry out tagsort, thus identify different human face expressions.But Gabor characteristic has, and computation complexity is large, dimension is high and be subject to the shortcoming of illumination interference; The Wang Yubo of Tsing-Hua University equals the Haar-like feature that 2003 extract facial image, then utilizes algorithm based on continuous Adaboost to facial expression classification.Some advantages that Haar-like geometric properties has intuitively, dimension is low and descriptive power is strong, but the method edge characteristic sum line features is comparatively responsive, and feature extraction precision is not high, in addition, when the background environment of image or video is complicated, Adaboost sorter can produce higher mistake identification.The Liao of North Carolina University utilized in 2009 the local binary patterns (DominantLBP that is dominant, DLBP) and Gabor methods combining extract feature, selecting of principal character has been carried out for LBP algorithm, make computing more quick, and obtain good effect in conjunction with carrying out Texture classification after DLBP and Gabor method extraction feature, but the method exists the shortcoming of two aspects: on the one hand, LBP does not consider the impact of center pixel in image texture characteristic expression; On the other hand, the method does not fully take into account the effect of time-domain information, lost a part of information and causes discrimination undesirable.This defect of center pixel is not considered for LBP, the solution that the centralization binary pattern (hereinafter referred to as CBP) proposed on LBP basis is in recent years this problem provides thinking, proposes again MCBP (the being called for short MCBP-IMED) method of multiple dimensioned CBP (being called for short MCBP) and embedded images Euclidean distance (IMED) on this basis.LBP is owing to having the advantage of gray scale unchangeability and rotational invariance, therefore be widely used in Expression Recognition field, but its defect is difficult to obtain larger support area, space, and do not have a robustness to the change at direction of illumination and visual angle, in Texture classification, performance is also not fully up to expectations.

Summary of the invention

Technical matters to be solved by this invention is: provide the facial expression recognizing method based on video sequence, that one utilizes three-dimensional orthogonal Haar-like center binary pattern (Haar-likeCentralizedBinaryPatternsfromThreeOrthogonalPan els, hereinafter referred to as HCBP-TOP) extract the facial expression recognizing method of the dynamic space-time textural characteristics of human face expression sequence, overcome prior art and do not consider center pixel, ignore local detail information, the efficiency of expression recognition and accuracy of identification are all lower, not there is the defect of general applicability.

The present invention solves this technical problem adopted technical scheme: based on the facial expression recognizing method of video sequence, be a kind of facial expression recognizing method utilizing HCBP-TOP algorithm to extract the dynamic space-time textural characteristics of human face expression sequence, concrete steps are as follows:

The first step, the pre-service of human face expression sequence image:

(1) human face expression sequence image cutting:

By the human face expression sequence image that reads from existing human face expression video sequence data storehouse by RBG spatial transformation to gray space, the formula (1) of employing is as follows:

Gray＝0.299R+0.587G+0.114B(1)，

Wherein, Gray is gray-scale value, and scope is generally from 0 to 255, and R is red component, and G is green component, and B is blue component,

According to the geometric model of the characteristic sum face of face face " three five, front yards ", cutting is carried out to the human face expression sequence image being transformed into gray space, if the horizontal range between eyes is d, with the mid point of two lines for reference point, upwards distance reference point 0.55d place is decided to be coboundary, and distance reference point 1.45d place is decided to be lower boundary downwards, and distance 0.9d place is decided to be left margin left, distance 0.9d place is decided to be right margin to the right, completes the cutting of human face expression sequence image thus;

(2) human face expression sequence image convergent-divergent:

Bicubic interpolation algorithm is adopted to change the yardstick of image to the human face expression sequence after above-mentioned (1) cutting, realize dimension normalization, to carry out human face expression sequence image convergent-divergent, the facial image size after human face expression sequence convergent-divergent is 64 × 64 pixels;

(3) human face expression sequence image gray balance:

Adopt histogram equalization to process to above-mentioned (2) the human face expression sequence image obtained and carry out gray proces, so far complete the pre-service of human face expression sequence image;

Second step, adopts spatial pyramid partitioning scheme to the process of human face expression sequence image hierarchical block:

Namely spatial pyramid segmentation is spatially progressively segmented image, here the spatial pyramid partitioning scheme adopted is the division carrying out the index multiple of 2 in the level and vertical two standard coordinate directions of human face expression sequence image, if pyramidal total number of plies is L+1, the number of plies i of each layer of pyramid is respectively 0,1,2 ... L, the dividing method that pyramid is i-th layer is: at level standard coordinate direction, human face expression sequence image is divided into 2 ⁱblock, is divided into 2 at vertical standard coordinate steering handle human face expression sequence image ⁱblock, human face expression sequence image is divided into 2 the most at last ⁱ× 2 ⁱblock, sets above-mentioned L=2 here, and namely utilize spatial pyramid partitioning scheme that human face expression sequence image pretreated in the first step is divided into 2+1 layer, level 0 is former human face expression sequence image, and i-th layer is divided into 2 former human face expression sequence image ⁱ× 2 ⁱsub-block, i=1,2;

3rd step, utilizes HCBP-TOP algorithm to extract the dynamic space-time textural characteristics of human face expression sequence image:

HCBP-TOP algorithm is utilized human face expression sequence image to be carried out to the extraction of the dynamic space-time territory textural characteristics of hierarchical block, described " hierarchical block " is the human face expression sequence image hierarchical block obtained in above-mentioned second step, after second step spatial pyramid partitioning scheme is to the process of human face expression sequence image hierarchical block, each sub-block X-axis is got further with HCBP-TOP algorithm, Y-axis, the proper vector of time T axle three dimensions, and global feature vector in combined and spliced one-tenth block, again the proper vector of all pieces in entire image is integrated, form the characteristic in certain layering of piece image, finally the dynamic space-time textural characteristics histogram of pyramidal each layer human face expression sequence image is connected into the dynamic space-time textural characteristics histogram of whole human face expression sequence image according to weight allocation, concrete grammar is as follows:

(1) the HCBP feature of image subblock is extracted:

In each layer, utilize HCBP-TOP algorithm to the HCBP feature sub-block of correspondence position in each sequence being extracted image subblock, this sub-block is carried out to the statistics with histogram of the dynamic space-time textural characteristics of human face expression sequence image, then the dynamic space-time textural characteristics histogram of the human face expression sequence image of all sub-blocks is connected into the dynamic space-time textural characteristics histogram of the human face expression sequence image of each layer in whole human face expression sequence, namely human face expression sequence is obtained at the deformation data of X-Y plane and the movable information in X-T plane and Y-T plane, leaching process is calculated by following HCBP coding:

8 groups of HCBP encoding model M ₁-M ₈as shown in formula (2):

M_{1} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], M_{2} = [\begin{matrix} 1 & 1 & 1 & 1 & 1 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], M_{3} = [\begin{matrix} 0 & 0 & 1 & 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}],

M_{4} = [\begin{matrix} 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \end{matrix}], M_{5} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 1 & 1 & 1 \end{matrix}], M_{6} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 1 & 1 & 1 & 1 & 1 \end{matrix}],

M_{7} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & 1 & 1 & 0 & 0 \end{matrix}], M_{8} = [\begin{matrix} 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \end{matrix}] - - - (2),

In above formula, in every group model, outermost layer 5 pixels are set weight is 1, and it is-1 that adjacent second outer 5 pixels are set weights, and it is 0 that other parts are set weight, center point P ₀be used for recording the texture variations storing Haar-like type feature, with P in each plane of X-Y, X-T, Y-T ₀centered by constitute one as Suo Shi formula (3) 5 × 5 wicket, around have 24 neighborhood point P _i(i=1,2 ..., 24),

W (x, y) = [\begin{matrix} P_{9} & P_{10} & P_{11} & P_{12} & P_{13} \\ P_{24} & P_{1} & P_{2} & p_{3} & P_{14} \\ p_{23} & p_{8} & p_{0} & p_{4} & P_{15} \\ P_{22} & P_{7} & P_{6} & P_{5} & P_{16} \\ P_{21} & P_{20} & P_{19} & P_{18} & P_{17} \end{matrix}] - - - (3),

As seen from the above description, there is fenestella W in its surrounding in any one pixel I (x, y, t) in image sequence I _j(x, y), j=0,1,2, W ₀(x, y) represents the wicket that this pixel is corresponding on an x-y plane, W ₁(x, y) represents the wicket that this pixel is corresponding in X-Z plane, W ₂(x, y) represents the wicket that this pixel is corresponding in Y-Z plane, W _jthe HCBP value f of (x, y) _jthe computing formula of (x, y, t) as shown in the formula:

f_{j} (x, y, t) = H C B P (I (x, y, t)) = Σ_{k = 1}^{8} B (a_{j, k}) \times 2^{8 - k} - - - (4),

Wherein,

B (a_{j, k}) = \{\begin{matrix} 1, & a_{j, k} &GreaterEqual; T_{j, k} \\ 0, & a_{j, k} < T_{j, k} \end{matrix} - - - (5),

T_{j, k} = Σ_{k = 1}^{8} 5 (I (x, y, t) - (C_{k} \cdot W_{j} (x, y) + I (x, y, t)) / 11) - - - (6),

a _j,k＝M _k·W _j(x,y)(7)，

M _kfor encoding model, C _kfor the pixel summation model that weights in each encoding model are non-vanishing, C _k=| M _k|, the threshold value comparison function that B (x) is HCBP value, T _j,kfor threshold value, a _j,kfor fenestella W _j(x, y) and encoding model M _kthe decimal number obtained after convolution;

Model M is utilized to each image subblock ₁-M ₈scan image-region pixel, model window size is 5 × 5 pixels, the corresponding threshold value of each model, and the computing method of threshold value are, first calculate 10 non-vanishing pixels of weights and center pixel in each model totally 11 pixels with, i.e. C _kw _j(x, y), then the mean value of these 11 pixels is obtained, 5 times of difference of pixel and mean value centered by the threshold value of each model, then formula (4) is utilized to calculate the dynamic space-time textural characteristics of human face expression sequence, weights be 5 pixel values of 1 and deduct weights be 5 pixel values of-1 and, i.e. the change information of inboard, lateral, obtains a decimal number a _j,k, by a _j,kwith threshold value T _j,krelatively, if a _j,kbe greater than threshold value then HCBP-TOP code value be 1, otherwise count 0;

(2) the dynamic space-time texture HCBP-TOP feature of human face expression sequence image is extracted:

HCBP-TOP algorithm is utilized to extract the dynamic space-time textural characteristics of human face expression sequence image to the hierarchical pyramid sub-block that second step generates, according to the dynamic space-time texture HCBP-TOP feature extracting the human face expression sequence image in the every block of HCBP feature calculation of image subblock in above-mentioned (1), each sub-block X, Y, T tri-dimensions at XY, XT, HCBP combination of eigenvectors on YT direction is spliced into global feature vector HCBP-TOP in block, again the proper vector of all pieces in entire image is integrated, form the characteristic in certain layering of piece image, the feature histogram of certain layer is obtained according to histogram functions, the different effect played when Images Classification with each tomographic image block of spatial information due to tomographic image each during Images Classification is different, the dynamic space-time textural characteristics histogram of pyramidal each layer human face expression sequence image is connected into the dynamic space-time textural characteristics histogram of whole human face expression sequence image according to weight allocation, the dynamic space-time textural characteristics histogram weight of the human face expression sequence image of the layer of weight allocation principle belonging to the sub-block of large scale is little, and the dynamic space-time textural characteristics histogram weight of the human face expression sequence image of layer belonging to the sub-block of small scale is large, the histogrammic weight of dynamic space-time textural characteristics of the human face expression sequence image of definition i-th layer, pyramid is i+1, the weight that then the former figure feature histogram of level 0 distributes is 1, the weight that ground floor feature histogram distributes is 2, along with the number of plies is larger, the weight of this layer of feature histogram distribution is larger, namely the proportion of this layer of characteristic information in total characteristic represents is larger, then the dynamic space-time textural characteristics histogram of each layer human face expression sequence image is merged the dynamic space-time textural characteristics histogram of the human face expression sequence image being connected into general image according to the set of weights divided, wherein the sub-block number of i-th layer is 2 ⁱ× 2 ⁱthe dynamic space-time textural characteristics span of the human face expression sequence image of each sub-block is 0 to 255, and represent through the dynamic space-time textural characteristics that standardization processing obtains final human face expression sequence image, by training classifier in the dynamic space-time textural characteristics data input SVM of human face expression sequence image that extracts,

4th step, adopts SVM classifier to carry out training and the prediction of human face expression:

The dynamic space-time textural characteristics data input SVM training classifier of the human face expression sequence image the 3rd step extracted carries out training and the prediction of human face expression, judge which class human face expression the dynamic space-time textural characteristics of the human face expression sequence image extracted belongs to actually, adopt leaving-one method, the average result getting experiment is Expression Recognition rate, and concrete operations flow process is as follows:

(1) the dynamic space-time textural characteristics data input SVM training classifier of the human face expression sequence image above-mentioned 3rd step extracted, go out according to the dynamic space-time textural characteristics sample architecture of these human face expression sequence images and distinguish corresponding training and testing classification sample matrix with the dynamic space-time textural characteristics matrix of the dynamic space-time textural characteristics matrix of training sample human face expression sequence image and test sample book human face expression sequence image, the value in this training and testing classification sample matrix is the class categories of sample;

(2) self-defined kernel function is adopted for the express one's feelings dynamic space-time textural characteristics of sequence image of local facial, cross validation is adopted to select optimal parameter c and g, Lagrange factor c=790, g=1.9, first the dynamic space-time textural characteristics matrix feeding svmtrain function of the dynamic space-time textural characteristics matrix of training sample human face expression sequence image and test sample book human face expression sequence image is obtained support vector, again the dynamic space-time textural characteristics matrix of test sample book human face expression sequence image and above-mentioned support vector are sent in svmpredict function and predict, complete expression recognition thus, in Cohn-Kanade storehouse and SFEW storehouse, experiment obtains anger, detest, fear, glad, sad and surprised 6 kinds of expressions, complete the identification of human face expression thus.

The above-mentioned facial expression recognizing method based on video sequence, described HCBP-TOP algorithm, it is the algorithm adopting CBP characteristic sum Haar-like feature to combine, the subimage sequence of same frequency is extracted to the dynamic space-time textural characteristics of the human face expression sequence of piecemeal with HCBP-TOP algorithm, wherein HCBP-TOP histogram is defined as follows:

H_{i, j} = \underset{x, y, t}{Σ} E {f_{j} (x, y, t) = i} i = 0, ..., 255; j = 0, 1, 2 - - - (8),

Wherein: f _j(x, y, t) represents the HCBP value of pixel I (x, y, t) in jth plane, (j=0:XY; 1:XT; 2:YT), function E{f} is defined as follows:

E {f} = \{\begin{matrix} 1, & \begin{matrix} i f & f = i \end{matrix} \\ 0, & e l s e \end{matrix} - - - (9) .

The above-mentioned facial expression recognizing method based on video sequence, described spatial pyramid segmentation and SVM classifier are known.

The invention has the beneficial effects as follows: compared with prior art, outstanding substantive distinguishing features of the present invention and marked improvement as follows:

(1) the inventive method is in human face expression sequence image stratification step, overcome and extract entire image and ignore the defect that local detail information causes discrimination low, adopt spatial pyramid partitioning scheme that image sequence is divided into L+1 layer, the sub-block number that every tomographic image divides is 2 ⁱ× 2 ⁱ, wherein i ∈ 0,1 ... L}, every layer is extracted feature in units of sub-block, and gives different weights according to the number of plies to statistical nature histogram, and the more weights of piecemeal are larger, embody detailed information by weight size, improve Expression Recognition rate, be generally applicable in various image sequence;

(2) the inventive method is in the dynamic space-time textural characteristics step utilizing HCBP-TOP algorithm extraction human face expression sequence image, the defect that recognition speed that center pixel causes is low is not considered relative to existing HLBP, by asking for the mean value of ten elements and central element in each pattern, using five times of the difference of the mean value of 11 elements and center pixel as the threshold value in each pattern, improve Expression Recognition rate;

(3) the inventive method is carried out in training and prediction steps at sorter, SVM classifier is adopted to carry out training and the prediction of human face expression, the centralization binary pattern algorithm of employing based on Haar-like and the dynamic space-time textural characteristics of HCBP extraction human face expression sequence image, spatial pyramid partitioning scheme is adopted to distribute the histogrammic weight of dynamic space-time textural characteristics of human face expression sequence image, the dynamic space-time textural characteristics histogram of statistics human face expression sequence image is merged the dynamic space-time textural characteristics of human face expression sequence image as a whole, the dynamic space-time textural characteristics data of the human face expression sequence image extracted input SVM training classifier is carried out training and the prediction of human face expression, the dynamic space-time textural characteristics of this centralization binary pattern human face expression sequence image based on Haar-like well can describe human face expression feature, there is very high accuracy of identification, further increase the practicality of expression recognition system,

(4) the inventive method can effectively utilize the information of human face expression time-space domain feature, and overcome the defect not considering center pixel in LBP recognition methods, improve efficiency and the accuracy of identification of human face in video frequency sequence Expression Recognition, improve speed and the efficiency of training, make the method have more actual general using value.

The following examples have made further proof to outstanding substantive distinguishing features of the present invention and marked improvement.

Accompanying drawing explanation

Below in conjunction with drawings and Examples, the present invention is further described.

Fig. 1 is the steps flow chart schematic diagram of the facial expression recognizing method that the present invention is based on video sequence.

Fig. 2 is the schematic diagram of the dynamic space-time textural characteristics process extracting human face expression sequence image in the inventive method with HCBP-TOP algorithm.

Embodiment

Embodiment illustrated in fig. 1ly show, the steps flow chart that the present invention is based on the facial expression recognizing method of video sequence shows and is: the dynamic space-time textural characteristics → employing SVM classifier of human face expression sequence pre-service → employing spatial pyramid partitioning scheme to human face expression sequence image hierarchical block process → utilize HCBP-TOP algorithm to extract human face expression sequence image carries out training and the prediction of human face expression.

Embodiment illustrated in fig. 2ly to show, in the inventive method by the dynamic space-time textural characteristics process of HCBP-TOP algorithm extraction human face expression sequence image be: X is carried out to the human face expression sequence image after the process of spatial pyramid partitioning scheme, Y, the dynamic space-time texture feature extraction of the human face expression sequence image of T tri-dimensions, X and Y represents horizontal and vertical direction, T represents time domain, at XY, XT and YT tri-normal surfaces extract the deformation data of XY plane and the movable information in XT plane and YT plane, the combination of eigenvectors of three dimensions is spliced into global feature vector in block, again the proper vector of all pieces in entire image is integrated, form the characteristic in certain layering of piece image.

Elaborate further: after above-mentioned second step spatial pyramid partitioning scheme is to the process of human face expression sequence image hierarchical block, each sub-block X is obtained further with HCBP-TOP algorithm, Y, the proper vector of T tri-dimensions global feature vector in combined and spliced one-tenth block, again the proper vector of all pieces in entire image is integrated, form the characteristic in certain layering of piece image, the different effect played when Images Classification with each tomographic image block of spatial information due to tomographic image each during Images Classification is different, dynamic space-time textural characteristics data by each layer human face expression sequence image are merged the dynamic space-time textural characteristics data of the human face expression sequence image being connected into general image according to the set of weights divided.The dynamic space-time textural characteristics data of the human face expression sequence image after gathering obtain final image feature representation through standardization processing.

Embodiment

The present embodiment is based on the facial expression recognizing method of video sequence, and be a kind of facial expression recognizing method utilizing HCBP-TOP algorithm to extract the dynamic space-time textural characteristics of human face expression sequence, concrete steps are as follows:

The first step, the pre-service of human face expression sequence image:

(1) human face expression sequence image cutting:

Gray＝0.299R+0.587G+0.114B(1)，

(2) human face expression sequence image convergent-divergent:

(3) human face expression sequence image gray balance:

(1) the HCBP feature of image subblock is extracted:

8 groups of HCBP encoding model M ₁-M ₈as shown in formula (2):

M_{1} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], M_{2} = [\begin{matrix} 1 & 1 & 1 & 1 & 1 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], M_{3} = [\begin{matrix} 0 & 0 & 1 & 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}],

M_{4} = [\begin{matrix} 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \end{matrix}], M_{5} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 1 & 1 & 1 \end{matrix}], M_{6} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 1 & 1 & 1 & 1 & 1 \end{matrix}],

M_{7} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & 1 & 1 & 0 & 0 \end{matrix}], M_{8} = [\begin{matrix} 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \end{matrix}] - - - (2),

W (x, y) = [\begin{matrix} P_{9} & P_{10} & P_{11} & P_{12} & P_{13} \\ P_{24} & P_{1} & P_{2} & p_{3} & P_{14} \\ p_{23} & p_{8} & p_{0} & p_{4} & P_{15} \\ P_{22} & P_{7} & P_{6} & P_{5} & P_{16} \\ P_{21} & P_{20} & P_{19} & P_{18} & P_{17} \end{matrix}] - - - (3),

f_{j} (x, y, t) = H C B P (I (x, y, t)) = Σ_{k = 1}^{8} B (a_{j, k}) \times 2^{8 - k} - - - (4),

Wherein,

B (a_{j, k}) = \{\begin{matrix} 1, & a_{j, k} &GreaterEqual; T_{j, k} \\ 0, & a_{j, k} < T_{j, k} \end{matrix} - - - (5),

T_{j, k} = Σ_{k = 1}^{8} 5 (I (x, y, t) - (C_{k} \cdot W_{j} (x, y) + I (x, y, t)) / 11) - - - (6),

a _j,k＝M _k·W _j(x,y)(7)，

HCBP-TOP algorithm is utilized to extract the dynamic space-time textural characteristics of human face expression sequence image to the hierarchical pyramid sub-block that second step generates, according to the dynamic space-time texture HCBP-TOP feature of the human face expression sequence image in the every block of HCBP feature calculation of the extraction image subblock of above-mentioned (1) step, each sub-block X, Y, T tri-dimensions at XY, XT, HCBP combination of eigenvectors on YT direction is spliced into global feature vector HCBP-TOP in block, again the proper vector of all pieces in entire image is integrated, form the characteristic in certain layering of piece image, the feature histogram of certain layer is obtained according to histogram functions, the different effect played when Images Classification with each tomographic image block of spatial information due to tomographic image each during Images Classification is different, the dynamic space-time textural characteristics histogram of pyramidal each layer human face expression sequence image is connected into the dynamic space-time textural characteristics histogram of whole human face expression sequence image according to weight allocation, the dynamic space-time textural characteristics histogram weight of the human face expression sequence image of the layer of weight allocation principle belonging to the sub-block of large scale is little, and the dynamic space-time textural characteristics histogram weight of the human face expression sequence image of layer belonging to the sub-block of small scale is large, the histogrammic weight of dynamic space-time textural characteristics of the human face expression sequence image of definition i-th layer, pyramid is i+1, the weight that then the former figure feature histogram of level 0 distributes is 1, the weight that ground floor feature histogram distributes is 2, along with the number of plies is larger, the weight of this layer of feature histogram distribution is larger, namely the proportion of this layer of characteristic information in total characteristic represents is larger, then the dynamic space-time textural characteristics histogram of each layer human face expression sequence image is merged the dynamic space-time textural characteristics histogram of the human face expression sequence image being connected into general image according to the set of weights divided, wherein the sub-block number of i-th layer is 2 ⁱ× 2 ⁱthe dynamic space-time textural characteristics span of the human face expression sequence image of each sub-block is 0 to 255, and represent through the dynamic space-time textural characteristics that standardization processing obtains final human face expression sequence image, by training classifier in the dynamic space-time textural characteristics data input SVM of human face expression sequence image that extracts,

The present embodiment based in the facial expression recognizing method of video sequence, described HCBP-TOP algorithm, it is the algorithm adopting CBP characteristic sum Haar-like feature to combine, to the dynamic space-time textural characteristics of the human face expression sequence that the subimage sequence of same frequency is improved with HCBP-TOP algorithm extraction piecemeal, wherein HCBP-TOP histogram is defined as follows:

H_{i, j} = \underset{x, y, t}{Σ} E {f_{j} (x, y, t) = i} i = 0, ..., 255; j = 0, 1, 2 - - - (8),

E {f} = \{\begin{matrix} 1, & \begin{matrix} i f & f = i \end{matrix} \\ 0, & e l s e \end{matrix} - - - (9) .

The present embodiment is tested on the existing human face expression video sequence data storehouse of Cohn-Kanade and SFEW two.From Cohn-Kanade storehouse, choose 327 face expression video sequence images and test, in experiment, human face expression sequence image is divided into anger, detest, fear, glad, sad and surprised, comprise 38 respectively, 35,41,50,43,48 image sequences, each face expression video sequence image comprises 10 frames, and start frame is neutral expression, end frame is the tip that expression occurs, totally 3270 width images.MATLABR2012a platform under Windows7 environment runs, and respectively identification experiment has been carried out to the human face expression video gathered under normal illumination, low-light, intense light irradiation.In order to effectively assess the method for the present embodiment, extract that 3270 frames comprise the different colour of skin, the image of different light has carried out experimental analysis from human face expression video sequence, the accurate discrimination of the present embodiment is 92.86%, and false drop rate is 7.14%.

In order to verify the advantage of the present embodiment method in expression recognition rate, the facial expression recognizing method that the present embodiment chooses the facial expression recognizing method of the dynamic space-time textural characteristics of the LBP-TOP algorithm extraction human face expression sequence being usually used in expression recognition and the dynamic space-time textural characteristics of HLBP-TOP algorithm extraction human face expression sequence compares, utilize SVM classifier to train, Cohn-Kanade carries out expression recognition contrast experiment.Table 1 lists the discrimination of the human face expression of algorithms of different on Cohn-Kanade database, the selection mode of test sample book is wherein trained to be that a part for the every class video sequence of Stochastic choice is as training sample, a remaining part is test sample book, ensure that training and test sample book do not repeat, more can ensure ubiquity and the correctness of experimental result.

The discrimination of the human face expression of table 1. algorithms of different on Cohn-Kanade database

The hierarchy number difference of spatial pyramid partitioning scheme also can have an impact to expression recognition rate, and table 2 lists the different hierarchy number of spatial pyramid partitioning scheme to the impact of expression recognition rate.Wherein the hierarchy number of spatial pyramid partitioning scheme is 2, and when namely dividing three layers, effect is best.

The different hierarchy number of table 2. spatial pyramid partitioning scheme is on the impact of Cohn-Kanade database human face expression average recognition rate

From SFEW storehouse, choose 940 human face expression video sequence images and test, in experiment human face expression sequence image be divided into anger, detest, fear, glad, sad and surprised, comprise 214,66,116,227,198,119 images respectively.Table 3 lists the discrimination of the human face expression of algorithms of different on SFEW database.

The discrimination of the human face expression of table 3. algorithms of different on SFEW database

Result shows, the discrimination that the present embodiment method utilizes HCBP-TOP algorithm to extract the facial expression recognizing method of the dynamic space-time textural characteristics of human face expression sequence is obviously better than the facial expression recognizing method that LBP-TOP algorithm extracts the facial expression recognizing method of the dynamic space-time textural characteristics of human face expression sequence and the dynamic space-time textural characteristics of HLBP-TOP algorithm extraction human face expression sequence.

The segmentation of spatial pyramid described in the present embodiment and SVM classifier are known, and involved equipment is well known in the art and obtains by being purchased approach.

Claims

1. based on the facial expression recognizing method of video sequence, it is characterized in that: be a kind of facial expression recognizing method utilizing HCBP-TOP algorithm to extract the dynamic space-time textural characteristics of human face expression sequence, concrete steps are as follows:

The first step, the pre-service of human face expression sequence image:

(1) human face expression sequence image cutting:

Gray＝0.299R+0.587G+0.114B(1)，

(2) human face expression sequence image convergent-divergent:

(3) human face expression sequence image gray balance:

(1) the HCBP feature of image subblock is extracted:

8 groups of HCBP encoding model M ₁-M ₈as shown in formula (2):

M_{1} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}],

M_{2} = [\begin{matrix} 1 & 1 & 1 & 1 & 1 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}],

M_{3} = [\begin{matrix} 0 & 0 & 1 & 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}],

M_{4} = [\begin{matrix} 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & - 1 & 1 \end{matrix}],

M_{5} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & - 1 & 1 \\ 0 & - 1 & - 1 & - 1 & 1 \\ 0 & 0 & 1 & 1 & 1 \end{matrix}],

M_{6} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ - 1 & - 1 & - 1 & - 1 & - 1 \\ 1 & 1 & 1 & 1 & 1 \end{matrix}],

M_{7} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & - 1 & - 1 & 0 \\ 1 & 1 & 1 & 0 & 0 \end{matrix}],

M_{8} = [\begin{matrix} 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & 0 & 0 \end{matrix}] - - - (2),

W (x, y) = [\begin{matrix} P_{9} & P_{10} & P_{11} & P_{12} & P_{13} \\ P_{24} & P_{1} & P_{2} & P_{3} & P_{14} \\ P_{23} & P_{8} & P_{0} & P_{4} & P_{15} \\ P_{22} & P_{7} & P_{6} & P_{5} & P_{16} \\ P_{21} & P_{20} & P_{19} & P_{18} & P_{17} \end{matrix}] - - - (3),

f_{j} (x, y, t) = H C B P (I (x, y, t)) = Σ_{k = 1}^{8} B (a_{j, k}) \times 2^{8 - k} - - - (4),

Wherein

B (a_{j, k}) = \{\begin{matrix} 1, & a_{j, k} &GreaterEqual; T_{j, k} \\ 0, & a_{j, k} < T_{j, k} \end{matrix} - - - (5),

T_{j, k} = Σ_{k = 1}^{8} 5 (I (x, y, t) - (C_{k} \cdot W_{j} (x, y) + I (x, y, t)) / 11) - - - (6),

a _j,k＝M _k·W _j(x,y)(7)，

2. according to claim based on the facial expression recognizing method of video sequence, it is characterized in that: described HCBP-TOP algorithm, it is the algorithm adopting CBP characteristic sum Haar-like feature to combine, the subimage sequence of same frequency is extracted to the dynamic space-time textural characteristics of the human face expression sequence of piecemeal with HCBP-TOP algorithm, wherein HCBP-TOP histogram is defined as follows:

H_{i, j} = \underset{x, y, t}{Σ} E {f_{j} (x, y, t) = i}, i = 0, ..., 255; j = 0, 1, 2 - - - (8),

E {f} = \{\begin{matrix} 1, & \begin{matrix} i f & f = i \end{matrix} \\ 0, & e l s e \end{matrix} - - - (9) .