CN104376333A

CN104376333A - Facial expression recognition method based on random forests

Info

Publication number: CN104376333A
Application number: CN201410503590.XA
Authority: CN
Inventors: 蒲晓蓉; 陈雷霆; 王耀晖; 蔡洪斌; 陈雄; 曹跃; 崔金钟; 卢光辉; 邱航
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2014-09-25
Filing date: 2014-09-25
Publication date: 2015-02-25

Abstract

The invention discloses a facial expression recognition method based on random forests. The facial expression recognition method based on the random forests comprises the step of extraction of a displacement feature of an AAM, the step of extraction of AUs in a facial expression sequence, the step of training of a facial expression classification model and the step of facial expression recognition. According to the facial expression recognition method, the novel AAM displacement feature is provided to be used for training and learning the AUs, and finally facial expression recognition is carried out by depending on the AUs. Compared with other feature representations in identification of the same classification, the facial expression recognition method based on the random forests better describes expression information and changing process information contained in the expression sequence. The random forests are used for facial expression recognition for the first time, and the random forests in the method have a better classified recognition effect in the field compared with a frequently used support vector machine (SVM) method at present. For the aspect of CK and AU recognition of databases, the facial expression recognition method based on the random forests can achieve a perfect recognition effect.

Description

Face expression recognition method based on random forest

Technical Field

The invention relates to a facial expression recognition method, in particular to a facial expression recognition method based on a random forest.

Background

The human face expression is used as important non-language information, so that very rich information can be provided, and language communication is effectively supplemented. Many times, one's joy, anger, sadness and fun is directly reflected in facial expressions. The human face expression can be automatically identified by the computer, so that the computer can better interact with people, and humanized service is provided. Facial expression recognition can be widely applied to various fields such as medical treatment, human-computer interaction (HCI), video games, data-driven animation and the like. Facial expression recognition has attracted more and more research attention over the past twenty years, and a variety of advanced algorithms have been proposed^[1]. Facial expressions have some of the following characteristics: (1) is delicate. This is reflected in, on the one hand, the very similarity between certain expressions; on the other hand, the facial expression contains abundant micro expression and is not easy to perceive and capture. (2) The facial expression is vanishing instantly. (3) The facial expression is complicated and changeable. From the perspective of organs, the facial expression is mainly reflected in the eyebrows, eyes, nose, mouth and other places, and each organ has multiple states such as open eyes, closed eyes, squinting and the like, and different states of different organs are combined to form different facial expressions. (4) Is easily influenced by factors such as posture, illumination and the like. Facial expressions are closely related to facial poses, and significant differences may occur for the same expression, viewed from the front, side, or other angles, respectively. In addition, the influence of different lighting on facial expressions is very obvious, for example, strong lighting makes the whole face image white, or darker light can cause the face in the image to be blurred, and the expressions are more difficult to distinguish, which are common problems and challenges in the field of biometric identification. Therefore, most of the facial expression recognition research still stays in the experimental exploration stage at present, and widely adopted facial expression databases are established under certain limiting conditions, particularly data collected under controlled environments such as illumination, angles and the like. Therefore, facial expression recognition remains a significant difficulty and challengeAnd (4) the problem of sexual performance.

The invention of the "facial expression recognition method and system, and the training method and system of expression classifier" of the invention of the treble national duty of the beijing zhongxing microelectronics ltd is patented and approved by the chinese national intellectual property office in 12.21.2009, and is published in 11.03.2010, with the publication numbers: CN 101877056a, this method adopts Hidden Markov Model (HMM), and the overall performance is poor. Secondly, in the training process of the system, the neutral expression and the weak and small expression in the facial expression sequence are adopted, the information contained in the weak and small expression is not as much as the information contained in the exaggerated expression, and therefore partial information can be lost. In addition, not only the fusion binary image but also the gray level image are used in the process of extracting the features, and the calculation process is complex.

The invention of 'facial expression recognition method based on feature point vector and texture deformation energy parameter' of the major university of aerospace, xu xia li, chenlijiang and wangxiang kan applies for patent and obtains approval from the intellectual property office of china 10 and 17 days 2012, and is published in 27.02 and 2013 with the publication numbers as follows: CN 102945361A; feature point positioning is respectively carried out on neutral expressions and terminal expressions (exaggerated expressions) of the facial expression sequence by using an AAM tool in OPENCV, 26 feature points are selected to form feature point vectors, and the feature point vectors are divided into Euclidean distances among the feature points and included angles of connecting lines. And establishing a feature block according to the feature points, calculating a texture deformation energy coefficient matrix, and obtaining a final texture deformation energy parameter by using PCA. The three are combined to be used as training data of the RBF neural network, and finally face expression recognition is achieved. Although the AAM feature includes features of both shape and texture, it is not directly related to the facial expression, and the effect of directly performing facial expression recognition using AAM as a feature is not ideal. And in the extraction process of the features, redundancy is removed from three aspects of the constituent features, resulting in several problems. On one hand, the process of removing redundancy inevitably removes a part of useful information, on the other hand, the redundant information is sometimes necessary in the experimental process, and the proper redundancy can be beneficial to the experimental result, such as the calculation of the HOG characteristic can be well embodied.

The invention of von willebrand and Pengzheng industry of northwest industry university, namely the facial expression recognition method based on facial action unit combination characteristics, applies for patent and is approved by the intellectual property office of China in 2012 12 and 21 and is published in 2013 in 04 and 24, and the publication numbers are as follows: CN 103065122A; the face database adopted in the patent is self-constructed, and the rationality and the effectiveness of the face database are questionable. Second, a single image is used in the algorithm rather than a sequence of images. In fact, the expression is a continuous change process, and the single image contains insufficient expression information. In addition, in the training process of the algorithm, manual work (10 researchers) is adopted for AU labeling, and the AU labeling needs experts in the aspect to ensure high accuracy, so that errors are inevitable. Although the patent states that it defines a final adopted threshold, its reliability is poor. Moreover, LBP does not directly link with AU, and the learning of AU combination is carried out by using LBP, and the reasonability is worth exploring.

The facial expression recognition can be applied to various fields, such as automatic smile capturing and photographing carried by a camera, natural interaction between an intelligent robot and a person, capturing of painful expressions of patients in a medical treatment process so as to timely rescue and the like. Although facial expression recognition has been studied for a long time and many solutions have been proposed, the results of practical application are not ideal and the recognition effect is still in the space of further improvement in the experimental research stage.

Disclosure of Invention

In view of the above disadvantages, the present invention aims to provide a new facial expression recognition algorithm, which performs 7 basic facial expression recognitions on the CK + database and obtains a higher recognition rate than the previous algorithms.

The invention aims at researching the facial expression of an image sequence. The core methods used are Random forest (Random forest) and Active Appearance Model (Active Appearance Model). The random forest is mainly composed of some decision trees, and the AAM is mainly used for describing shape and texture features.

The method comprises the steps of extracting AAM characteristic points of a first frame (neutral expression) from an expression sequence image (from the neutral expression to an exaggerated expression), and tracking through an LK algorithm to obtain the AAM characteristic points of a last frame (the exaggerated expression). And (3) taking the displacement of the AAM feature points as feature vectors, inputting Random forest (Random forest) for training, and identifying AUs contained in the expression. And the AUs are used as input and are transmitted to Random forest (Random forest) for new training to obtain an expression classification model, and finally, the model is used for facial expression recognition.

The specific technical scheme of the invention is as follows:

step 1 AAM Displacement feature extraction

AAM is based on the active Shape model ASM (active Shape model), originally proposed by Edwards, Cootes and Taylor. AAM includes two parts: a shape model and a texture model. The present invention uses shape models. The shape model is defined as the coordinates of n feature points:

s＝(x₁,y₁,x₂,y₂,…,x_n,y_n)^T

it can be changed into a linear model:

wherein p is_iIs the shape parameter, s_iThe training data is obtained by Principal Component Analysis (PCA) (principal Component analysis) dimensionality reduction. However, for the purposes of calculation and tracking, the present invention does not use the second expression, but uses the first simple expression.

Step 1.1, selecting a first frame (neutral expression) of a facial expression image sequence, extracting AAM feature points of the first frame, and setting as:

s₀＝[x₁₀,y₁₀,x₂₀,y₂₀,...,x_n0,y_n0]

step 1.2, tracking the AAM characteristic points from the first frame to the last frame by using an LK tracking algorithm. The LK optical flow tracking algorithm employed by the present invention is an improved version of the classic LK algorithm proposed by j

<math><mrow> <msup> <mi>ϵ</mi> <mi>L</mi> </msup> <mrow> <mo>(</mo> <msup> <mi>d</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>J</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>+</mo> <msup> <mi>g</mi> <mi>L</mi> </msup> <mo>+</mo> <msup> <mi>d</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow></math>

Wherein,is the initial estimate at pyramid L level, which can be calculated from L-1 level:

g^L＝2(g^L-1+d^L-1)

the final optical flow can be expressed in the form:

wherein L is_mIs the deepest level of the pyramid. Thus, d^LThe solution can be solved as follows:

wherein,

while

<math><mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>G</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>)</mo> </mrow> <mi>L</mi> </msup> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <msup> <mrow> <mo>(</mo> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow></math>

<math><mrow> <msubsup> <mi>b</mi> <mi>k</mi> <mi>L</mi> </msubsup> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mi>δ</mi> <msubsup> <mi>I</mi> <mi>k</mi> <mi>L</mi> </msubsup> <mo>,</mo> </mrow></math>

In both formulaeI^L _k(x)＝I^L(x)-J^L(x+g^L+v^L)

Wherein v is^LIs the optical flow vector of L layers, K is of each layer in the LK algorithmThe maximum number of iterations.

Through an LK optical flow tracking algorithm, obtaining AAM feature points of the last frame (exaggerated expression), and setting the AAM feature points at the moment as:

s_n＝[x_1n,y_1n,x_2n,y_2n,...,x_nn,y_nn]

step 1.3, according to the results obtained in step 1.1 and step 1.2, the displacement of the AAM characteristic point is calculated:

s_t＝[x_1n-x₁₀,y_1n-y₁₀,x_2n-x₂₀,y_2n-y₂₀,...,x_nn-x_n0,y_nn-y_n0]；

the obtained AAM displacement characteristic is obtained.

Step 2, AU extraction in facial expression sequence

Action units AU come from the facial behavior Coding system facs (facial Action Coding system), which was proposed by Paul Ekman and Wallace v.

And 2.1, taking the AAM displacement characteristics obtained in the step 1 as the input of the random forest.

Input AAM displacement characteristic data are firstly subjected to randomization processing to obtain K pieces of fractional data, then the K pieces of fractional data are transmitted into corresponding decision trees to obtain K results, and classification is carried out according to the following random forest definition to obtain a final classification result.

The definition of random forest is as follows:

where H (x) is the final fused classifier, K is the number of decision trees, h_i(x) Represents a decision tree, Y is a class label, I (h)_i(x) Y) indicates that x belongs to category Y.

Step 2.2 AAM feature point displacement reflects changes of facial expressions, such as eye closing, eye enlargement, mouth opening and the like, which are information represented by AU, and the two are consistent. Therefore, it is reasonable to input AAM displacement features into a random forest for training to obtain human face AU information in the sequence, and the feasibility is also verified by the final experimental result. Thus, the corresponding AUs contained in the sequence of facial expressions is finally obtained via step 2.1.

Step 3, training a facial expression classification model

Facial expressions are primarily combinations of facial muscle group movements, and AU is the movement of a block or group of facial muscles. Therefore, the combination of AUs represents the corresponding facial expression. The corresponding expression of the expression sequence can be obtained by analyzing the AU combination contained in the expression sequence.

And (3) taking AUs information of the facial expression sequence obtained in the step (2) as training data, and inputting the training data into the random forest again for training to obtain the facial expression classification model.

Step 4 facial expression recognition

The process of facial expression recognition is similar to the process of training a facial expression classification model. And (3) processing the facial expression to be classified according to the step 1 to obtain an AAM displacement characteristic vector, processing the AAM displacement characteristic vector in the step 2 to obtain AU information contained in the sequence, and finally inputting the AU information into the random forest facial expression classification model obtained in the step 3 to obtain a final facial expression recognition result.

The invention mainly aims at 7 basic facial expressions: anger, apprehension, aversion, fear, happiness, sadness and surprise.

The invention has the beneficial effects that: a new facial expression recognition algorithm is provided, which specifically comprises the following steps: (1) a new AAM displacement feature is proposed for training learning AUs and ultimately relying on AUs for facial expression recognition. It describes the expression information and change process information contained in an expression sequence better than other feature representations in the same kind of recognition. (2) The random forest is used for facial expression recognition for the first time, and compared with a Support Vector Machine (SVM) method which is commonly used in the past, the random forest has a better classification recognition effect in the field. (3) Aiming at the aspect of AU identification of CK + database, the automatic identification method provided by the invention can achieve perfect identification effect.

The method adopts a CK + database which is provided by field authority experts PatrickLucy and the like, is an international universal database researched in the aspect of face expression recognition at present, and also provides AUs, expression and other information of an image sequence in the database, so that the defects of the prior art III (publication number CN 103065122A) are avoided, and meanwhile, the method well avoids the defects of the prior art I (publication number CN 101877056A) and II (publication number CN 102945361A).

Drawings

Fig. 1 is a flowchart of a facial expression recognition method according to the present invention.

Fig. 2 is a diagram of the AAM displacement feature extraction process.

FIG. 3 is a diagram of a random forest training and recognition process.

Detailed Description

A facial expression recognition method based on random forests comprises the following steps:

step 1 AAM Displacement feature extraction

s＝(x₁,y₁,x₂,y₂,…,x_n,y_n)^T

it can be changed into a linear model:

wherein p is_iIs the shape parameter, s_iThe training data is obtained by Principal Component Analysis (PCA) (principal Component analysis) dimensionality reduction. However, for the purposes of calculation and tracking, the present invention does not use the second expression, but uses the first simple expression. The detailed process is shown in fig. 2.

Step 1.1 selects the first frame (for example, neutral expression) of the facial expression image sequence, extracts the AAM feature points, and sets as:

s₀＝[x₁₀,y₁₀,x₂₀,y₂₀,...,x_n0,y_n0]

step 1.2 tracking the AAM feature points from the first frame (neutral expression) to the last frame (exaggerated expression) using the LK tracking algorithm. The LK optical flow tracking algorithm employed by the present invention is an improved version of the classic LK algorithm proposed by j

g^L＝2(g^L-1+d^L-1)

the final optical flow can be expressed in the form:

wherein,

while

In both formulaeI^L _k(x)＝I^L(x)-J^L(x+g^L+v^L)

Wherein v is^LIs the optical flow vector of the L layers, and K is the maximum number of iterations of each layer in the LK algorithm.

s_n＝[x_1n,y_1n,x_2n,y_2n,...,x_nn,y_nn]

s_t＝[x_1n-x₁₀,y_1n-y₁₀,x_2n-x₂₀,y_2n-y₂₀,...,x_nn-x_n0,y_nn-y_n0]。

i.e. the calculated AAM displacement characteristic, as shown in block 1 of fig. 1.

Step 2, AU extraction in facial expression sequence

Action units AU come from the facial behavior Coding system facs (facial Action Coding system), which was proposed by Paul Ekman and Wallace v. The core idea of AU is mainly to describe the movement of a muscle or a group of muscles of the face, such as the raising of eyebrows in AU 1. Since the probability of occurrence of the part AUs is very low, only 17 of them with a high probability of occurrence AUs: {1,2,4,5,6,7,9, 12,14,15,17,20,23,24,25,27,38 }. The extraction process of AU is shown in block 2 of fig. 1.

And 2.1, taking the AAM displacement characteristics obtained in the step 1 as the input of the random forest. The random forest is shown in figure 3;

as shown in fig. 3, the input AAM displacement feature data (i.e., D in the figure) is first randomized to obtain K pieces of data, and then the K pieces of data are transmitted to corresponding decision trees to obtain K results, and then classified according to the following random forest definitions to obtain the final classification result.

The definition of random forest is as follows:

Step 3, training a facial expression classification model

As shown in block 3 in fig. 1, AUs information of the facial expression sequence obtained in step 2 is used as training data, and random forests are input again for training, so that a facial expression classification model can be obtained.

Step 4 facial expression recognition

Claims

1. A facial expression recognition method based on random forests comprises the following steps:

step 1, extracting AAM displacement characteristics;

step 2, extracting AU in the facial expression sequence;

step 3, training a facial expression classification model;

and 4, recognizing the facial expression.

2. The random forest based facial expression recognition method of claim 1, wherein: the method for extracting the AAM displacement characteristics in the step 1 comprises the following steps:

using a shape model, the shape model is defined as the coordinates of n feature points:

s＝(x₁,y₁,x₂,y₂,…,x_n,y_n)^T

it can be changed into a linear model:

wherein p is_iIs the shape parameter, s_iCarrying out Principal Component Analysis (PCA) dimensionality reduction on training data to obtain the training data;

step 1.1, selecting a first frame of a facial expression image sequence, extracting AAM feature points of the first frame, and setting as:

s₀＝[x₁₀,y₁₀,x₂₀,y₂₀,...,x_n0,y_n0]

step 1.2, tracking the AAM characteristic points from the first frame to the last frame by using an LK tracking algorithm; finding an image rate that minimizes the value of the error matching function

<math> <mrow> <msup> <mi>ϵ</mi> <mi>L</mi> </msup> <mrow> <mo>(</mo> <msup> <mi>d</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>J</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>+</mo> <msup> <mi>g</mi> <mi>L</mi> </msup> <mo>+</mo> <msup> <mi>d</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math>

g^L＝2(g^L-1+d^L-1)

the final optical flow can be expressed in the form:

wherein L is_mIs the deepest level of the pyramid, and thus, d^LThe solution can be solved as follows:

wherein,

while

<math> <mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>G</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>)</mo> </mrow> <mi>L</mi> </msup> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <msup> <mrow> <mo>(</mo> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math>

<math> <mrow> <msubsup> <mi>b</mi> <mi>k</mi> <mi>L</mi> </msubsup> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>&dtri;</mo> <msup> <mi>I</mi> <mi>L</mi> </msup> <mi>δ</mi> <msubsup> <mi>I</mi> <mi>k</mi> <mi>L</mi> </msubsup> <mo>,</mo> </mrow> </math>

In both formulaeI^L _k(x)＝I^L(x)-J^L(x+g^L+v^L)

Wherein v is^LIs the optical flow vector of L layers, K is the maximum iteration number of each layer in the LK algorithm;

obtaining AAM characteristic points of the last frame through an LK optical flow tracking algorithm, and setting the AAM characteristic points as follows:

s_n＝[x_1n,y_1n,x_2n,y_2n,...,x_nn,y_nn]

3. the random forest based facial expression recognition method of claim 1, wherein: the method for extracting AU in the human face expression sequence in the step 2 comprises the following steps:

step 2.1, the AAM displacement characteristics obtained in the step 1 are used as input of a random forest, input AAM displacement characteristic data are firstly subjected to randomization processing to obtain K sub-data, then the K sub-data are transmitted into a corresponding decision tree to obtain K results, and then classification is carried out according to the following random forest definition to obtain a final classification result;

the definition of random forest is as follows:

Step 2.2 finally obtains AUs corresponding to the sequence of facial expressions.

4. The random forest based facial expression recognition method of claim 1, wherein: the method for training the human face expression classification model in the step 3 comprises the following steps:

5. The random forest based facial expression recognition method of claim 1, wherein: the method for recognizing the facial expressions in the step 4 comprises the following steps:

and (3) processing the facial expression to be classified according to the step 1 to obtain an AAM displacement characteristic vector, processing the AAM displacement characteristic vector in the step 2 to obtain AU information contained in the sequence, and finally inputting the AU information into the random forest facial expression classification model obtained in the step 3 to obtain a final facial expression recognition result.