CN112381047B

CN112381047B - Enhanced recognition method for facial expression image

Info

Publication number: CN112381047B
Application number: CN202011377211.9A
Authority: CN
Inventors: 谢巍; 刘彦汝; 钱文轩; 谢苗苗
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2023-08-22
Anticipated expiration: 2040-11-30
Also published as: CN112381047A

Abstract

The invention discloses a facial expression image enhancement and recognition method, which comprises the following steps: 1) Carrying out face positioning by using an Adaboost cascade detector based on Haar characteristics, framing out a face part, cutting out the face part, carrying out image preprocessing on the cut image, and storing the image; 2) Establishing a mapping relation between the appearance of the face and the shape of the face by using a cascade regression tree algorithm in a regression model, and extracting facial feature points; 3) Calculating corresponding Euclidean distance by using the facial expression representation model to obtain a six-element array representing facial expression characteristics; 4) Training a classification model by using a random forest algorithm, and inputting a six-element array into the trained model to realize classification recognition. The invention can better identify the facial expression image with a certain head deflection on the basis of identifying the facial expression image with the standard gesture, has higher identification efficiency and higher running speed, meets the actual application requirement, and is more suitable for actual scenes.

Description

Enhanced recognition method for facial expression image

Technical Field

The invention relates to the technical field of computer vision and pattern recognition, in particular to a method for enhancing and recognizing facial expression images.

Background

The facial expression recognition technology analyzes the specific moods of the people by extracting specific expression images in pictures or videos so as to better perform man-machine interaction. In view of the characteristic of high information degree, facial expression recognition plays an important role in psychological analysis, clinical medicine, safe driving, criminal investigation, and the like. The facial expression recognition at the present stage mainly aims at the standard posture, namely the frontal facial expression, but under the normal condition, people tend to unconsciously generate certain head deflection when making the expression, and in actual application, many complex situations can be faced, so that the recognition effect of the facial expression recognition cannot meet the expected needs.

Current attempts at facial expression recognition with head deflection can be summarized in three categories: a face key point-based method, a appearance-based method, and a gesture-based method. The method based on the key points of the human face mainly relies on a geometric model to locate the key points and then identify, a large number of samples for marking the key points are needed, and the automatic marking difficulty of the key points is high (He Jun, he Zhongwen, cai Jianfeng, ganoderma lucidum; a novel deflection angle facial expression identification method [ J ]. Computer application research, 2018,35 (01): 282-286.); the method based on appearance is to obtain the facial expression characteristics of the face in different attitudes, reduce the interference of the factors irrelevant to the expression in the image, avoid the problem of difficult extraction of key points, but the recognition effect is general (Wang Chenxing, liang Yu. A new expression recognition method [ J ]. Electronic technology and software engineering, 2018 (06): 67.); the gesture-based method can be divided into two types, one is to group expression libraries according to different gestures of a human face, and the groups are used for training, identifying and classifying; the other is to build the relation between the non-frontal face and the frontal face sample, map the non-frontal face into frontal face, then classify and identify the frontal face (Zheng Wenming, feng Tian. The non-frontal face expression identification method based on gesture normalization [ P ]. Jiangsu: CN103400105A, 2013-11-20.), the effect of this method is good, but the algorithm is complex, the operation is slow, and the method is not suitable for practical application.

Disclosure of Invention

The invention provides a facial expression enhancement and recognition method aiming at the problems. The method adopts the regression model to extract the characteristics, can accurately extract the facial expression characteristics under the deflection angle, and simultaneously combines the mode of integrated learning, thereby effectively improving the practicability of the method. The facial expression recognition method not only can recognize facial expression images under standard gestures, but also has better recognition effect on facial expression images with certain head deflection, and reduces algorithm complexity by adopting an integrated learning mode, improves operation speed, and is more suitable for actual scenes.

The invention is realized at least by one of the following technical schemes.

A method for enhanced recognition of a facial expression image, the method comprising the steps of:

1) Carrying out face positioning by using an Adaboost cascade detector based on Haar characteristics, framing out a face part, cutting out the face part, carrying out image preprocessing on the cut image, and storing the image;

2) Establishing a mapping relation between the appearance of the face and the shape of the face by using a regression model, realizing the alignment of the face, extracting facial feature points of the face, and determining the feature points of the face;

3) Calculating the corresponding Euclidean distance of the facial feature points to obtain a six-element array representing the facial expression features;

4) And inputting the six-element array into a trained classification model to realize classification recognition.

Preferably, in step 1), the face positioning is performed by using an Adaboost cascade detector based on Haar features, which specifically includes:

firstly, obtaining Haar characteristics of an image, traversing an integral graph to obtain Haar characteristic values, taking the Haar characteristic values as input of a classifier, giving the same initial weight to each input data to train a weak classifier, selecting the weak classifier with the smallest error as the optimal weak classifier of the round, calculating the weight of the next optimal weak classifier by calculating the error between the predicted value and the true value of the optimal weak classifier, carrying out repeated iteration, and then carrying out weighted combination on the obtained N optimal weak classifiers to obtain a strong classifier, and finally cascading a plurality of strong classifiers to detect and position.

Preferably, step 2) is to establish a mapping relationship between the appearance of the face and the shape of the face by using a cascade regression tree algorithm in a regression model, so as to realize face alignment and extract the feature points of the face h.

Preferably, the extracting the facial feature points of the face h specifically includes:

selecting a 300-W database as a sample set for training, and defining a _i ＝(x _i ,y _i ) (i=1, 2,., h) defines s= (a) for coordinates of feature points on one picture P in the sample set ₁ ^Τ ,a ₂ ^Τ ,...,a _m ^Τ ) M=h, which is the coordinate vector of all feature points on a picture, called shape, and regression iteration formula is:

λ _t a regression model formed for a cascade of regressors,lambda is the current estimate of S _t (. Cndot. Cndot.) is the image update vector of the cascade regressor prediction, S _z Human face real shape, deltaS ^(t+1) Is residual; the current face shape and the sample image are input into the regression model, the regression model predicts the update vector to obtain a new shape estimator, namely a new current face shape and the residual error of the face shape at the moment, namely the difference between the current shape and the real shape, the regression model carries out iterative updating according to the current residual error, the residual error is continuously reduced, the actual shape of the face is gradually approximated, and finally the face feature points are accurately extracted.

Preferably, the training process of the regression model formed by the N regressors specifically includes:

defining a sample set (P ₁ ,S ₁ ),....,(P _n ,S _n )，P _r (r=1, 2,., n) is an expression image in the sample set, and a shape vector corresponding to each expression image is S _r (r＝1,2,...,n)；

Inputting training sample image, initial shape estimation quantity and residual quantity, learning rate as rho, and initialized regression functionThe method comprises the following steps:

wherein C is a constant, C minimizes the initial predictive loss function, ΔS _r ^(t) For the residual quantity, n=nr, R is an initialization multiple of each expression image;

the square error is used as a loss function, the loss function is derived to obtain a gradient, and the gradient is used as a fitting object in each step of iteration:

is a regression function, deltaS _r ^(t) For residual quantity, ++>Is S _r Is a current estimate of (1);

for k=1,. -%, K; i=1.. N constructs a regression tree lambda based on a weak classifier G _ik K is the number of weak classifiers G, and is updated:

finally get lambda _t And (5) finishing the construction of a regression model.

Preferably, step 3) is to calculate the corresponding euclidean distance by using the facial expression characterization model to obtain a six-element array for characterizing facial expression features.

Preferably, the face is usedThe calculating of the corresponding euclidean distance by the facial expression representation model specifically comprises the following steps: classifying and distinguishing through a facial expression representation model, and calculating to obtain a six-element array D= (D) by adopting the facial expression representation model measured in the Euclidean distance mode ₁ ,d ₂ ,d ₃ ,d ₄ ,d ₅ ,d ₆ ) Wherein d ₁ Represents the distance between the two eyebrows, d ₂ Represents the distance between the eyebrows, d ₃ Represents the distance between the upper and lower boundaries of the eye, d ₄ Representing the height of the mouth, d ₅ Representing the width of the mouth d ₆ Indicating the distance from the corner of the mouth to the highest position of the upper lip.

Preferably, in step 4), a classification model is trained by using a random forest algorithm, and the six-element array is input into the trained model to realize classification recognition.

Preferably, the classification and identification specifically comprises the following steps:

step one, selecting an expression database fer2013 for training and randomly extracting part of samples and part of attributes;

determining splitting attributes from the attributes to be selected by utilizing a radix-ni (Gini) coefficient, generating nodes, generating CART decision trees, and forming a random forest by the generated decision trees;

and thirdly, after the samples are input, N classification results are generated in the forest, voting is carried out on the classification results obtained by all the input samples by adopting a voting mechanism, and the class with the largest voting frequency is the output identification result.

Preferably, the determining the splitting attribute from the candidate attributes by using the Gini coefficient is specifically:

let the sample set X contain N categories, gini coefficients are defined as:

σ _i representing the frequency of occurrence of class i in sample set X if X is divided into X under selected attribute X ₁ And X ₂ Two sample subsets, the Gini coefficient sum of the two sample subsets after division is:

M ₁ and M ₂ Respectively X ₁ And X ₂ The number of samples, M is the number of X samples, and the Gini coefficient is:

Gini＝Gini(X)-Gini _split(x) (X)

and selecting the attribute with the smallest Gini coefficient as the splitting attribute at each splitting node.

According to the invention, the original image is preprocessed by using the Adaboost cascade detector based on Haar characteristics, the posture mapping relation is established by using the cascade regression tree algorithm to realize the alignment of the human face, the human face expression characteristics are obtained by combining the human face expression representation model, and finally the random forest algorithm is used for classifying and identifying, so that the accurate extraction of the human face expression characteristics under the deflection angle is realized, the identification efficiency is improved, and the method has a certain practical value.

Compared with the prior art, the invention has the beneficial effects that:

(1) The feature extraction method has the advantages that the feature extraction is carried out by using the cascade regression tree algorithm, the problem of low positioning accuracy of the key points of the face extracted directly is avoided, the face is aligned firstly by using a regression iteration mode, and then the feature extraction is carried out, so that the extraction accuracy of the non-frontal facial expression features is effectively improved.

(2) The random forest algorithm is adopted for classification and identification, so that the identification efficiency is improved, the running time is shortened, and the practical value is enhanced.

(3) The deflection range of facial expression recognition is expanded, and the applicable scene of the system is enlarged.

Drawings

Fig. 1 is a flowchart of an enhanced recognition method for facial expression images in this embodiment.

Detailed Description

In order that those skilled in the art will better understand the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings, wherein it is to be understood that the illustrated embodiments are merely examples of some, but not all, of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment discloses an enhanced recognition method of a facial expression image, which not only can recognize the facial expression image under the standard gesture, but also has a better recognition effect on the facial expression image with a certain head deflection, and reduces the algorithm complexity by adopting an integrated learning mode, improves the operation speed, and is more suitable for actual scenes.

As shown in fig. 1, the method for enhancing and identifying a facial expression image in this embodiment specifically includes the following steps:

1) Carrying out face positioning by using an Adaboost cascade detector based on Haar characteristics, framing out a face part, cutting out the face part, preprocessing the cut image by using the image subjected to gray scale normalization and scale normalization, and storing the image, wherein the specific process is as follows:

The face part of the rear frame is positioned, the face frame is properly and proportionally expanded to ensure that a complete face image is obtained, the image is preprocessed after clipping, specifically, scale normalization and gray scale normalization are included, the image is properly reduced to increase the smoothness and definition of the image, and the image is subjected to weighted average graying to keep the accurate information of the image as much as possible under the condition of reducing the calculated amount of the image.

2) The mapping relation between the appearance of the face and the shape of the face is established by using a cascade regression tree algorithm in a regression model, so that the alignment of the face is realized, and the characteristic points of the face 68 are extracted, specifically:

the algorithm involves a two-layer regression process. The 300-W database is selected as a sample set for training, and a is defined _i ＝(x _i ,y _i ) (i=1, 2,.,. 68) defines s= (a) for coordinates of feature points on one picture P in the sample set ₁ ^Τ ,a ₂ ^Τ ,...,a _m ^Τ ) (m=68) is a coordinate vector of all feature points on one picture, which is called a shape. In the first layer regression, the regression iteration formula is:

λ _t a regression model formed for a cascade of regressors,lambda is the current estimate of S _t (. Cndot. Cndot.) is the image update vector of the cascade regressor prediction, S _z Human face real shape, deltaS ^(t+1) Is the residual. The current face shape and the sample image are input into the regression model, the regression model predicts the update vector to obtain a new shape estimator, namely a new current face shape and the residual error of the face shape at the moment, namely the difference between the current shape and the real shape, the regression model carries out iterative updating according to the current residual error, the residual error is continuously reduced, the actual shape of the face is gradually approximated, and finally the face feature points are accurately extracted.

The second layer regression is a regression model formed by training N regressors, and the training process specifically comprises the following steps:

defining a sample set (P) in a training database ₁ ,S ₁ ),....,(P _n ,S _n )，P _r (r=1, 2,., n) is an expression image in the sample set, S _r (r＝1,2,...,n) is the shape vector corresponding to each expression image.

wherein C is a constant, C minimizes the initial predictive loss function, ΔS _r ^(t) For the residual quantity, n=nr, where R is an initialization multiple of each emoticon.

is a regression function, deltaS _r ^(t) For residual quantity, ++>Is S _r Is a current estimate of (1).

finally get lambda _t And (5) finishing the construction of a regression model.

After training the model file, inputting an expression picture to obtain the accurately extracted face 68 personal face feature points.

3) Calculating corresponding Euclidean distance by using a facial expression representation model to obtain a six-element array for representing facial expression characteristics, wherein the six-element array specifically comprises the following steps:

in view of the fact that single expressions can be classified and distinguished through structural features of facial expressions, a facial expression representation model measured in an Euclidean distance mode is adopted, extracted facial feature points are input into the facial expression representation model, and a six-element array D= (D) is calculated by using the model ₁ ,d ₂ ,d ₃ ,d ₄ ,d ₅ ,d ₆ ) Wherein d ₁ Represents the distance between the two eyebrows, d ₂ Represents the distance between the eyebrows, d ₃ Represents the distance between the upper and lower boundaries of the eye, d ₄ Representing the height of the mouth, d ₅ Representing the width of the mouth d ₆ Indicating the distance from the corner of the mouth to the highest position of the upper lip.

4) Training a classification model by using a random forest algorithm, and inputting the six-element array into the trained model to realize classification recognition, wherein the method specifically comprises the following steps of:

the facial expression library fer2013 is selected for training, most of images in the expression library rotate on a plane and a non-plane, and a plurality of images are shielded by a hand, hair, scarf and other shielding objects, so that the facial expression library fer is more in line with actual life scenes. The specific process of training by adopting the random forest algorithm is as follows:

firstly, selecting an expression database fer2013 for training, preprocessing expression images in the fer2013 database, extracting characteristics, and randomly extracting partial samples and partial attributes; determining splitting attributes from the attributes to be selected by utilizing the Gini coefficient, generating nodes, generating CART decision trees, and forming a random forest by the generated multiple decision trees. The splitting attribute is determined from the candidate attributes by using the Gini coefficient, and specifically comprises the following steps:

if the sample set X contains N categories, the Gini coefficients are defined as:

σ _i representing the frequency of occurrence of class i in sample set X if X is divided into X under selected attribute X ₁ And X ₂ Two sample subsets, then the Gini coefficients of the two sample subsets after division sum to:

Gini＝Gini(X)-Gini _split(x) (X)

After the optimal model parameters are determined, inputting the obtained six-element array into a model to obtain a final classification and identification result.

The foregoing examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the foregoing examples, and any changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principles of the present invention are intended to be equivalent, and are included in the scope of the present invention.

Claims

1. The facial expression image enhancement and recognition method is characterized by comprising the following steps of:

specifically, a cascade regression tree algorithm in a regression model is utilized to establish a mapping relation between the appearance of a face and the shape of the face, so that the alignment of the face is realized, and the characteristic points of the h face of the face are extracted; the extracting of the face h personal face feature points specifically comprises:

selecting a 300-W database as a sample set for training, and defining a _i ＝(x _i ,y _i ) (i=1, 2,., h) defines s= (a) for coordinates of feature points on one picture P in the sample set ₁ ^T ,a ₂ ^T ,...,a _m ^T ) M=h, which is the coordinate vector of all feature points on a picture, called shape, and regression iteration formula is:

λ _t a regression model formed for a cascade of regressors,lambda is the current estimate of S _t (. Cndot. Cndot.) is the image update vector of the cascade regressor prediction, S _z Human face real shape, deltaS ^(t+1) Is residual; inputting the current face shape and sample image into a regression model, predicting and updating vectors through the regression model to obtain a new shape estimator, namely a new current face shape and a residual error of the face shape at the moment, namely a difference value between the current shape and a real shape, and iteratively updating the regression model according to the current residual error, continuously reducing the residual error and gradually approaching the true faceThe real shape is finally extracted accurately;

4) The six-element array is input into a trained classification model to realize classification recognition, the classification model is trained by utilizing a random forest algorithm, and the six-element array is input into the trained model to realize classification recognition; the classification and identification specifically comprises the following steps:

determining splitting attributes from the attributes to be selected by utilizing a Gini coefficient, generating nodes, generating CART decision trees, and forming a random forest by the generated multiple decision trees;

2. The method for enhancing and recognizing a facial expression image according to claim 1, wherein in step 1), a Haar feature-based Adaboost cascade detector is used for face positioning, and the method specifically comprises:

3. The method for enhancing and identifying facial expression images according to claim 1, wherein the training process of the regression model formed by the regressors specifically comprises:

Inputting training sample image, initial shape estimator and residual quantity, learning rate as ρ, and initialized regression functionThe method comprises the following steps:

finally get lambda _t And (5) finishing the construction of a regression model.

4. The method for enhancing and recognizing facial expression image according to claim 1, wherein step 3) is to calculate the corresponding euclidean distance by using a facial expression characterization model to obtain a six-element array characterizing facial expression features.

5. The method for enhancing and identifying a facial expression image according to claim 4, wherein calculating the corresponding euclidean distance using a facial expression characterization model comprises: classifying and distinguishing through a facial expression representation model, and calculating to obtain a six-element array D= (D) by adopting the facial expression representation model measured in the Euclidean distance mode ₁ ,d ₂ ,d ₃ ,d ₄ ,d ₅ ,d ₆ ) Wherein d ₁ Represents the distance between the two eyebrows, d ₂ Represents the distance between the eyebrows, d ₃ Represents the distance between the upper and lower boundaries of the eye, d ₄ Representing the height of the mouth, d ₅ Representing the width of the mouth d ₆ Indicating the distance from the corner of the mouth to the highest position of the upper lip.

6. The enhanced recognition method of a facial expression image according to claim 1, wherein the determining a split attribute from the candidate attributes by using Gini coefficients is specifically:

let the sample set X contain N categories, gini coefficients are defined as:

Gini＝Gini(X)-Gini _split(x) (X)