CN112381047A

CN112381047A - Method for enhancing and identifying facial expression image

Info

Publication number: CN112381047A
Application number: CN202011377211.9A
Authority: CN
Inventors: 谢巍; 刘彦汝; 钱文轩; 谢苗苗
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-02-19
Anticipated expiration: 2040-11-30
Also published as: CN112381047B

Abstract

The invention discloses an enhanced recognition method of facial expression images, which comprises the following steps: 1) using an Adaboost cascade detector based on Haar characteristics to carry out face positioning, framing out a face part and cutting, carrying out image preprocessing on a cut image and then storing the image; 2) establishing a mapping relation between the appearance of the human face and the shape of the human face by using a cascade regression tree algorithm in a regression model, and extracting facial feature points; 3) calculating corresponding Euclidean distances by using the facial expression characterization model to obtain a six-element array for characterizing facial expression characteristics; 4) and training a classification model by using a random forest algorithm, and inputting the six-element array into the trained model to realize classification and identification. The method can better identify the facial expression image with certain head deflection on the basis of identifying the standard posture facial expression image, has higher identification efficiency and higher running speed, meets the requirement of practical application, and is more suitable for practical scenes.

Description

Method for enhancing and identifying facial expression image

Technical Field

The invention relates to the technical field of computer vision and pattern recognition, in particular to an enhanced recognition method of facial expression images.

Background

The facial expression recognition technology is used for analyzing the specific mood of a person by extracting specific expression images in pictures or videos so as to better perform human-computer interaction. In view of the characteristic of high information degree, the facial expression recognition plays an important role in the aspects of psychological analysis, clinical medicine, safe driving, criminal investigation and case solving and the like. The facial expression recognition at the present stage mainly aims at the facial expression under the standard posture, namely the front facial expression, but in general, people often unconsciously generate certain head deflection when making the expression, and in practical application, many complex situations may be faced, so that the recognition effect often cannot meet the expected requirement.

Current attempts at facial expression recognition with head deflection can be summarized in the following three categories: face keypoints-based methods, appearance-based methods, and pose-based methods. The method based on the human face key points mainly positions the key points by means of a geometric model and then identifies the key points, a large number of samples for marking the key points are needed, and the key points are difficult to mark automatically (hero, loyal, Chua Jian, Kong Gao, Kongshi, a new deflection angle human face expression identification method [ J ] computer application research, 2018,35(01):282 + 286 ]); the appearance-based method is to obtain local or global expression characteristics of the face under different postures, reduce interference of factors irrelevant to expressions in an image and avoid the problem of difficult extraction of key points, but the recognition effect is general (Wang Chenxing, Liang. a new expression recognition method [ J ]. electronic technology and software engineering, 2018(06): 67.); the gesture-based method can be divided into two types, one is to group the expression library according to different gestures of the face, and group training, recognition and classification are carried out; the other method is to establish the relationship between the non-frontal face and the frontal face sample, map the non-frontal face to the frontal face, and then classify and recognize the frontal face (Zheng civilization, Von Tianke, non-frontal facial expression recognition method based on posture normalization [ P ]. Jiangsu: CN103400105A,2013-11-20.), which has good effect, but is not suitable for practical application because of complex algorithm and slow operation.

Disclosure of Invention

The invention provides a method for enhancing and identifying facial expressions, aiming at the problems. The method adopts the regression model to extract the features, can accurately extract the facial expression features under the deflection angle, and effectively improves the practicability of the method by combining an integrated learning mode. The facial expression recognition method can recognize the facial expression image under the standard posture, has a good recognition effect on the facial expression image with certain head deflection, reduces the algorithm complexity by adopting an integrated learning mode, improves the operation speed, and is more suitable for actual scenes.

The invention is realized by at least one of the following technical schemes.

A facial expression image enhancement identification method comprises the following steps:

1) using an Adaboost cascade detector based on Haar characteristics to carry out face positioning, framing out a face part and cutting, carrying out image preprocessing on a cut image and then storing the image;

2) establishing a mapping relation between the face appearance and the face shape by using a regression model, realizing face alignment, extracting face characteristic points, and determining the face characteristic points;

3) calculating the Euclidean distance corresponding to the human face characteristic points to obtain a six-element array representing the human face expression characteristics;

4) and inputting the six-element array into a trained classification model to realize classification and identification.

Preferably, in step 1), the face localization is performed by using an Adaboost cascade detector based on Haar features, which specifically includes:

firstly, obtaining Haar characteristics of an image, utilizing an integral diagram to traverse to obtain Haar characteristic values, using the Haar characteristic values as input of a classifier, giving each input data the same initial weight to train a weak classifier, selecting the weak classifier with the minimum error as an optimal weak classifier of the current round, calculating the weight of the next optimal weak classifier by calculating the error between a predicted value and a real value of the optimal weak classifier, performing multiple iterations, performing weighted combination on N optimal weak classifiers to obtain a strong classifier, and finally cascading a plurality of strong classifiers for detection and positioning.

Preferably, in the step 2), a mapping relation between the face appearance and the face shape is established by using a cascading regression tree algorithm in the regression model, so that the face alignment is realized, and the face characteristic points of the face h are extracted.

Preferably, the extracting facial h face feature points specifically includes:

selecting a 300-W database as a sample set for training, and defining a_i＝(x_i,y_i) (i ═ 1, 2., h) is the coordinates of the feature points in one picture P in the sample set, and S ═ is defined as (a ═ a ·₁ ^Τ,a₂ ^Τ,...,a_m ^Τ) And m is h, which is a coordinate vector of all feature points on one picture and is called as a shape, and the regression iteration formula is as follows:

λ_ta regression model formed by cascading a plurality of regressors,

is the current estimate of S, λ_t(-) image update vector predicted by Cascade regressor, S_zFor the true shape of a human face, Δ S^(t+1)Is a residual error; inputting the current face shape and the sample image into the regression model, predicting and updating the vector through the regression model to obtain a new shape estimator, namely a new current face shape and a residual error of the current face shape, namely a difference value between the current shape and a real shape, iteratively updating the regression model according to the current residual error,and continuously reducing the residual error, gradually approaching the real shape of the human face, and finally accurately extracting the characteristic points of the human face.

Preferably, the training process of the regression model formed by the N regressors specifically includes:

defining a sample set (P)₁,S₁),....,(P_n,S_n)，P_r(r ═ 1, 2.. and n) are expression images in the sample set, and the shape vector corresponding to each expression image is S_r(r＝1,2,...,n)；

Inputting training sample images, initial shape estimators and residual quantities, learning rate is rho, and initialized regression function

Comprises the following steps:

where C is a constant that minimizes the initial prediction loss function, Δ S_r ^(t)N ═ nR, R is the initialization multiple of each expression image;

and taking the square error as a loss function, and deriving the loss function to obtain a gradient which is taken as a fitting object in each step of iteration:

is a regression function, Δ S_r ^(t)Is the amount of the residual error,

is S_rThe current estimate of (a);

k is 1, K; n constructs a regression tree λ based on a weak classifier G_ikK is the number of weak classifiers G, update：

Finally obtain lambda_tAnd finishing the construction of the regression model.

Preferably, in the step 3), the corresponding Euclidean distance is calculated by using the facial expression representation model to obtain a six-element array representing the facial expression features.

Preferably, the calculating the corresponding euclidean distance by using the facial expression representation model specifically includes: classifying and distinguishing through the facial expression characterization model, and calculating to obtain a six-element array D ═ (D) by adopting the facial expression characterization model measured in an Euclidean distance mode₁,d₂,d₃,d₄,d₅,d₆) Wherein d is₁Indicating the distance between the two eyebrows, d₂Indicating the distance between eyebrows and eyes, d₃Indicating the distance between the upper and lower boundaries of the eye, d₄Indicating the height of the mouth, d₅Indicates the width of the mouth, d₆Indicating the distance from the corner of the mouth to the highest position of the upper lip.

Preferably, in the step 4), a random forest algorithm is used for training a classification model, and the six-element array is input into the trained model to realize classification and recognition.

Preferably, the classification identification specifically includes the following steps:

step one, selecting an expression database fer2013 to train and randomly extract a part of samples and a part of attributes;

determining splitting attributes from the attributes to be selected by using a Gini coefficient, generating nodes, generating a CART decision tree, and forming a random forest by using the generated multiple decision trees;

and step three, after the samples are input, N classification results are generated in the forest, a voting mechanism is adopted to vote the classification results obtained by all the input samples, and the class with the largest voting times is the output identification result.

Preferably, the determining the splitting attribute from the attributes to be selected by using the Gini coefficient specifically includes:

let the sample set X contain N classes, the Gini coefficient is defined as:

σ_iindicating the frequency of occurrence of class i in the sample set X if X is classified as X under the selected attribute X₁And X₂Two sample subsets, and the sum of Gini coefficients of the two divided sample subsets is:

M₁and M₂Are each X₁And X₂The number of samples, M is the number of X samples, and the Gini coefficient is:

Gini＝Gini(X)-Gini_split(x)(X)

and selecting the attribute with the minimum Gini coefficient as a splitting attribute at each splitting node.

According to the invention, an Adaboost cascade detector based on Haar features is used for preprocessing an original image, a cascade regression tree algorithm is used for establishing a posture mapping relation to realize face alignment, a face expression characterization model is combined to obtain face expression features, and finally a random forest algorithm is used for classification and identification, so that accurate extraction of the face expression features at a deflection angle is realized, the identification efficiency is improved, and certain practical value is achieved.

Compared with the prior art, the invention has the beneficial effects that:

(1) the features are extracted by using the cascade regression tree algorithm, the problem of low positioning accuracy of directly extracting the face key points is solved, the face is aligned firstly by using a regression iteration mode, and then the features are extracted, so that the extraction accuracy of the non-frontal face expression features is effectively improved.

(2) And the random forest algorithm is adopted for classification and identification, so that the identification efficiency is improved, the running time is shortened, and the practical value is enhanced.

(3) The deflection range of facial expression recognition is expanded, and the applicable scenes of the system are expanded.

Drawings

Fig. 1 is a schematic flow chart of the method for enhancing and recognizing a facial expression image according to the present embodiment.

Detailed Description

In order that those skilled in the art will better understand the technical solution of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and the detailed description, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment discloses an enhanced recognition method of a facial expression image, which can recognize the facial expression image under a standard posture, has a good recognition effect on the facial expression image with certain head deflection, reduces algorithm complexity by adopting an integrated learning mode, improves operation speed and is more suitable for actual scenes.

As shown in fig. 1, the method for enhancing and identifying a facial expression image in this embodiment specifically includes the following steps:

1) the method comprises the following steps of utilizing an Adaboost cascade detector based on Haar characteristics to carry out face positioning, framing out a face part and cutting, carrying out image preprocessing of gray normalization and scale normalization on a cut image and then storing the image, wherein the specific process comprises the following steps:

Locating and framing a face part, externally expanding the face frame in a proper proportion to ensure that a complete face image is obtained, preprocessing the image after cutting, specifically comprising scale normalization and gray scale normalization, properly reducing the image to increase the smoothness and definition of the image, and performing weighted average graying on the image to keep the accurate information of the image as far as possible under the condition of reducing the calculated amount of the image.

2) Establishing a mapping relation between the appearance of the human face and the shape of the human face by using a cascading regression tree algorithm in a regression model, realizing human face alignment, and extracting 68 personal face characteristic points of the human face, wherein the specific steps are as follows:

the algorithm comprises a two-layer regression process. Here, a 300-W database was selected as the sample set for training, and a was defined_i＝(x_i,y_i) (i ═ 1, 2., 68.) as the coordinates of the feature points in one picture P in the sample set, S ═ is defined (a ═₁ ^Τ,a₂ ^Τ,...,a_m ^Τ) And (m-68) is a coordinate vector of all feature points on one picture, and is called as a shape. In the first layer regression, the regression iteration formula is:

λ_ta regression model formed by cascading a plurality of regressors,

is the current estimate of S, λ_t(-) image update vector predicted by Cascade regressor, S_zFor the true shape of a human face, Δ S^(t+1)Is the residual error. Regression modelInputting a current face shape and a sample image, predicting and updating a vector through a regression model to obtain a new shape estimator, namely a residual error between the new current face shape and the face shape at the moment, namely a difference value between the current shape and the real shape, iteratively updating the regression model according to the current residual error, continuously reducing the residual error, gradually approaching the real shape of the face, and finally accurately extracting the face characteristic points.

The second-layer regression is a regression model formed by training N regressors, and the training process specifically comprises the following steps:

defining a set of samples (P) in a training database₁,S₁),....,(P_n,S_n)，P_r(r ═ 1, 2.. times, n) is the expression image in the sample set, S_rAnd (r ═ 1, 2.., n) is a shape vector corresponding to each expression image.

Comprises the following steps:

where C is a constant that minimizes the initial prediction loss function, Δ S_r ^(t)N ═ nR as the residual amount, where R is the initialized multiple of each expression image.

is a regression function, Δ S_r ^(t)Is the amount of the residual error,

is S_rThe current estimate of (a).

K is 1, K; n constructs a regression tree λ based on a weak classifier G_ikAnd K is the number of the weak classifiers G, and is updated:

finally obtain lambda_tAnd finishing the construction of the regression model.

After the model file is trained, the expression picture is input, and then the accurate extracted face 68 personal face characteristic points can be obtained.

3) Calculating corresponding Euclidean distance by using the facial expression representation model to obtain a six-element array representing facial expression characteristics, which specifically comprises the following steps:

in view of the fact that single expressions can be classified and distinguished through structural characteristics of facial expressions, a facial expression representation model measured in an Euclidean distance mode is adopted, extracted facial feature points are input into the facial expression representation model, and a six-element array D (D) is obtained through calculation by means of the model₁,d₂,d₃,d₄,d₅,d₆) Wherein d is₁Indicating the distance between the two eyebrows, d₂Indicating the distance between eyebrows and eyes, d₃Indicating the distance between the upper and lower boundaries of the eye, d₄Indicating the height of the mouth, d₅Indicates the width of the mouth, d₆Indicating the distance from the corner of the mouth to the highest position of the upper lip.

4) Training a classification model by utilizing a random forest algorithm, and inputting the six-element array into the trained model to realize classification and identification, wherein the classification and identification specifically comprises the following steps:

the human face expression library fer2013 is selected for training, images in the expression library mostly rotate on a plane and a non-plane, and a plurality of images are shielded by shielding objects such as hands, hair and scarves, so that the human face expression library fer2013 is more suitable for actual life situations. The specific process of training by adopting the random forest algorithm comprises the following steps:

selecting an expression database fer2013 for training, and randomly extracting part of samples and part of attributes after preprocessing and feature extraction are carried out on expression images in the fer2013 database; and step two, determining splitting attributes from the attributes to be selected by utilizing Gini coefficients, generating nodes, generating CART decision trees, and forming a random forest by the generated multiple decision trees. Determining the splitting attribute from the attributes to be selected by using the Gini coefficient, specifically:

if the sample set X contains N classes, the Gini coefficient is defined as:

σ_iindicating the frequency of occurrence of class i in the sample set X if X is classified as X under the selected attribute X₁And X₂Two sample subsets, then the Gini coefficient sum for the two sample subsets after division is:

Gini＝Gini(X)-Gini_split(x)(X)

And after the optimal model parameters are determined, inputting the obtained hexagram array into a model to obtain a final classification recognition result.

The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments, and any changes, modifications, substitutions, combinations, and simplifications made by the present specification and drawings without departing from the spirit and principle of the present invention should be construed as equivalents and included in the protection scope of the present invention.

Claims

1. A facial expression image enhancement identification method is characterized by comprising the following steps:

2. The method for enhancing and recognizing the facial expression image according to claim 1, wherein in the step 1), the face positioning is performed by using an Adaboost cascade detector based on Haar features, and the method specifically comprises the following steps:

3. The method for enhancing the recognition of the facial expression image according to claim 2, wherein the step 2) is to establish a mapping relationship between the facial appearance and the facial shape by using a cascading regression tree algorithm in a regression model, to realize the alignment of the face, and to extract the facial feature points of the face h.

4. The method of claim 3, wherein the extracting facial h facial feature points specifically comprises:

λ_ta regression model formed by cascading a plurality of regressors,

is the current estimate of S, λ_t(-) image update vector predicted by Cascade regressor, S_zFor the true shape of a human face, Δ S^(t+1)Is a residual error; inputting the current face shape and the sample image into a regression model, predicting and updating a vector through the regression model to obtain a new shape estimator, namely a new current face shape and a residual error of the current face shape, namely a difference value between the current shape and a real shape, and carrying out regression model processing according to the current residual errorAnd (4) carrying out iterative updating, continuously reducing the residual error, gradually approaching the real shape of the human face, and finally accurately extracting the characteristic points of the human face.

5. The method of claim 4, wherein the training process of the regression model formed by the N regressors specifically comprises:

defining a sample set (P)₁,S₁),....,(P_n,S_n)，P_r(r ═ 1, 2.. times.n) are expression images in the sample set, and the shape vector corresponding to each expression image is S_r(r＝1,2,...,n)；

Comprises the following steps:

is a regression function, Δ S_r ^(t)Is the amount of the residual error,

is S_rThe current estimate of (a);

finally obtain lambda_tAnd finishing the construction of the regression model.

6. The method for enhancing the recognition of the facial expression image according to claim 5, wherein the step 3) is to calculate the corresponding Euclidean distance by using the facial expression representation model to obtain a six-element array representing the facial expression features.

7. The method of claim 6, wherein the calculating the Euclidean distance using the facial expression representation model specifically comprises: classifying and distinguishing through the facial expression characterization model, and calculating to obtain a six-element array D ═ (D) by adopting the facial expression characterization model measured in an Euclidean distance mode₁,d₂,d₃,d₄,d₅,d₆) Wherein d is₁Indicating the distance between the two eyebrows, d₂Indicating the distance between eyebrows and eyes, d₃Indicating the distance between the upper and lower boundaries of the eye, d₄Indicating the height of the mouth, d₅Indicates the width of the mouth, d₆Indicating the distance from the corner of the mouth to the highest position of the upper lip.

8. The method for enhancing recognition of facial expression images according to claim 7, wherein in step 4), a random forest algorithm is used for training a classification model, and the six-element array is input into the trained model to realize classification recognition.

9. The method for enhancing recognition of facial expression images according to claim 8, wherein the classification recognition specifically comprises the following steps:

10. The method for enhancing and recognizing the facial expression image according to claim 9, wherein the determining of the splitting attribute from the candidate attributes by using the Gini coefficient specifically comprises:

let the sample set X contain N classes, the Gini coefficient is defined as:

Gini＝Gini(X)-Gini_split(x)(X)