CN115937937A

CN115937937A - Facial expression recognition method based on improved residual error neural network

Info

Publication number: CN115937937A
Application number: CN202211530645.7A
Authority: CN
Inventors: 张旭光; 张伟光; 谢强伟; 方银锋
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2023-04-07

Abstract

The invention belongs to the technical field of image recognition, and discloses a facial expression recognition method based on an improved residual error neural network, which comprises the following steps: step 1: preprocessing data; and 2, step: constructing a facial expression recognition model based on an improved residual error neural network; and 3, step 3: performing iterative training on the face recognition model; and 4, step 4: and acquiring a facial expression recognition result based on the trained facial expression recognition model. The invention designs two residual error modules based on the residual error idea, thereby preventing the network degradation problem. The invention introduces the Incepotion module, and solves the problem of insufficient feature extraction. The invention uses a new Mish activation function to replace a common ReLu activation function, further improves the robustness of the model and improves the accuracy of the facial expression recognition. The network constructed by the invention adopts a mechanism of learning rate attenuation, thereby preventing the occurrence of an overfitting phenomenon.

Description

Facial expression recognition method based on improved residual error neural network

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a facial expression recognition method based on an improved residual error neural network.

Background

With the rapid development of artificial intelligence technology, facial expression recognition becomes an important research topic in the field of computer image processing, and has wide application prospects in the fields of human-computer interaction, safe driving, online education and the like, for example, in the field of safe driving, a camera can acquire facial expression information of a driver in real time, judge whether the driving state is good or not, and avoid accidents; in the field of education, the classroom state of students can be evaluated by identifying the facial expressions of the students in the classroom, and a teaching method is adjusted to obtain a better teaching effect.

Facial expression recognition mainly comprises three parts: preprocessing, feature extraction and classification identification. The accuracy of algorithm identification mainly depends on a feature extraction method. The facial expression feature extraction method is mainly divided into a method based on traditional feature extraction and a method based on deep learning. Wherein, the feature extraction plays a decisive role in the accuracy of facial expression recognition. There are many conventional feature extraction methods. Such as Local Binary Patterns (LBP), scale Invariant Feature Transforms (SIFT), and Gabor wavelet transforms. Although the traditional facial expression recognition method can also achieve good effects, the features generated and used by the traditional facial recognition algorithm belong to shallow features, and deeper high semantic features and depth features cannot be obtained from the original image. In addition, in order to obtain good recognition effect, these traditional face recognition algorithms must be combined with artificial features, which often cause unexpected human factors and errors in the feature extraction and recognition process. The advent of deep learning solves these problems well.

In recent years, with the development of deep learning, deep Convolutional Neural Networks (DCNN) have made a breakthrough in the fields of image classification, recognition, and the like. As the depth of the network increases, although the features may fit better, training becomes more complex due to problems of gradient disappearance and gradient explosion. In addition, many network models still have the problem of insufficient feature extraction.

Disclosure of Invention

The invention aims to provide a facial expression recognition method based on an improved residual error neural network, so as to solve the technical problems.

In order to solve the technical problems, the specific technical scheme of the facial expression recognition method based on the improved residual error neural network is as follows:

a facial expression recognition method based on an improved residual error neural network comprises the following steps:

step 1: preprocessing data;

and 2, step: constructing a facial expression recognition model based on an improved residual error neural network;

and 3, step 3: performing iterative training on the face recognition model;

and 4, step 4: and acquiring a facial expression recognition result based on the trained facial expression recognition model.

Further, the step 1 comprises the following steps:

step 1.1: acquiring a face image data set, identifying a face region of the data by using a haar algorithm, then cutting a face part, adjusting the size of a picture to be 48 × 48, and dividing the picture into a training set and a testing set according to the ratio of 8;

step 1.2: the acquired facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, wherein tags with expressions are marked on the last frame of 327 video sequences, the last three pictures in the picture sequences with tags of each person are selected, and the total number of the pictures is 981, and the pictures contain 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise; the KDEF data set comprises 4900 pictures, 980 front face pictures are selected as an experimental data set, and the total number of the pictures comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.

Further, the model of step 2 comprises a residual error feature extraction module, an inclusion feature extraction module and a classification module.

Further, the residual error feature extraction module comprises 1 convolutional layer, 7 residual error modules I and 2 residual error modules II, wherein the residual error modules I comprise 2 convolutional layers and 2 batch processing layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;

the residual error feature extraction module uses a Mish activation function, as follows:

f(x)＝x·tanh(ln(1+e ^x ))

the jump layer connection adopts a characteristic addition method, and the formula is as follows:

wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,

represents a characteristic a, <' > is selected>

Represents characteristic b, <' >>

Indicating the summed features.

Further, the inclusion feature extraction module is composed of 9 inclusion modules, wherein a feature splicing method adopted by feature fusion is as follows:

represents a characteristic a, <' > is selected>

Represents a characteristic b, <' > is>

And &>

And representing the spliced features of the features.

Further, the classification module uses a Softmax activation function and a cross entropy loss function, as follows:

softmax equation: i represents the number of output nodes; x is the number of _i Is the output value of the ith node, K represents the label of the output node, i.e., from 1 to K, K is the number of output nodes, i.e., the number of classification classes, and the multi-class output value is converted to [0,1 ] by the Softmax function]The probability distribution over the range, in the loss function formula,

is a predicted value, y, generated after the Softmax function _i Is the true value.

Further, the step 3 comprises the following specific steps:

the data size of an input network is 48x48x1, the output size is 7x1, the convolution kernel size of the convolutional layer is 3x3, the convolution kernel size of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 ^-3 The learning rate is attenuated along with the increase of the iteration number, and the attenuation formula is shown as follows:

lr _y ＝lr _x ×dr

wherein lr is _y Represents the learning rate after attenuation, lr _x Representing the learning rate before attenuation, dr representing the attenuation coefficient;

and training a preset network by using a training set, and adjusting the hyper-parameters in the network by the performance of the training learning condition on the test set to obtain an optimal model.

Further, the step 4 comprises the following specific steps:

and taking the processed test set as an input sample, and inputting the sample into a trained approximately optimal network to perform facial expression recognition.

The facial expression recognition method based on the improved residual error neural network has the following advantages:

1. the invention designs two residual modules based on the residual thought, constructs a 20-layer residual neural network and prevents the network degradation problem.

2. The invention introduces an inclusion module, designs a facial expression recognition model based on an improved residual neural network, and solves the problem of insufficient feature extraction.

3. The invention uses a new Mish activation function to replace a common ReLu activation function, further improves the robustness of the model and improves the accuracy of the facial expression recognition.

4. The network constructed by the invention adopts a mechanism of learning rate attenuation, thereby preventing the over-fitting phenomenon from occurring.

Drawings

FIG. 1 is a flow chart of an implementation of the method of the present invention;

FIG. 2 is a network architecture diagram of the method of the present invention;

FIG. 3 is a schematic diagram of a residual error design module according to the present invention;

FIG. 4 is a graph of the results of the CK + data set experiments performed by the method of the present invention;

fig. 5 is a graph of experimental results of the method of the present invention on a KDEF dataset.

Detailed Description

In order to better understand the purpose, structure and function of the present invention, a facial expression recognition method based on an improved residual error neural network is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for recognizing facial expressions based on an improved residual neural network of the present invention includes the following steps:

acquiring a face image data set, in order to avoid the influence of the picture background on a recognition result, recognizing a face region of data by using a haar algorithm, then cutting a face part, adjusting the size of a picture to 48 × 48, and dividing the picture into a training set and a test set according to the ratio of 8.

The obtained facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, the last frame of 327 video sequences is marked with an expression label, the last three pictures in the picture sequences of each person with the label are selected in the experiment, the total 981 pictures comprise 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise.

The KDEF data set comprises 4900 pictures, 980 front face pictures are selected as an experimental data set, and the experimental data set comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.

As shown in fig. 2, a facial expression recognition model based on an improved residual neural network is constructed based on a designed residual module and an inclusion module, and includes a 20-layer residual neural network, which mainly includes a residual feature extraction module (10 layers), an inclusion feature extraction module (9 layers), and a classification module (1 layer).

As shown in fig. 3, the residual feature extraction module includes 1 convolutional layer, 7 residual modules one and 2 residual modules two, where the residual module one includes 2 convolutional layers and 2 batch layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;

the residual error feature extraction module uses a Mish activation function, as shown in the following, the non-monotonicity of Mish enables a small negative input to be kept as a negative output, and the expression capacity and gradient flow are improved.

f(x)＝x·tanh(ln(1+e ^x ))

where 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) representing the pixel value of the coordinate corresponding to the feature map,

represents characteristic a, <' > is>

Represents a characteristic b, <' > is>

Representing the summed features.

The inclusion feature extraction module is comprised of 9 inclusion modules. The characteristic splicing method adopted by the characteristic fusion is as follows:

represents a characteristic a, <' > is selected>

Represents a characteristic b, <' > is>

And &>

And representing the spliced features of the features.

The classification module comprises 1 full connection layer, and uses a Softmax activation function and a cross entropy loss function, as follows:

softmax equation: i represents the number of output nodes; x is the number of _i Is the output value of the ith node. K denotes the label of the output node, i.e. from 1 to K. K is the number of output nodes, i.e. the number of classification classes. The multi-class output value can be converted to [0,1 ] by the Softmax function]Probability distribution over a range. In the formula of the loss function,

is the predicted value, y, generated after the Softmax function _i Is the true value.

The parameter settings for the residual module and the inclusion module are shown in tables 1 and 2. 1 \ u 1 to 1_7 in Table 1 represent residual modules 1,2_1 and 2_2 represent residual module 2.

Table 1 residual module parameter set

TABLE 2Inception Module parameter settings

/>

The input network has a data size of 48x48x1 and the output size of 7x1. The convolution kernel of the convolution layer is 3x3, the convolution kernel of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 ^-3 The learning rate is attenuated as the number of iterations increases, and the attenuation formula is shown below:

lr _y ＝lr _x ×dr

wherein lr is _y Represents the learning rate after attenuation, lr _x The learning rate before the attenuation is expressed, and dr is the attenuation coefficient. The attenuation coefficient settings are shown in table 3.

TABLE 3 learning Rate decay parameters at different iterations

And training the preset network by using the training set, and adjusting the hyper-parameters in the network by the performance of the training learning condition on the test set to obtain an optimal model.

The processed test set is used as an input sample, the sample is input into a trained approximately optimal network for facial expression recognition, as shown in fig. 4 and 5, the result shows that the accuracy of the facial expression recognition network model based on the improved residual error neural network on a CK + data set is 96.37%, the accuracy on a KDEF data set is 93.38%, and the accuracy is higher than that of other recognition methods, as shown in table 4, the recognition rates of 7 methods on the CK + data set and the KDEF data set are compared.

Table 4 accuracy of the present method and method of the invention on CK + and KDEF datasets

It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A facial expression recognition method based on an improved residual error neural network is characterized by comprising the following steps:

step 1: preprocessing data;

step 2: constructing a facial expression recognition model based on an improved residual error neural network;

and step 3: performing iterative training on the face recognition model;

2. The method for recognizing the facial expression based on the improved residual neural network as claimed in claim 1, wherein the step 1 comprises the following steps:

step 1.1: acquiring a face image data set, identifying a face region of the data by using a haar algorithm, cutting a face part, adjusting the size of the image to 48 × 48, and dividing the image into a training set and a test set according to the ratio of 8;

step 1.2: the acquired facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, the last frame of 327 video sequences is marked with a label of an expression, the last three pictures in the picture sequence of each person with the label are selected, the total number of the pictures is 981, and the pictures contain 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise; the KDEF data set comprises 4900 pictures, 980 front face pictures are selected as experiment data sets, and the total number of the experiment data sets comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.

3. The method of claim 1, wherein the model of step 2 comprises a residual feature extraction module, an inclusion feature extraction module, and a classification module.

4. The method for recognizing the facial expression based on the improved residual neural network of claim 3, wherein the residual feature extraction module comprises 1 convolutional layer, 7 residual modules I and 2 residual modules II, wherein the residual module I comprises 2 convolutional layers and 2 batch layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;

f(x)＝x·tanh(ln(1+e ^x ))

represents a characteristic a, <' > is selected>

Represents a characteristic b, <' > is>

Representing the summed features.

5. The method according to claim 3, wherein the inclusion feature extraction module comprises 9 inclusion modules, and a feature splicing method adopted by feature fusion is as follows:

represents a characteristic a, <' > is selected>

Represents a characteristic b, <' > is>

And &>

And representing the spliced features of the features.

6. The method for recognizing facial expressions based on improved residual neural network according to claim 3, wherein the classification module uses a Softmax activation function and a cross entropy loss function as follows:

/>

7. The method for recognizing facial expressions based on an improved residual neural network as claimed in claim 1, wherein the step 3 comprises the following specific steps:

the data size of an input network is 48x48x1, the output size is 7x1, the convolution kernel size of the convolution layer is 3x3, the convolution kernel size of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 ^-3 The learning rate is attenuated along with the increase of the iteration number, and the attenuation formula is shown as follows:

lr _y ＝lr _x ×dr

8. The method for recognizing the facial expression based on the improved residual neural network as claimed in claim 1, wherein the step 4 comprises the following specific steps: