CN115937937A - Facial expression recognition method based on improved residual error neural network - Google Patents

Facial expression recognition method based on improved residual error neural network Download PDF

Info

Publication number
CN115937937A
CN115937937A CN202211530645.7A CN202211530645A CN115937937A CN 115937937 A CN115937937 A CN 115937937A CN 202211530645 A CN202211530645 A CN 202211530645A CN 115937937 A CN115937937 A CN 115937937A
Authority
CN
China
Prior art keywords
facial expression
ltoreq
neural network
expression recognition
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211530645.7A
Other languages
Chinese (zh)
Inventor
张旭光
张伟光
谢强伟
方银锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211530645.7A priority Critical patent/CN115937937A/en
Publication of CN115937937A publication Critical patent/CN115937937A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image recognition, and discloses a facial expression recognition method based on an improved residual error neural network, which comprises the following steps: step 1: preprocessing data; and 2, step: constructing a facial expression recognition model based on an improved residual error neural network; and 3, step 3: performing iterative training on the face recognition model; and 4, step 4: and acquiring a facial expression recognition result based on the trained facial expression recognition model. The invention designs two residual error modules based on the residual error idea, thereby preventing the network degradation problem. The invention introduces the Incepotion module, and solves the problem of insufficient feature extraction. The invention uses a new Mish activation function to replace a common ReLu activation function, further improves the robustness of the model and improves the accuracy of the facial expression recognition. The network constructed by the invention adopts a mechanism of learning rate attenuation, thereby preventing the occurrence of an overfitting phenomenon.

Description

Facial expression recognition method based on improved residual error neural network
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a facial expression recognition method based on an improved residual error neural network.
Background
With the rapid development of artificial intelligence technology, facial expression recognition becomes an important research topic in the field of computer image processing, and has wide application prospects in the fields of human-computer interaction, safe driving, online education and the like, for example, in the field of safe driving, a camera can acquire facial expression information of a driver in real time, judge whether the driving state is good or not, and avoid accidents; in the field of education, the classroom state of students can be evaluated by identifying the facial expressions of the students in the classroom, and a teaching method is adjusted to obtain a better teaching effect.
Facial expression recognition mainly comprises three parts: preprocessing, feature extraction and classification identification. The accuracy of algorithm identification mainly depends on a feature extraction method. The facial expression feature extraction method is mainly divided into a method based on traditional feature extraction and a method based on deep learning. Wherein, the feature extraction plays a decisive role in the accuracy of facial expression recognition. There are many conventional feature extraction methods. Such as Local Binary Patterns (LBP), scale Invariant Feature Transforms (SIFT), and Gabor wavelet transforms. Although the traditional facial expression recognition method can also achieve good effects, the features generated and used by the traditional facial recognition algorithm belong to shallow features, and deeper high semantic features and depth features cannot be obtained from the original image. In addition, in order to obtain good recognition effect, these traditional face recognition algorithms must be combined with artificial features, which often cause unexpected human factors and errors in the feature extraction and recognition process. The advent of deep learning solves these problems well.
In recent years, with the development of deep learning, deep Convolutional Neural Networks (DCNN) have made a breakthrough in the fields of image classification, recognition, and the like. As the depth of the network increases, although the features may fit better, training becomes more complex due to problems of gradient disappearance and gradient explosion. In addition, many network models still have the problem of insufficient feature extraction.
Disclosure of Invention
The invention aims to provide a facial expression recognition method based on an improved residual error neural network, so as to solve the technical problems.
In order to solve the technical problems, the specific technical scheme of the facial expression recognition method based on the improved residual error neural network is as follows:
a facial expression recognition method based on an improved residual error neural network comprises the following steps:
step 1: preprocessing data;
and 2, step: constructing a facial expression recognition model based on an improved residual error neural network;
and 3, step 3: performing iterative training on the face recognition model;
and 4, step 4: and acquiring a facial expression recognition result based on the trained facial expression recognition model.
Further, the step 1 comprises the following steps:
step 1.1: acquiring a face image data set, identifying a face region of the data by using a haar algorithm, then cutting a face part, adjusting the size of a picture to be 48 × 48, and dividing the picture into a training set and a testing set according to the ratio of 8;
step 1.2: the acquired facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, wherein tags with expressions are marked on the last frame of 327 video sequences, the last three pictures in the picture sequences with tags of each person are selected, and the total number of the pictures is 981, and the pictures contain 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise; the KDEF data set comprises 4900 pictures, 980 front face pictures are selected as an experimental data set, and the total number of the pictures comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.
Further, the model of step 2 comprises a residual error feature extraction module, an inclusion feature extraction module and a classification module.
Further, the residual error feature extraction module comprises 1 convolutional layer, 7 residual error modules I and 2 residual error modules II, wherein the residual error modules I comprise 2 convolutional layers and 2 batch processing layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;
the residual error feature extraction module uses a Mish activation function, as follows:
f(x)=x·tanh(ln(1+e x ))
the jump layer connection adopts a characteristic addition method, and the formula is as follows:
Figure BDA0003975162270000031
wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,
Figure BDA0003975162270000032
represents a characteristic a, <' > is selected>
Figure BDA0003975162270000033
Represents characteristic b, <' >>
Figure BDA0003975162270000034
Indicating the summed features.
Further, the inclusion feature extraction module is composed of 9 inclusion modules, wherein a feature splicing method adopted by feature fusion is as follows:
Figure BDA0003975162270000035
Figure BDA0003975162270000036
wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,
Figure BDA0003975162270000037
represents a characteristic a, <' > is selected>
Figure BDA0003975162270000038
Represents a characteristic b, <' > is>
Figure BDA0003975162270000039
And &>
Figure BDA00039751622700000310
And representing the spliced features of the features.
Further, the classification module uses a Softmax activation function and a cross entropy loss function, as follows:
Figure BDA00039751622700000311
Figure BDA00039751622700000312
softmax equation: i represents the number of output nodes; x is the number of i Is the output value of the ith node, K represents the label of the output node, i.e., from 1 to K, K is the number of output nodes, i.e., the number of classification classes, and the multi-class output value is converted to [0,1 ] by the Softmax function]The probability distribution over the range, in the loss function formula,
Figure BDA00039751622700000313
is a predicted value, y, generated after the Softmax function i Is the true value.
Further, the step 3 comprises the following specific steps:
the data size of an input network is 48x48x1, the output size is 7x1, the convolution kernel size of the convolutional layer is 3x3, the convolution kernel size of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 -3 The learning rate is attenuated along with the increase of the iteration number, and the attenuation formula is shown as follows:
lr y =lr x ×dr
wherein lr is y Represents the learning rate after attenuation, lr x Representing the learning rate before attenuation, dr representing the attenuation coefficient;
and training a preset network by using a training set, and adjusting the hyper-parameters in the network by the performance of the training learning condition on the test set to obtain an optimal model.
Further, the step 4 comprises the following specific steps:
and taking the processed test set as an input sample, and inputting the sample into a trained approximately optimal network to perform facial expression recognition.
The facial expression recognition method based on the improved residual error neural network has the following advantages:
1. the invention designs two residual modules based on the residual thought, constructs a 20-layer residual neural network and prevents the network degradation problem.
2. The invention introduces an inclusion module, designs a facial expression recognition model based on an improved residual neural network, and solves the problem of insufficient feature extraction.
3. The invention uses a new Mish activation function to replace a common ReLu activation function, further improves the robustness of the model and improves the accuracy of the facial expression recognition.
4. The network constructed by the invention adopts a mechanism of learning rate attenuation, thereby preventing the over-fitting phenomenon from occurring.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a network architecture diagram of the method of the present invention;
FIG. 3 is a schematic diagram of a residual error design module according to the present invention;
FIG. 4 is a graph of the results of the CK + data set experiments performed by the method of the present invention;
fig. 5 is a graph of experimental results of the method of the present invention on a KDEF dataset.
Detailed Description
In order to better understand the purpose, structure and function of the present invention, a facial expression recognition method based on an improved residual error neural network is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for recognizing facial expressions based on an improved residual neural network of the present invention includes the following steps:
acquiring a face image data set, in order to avoid the influence of the picture background on a recognition result, recognizing a face region of data by using a haar algorithm, then cutting a face part, adjusting the size of a picture to 48 × 48, and dividing the picture into a training set and a test set according to the ratio of 8.
The obtained facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, the last frame of 327 video sequences is marked with an expression label, the last three pictures in the picture sequences of each person with the label are selected in the experiment, the total 981 pictures comprise 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise.
The KDEF data set comprises 4900 pictures, 980 front face pictures are selected as an experimental data set, and the experimental data set comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.
As shown in fig. 2, a facial expression recognition model based on an improved residual neural network is constructed based on a designed residual module and an inclusion module, and includes a 20-layer residual neural network, which mainly includes a residual feature extraction module (10 layers), an inclusion feature extraction module (9 layers), and a classification module (1 layer).
As shown in fig. 3, the residual feature extraction module includes 1 convolutional layer, 7 residual modules one and 2 residual modules two, where the residual module one includes 2 convolutional layers and 2 batch layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;
the residual error feature extraction module uses a Mish activation function, as shown in the following, the non-monotonicity of Mish enables a small negative input to be kept as a negative output, and the expression capacity and gradient flow are improved.
f(x)=x·tanh(ln(1+e x ))
The jump layer connection adopts a characteristic addition method, and the formula is as follows:
Figure BDA0003975162270000051
where 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) representing the pixel value of the coordinate corresponding to the feature map,
Figure BDA0003975162270000054
represents characteristic a, <' > is>
Figure BDA0003975162270000055
Represents a characteristic b, <' > is>
Figure BDA0003975162270000056
Representing the summed features.
The inclusion feature extraction module is comprised of 9 inclusion modules. The characteristic splicing method adopted by the characteristic fusion is as follows:
Figure BDA0003975162270000053
Figure BDA0003975162270000052
wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,
Figure BDA0003975162270000061
represents a characteristic a, <' > is selected>
Figure BDA0003975162270000062
Represents a characteristic b, <' > is>
Figure BDA0003975162270000063
And &>
Figure BDA0003975162270000064
And representing the spliced features of the features.
The classification module comprises 1 full connection layer, and uses a Softmax activation function and a cross entropy loss function, as follows:
Figure BDA0003975162270000065
Figure BDA0003975162270000069
softmax equation: i represents the number of output nodes; x is the number of i Is the output value of the ith node. K denotes the label of the output node, i.e. from 1 to K. K is the number of output nodes, i.e. the number of classification classes. The multi-class output value can be converted to [0,1 ] by the Softmax function]Probability distribution over a range. In the formula of the loss function,
Figure BDA0003975162270000066
is the predicted value, y, generated after the Softmax function i Is the true value.
The parameter settings for the residual module and the inclusion module are shown in tables 1 and 2. 1 \ u 1 to 1_7 in Table 1 represent residual modules 1,2_1 and 2_2 represent residual module 2.
Table 1 residual module parameter set
Figure BDA0003975162270000067
TABLE 2Inception Module parameter settings
Figure BDA0003975162270000068
Figure BDA0003975162270000071
/>
The input network has a data size of 48x48x1 and the output size of 7x1. The convolution kernel of the convolution layer is 3x3, the convolution kernel of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 -3 The learning rate is attenuated as the number of iterations increases, and the attenuation formula is shown below:
lr y =lr x ×dr
wherein lr is y Represents the learning rate after attenuation, lr x The learning rate before the attenuation is expressed, and dr is the attenuation coefficient. The attenuation coefficient settings are shown in table 3.
TABLE 3 learning Rate decay parameters at different iterations
Figure BDA0003975162270000072
Figure BDA0003975162270000081
And training the preset network by using the training set, and adjusting the hyper-parameters in the network by the performance of the training learning condition on the test set to obtain an optimal model.
The processed test set is used as an input sample, the sample is input into a trained approximately optimal network for facial expression recognition, as shown in fig. 4 and 5, the result shows that the accuracy of the facial expression recognition network model based on the improved residual error neural network on a CK + data set is 96.37%, the accuracy on a KDEF data set is 93.38%, and the accuracy is higher than that of other recognition methods, as shown in table 4, the recognition rates of 7 methods on the CK + data set and the KDEF data set are compared.
Table 4 accuracy of the present method and method of the invention on CK + and KDEF datasets
Figure BDA0003975162270000082
It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (8)

1. A facial expression recognition method based on an improved residual error neural network is characterized by comprising the following steps:
step 1: preprocessing data;
step 2: constructing a facial expression recognition model based on an improved residual error neural network;
and step 3: performing iterative training on the face recognition model;
and 4, step 4: and acquiring a facial expression recognition result based on the trained facial expression recognition model.
2. The method for recognizing the facial expression based on the improved residual neural network as claimed in claim 1, wherein the step 1 comprises the following steps:
step 1.1: acquiring a face image data set, identifying a face region of the data by using a haar algorithm, cutting a face part, adjusting the size of the image to 48 × 48, and dividing the image into a training set and a test set according to the ratio of 8;
step 1.2: the acquired facial expression data set is a CK + expression data set and a KDEF data set, the CK + expression data set comprises 593 video sequences of 123 subjects, the last frame of 327 video sequences is marked with a label of an expression, the last three pictures in the picture sequence of each person with the label are selected, the total number of the pictures is 981, and the pictures contain 7 emotions: anger, slight sight, disgust, fear, happiness, hurting and surprise; the KDEF data set comprises 4900 pictures, 980 front face pictures are selected as experiment data sets, and the total number of the experiment data sets comprises 7 emotions: anger, neutrality, aversion, fear, happiness, injury and surprise.
3. The method of claim 1, wherein the model of step 2 comprises a residual feature extraction module, an inclusion feature extraction module, and a classification module.
4. The method for recognizing the facial expression based on the improved residual neural network of claim 3, wherein the residual feature extraction module comprises 1 convolutional layer, 7 residual modules I and 2 residual modules II, wherein the residual module I comprises 2 convolutional layers and 2 batch layers; the residual error module II comprises 3 convolution layers and 3 batch processing layers;
the residual error feature extraction module uses a Mish activation function, as follows:
f(x)=x·tanh(ln(1+e x ))
the jump layer connection adopts a characteristic addition method, and the formula is as follows:
Figure FDA0003975162260000021
wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,
Figure FDA0003975162260000022
represents a characteristic a, <' > is selected>
Figure FDA0003975162260000023
Represents a characteristic b, <' > is>
Figure FDA0003975162260000024
Representing the summed features.
5. The method according to claim 3, wherein the inclusion feature extraction module comprises 9 inclusion modules, and a feature splicing method adopted by feature fusion is as follows:
Figure FDA0003975162260000025
Figure FDA0003975162260000026
wherein 1. Ltoreq. H.ltoreq.H, 1. Ltoreq. W.ltoreq.W, 1. Ltoreq. C.ltoreq.C, the point (H, W, C) represents a pixel value of a coordinate corresponding to the feature map,
Figure FDA0003975162260000027
represents a characteristic a, <' > is selected>
Figure FDA0003975162260000028
Represents a characteristic b, <' > is>
Figure FDA0003975162260000029
And &>
Figure FDA00039751622600000210
And representing the spliced features of the features.
6. The method for recognizing facial expressions based on improved residual neural network according to claim 3, wherein the classification module uses a Softmax activation function and a cross entropy loss function as follows:
Figure FDA00039751622600000211
/>
Figure FDA00039751622600000212
softmax equation: i represents the number of output nodes; x is the number of i Is the output value of the ith node, K represents the label of the output node, i.e., from 1 to K, K is the number of output nodes, i.e., the number of classification classes, and the multi-class output value is converted to [0,1 ] by the Softmax function]The probability distribution over the range, in the loss function formula,
Figure FDA00039751622600000213
is the predicted value, y, generated after the Softmax function i Is the true value.
7. The method for recognizing facial expressions based on an improved residual neural network as claimed in claim 1, wherein the step 3 comprises the following specific steps:
the data size of an input network is 48x48x1, the output size is 7x1, the convolution kernel size of the convolution layer is 3x3, the convolution kernel size of the connection layer is 1, and the dimension is set to be 7 dimensions; the batch size was set to 32, the number of iterations was set to 200, and the initial learning rate was set to 0.9 × 10 -3 The learning rate is attenuated along with the increase of the iteration number, and the attenuation formula is shown as follows:
lr y =lr x ×dr
wherein lr is y Represents the learning rate after attenuation, lr x Representing the learning rate before attenuation, dr representing the attenuation coefficient;
and training a preset network by using a training set, and adjusting the hyper-parameters in the network by the performance of the training learning condition on the test set to obtain an optimal model.
8. The method for recognizing the facial expression based on the improved residual neural network as claimed in claim 1, wherein the step 4 comprises the following specific steps:
and taking the processed test set as an input sample, and inputting the sample into a trained approximately optimal network to perform facial expression recognition.
CN202211530645.7A 2022-12-01 2022-12-01 Facial expression recognition method based on improved residual error neural network Pending CN115937937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211530645.7A CN115937937A (en) 2022-12-01 2022-12-01 Facial expression recognition method based on improved residual error neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211530645.7A CN115937937A (en) 2022-12-01 2022-12-01 Facial expression recognition method based on improved residual error neural network

Publications (1)

Publication Number Publication Date
CN115937937A true CN115937937A (en) 2023-04-07

Family

ID=86549990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211530645.7A Pending CN115937937A (en) 2022-12-01 2022-12-01 Facial expression recognition method based on improved residual error neural network

Country Status (1)

Country Link
CN (1) CN115937937A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116469151A (en) * 2023-05-11 2023-07-21 山东省人工智能研究院 Facial expression-based generation type AI face detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116469151A (en) * 2023-05-11 2023-07-21 山东省人工智能研究院 Facial expression-based generation type AI face detection method
CN116469151B (en) * 2023-05-11 2024-02-02 山东省人工智能研究院 Facial expression-based generation type AI face detection method

Similar Documents

Publication Publication Date Title
CN109992783B (en) Chinese word vector modeling method
Das et al. Sign language recognition using deep learning on custom processed static gesture images
CN108197294B (en) Text automatic generation method based on deep learning
CN108304823B (en) Expression recognition method based on double-convolution CNN and long-and-short-term memory network
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
CN109948691A (en) Iamge description generation method and device based on depth residual error network and attention
CN112395393B (en) Remote supervision relation extraction method based on multitask and multiple examples
CN109389166A (en) The depth migration insertion cluster machine learning method saved based on partial structurtes
Patro et al. Robust explanations for visual question answering
CN107909115A (en) A kind of image Chinese subtitle generation method
CN113688894B (en) Fine granularity image classification method integrating multiple granularity features
CN115937937A (en) Facial expression recognition method based on improved residual error neural network
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN115659947A (en) Multi-item selection answering method and system based on machine reading understanding and text summarization
CN111460146A (en) Short text classification method and system based on multi-feature fusion
CN112434686B (en) End-to-end misplaced text classification identifier for OCR (optical character) pictures
CN110728144A (en) Extraction type document automatic summarization method based on context semantic perception
CN107506351B (en) Twitter semantic similarity analysis method based on character convolution network
Jadhav et al. Content based facial emotion recognition model using machine learning algorithm
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN115761235A (en) Zero sample semantic segmentation method, system, equipment and medium based on knowledge distillation
Yang et al. Multi-intent text classification using dual channel convolutional neural network
CN114116974A (en) Emotional cause extraction method based on attention mechanism
Htet et al. Real-Time Myanmar Sign Language Recognition Using Deep Learning
CN110543569A (en) Network layer structure for short text intention recognition and short text intention recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination