CN110084122B

CN110084122B - Dynamic human face emotion recognition method based on deep learning

Info

Publication number: CN110084122B
Application number: CN201910242066.4A
Authority: CN
Inventors: 吴家皋; 张华杰; 陈欣宇; 周峻全
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2022-10-04
Anticipated expiration: 2039-03-28
Also published as: CN110084122A

Abstract

The invention discloses a dynamic face emotion recognition method based on deep learning, which comprises the following steps: s1, acquiring a face image sequence; s2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network; s3, identifying the face emotion by using an LSTM recurrent neural network and combining the image features extracted in the S2; s4, repeatedly training the network by using a loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model; the invention focuses on the analysis of dynamic human face emotion, and effectively considers the characteristics of human emotion staging by a mode of collecting a human face image sequence for analysis. Meanwhile, the invention completes the processing of the face image sequence by combining the VGG convolutional neural network and the LSTM recurrent neural network, thereby obviously improving the accuracy of emotion recognition.

Description

Dynamic human face emotion recognition method based on deep learning

Technical Field

The invention relates to a computer information processing method, in particular to a dynamic face emotion recognition method based on deep learning by using a method of combining a VGG network and an LSTM network, belonging to the field of artificial intelligence and pattern recognition.

Background

In recent years, with the wave of artificial intelligence, man-machine interaction has become a new-stage research hotspot, and among many research directions in the field of man-machine interaction, research on emotion recognition has attracted extensive attention of researchers. Emotion recognition refers to giving the machine the ability to recognize the emotion of the user and to make the machine respond accordingly, i.e. giving the machine the ability to "think". With the continuous development of emotion recognition technology, the emotion recognition accuracy rate is continuously improved, and the technology is bound to be splendid in the fields of education, medical treatment, traffic and the like in the future.

Although emotion recognition has been rapidly developed and advanced after the convolutional neural network is proposed, the accuracy of classification is still unsatisfactory because it is limited to recognition of a single image. The reasons for this phenomenon mainly include the following:

first, emotion is a dynamic process from generation to expression, and it takes a certain time to perfectly present the individual emotion and reach a peak at a certain time, whereas emotion recognition in the prior art generally uses only a single picture for recognition analysis, which undoubtedly leads to a problem of low accuracy. Moreover, in many past studies of emotion recognition, researchers either only use the CNN convolutional neural network to process facial images, and lack the analysis of the emotion dynamic process; in addition, or only the RNN recurrent neural network is used for processing the face image, although the time factor is considered, the processing accuracy of the RNN recurrent neural network is not high in the aspect of image processing, so that no matter how the network is optimized, the role of a single network in emotion recognition is still very limited.

In addition, in the existing research, most researchers adopt AlexNet networks, and compared with basic AlexNet networks, VGG networks have more convolutional layers, and deeper networks also enable VGGs to have stronger expression capability than the former, and meanwhile, the VGG networks have the characteristics of local connection and weight sharing, and have higher calculation speed. If the VGG network can be used together with the time-sensitive LSTM network, the change process of the expression can be detected more sensitively, so that the result of the network prediction has higher accuracy.

In summary, how to provide a dynamic human face emotion recognition method implemented by using a method combining a VGG network and an LSTM network on the basis of the prior art becomes a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above defects in the prior art, the present invention aims to provide a dynamic human face emotion recognition method based on deep learning, which is characterized by comprising the following steps:

s1, acquiring a face image sequence;

s2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network;

s3, identifying the face emotion by using an LSTM recurrent neural network and combining the image features extracted in the S2;

s4, repeatedly training the network by using a loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model;

and S5, recognizing the emotion of the human face by using the dynamic human face emotion recognition model.

Preferably, the S1 specifically includes the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then acquiring a human face image sequence according to time sequence.

Preferably, the S2 specifically includes the following steps:

s21, carrying out scaling processing and graying processing on the images, and summarizing all the processed images to form a training set;

s22, extracting the features of the images in the training set by adopting a VGG convolutional neural network through a convolutional layer, a pooling layer and a full-link layer;

s23, generating a nonlinear representation by using the softmax function as an activation function, and constraining the values of all image feature vectors in [0,1 ].

Preferably, the scaling and graying the image in S21 specifically includes the following steps: and performing scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28 x 28 grayscale image.

Preferably, the S3 specifically includes the following steps:

s31, inputting the plurality of image feature vectors into an LSTM recurrent neural network according to a time sequence;

and S32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result.

Preferably, the step S32 specifically includes the following steps:

let the feature vector sequence output by the LSTM recurrent neural network be v _i ，i＝12, \ 8230, n, n is the length of the sequence, and the weighted fusion of the feature vectors is

Wherein the weighting coefficients

max(v _i ) Is a vector v _i Max _2 (v) of _i ) Is a vector v _i B =0.05.

Compared with the prior art, the invention has the advantages that:

the dynamic human face emotion recognition method based on deep learning provided by the invention focuses on the analysis of dynamic human face emotion, and effectively takes the characteristics of human emotion staging into consideration by a mode of collecting a human face image sequence for analysis.

Meanwhile, the invention combines the VGG convolutional neural network and the LSTM recurrent neural network, processes the face image sequence through the two networks, fully utilizes the respective advantages of the two networks and obviously improves the accuracy of emotion recognition. In addition, the invention also carries out weighted fusion on the face characteristics according to the time sequence, thereby further improving the learning efficiency of the dynamic change of the face emotion.

The invention also provides reference for other related problems in the same field, can be expanded and extended on the basis of the reference, is applied to other technical schemes related to emotion recognition and analysis methods, and has very wide application prospect.

The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.

Drawings

Fig. 1 is a schematic structural diagram of a dynamic human face emotion recognition model constructed by the present invention.

Detailed Description

The dynamic face emotion recognition method based on deep learning provided by the invention combines the VGG network and the LSTM network and is used for dynamic face emotion recognition, and the dynamic emotion recognition is mainly realized. Nowadays, emotion recognition has a wide application range, such as fatigue driving detection, emotion monitoring of depression patients and the like, and the advantages of the dynamic recognition method in pertinence and accuracy are incomparable with the existing static emotion recognition.

The method of the invention generally comprises two-part face detection and emotion recognition. Firstly, face detection is carried out through an Adaboost algorithm, and a face image sequence is obtained. And in the aspect of emotion classification, analyzing a face sequence generated by dynamic change of the face expression by using a network combining VGG and LSTM, giving a weight to each feature in each group of sequences according to timeliness of pictures by using a feature weighting function, and fusing to obtain a final prediction result.

Further, the dynamic human face emotion recognition method based on deep learning comprises the following steps.

S1, obtaining a face image sequence.

The S1 specifically comprises the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then obtaining a human face image sequence according to a time sequence.

And S2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network.

The S2 specifically comprises the following steps:

and S21, carrying out scaling processing and graying processing on the images, and summarizing all the processed images to form a training set.

The image scaling and graying processing specifically includes the following steps: and carrying out scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28X 28 grayscale image.

And S22, performing feature extraction on the images in the training set through a convolutional layer, a pooling layer and a full-link layer by adopting a VGG convolutional neural network.

S23, generating a nonlinear representation by using a softmax function as an activation function, and constraining the values of all image feature vectors in [0,1 ].

And S3, identifying the emotion of the human face by using an LSTM recurrent neural network and combining the image features extracted in the S2.

The S3 specifically comprises the following steps:

and S31, inputting the plurality of image feature vectors into the LSTM recurrent neural network according to the time sequence.

And S32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result. The calculation process is as follows:

let the feature vector sequence output by the LSTM recurrent neural network be v _i I =1,2, \8230, n is the length of the sequence, and the weighted fusion feature vector is

Wherein the weighting coefficients

max(v _i ) Is a vector v _i Max _2 (v) of _i ) Is a vector v _i B =0.05.

And S4, repeatedly training the network by using a proper loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model. The structural schematic diagram of the dynamic human face emotion recognition model is shown in fig. 1.

And S5, recognizing the emotion of the human face by using the constructed dynamic human face emotion recognition model.

In general, the method firstly utilizes an Adaboost method to detect the human face and acquire a human face image sequence, and then utilizes a VGG convolutional neural network to extract the characteristics of the human face image. Then, a plurality of feature vectors in the dynamic change process of the facial expression are sequentially input into the LSTM recurrent neural network according to the time sequence. And finally, performing weighted fusion on the output feature vector sequence of the LSTM network to obtain a final expression classification result. The invention fully considers the characteristics of the change of the facial expression, creatively combines the VGG and the LSTM network, effectively improves the learning efficiency of the dynamic change of the facial emotion through the later-stage feature fusion, and has higher accuracy of the dynamic recognition of the facial expression.

Meanwhile, the invention combines the VGG convolutional neural network and the LSTM recurrent neural network, processes the human face image sequence through the two networks, fully utilizes the respective advantages of the two networks and obviously improves the accuracy of emotion recognition. In addition, the invention also performs weighted fusion on the face features according to the time sequence, thereby further improving the learning efficiency of the dynamic change of the face emotion.

The invention also provides reference for other related problems in the same field, can be expanded and extended based on the reference, is applied to other technical schemes related to emotion recognition analysis methods, and has very wide application prospect.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not to be construed as limiting the claims.

Furthermore, it should be understood that although the present specification describes embodiments, not every embodiment includes only a single embodiment, and such description is for clarity purposes only, and it is to be understood that all embodiments may be combined as appropriate by one of ordinary skill in the art to form other embodiments as will be apparent to those of skill in the art from the description herein.

Claims

1. A dynamic face emotion recognition method based on deep learning is characterized by comprising the following steps:

s1, acquiring a face image sequence;

the S3 specifically comprises the following steps:

s32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result;

the S32 specifically includes the following steps:

let the feature vector sequence output by the LSTM recurrent neural network be v _i I =1,2, \ 8230, n is the length of the sequence, and the weighted fusion of the feature vectors is

Wherein the weighting coefficients

max(v _i ) Is a vector v _i Max _2 (v) of _i ) Is a vector v _i The second largest component of, b =0.05;

2. The method for recognizing dynamic human face emotion based on deep learning according to claim 1, wherein S1 specifically comprises the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then acquiring a human face image sequence according to time sequence.

3. The method for recognizing dynamic human face emotion based on deep learning according to claim 1, wherein the step S2 specifically includes the steps of:

4. The dynamic human face emotion recognition method based on deep learning according to claim 3, wherein the scaling and graying of the image in S21 specifically includes the following steps: and carrying out scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28X 28 grayscale image.