CN110084122B - Dynamic human face emotion recognition method based on deep learning - Google Patents

Dynamic human face emotion recognition method based on deep learning Download PDF

Info

Publication number
CN110084122B
CN110084122B CN201910242066.4A CN201910242066A CN110084122B CN 110084122 B CN110084122 B CN 110084122B CN 201910242066 A CN201910242066 A CN 201910242066A CN 110084122 B CN110084122 B CN 110084122B
Authority
CN
China
Prior art keywords
image
human face
neural network
sequence
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910242066.4A
Other languages
Chinese (zh)
Other versions
CN110084122A (en
Inventor
吴家皋
张华杰
陈欣宇
周峻全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201910242066.4A priority Critical patent/CN110084122B/en
Publication of CN110084122A publication Critical patent/CN110084122A/en
Application granted granted Critical
Publication of CN110084122B publication Critical patent/CN110084122B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic face emotion recognition method based on deep learning, which comprises the following steps: s1, acquiring a face image sequence; s2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network; s3, identifying the face emotion by using an LSTM recurrent neural network and combining the image features extracted in the S2; s4, repeatedly training the network by using a loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model; the invention focuses on the analysis of dynamic human face emotion, and effectively considers the characteristics of human emotion staging by a mode of collecting a human face image sequence for analysis. Meanwhile, the invention completes the processing of the face image sequence by combining the VGG convolutional neural network and the LSTM recurrent neural network, thereby obviously improving the accuracy of emotion recognition.

Description

Dynamic human face emotion recognition method based on deep learning
Technical Field
The invention relates to a computer information processing method, in particular to a dynamic face emotion recognition method based on deep learning by using a method of combining a VGG network and an LSTM network, belonging to the field of artificial intelligence and pattern recognition.
Background
In recent years, with the wave of artificial intelligence, man-machine interaction has become a new-stage research hotspot, and among many research directions in the field of man-machine interaction, research on emotion recognition has attracted extensive attention of researchers. Emotion recognition refers to giving the machine the ability to recognize the emotion of the user and to make the machine respond accordingly, i.e. giving the machine the ability to "think". With the continuous development of emotion recognition technology, the emotion recognition accuracy rate is continuously improved, and the technology is bound to be splendid in the fields of education, medical treatment, traffic and the like in the future.
Although emotion recognition has been rapidly developed and advanced after the convolutional neural network is proposed, the accuracy of classification is still unsatisfactory because it is limited to recognition of a single image. The reasons for this phenomenon mainly include the following:
first, emotion is a dynamic process from generation to expression, and it takes a certain time to perfectly present the individual emotion and reach a peak at a certain time, whereas emotion recognition in the prior art generally uses only a single picture for recognition analysis, which undoubtedly leads to a problem of low accuracy. Moreover, in many past studies of emotion recognition, researchers either only use the CNN convolutional neural network to process facial images, and lack the analysis of the emotion dynamic process; in addition, or only the RNN recurrent neural network is used for processing the face image, although the time factor is considered, the processing accuracy of the RNN recurrent neural network is not high in the aspect of image processing, so that no matter how the network is optimized, the role of a single network in emotion recognition is still very limited.
In addition, in the existing research, most researchers adopt AlexNet networks, and compared with basic AlexNet networks, VGG networks have more convolutional layers, and deeper networks also enable VGGs to have stronger expression capability than the former, and meanwhile, the VGG networks have the characteristics of local connection and weight sharing, and have higher calculation speed. If the VGG network can be used together with the time-sensitive LSTM network, the change process of the expression can be detected more sensitively, so that the result of the network prediction has higher accuracy.
In summary, how to provide a dynamic human face emotion recognition method implemented by using a method combining a VGG network and an LSTM network on the basis of the prior art becomes a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above defects in the prior art, the present invention aims to provide a dynamic human face emotion recognition method based on deep learning, which is characterized by comprising the following steps:
s1, acquiring a face image sequence;
s2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network;
s3, identifying the face emotion by using an LSTM recurrent neural network and combining the image features extracted in the S2;
s4, repeatedly training the network by using a loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model;
and S5, recognizing the emotion of the human face by using the dynamic human face emotion recognition model.
Preferably, the S1 specifically includes the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then acquiring a human face image sequence according to time sequence.
Preferably, the S2 specifically includes the following steps:
s21, carrying out scaling processing and graying processing on the images, and summarizing all the processed images to form a training set;
s22, extracting the features of the images in the training set by adopting a VGG convolutional neural network through a convolutional layer, a pooling layer and a full-link layer;
s23, generating a nonlinear representation by using the softmax function as an activation function, and constraining the values of all image feature vectors in [0,1 ].
Preferably, the scaling and graying the image in S21 specifically includes the following steps: and performing scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28 x 28 grayscale image.
Preferably, the S3 specifically includes the following steps:
s31, inputting the plurality of image feature vectors into an LSTM recurrent neural network according to a time sequence;
and S32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result.
Preferably, the step S32 specifically includes the following steps:
let the feature vector sequence output by the LSTM recurrent neural network be v i ,i=12, \ 8230, n, n is the length of the sequence, and the weighted fusion of the feature vectors is
Figure BDA0002009960210000041
Wherein the weighting coefficients
Figure BDA0002009960210000042
max(v i ) Is a vector v i Max _2 (v) of i ) Is a vector v i B =0.05.
Compared with the prior art, the invention has the advantages that:
the dynamic human face emotion recognition method based on deep learning provided by the invention focuses on the analysis of dynamic human face emotion, and effectively takes the characteristics of human emotion staging into consideration by a mode of collecting a human face image sequence for analysis.
Meanwhile, the invention combines the VGG convolutional neural network and the LSTM recurrent neural network, processes the face image sequence through the two networks, fully utilizes the respective advantages of the two networks and obviously improves the accuracy of emotion recognition. In addition, the invention also carries out weighted fusion on the face characteristics according to the time sequence, thereby further improving the learning efficiency of the dynamic change of the face emotion.
The invention also provides reference for other related problems in the same field, can be expanded and extended on the basis of the reference, is applied to other technical schemes related to emotion recognition and analysis methods, and has very wide application prospect.
The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.
Drawings
Fig. 1 is a schematic structural diagram of a dynamic human face emotion recognition model constructed by the present invention.
Detailed Description
The dynamic face emotion recognition method based on deep learning provided by the invention combines the VGG network and the LSTM network and is used for dynamic face emotion recognition, and the dynamic emotion recognition is mainly realized. Nowadays, emotion recognition has a wide application range, such as fatigue driving detection, emotion monitoring of depression patients and the like, and the advantages of the dynamic recognition method in pertinence and accuracy are incomparable with the existing static emotion recognition.
The method of the invention generally comprises two-part face detection and emotion recognition. Firstly, face detection is carried out through an Adaboost algorithm, and a face image sequence is obtained. And in the aspect of emotion classification, analyzing a face sequence generated by dynamic change of the face expression by using a network combining VGG and LSTM, giving a weight to each feature in each group of sequences according to timeliness of pictures by using a feature weighting function, and fusing to obtain a final prediction result.
Further, the dynamic human face emotion recognition method based on deep learning comprises the following steps.
S1, obtaining a face image sequence.
The S1 specifically comprises the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then obtaining a human face image sequence according to a time sequence.
And S2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network.
The S2 specifically comprises the following steps:
and S21, carrying out scaling processing and graying processing on the images, and summarizing all the processed images to form a training set.
The image scaling and graying processing specifically includes the following steps: and carrying out scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28X 28 grayscale image.
And S22, performing feature extraction on the images in the training set through a convolutional layer, a pooling layer and a full-link layer by adopting a VGG convolutional neural network.
S23, generating a nonlinear representation by using a softmax function as an activation function, and constraining the values of all image feature vectors in [0,1 ].
And S3, identifying the emotion of the human face by using an LSTM recurrent neural network and combining the image features extracted in the S2.
The S3 specifically comprises the following steps:
and S31, inputting the plurality of image feature vectors into the LSTM recurrent neural network according to the time sequence.
And S32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result. The calculation process is as follows:
let the feature vector sequence output by the LSTM recurrent neural network be v i I =1,2, \8230, n is the length of the sequence, and the weighted fusion feature vector is
Figure BDA0002009960210000061
Wherein the weighting coefficients
Figure BDA0002009960210000062
max(v i ) Is a vector v i Max _2 (v) of i ) Is a vector v i B =0.05.
And S4, repeatedly training the network by using a proper loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model. The structural schematic diagram of the dynamic human face emotion recognition model is shown in fig. 1.
And S5, recognizing the emotion of the human face by using the constructed dynamic human face emotion recognition model.
In general, the method firstly utilizes an Adaboost method to detect the human face and acquire a human face image sequence, and then utilizes a VGG convolutional neural network to extract the characteristics of the human face image. Then, a plurality of feature vectors in the dynamic change process of the facial expression are sequentially input into the LSTM recurrent neural network according to the time sequence. And finally, performing weighted fusion on the output feature vector sequence of the LSTM network to obtain a final expression classification result. The invention fully considers the characteristics of the change of the facial expression, creatively combines the VGG and the LSTM network, effectively improves the learning efficiency of the dynamic change of the facial emotion through the later-stage feature fusion, and has higher accuracy of the dynamic recognition of the facial expression.
The dynamic human face emotion recognition method based on deep learning provided by the invention focuses on the analysis of dynamic human face emotion, and effectively takes the characteristics of human emotion staging into consideration by a mode of collecting a human face image sequence for analysis.
Meanwhile, the invention combines the VGG convolutional neural network and the LSTM recurrent neural network, processes the human face image sequence through the two networks, fully utilizes the respective advantages of the two networks and obviously improves the accuracy of emotion recognition. In addition, the invention also performs weighted fusion on the face features according to the time sequence, thereby further improving the learning efficiency of the dynamic change of the face emotion.
The invention also provides reference for other related problems in the same field, can be expanded and extended based on the reference, is applied to other technical schemes related to emotion recognition analysis methods, and has very wide application prospect.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not to be construed as limiting the claims.
Furthermore, it should be understood that although the present specification describes embodiments, not every embodiment includes only a single embodiment, and such description is for clarity purposes only, and it is to be understood that all embodiments may be combined as appropriate by one of ordinary skill in the art to form other embodiments as will be apparent to those of skill in the art from the description herein.

Claims (4)

1. A dynamic face emotion recognition method based on deep learning is characterized by comprising the following steps:
s1, acquiring a face image sequence;
s2, extracting the image characteristics of each image in the face image sequence by using a VGG convolutional neural network;
s3, identifying the face emotion by using an LSTM recurrent neural network and combining the image features extracted in the S2;
the S3 specifically comprises the following steps:
s31, inputting the plurality of image feature vectors into an LSTM recurrent neural network according to a time sequence;
s32, performing feature weighted fusion on the output feature vector sequence of the LSTM recurrent neural network to obtain a final classification result;
the S32 specifically includes the following steps:
let the feature vector sequence output by the LSTM recurrent neural network be v i I =1,2, \ 8230, n is the length of the sequence, and the weighted fusion of the feature vectors is
Figure FDA0003744243290000011
Wherein the weighting coefficients
Figure FDA0003744243290000012
max(v i ) Is a vector v i Max _2 (v) of i ) Is a vector v i The second largest component of, b =0.05;
s4, repeatedly training the network by using a loss function, optimizing network parameters, and constructing a complete dynamic human face emotion recognition model;
and S5, recognizing the emotion of the human face by using the dynamic human face emotion recognition model.
2. The method for recognizing dynamic human face emotion based on deep learning according to claim 1, wherein S1 specifically comprises the following steps: the method comprises the steps of detecting a human face by using an Adaboost algorithm, and then acquiring a human face image sequence according to time sequence.
3. The method for recognizing dynamic human face emotion based on deep learning according to claim 1, wherein the step S2 specifically includes the steps of:
s21, carrying out scaling processing and graying processing on the images, and summarizing all the processed images to form a training set;
s22, extracting the features of the images in the training set by adopting a VGG convolutional neural network through a convolutional layer, a pooling layer and a full-link layer;
s23, generating a nonlinear representation by using a softmax function as an activation function, and constraining the values of all image feature vectors in [0,1 ].
4. The dynamic human face emotion recognition method based on deep learning according to claim 3, wherein the scaling and graying of the image in S21 specifically includes the following steps: and carrying out scaling processing and graying processing on each image in the face image sequence to convert the image into a standard 28X 28 grayscale image.
CN201910242066.4A 2019-03-28 2019-03-28 Dynamic human face emotion recognition method based on deep learning Active CN110084122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910242066.4A CN110084122B (en) 2019-03-28 2019-03-28 Dynamic human face emotion recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910242066.4A CN110084122B (en) 2019-03-28 2019-03-28 Dynamic human face emotion recognition method based on deep learning

Publications (2)

Publication Number Publication Date
CN110084122A CN110084122A (en) 2019-08-02
CN110084122B true CN110084122B (en) 2022-10-04

Family

ID=67413731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910242066.4A Active CN110084122B (en) 2019-03-28 2019-03-28 Dynamic human face emotion recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN110084122B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks
CN108921042A (en) * 2018-06-06 2018-11-30 四川大学 A kind of face sequence expression recognition method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7023613B2 (en) * 2017-05-11 2022-02-22 キヤノン株式会社 Image recognition device and learning device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks
CN108921042A (en) * 2018-06-06 2018-11-30 四川大学 A kind of face sequence expression recognition method based on deep learning

Also Published As

Publication number Publication date
CN110084122A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN107506722A (en) One kind is based on depth sparse convolution neutral net face emotion identification method
Wang et al. Fast sign language recognition benefited from low rank approximation
CN110287805A (en) Micro- expression recognition method and system based on three stream convolutional neural networks
CN107341452A (en) Human bodys' response method based on quaternary number space-time convolutional neural networks
CN110765873B (en) Facial expression recognition method and device based on expression intensity label distribution
CN110084266B (en) Dynamic emotion recognition method based on audio-visual feature deep fusion
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN107590432A (en) A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
CN108921037B (en) Emotion recognition method based on BN-acceptance double-flow network
CN111339847A (en) Face emotion recognition method based on graph convolution neural network
CN109359527B (en) Hair region extraction method and system based on neural network
CN105718889A (en) Human face identity recognition method based on GB(2D)2PCANet depth convolution model
CN113221663B (en) Real-time sign language intelligent identification method, device and system
CN110046544A (en) Digital gesture identification method based on convolutional neural networks
CN114782737A (en) Image classification method, device and storage medium based on improved residual error network
Vasudevan et al. Introduction and analysis of an event-based sign language dataset
CN116580453A (en) Human body behavior recognition method based on space and time sequence double-channel fusion model
CN113705384B (en) Facial expression recognition method considering local space-time characteristics and global timing clues
Jaymon et al. Real time emotion detection using deep learning
CN111401116A (en) Bimodal emotion recognition method based on enhanced convolution and space-time L STM network
Pang et al. Dance video motion recognition based on computer vision and image processing
CN110084122B (en) Dynamic human face emotion recognition method based on deep learning
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
Bie et al. Facial expression recognition from a single face image based on deep learning and broad learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant