CN110705430A

CN110705430A - Multi-person facial expression recognition method and system based on deep learning

Info

Publication number: CN110705430A
Application number: CN201910916023.XA
Authority: CN
Inventors: 马国军; 钟捷; 朱琎; 郑威; 马道懿; 夏健; 周大年
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2020-01-17

Abstract

The invention discloses a method and a system for recognizing facial expressions of multiple persons based on deep learning, wherein the recognition method comprises the following steps: 1. establishing an expression recognition model; 2. constructing a training sample set, and training parameters in the expression recognition model; 3. detecting the face in the image to be recognized by adopting an MTCNN network to obtain a face window in the image to be recognized; and inputting the detected face area into a trained expression recognition model for recognition to obtain an expression classification result of each face in the image to be recognized. The recognition method applies deep learning to expression recognition, can quickly complete the task of recognizing facial expressions of multiple people, and has high recognition rate.

Description

Multi-person facial expression recognition method and system based on deep learning

Technical Field

The invention belongs to the technical field of expression recognition, and particularly relates to a method and a system for recognizing facial expressions of multiple persons based on deep learning.

Background

Facial expression, a commonly used expression of human emotion, is generally used as a way to identify emotion when communicating between people. With the development of human-computer interaction, facial expression recognition becomes a hot topic in recent decades, is widely applied to the aspects of traffic, medicine, education and the like, and permeates various aspects of people's life.

The traditional expression recognition algorithm extracts features manually, the process is complex, and the calculated amount is large. The concept of deep learning, which originates from artificial neural networks, essentially refers to a class of methods for effectively training neural networks with deep structures. The convolutional neural network is the most important model in deep learning and is a special feedforward neural network. Standard convolutional neural networks, including input, convolutional, pooling, and output layers, have greatly advanced the development of image classification, recognition, and understanding techniques. Through multi-level convolution calculation, the deep learning can automatically learn the characteristics related to the facial expression, and finally the facial expression recognition is completed.

However, due to the complexity of the facial expression recognition problem, the effect of applying the deep learning technique to the facial expression recognition is affected by the reasons of the gesture, the obstruction, the illumination and the like, and the recognition accuracy is not high.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a method for recognizing facial expressions of multiple persons, which is high in recognition accuracy.

The technical scheme is as follows: the invention discloses a method for recognizing facial expressions of multiple persons based on deep learning, which comprises a training stage and a recognition stage, wherein the training stage comprises the following steps:

(1) establishing an expression recognition model, wherein the expression recognition model has a VGG-19 network structure, and comprises the following steps: sequentially and alternately arranging 5 volume blocks, 5 maximum pooling layers, 2 full-connection layers and a softmax classification layer; in the 5 volume blocks, the first volume block and the second volume block respectively comprise 2 volume layers, and the third volume block, the fourth volume block and the fifth volume block respectively comprise 4 volume layers; the softmax classification layer is a classification layer of 7 classifications;

the parameters of the 5 volume blocks and the 5 maximum pooling layers adopt the parameters of a pre-trained VGG-19 network;

(2) constructing a training sample set, and training parameters of 2 full-connection layers and a softmax classification layer in the expression recognition model;

the training sample set is images in CK + and fer2013 expression data sets and comprises 7 types of expressions: anger generates gas; disgust aversion; fear of fears; happy; sad hurts the heart; surrised was surprised; normal neutral; all pictures are unified into 48 × 48 gray level images;

training the constructed expression recognition network model by adopting a random gradient descent algorithm of self-adaptive moment estimation;

the identification phase comprises the steps of:

(3) detecting the face in the image to be recognized by adopting an MTCNN network to obtain a face window in the image to be recognized; and inputting the detected face area into a trained expression recognition model for recognition to obtain an expression classification result of each face in the image to be recognized.

The images in the training sample set further comprise: performing data enhancement on the images in the CK + and fer2013 expression data sets; the data enhancement comprises: the image is randomly rotated, scaled, horizontally or vertically projectively transformed, and horizontally flipped.

In the MTCNN network, the step length of all convolutions is 1, and the step length of pooling is 2; the activation function is PReLU:

wherein alpha is less than or equal to 1 and is an adjustable parameter.

On the other hand, the invention discloses a multi-person facial expression recognition system for realizing the method, which comprises the following steps: the facial recognition system comprises a facial detection module, an expression recognition training module and an expression recognition model; the face detection module is used for detecting a face region in an input image;

the expression recognition training module is used for training an expression recognition model according to a training sample set;

the expression recognition model is used for recognizing the facial expressions in the detected face area to obtain an expression classification result.

The face detection module, the expression recognition training module and the expression recognition model are computers equipped with NVIDIA GTX1080Ti GPUs.

Has the advantages that: compared with the prior art, the method for identifying the facial expressions of multiple persons disclosed by the invention has the following advantages: 1. the invention applies deep learning to expression recognition, can quickly complete the task of facial expression recognition of multiple people, and has high recognition rate under the influence of reasons such as postures, shelters, illumination and the like; 2. the method avoids the limitation of manually extracting the features of the traditional expression recognition algorithm, and the deep learning can automatically learn the features related to the facial expression through the multi-level convolution calculation; 3. the invention uses the transfer learning method, can greatly reduce the training parameters of the network, effectively extract the multilayer characteristics of the facial expression, and improve the accuracy of the expression recognition while ensuring the speed.

Drawings

FIG. 1 is a flow chart of a method for recognizing facial expressions of multiple persons according to the present invention;

FIG. 2 is a network architecture diagram of an expression recognition model of the present invention;

FIG. 3 is a flow chart of face detection in the present invention;

fig. 4 is a block diagram of a system for recognizing facial expressions of multiple persons according to the present invention.

Detailed Description

The invention is further elucidated with reference to the drawings and the detailed description.

As shown in FIG. 1, the invention discloses a method for recognizing facial expressions of multiple persons based on deep learning, which comprises a training stage and a recognition stage, wherein the training stage comprises the following steps:

step 1, establishing an expression recognition model, wherein the expression recognition model has a VGG-19 network structure, and comprises the following steps: sequentially alternating 5 volume blocks and 5 maximum pooling layers, 2 full-link layers and softmax sorting layers, as shown in fig. 2; in the 5 volume blocks, the first volume block and the second volume block respectively comprise 2 volume layers, and the third volume block, the fourth volume block and the fifth volume block respectively comprise 4 volume layers, and 16 volume layers are formed in total; the softmax classification layer is a classification layer of 7 classifications;

and migrating the VGG-19 pre-training model parameters to the expression recognition network model as parameters of 16 convolutional layers and 5 pooling layers. The expression recognition model has the same structure as the VGG-19 network structure, and is different in that 2 full-connection layers are adopted to replace 3 full-connection layers of the VGG-19 network structure, so that the calculated amount is reduced, and overfitting can be slowed down by less parameter amount; the number of the nodes of the 1 st full connection layer is 1024, and the number of the nodes of the 2 nd full connection layer is 4; and replacing the original softmax classification layer of the VGG-19 with a classification layer of 7 classifications.

Step 2, constructing a training sample set, and training parameters of 2 full-connection layers and a softmax classification layer in the expression recognition model;

the training sample set is an image in CK + and fer2013 expression data sets, and comprises 7 types of expressions: anger generates gas; disgust aversion; fear of fears; happy; sad hurts the heart; surrised was surprised; normal neutral; all pictures are unified into 48 × 48 gray scale images. In order to improve the generalization capability of the model, the images in the training sample set further include: performing data enhancement on the images in the CK + and fer2013 expression data sets; the data enhancement comprises: the image is randomly rotated, scaled, horizontally or vertically projectively transformed, and horizontally flipped.

The invention adopts a random gradient descent algorithm of self-adaptive moment estimation, namely an Adam algorithm to train the constructed expression recognition network model.

The identification phase comprises the steps of:

step 3, detecting the face in the image to be recognized by adopting an MTCNN network to obtain a face window in the image to be recognized;

the MTCNN multitask cascade convolution neural network comprises three cascade convolution neural networks, and the face position is predicted step by step and the characteristic points are calibrated from coarse to fine. The three cascaded convolutional neural networks are a proposed Network, a purified Network and an Output Network.

As shown in fig. 3, the face detection step is:

s1, zooming the input image according to different scales to form an image pyramid which is used as the input of the three-layer cascade network;

s2, rapidly generating a face candidate window and a boundary regression vector thereof by using a proposed Network Proposal Network; correcting the candidate window using the boundary regression vector; combining the overlapping windows by using a non-maximum value inhibition method;

s3, purifying the face candidate window selected in the step S2 by using a purification Network; the RefineNework also corrects the candidate window using a boundary regression vector; further merging the overlapping windows by using a non-maximum value inhibition method;

s4, screening the face candidate window in the step S3 by using an Output Network to obtain one or more accurate face positions, and finishing face detection.

In the invention, the step length of all convolutions of MTCNN is 1, and the step length of pooling is 2; the excitation layer is connected behind the convolution layer and the full-connection layer, and the activation function adopts the following parameters:

wherein alpha is less than or equal to 1 and is an adjustable parameter.

And inputting the face area detected by the MTCNN into a trained expression recognition model for recognition to obtain an expression classification result of each face in the image to be recognized.

As shown in fig. 4, in order to implement the recognition system for recognizing facial expressions of multiple persons, the recognition system includes: the facial recognition system comprises a facial detection module 1, an expression recognition training module 2 and an expression recognition model 3; the face detection module is used for detecting a face region in an input image; the expression recognition training module is used for training an expression recognition model according to a training sample set; the expression recognition model is used for recognizing the facial expressions in the detected face area to obtain an expression classification result.

In order to improve the training speed of the multi-person facial expression recognition system, the facial detection module, the expression recognition training module and the expression recognition model in the embodiment are computers equipped with NVIDIA GTX1080Ti GPUs.

Claims

1. The method for recognizing the facial expressions of multiple persons based on deep learning is characterized by comprising a training stage and a recognition stage, wherein the training stage comprises the following steps:

the identification phase comprises the steps of:

2. The method of claim 1, wherein the images in the training sample set further comprise: performing data enhancement on the images in the CK + and fer2013 expression data sets; the data enhancement comprises: the image is randomly rotated, scaled, horizontally or vertically projectively transformed, and horizontally flipped.

3. The multi-person facial expression recognition method of claim 1, wherein in the MTCNN network, the step size of all convolutions is 1, the step size of pooling is 2; the activation function is PReLU:

wherein alpha is less than or equal to 1 and is an adjustable parameter.

4. Many people facial expression recognition system based on degree of depth study, its characterized in that includes: the facial recognition system comprises a facial detection module, an expression recognition training module and an expression recognition model; the face detection module is used for detecting a face region in an input image;

5. The system of claim 4, wherein the face detection module detects faces in the input image using an MTCNN network; in the MTCNN network, the step length of all convolutions is 1, and the step length of pooling is 2; the activation function is PReLU:

wherein alpha is less than or equal to 1 and is an adjustable parameter.

6. The system of claim 4, wherein the expression recognition model is configured as a VGG-19 network, and comprises: sequentially and alternately arranging 5 volume blocks, 5 maximum pooling layers, 2 full-connection layers and a softmax classification layer; in the 5 volume blocks, the first volume block and the second volume block respectively comprise 2 volume layers, and the third volume block, the fourth volume block and the fifth volume block respectively comprise 4 volume layers; the softmax classification layer is a classification layer of 7 classifications;

the parameters of the 5 volume blocks and the 5 maximum pooling layers adopt the parameters of a pre-trained VGG-19 network.

7. The system of claim 4, wherein the training sample set is images from CK +, fer2013 expression dataset and data enhanced images from CK + and fer2013 expression dataset; including 7 types of expressions: anger generates gas; disgust aversion; fear of fears; happy; sad hurts the heart; surrised was surprised; normal neutral; all pictures are unified into 48 × 48 gray level images;

the data enhancement comprises: the image is randomly rotated, scaled, horizontally or vertically projectively transformed, and horizontally flipped.

8. The system of claim 4, wherein the face detection module, the expression recognition training module, and the expression recognition model are computers equipped with NVIDIA GTX1080Ti GPUs.