CN109635709B

CN109635709B - Facial expression recognition method based on significant expression change area assisted learning

Info

Publication number: CN109635709B
Application number: CN201811490141.0A
Authority: CN
Inventors: 胡海峰; 陈文东
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2022-09-23
Anticipated expiration: 2038-12-06
Also published as: CN109635709A

Abstract

The invention discloses a facial expression recognition method based on the auxiliary learning of a significant expression change area, which is characterized in that an auxiliary learning network is established to extract the characteristics of the significant expression change area in a facial expression image, parameters of a main network and the first 3 characteristic extraction layers of the auxiliary learning network are shared, and the characteristics extracted by the fourth layer and the fifth layer of the auxiliary learning network are subjected to characteristic weighting fusion with the fourth layer and the fifth layer of the main network, so that the main network structure can learn the characteristics of some significant expression areas in the auxiliary network; processing the facial expression data set by using a face detection and positioning algorithm to obtain a face area image for training a main network; the facial region image is preprocessed to obtain an image with a region with significant expression change, and the image is used for training the auxiliary learning network, so that the main network for expression recognition can focus more attention on the region with significant expression change, and expression features with more recognizability and robustness are extracted.

Description

Facial expression recognition method based on significant expression change area assisted learning

Technical Field

The invention relates to the field of artificial intelligence, in particular to a facial expression recognition method based on the auxiliary learning of a significant expression change area.

Background

In the communication between people, the information transmitted by the facial expression reflects the abundant inner world of human beings, and is an important carrier of human behavior information and emotion. With the development of scientific technology, the facial expression recognition technology is deeply researched and widely applied in various fields, and the facial expression recognition is often used in the field of human-computer interaction.

The steps of facial expression recognition generally comprise acquisition of facial expression images, cutting, normalization, expression feature extraction, model training and expression classification of original facial expression images, wherein the key step is expression feature extraction, and the effectiveness of the extracted features determines the level of the facial expression recognition performance. In the prior art, the whole facial expression image is generally recognized, and the facial expression transfer of important information is mainly realized through the change of eyes, lips and mouth. Therefore, if the feature extraction is carried out on the whole facial expression image, the loss of part of expression feature information is easy to cause, the original feature information is lost to a certain extent, the obtained identification performance is not satisfactory, and in addition, the extracted feature dimensions are very large, so that the classification in the next stage is not facilitated; and the recognition accuracy is not high.

Disclosure of Invention

The invention provides a face recognition method based on the auxiliary learning of an obvious expression change area, which aims to solve the problems that in the prior art, the recognition performance is low, the extracted feature dimension is very large and the classification of the next stage is not facilitated because only the feature extraction is carried out on the whole face expression image, and the recognition accuracy is effectively improved through parameter sharing between an auxiliary learning network and a main network.

In order to achieve the purpose of the invention, the technical scheme is as follows: a facial expression recognition method based on the auxiliary learning of a significant expression change area comprises the following steps:

s1: constructing a main network comprising 5 layers of feature extraction layers for extracting facial expression features, inputting the extracted high-layer semantic features into a full connection layer, and inputting the features output by the full connection layer into a Softmax classification layer for expression classification operation to obtain an expression result judged by the network;

s2: constructing an auxiliary learning network comprising 5 layers of feature extraction layers for extracting significant expression features in a human face, inputting the extracted high-level semantic features into a full connection layer, and inputting the features output by the full connection layer into a Softmax classification layer for expression classification operation to obtain an expression result judged by the network;

s3: sharing parameters of the front 3 layers of feature extraction layers of the main network and the auxiliary learning network; then, the output characteristics of the fourth layer and the fifth layer of the auxiliary learning network are respectively weighted and fused with the output characteristics of the fourth layer and the fifth layer of the main network, and then the fused characteristics are input into the main network to continue the extraction work of the high-level semantic characteristics of the auxiliary main network;

s4: the main network and the auxiliary learning network adopt cross entropy loss functions to judge the network loss, the back propagation of the network is carried out according to the judgment result of the network loss, the parameters of the main network and the auxiliary learning network are adjusted, and the main network and the auxiliary learning network are continuously optimized;

s5: extracting corresponding face area images in each image from the facial expression data set with the facial expression labels by using a face detection and positioning algorithm, and inputting the face area images into a main network for training; meanwhile, preprocessing the face region image to obtain an image with a region with a significant change in expression, and inputting the image into an auxiliary learning network for training; training the main network and the auxiliary learning network in an alternate training mode;

s6: and outputting the facial expression image to be recognized into the main network to complete facial expression recognition.

Preferably, each of the 5 layers of feature extraction layers of the main network and the 5 layers of feature extraction layers of the auxiliary learning network includes a convolution layer, a pooling layer, a Batch Normalization layer and a ReLU layer; and 5 layers of feature extraction layers of the main network and 5 layers of feature extraction layers of the auxiliary learning network are used for extracting facial expression features.

Preferably, in step S3, the weighted fusion is calculated as follows:

wherein: α is a weighting factor;

is the feature output of the ith layer of the main network structure,

is an auxiliary netThe characteristic output of the i-th layer of the complex,

the feature vector is subjected to main and auxiliary network fusion, and i is 4 and 5;

the fused features are input into the next layer as output features of the corresponding layer of the main network and continue to be propagated forwards.

Further, α is 0.5, and the feature weights extracted by the fourth layer and the fifth layer of the feature extraction layer of the main network and the auxiliary learning network are respectively 0.5 in proportion.

Preferably, in step S4, the loss of the cross entropy loss function is calculated as follows:

the whole network aims to minimize the loss function of the main network and the auxiliary learning network:

argmin(Loss _main +Loss _auxiliary )

wherein: loss _main Loss function, Loss, of the main network _auxiliary Is a loss function of the secondary learning network.

Preferably, in step S5, the training mode is to train the primary network three times, and then train the auxiliary learning network once, and cycle training is repeated.

Further, in step S5, the image with the significantly changing expression area includes feature data of an eye and an eyebrow area and feature data of a lip and a mouth area.

The invention has the following beneficial effects:

1. according to the method, an auxiliary learning network is built to extract the characteristics of the significant expression change area in the facial expression image, parameters of the main network and the first 3 layers of characteristic extraction layers of the auxiliary learning network are shared, and the characteristics extracted by the fourth layer and the fifth layer of the auxiliary learning network are subjected to characteristic weighting fusion with the fourth layer and the fifth layer of the main network, so that the main network structure can learn the characteristics of some significant expression areas in the auxiliary network.

2. Processing the facial expression data set with the facial expression labels by using a facial detection and positioning algorithm to obtain corresponding facial area images, and training a main network; and preprocessing the face region image to obtain an image with a region with significant expression change, and training the auxiliary learning network, so that the main network for expression recognition can focus more attention on the region with significant expression change, and expression features with more recognizability and robustness are extracted.

Drawings

FIG. 1 is a diagram of the total feature extraction layer of the present invention.

Fig. 2 is a feature extraction level diagram of the nth level feature extraction.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 1, a facial expression recognition method based on the area with significant expression change for assisted learning includes:

s1: constructing a main network comprising 5 feature extraction layers for extracting facial expression features, wherein each feature extraction layer in the 5 feature extraction layers of the main network comprises a convolution layer, a pooling layer, a Batch Normalization layer and a ReLU layer, as shown in FIG. 2; the extracted high-level semantic features are input into a full connection layer, a 3-level full connection structure is adopted in the embodiment, and then the features output by the full connection layer are input into a Softmax classification layer to perform expression classification operation, so that an expression result judged by a network is obtained.

S2: constructing an auxiliary learning network comprising 5 feature extraction layers for extracting significant expression features in a human face, wherein each feature extraction layer in the 5 feature extraction layers of the auxiliary learning network comprises a convolution layer, a pooling layer, a Batch Normalization layer and a ReLU layer, as shown in FIG. 2; the extracted high-level semantic features are input into a full connection layer, a 3-level full connection structure is adopted in the embodiment, and then the features output by the full connection layer are input into a Softmax classification layer to perform expression classification operation, so that an expression result judged by a network is obtained.

S3: in order to enable the main network to learn the characteristics of the significant expression areas in the auxiliary learning network, sharing the extracted parameters of the front 3 layers of characteristic extraction layers of the main network and the auxiliary learning network; in order to enable the network to learn different characteristics respectively, the output characteristics of the fourth layer and the fifth layer of the auxiliary learning network are weighted and fused with the output characteristics of the fourth layer and the fifth layer of the main network respectively, and then the fused characteristics are input into the main network to continue the extraction work of the high-level semantic characteristics of the auxiliary main network;

s4: in order to improve the facial expression recognition precision, the main network and the auxiliary learning network both adopt a cross entropy loss function to judge the network loss, perform backward propagation of the network according to the judgment result of the network loss, adjust the parameters of the network, and continuously optimize the main network and the auxiliary learning network;

s5: extracting corresponding face area images in each image from the facial expression data set with the facial expression labels by using a face detection and positioning algorithm, and inputting the face area images into a main network for training; meanwhile, preprocessing the face region image to obtain an image with a remarkably expression change region, and inputting the image into an auxiliary learning network for training; training the main network and the auxiliary learning network in an alternate training mode;

in the embodiment, the face expression data set adopts a CK + data set, the CK + data set is divided into frames, the last 3 frames in each sequence are taken as an expression data set with a label, and then a corresponding face area image in each image is extracted from the collected face expression data set CK + by using an Adaboost face detection and positioning algorithm in Opencv, so that the influence of background noise on face expression recognition can be removed to a certain extent.

In step S3 of this embodiment, the weighted fusion is calculated as follows:

wherein: α is a weighting factor;

is the feature output of the ith layer of the main network structure,

is the feature output of the i-th layer of the auxiliary network,

The alpha is 0.5, and the occupation ratio of the extracted feature weights of the fourth layer and the fifth layer of the feature extraction layer of the main network and the auxiliary learning network is 0.5.

In step S4 of this embodiment, the loss of the cross entropy loss function is calculated as follows:

argmin(Loss _main +Loss _auxiliary )

wherein: loss _main Loss function, Loss, of the main network _auxiliary Is a loss function of the assisted learning network.

The parameter information of the main network can be continuously adjusted through optimization of the loss function, so that the main network for expression recognition can focus more attention on the area with obvious expression change, and expression features with better recognition ability and robustness are extracted.

In step S5 of this embodiment, the region with significantly changed expression is mainly a region including the eye and eyebrow region and the region near the lip and mouth, and the removed region is a region near the nose, because the contribution of these regions to the recognition of expression is very small. The processed upper and lower parts of the area are spliced into a complete face image area again, and the image with the area with the significant change of expression for assisting network input is obtained after the preprocessing, wherein the image has the characteristic data of the eye and eyebrow area and the characteristic data of the lip and mouth area.

Because the main network and the auxiliary learning network share the parameters of the first 3 layers, the training mode adopts an alternate training strategy, specifically, the main network is trained for three times, then the auxiliary learning network is trained for one time, and the cyclic training is repeated. In the embodiment, a corresponding face area image in each image extracted from the collected face expression data set CK + is input into the main network for training, and meanwhile, the face area image is preprocessed to obtain an image with a significantly expression change area and is input into the auxiliary learning network for training.

The main network and the auxiliary learning network form an identification model, and the image with identification is output to the model, so that the facial expression identification can be realized.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A facial expression recognition method based on the auxiliary learning of a significant expression change area is characterized by comprising the following steps: the identification method comprises the following steps:

s1: constructing a main network comprising 5 layers of feature extraction layers for extracting facial expression features, inputting the extracted high-level semantic features into a full connection layer, and inputting the features output by the full connection layer into a Softmax classification layer for expression classification operation to obtain an expression result judged by the network;

s2: constructing an auxiliary learning network comprising 5 layers of feature extraction layers for extracting significant expression features in the human face, inputting the extracted high-level semantic features into a full connection layer, and inputting the features output by the full connection layer into a Softmax classification layer for expression classification operation to obtain an expression result judged by the network;

s4: the main network and the auxiliary learning network both adopt a cross entropy loss function to judge the network loss, perform back propagation of the network according to the judgment result of the network loss, adjust the parameters of the main network and the auxiliary learning network, and continuously optimize the main network and the auxiliary learning network;

s5: respectively extracting a corresponding face area image in each image from the face expression data set with the face expression label by using a face detection and positioning algorithm, and inputting the face area images into a main network for training; meanwhile, preprocessing the face region image to obtain an image with a region with a significant change in expression, and inputting the image into an auxiliary learning network for training; training the main network and the auxiliary learning network in an alternative training mode;

s6: outputting the facial expression image to be recognized to a main network to complete facial expression recognition;

in step S3, the weighted fusion is calculated as follows:

wherein: α is a weighting factor;

is the feature output of the ith layer of the main network structure,

is the feature output of the i-th layer of the auxiliary network,

the fused features are input into the next layer for continuous forward propagation as output features of the corresponding layer of the main network;

the alpha is 0.5, and the occupation ratio of the extracted feature weights of the fourth layer and the fifth layer of the main network and the auxiliary learning network is 0.5;

in step S4, the loss of the cross entropy loss function is calculated as follows:

argmin(Loss _main +Loss _auxiliary )

2. The facial expression recognition method based on the significant expression change region aided learning of claim 1, wherein: each of the 5 layers of feature extraction layers of the main network and the 5 layers of feature extraction layers of the auxiliary learning network comprises a convolution layer, a pooling layer, a Batch Normalization layer and a ReLU layer; and 5 layers of feature extraction layers of the main network and 5 layers of feature extraction layers of the auxiliary learning network are used for extracting facial expression features.

3. The facial expression recognition method based on the significant expression change region aided learning of claim 1, wherein: and step S5, training the main network three times in an alternative training mode, then training the auxiliary learning network once, and repeating the cycle training.

4. The facial expression recognition method based on the significant expression change region aided learning of claim 1, wherein: and step S5, the image with the expression significant change area comprises the feature data of the eye and eyebrow area and the feature data of the lip and mouth area.