CN114049675B

CN114049675B - Facial expression recognition method based on light-weight two-channel neural network

Info

Publication number: CN114049675B
Application number: CN202111430259.6A
Authority: CN
Inventors: 樊春晓; 王振兴; 林杰
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2024-02-13
Anticipated expiration: 2041-11-29
Also published as: CN114049675A

Abstract

The invention relates to the technical field of expression recognition methods, and discloses a facial expression recognition method based on a light-weight double-channel neural network, which comprises the following operation steps: s1, image preprocessing and construction of a graph structure; s2, constructing a GCN-based light-weight dual-channel network, and automatically extracting global features and local features of the expression; s3, feature fusion and expression classification. According to the invention, a graph structure is constructed from the input expression image, two local characteristics of facial expression geometry and texture can be automatically extracted by using the GCN, and the interference of human factors is avoided, so that the accuracy of expression classification results is improved, and the lightweight two-channel network still can obtain excellent classification performance under the conditions of simplified network, few network layers and small parameters, and has higher running speed and better robustness.

Description

Facial expression recognition method based on light-weight two-channel neural network

Technical Field

The invention relates to the technical field of expression recognition methods, in particular to a facial expression recognition method based on a light-weight double-channel neural network.

Background

The existing traditional method based on local features mainly aims at encoding and characterizing expression variable areas (such as eyes, mouth and nose) of a human face. However, the facial local features extracted by the methods are easily interfered by human factors, and can cause loss of facial expression information, so that classification is inaccurate. The deep learning method based on the global features takes RGB original face data as input, so that the accuracy of recognition is improved, but the complexity of a network in the method is also increased. And because the number of facial expression datasets is limited, the problem of over fitting is easy to occur. The algorithm has the problems of large performance fluctuation, poor robustness and the like when facing complex scenes, regardless of the traditional method based on local features or the depth method based on global features.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a facial expression recognition method based on a light-weight double-channel neural network, which solves the problems of low accuracy and poor robustness of the existing method.

(II) technical scheme

In order to achieve the above purpose, the present invention provides the following technical solutions: the facial expression recognition method based on the light-weight double-channel neural network comprises the following operation steps:

s1, image preprocessing and construction of a graph structure

S11, preprocessing a face image of an input picture, and firstly carrying out graying treatment on the face image to reduce the data dimension;

s12, face detection and clipping are carried out to reduce the influence of background information irrelevant to the face in the image on feature extraction and the like;

s13, normalizing the cut face image into 224x224 with uniform size, and using the face image as an input of a CNN channel;

s14, construction of graph structure

Detecting face feature points in an input image, constructing a graph, connecting every two graph nodes, and obtaining a weighted adjacent matrix by forming a weight value of an edge by a distanceRepresenting geometric features of expression, wherein pixel values around the feature points are attributes of the nodes of the graph to obtain a node feature matrix +.>Texture features that may represent expressions;

s2, constructing a GCN-based light-weight dual-channel network, and automatically extracting global features and local features of expressions

S21, global feature channel-CNN channel

The CNN channel consists of 5 convolution units, wherein each convolution unit comprises a convolution layer and a maximum pooling layer which are a convolution kernel of 3x3 and a pooling kernel of 2x2, a correction linear unit is used as an activation function of each convolution layer, a vectorization layer unidimensionally converts multidimensional data into global feature vectors, the later feature vectors are convenient to connect, and a batch normalization layer is added;

in batch training, the activation of each batch is centered around zero mean and unit variance, for an m-dimensional input x= { X ⁽¹⁾ ,...,x ^(m) Regularization of each dimension will be

Where E and Var are the expected value and variance of the input X, and the input of one layer in CNN has four dimensions, so that each dimension is normalized, and by using batch normalization, all samples in one miniband are correlated together, so that the network will not generate a certain result from a certain training sample, i.e. the output of the same sample will not depend only on the sample itself, but also on other samples belonging to the same batch as the sample, and the network will take the batch randomly each time, so that overfitting is avoided to some extent;

s22, local feature channel-GCN channel

The graph convolution network is similar to the common convolutional neural network in concept, and for the node characteristic matrix X and the weighted adjacency matrix A, the propagation modes of layers are as follows:

the GCN channel is specifically formed by a 4-layer graph convolution layer;

s3, feature fusion and expression classification

S31, connecting the global features extracted by the two-channel network with the local features to obtain a connection feature vector;

s32, inputting the data into a full-connection layer for feature fusion and expression classification, and obtaining a final classification result.

Preferably, in the step S2, the four dimensions refer to batch size, channels, width and height.

Preferably, in the step S2,I _N is an identity matrix; />Is->Degree matrix of (2), the formula is->W ^(l) Is a trainable weight matrix; h ^(l) Is a feature of each layer, and for the input layer, H is X; sigma represents an activation function, such as ReLU (·) =max (0, ·).

Preferably, in the step S3, the connection feature vector may be expressed as:

v _c ＝(v _g ,v _l ) Wherein v is _c 、v _g And v _l Representing a connection feature vector, a global feature vector, and a local feature vector, respectively.

(III) beneficial effects

The invention provides a facial expression recognition method based on a light-weight double-channel neural network, which has the following beneficial effects:

(1) The invention constructs the graph structure from the input expression image, and can automatically extract two local characteristics of facial expression geometry and texture by using GCN, thereby being not interfered by human factors and improving the accuracy of expression classification results.

(2) The invention can extract local features and global features simultaneously by using two channels, and fuse the global and local features to obtain a comprehensive representation, thus obtaining better recognition results than the method which uses a single type of features.

(3) The light-weight dual-channel network can still obtain excellent classification performance under the conditions of simplified network, few network layers and small parameters, and has higher running speed and better robustness.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a component diagram of the present invention;

FIG. 3 is a detailed table of CNN channels of the present invention;

fig. 4 is a GCN channel detailed information table of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1-4, the present invention provides a technical solution: the facial expression recognition method based on the light-weight double-channel neural network comprises the following operation steps:

s1, image preprocessing and construction of a graph structure

S11, preprocessing a facial image of an input picture to better extract facial expression characteristics, and carrying out graying treatment on the facial image to reduce data dimension;

s14, construction of graph structure

The input different from CNN is a whole picture, and the input of GCN is a graph structure. In order to construct a graph structure from facial expressions, it is necessary to detect facial feature points in an input image, and construct a graph structure as in fig. 2. In FIG. 2, each two graph nodes are connected, and the distances form the weights of the edges to obtain a weighted adjacency matrixRepresenting geometric features of expression, wherein pixel values around the feature points are attributes of the nodes of the graph to obtain a node feature matrix +.>Texture features that may represent expressions;

S21, global feature channel-CNN channel

The CNN channel consists of 5 convolution units, each containing one convolution layer and one maximum pooling layer, both a 3x3 convolution kernel and a 2x2 pooling kernel, the details of this channel being set forth in fig. 3. The correction linear unit is used as an activation function of each convolution layer, the vectorization layer unidimensionally converts multidimensional data into global feature vectors, the connection of the following feature vectors is facilitated, and in addition, a batch normalization layer is added for solving the problems of large intra-class difference and small inter-class difference of facial expressions. Unlike the face recognition task, where one category represents only one person, in facial expression recognition, one category contains a plurality of individuals. Thus, images belonging to the same expression class may have different appearances, sexes, skin colors, and ages. Thereby creating a large intra-class difference.

Where E and Var are the expected value and variance of the input X. The input to one layer in the CNN has four dimensions (channels), so each dimension is normalized. By using batch normalization, all samples in a minimatch are correlated together, so the network does not generate a deterministic result from a training sample, i.e., the output of the same sample is no longer dependent only on the sample itself, but also on other samples belonging to the same batch as the sample, and the network takes the batch randomly each time, which avoids overfitting to some extent;

s22, local feature channel-GCN channel

wherein the method comprises the steps ofI _N Is an identity matrix; />Is->Degree matrix of (2), the formula is->W ^(l) Is a trainable weight matrix; h ^(l) Is a feature of each layer, and for the input layer, H is X; sigma represents an activation function, e.gReLU(·)＝max(0,·)。

The GCN channel is specifically formed by a 4-layer graph convolution layer, and detailed information is shown in fig. 4;

s3, feature fusion and expression classification

S31, connecting the global features extracted by the two-channel network with the local features to obtain a connection feature vector, which can be expressed as:

v _c ＝(v _g ,v _l )

wherein v is _c 、v _g And v _l Respectively representing a connection feature vector, a global feature vector and a local feature vector;

The method solves the problems of low accuracy and poor robustness of the existing method. The GCN channel extracts local features, the CNN channel extracts global features, the two-channel neural network fuses the global and local features to obtain a comprehensive representation, and a better recognition result is obtained compared with the single type of features. In addition, through the light-weight double-channel network with compact structure, the network is simplified, the problem of network complexity is solved, the problem of over fitting is also solved, and the robustness is very strong.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The facial expression recognition method based on the light-weight double-channel neural network is characterized by comprising the following operation steps of:

s1, image preprocessing and construction of a graph structure

s14, construction of graph structure

S21, global feature channel-CNN channel

s22, local feature channel-GCN channel

the GCN channel is specifically formed by a 4-layer graph convolution layer;

s3, feature fusion and expression classification

2. The facial expression recognition method based on a lightweight two-channel neural network according to claim 1, wherein in the step S2, four dimensions refer to batch size, channels, width and height.

3. The facial expression recognition method based on the lightweight two-channel neural network according to claim 1, wherein in the step S2,I _N is an identity matrix; />Is->Degree matrix of (2), the formula is->W ^(l) Is a trainable weight matrix; h ^(l) Is a feature of each layer, and for the input layer, H is X; sigma represents an activation function, such as ReLU (·) =max (0, ·).

4. The facial expression recognition method based on the lightweight two-channel neural network according to claim 1, wherein in the step S3, the connection feature vector may be expressed as: