CN113469116A

CN113469116A - Face expression recognition method combining LBP (local binary pattern) features and lightweight neural network

Info

Publication number: CN113469116A
Application number: CN202110817351.1A
Authority: CN
Inventors: 霍华; 于亚丽; 刘俊强; 刘中华; 于春豪
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-10-01

Abstract

The invention provides a facial expression recognition method combining LBP characteristics and a lightweight neural network, wherein the recognition method adopts the concept of Xception model decomposition to divide a full convolution layer into two partial convolutions of a depth convolution layer and a point-by-point convolution layer to extract characteristics, so that the parameter number and the calculation cost are greatly reduced; meanwhile, in order to solve the problems of gradient disappearance and gradient explosion and improve the propagation capacity of the gradient between product layers, the method adopts an inverted residual error structure with a linear bottleneck in the MobileNet V2; moreover, since the full-connection layer is easy to be over-fitted, not only is the dependence on dropout regularization seriously caused, but also the generalization capability of the whole network is influenced, the global average pooling is adopted to replace the CNN full-connection layer, namely, all pixel values of the feature map of each channel are averaged, and then the obtained feature vectors are directly sent to the Softmax layer for classification, so that the correspondence between feature mapping and classification is enhanced, and the over-fitting of the whole structure is prevented.

Description

Face expression recognition method combining LBP (local binary pattern) features and lightweight neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a facial expression recognition method combining LBP characteristics and a lightweight neural network.

Background

With the change of interaction methods, research on facial expression recognition is receiving more and more attention. It has been shown that human facial expressions convey a very large percentage of information, up to 55%, 38% of which are derived from the speaker's tone, speech, rhythm, etc., and only 7% of which depend on the content of the speaker. Facial expressions, as a form of non-verbal communication, can convey non-verbal information that can help listeners infer emotional changes and intentions of the speaker as an aid to language. It follows that human facial expressions play a crucial role in human-to-human communication. In 1971, the psychologist Ekman et al study originally suggested that humans have six major emotions, each reflecting a unique expression of a person's unique mental activities. These six emotions consist of anger, happiness, sadness, surprise, disgust, and fear, and are called basic emotions. Facial expression recognition is a cross discipline spanning the fields of artificial intelligence, neurology, computer science and the like, has wide application in the fields of psychoanalysis, clinical medicine, vehicle monitoring and commerce, and is widely applied to actual scenes of human-computer interaction, online education, intelligent monitoring, medical services and the like.

The extraction of the expression features in the facial expression recognition process is key, the expression classification is a target, and the effective expression feature extraction can improve the accuracy of the expression classification. At present, a facial expression recognition method is a traditional facial expression recognition algorithm for manually extracting features, and mainly designs a proper feature extraction algorithm according to different feature requirements, and combines different classifiers for classification, such as Gabor wavelet transform, Local Binary Pattern (LBP), Active Appearance Model (AAM), and the like, and the traditional feature extraction method is difficult to extract high-dimensional stage features, so that part of expression information is lost, and the accuracy of facial expression classification is influenced; the other type is a facial expression recognition method based on deep learning, the main content of the method is how to construct and train a deep neural network model, the facial expression features extracted by the deep neural network are more favorable for visualization, the essence of facial expression can be better represented, the layered facial expression features are obtained, then a rapid and accurate classifier is designed, and finally a classification result is output.

With the development of artificial intelligence, a deep neural network gradually becomes a research hotspot in the technical field of image processing. The deep neural network can abstract the bottom layer characteristics into more effective characteristic representation through the nonlinear relation among different learning tasks, thereby replacing the traditional artificial characteristics. Meanwhile, a large amount of data training is carried out by establishing a model with a multilayer structure, so that effective features are obtained to improve the accuracy of recognition, and therefore, the problems that the feature extraction is difficult and the feature extraction is incomplete in the traditional method are solved by an improved lightweight neural network model, and the method is a research hotspot in the field.

Disclosure of Invention

In view of this, the present invention aims to provide a facial expression recognition method combining LBP features and a lightweight neural network, which aims to solve the problems of weak robustness and weak generalization capability of a recognition model in the prior art, and improve the accuracy of facial expression recognition in a scene more in line with the actual life.

In order to achieve the purpose, the invention adopts the technical scheme that: a facial expression recognition method combining LBP characteristics and a lightweight neural network comprises the following steps:

step 1: acquiring a facial expression data set Fer2013, and preprocessing facial expression pictures in the data set to obtain original image data;

step 2: extracting local texture features of the face region from the original image data obtained in the step 1 by adopting an LBP (local binary pattern) of a rotation invariant mode;

and step 3: randomly dividing the data set obtained after the processing in the step 2 into a training set and a testing set, and performing data enhancement processing on the training set;

and 4, step 4: inputting the data set processed in the step (3) into an improved lightweight neural network for feature extraction, and constructing a model; then compiling the model, training the model by using the training set after data enhancement, and testing the model by using the test set;

and 5: and (4) extracting facial expression features based on the model obtained in the step (4), and classifying the extracted expression features by adopting a Softmax classifier to obtain a facial expression recognition result.

Further, the step 1 is a process of preprocessing the facial expression picture in the data set, and specifically includes the following steps:

1.1, adjusting each image to 48 × 48, and then expanding the image dimension through Expand _ dims (), so that the image input by the lightweight neural network is a single-channel image of 48 × 1;

1.2, acquiring Emotion expression classification of the expression data set, and acquiring a unique hot code of each expression through Get _ dummy ();

1.3, image graying: graying the three-dimensional color image into a single-channel gray image through a color space conversion function;

1.4, pixel normalization: and adjusting the feature value scales of different dimensions to be in a similar range through normalization, namely, dividing each pixel value of the image by 255 to be normalized to be in a range of 0-1.

Further, in step 3, after the data set is randomly divided into a training set and a testing set, the ratio of the facial expression pictures in the training set and the testing set is 8:2, and then real-time data enhancement processing is performed on the facial expression pictures in the training set.

Further, the process of extracting the features in step 4 specifically includes the following steps:

4.1, inputting the data set processed in the step 3 into an improved lightweight neural network, firstly performing two-dimensional convolution through a 2D convolution layer with the convolution kernel size of 3 × 3 and the convolution kernel number of 32, wherein the step size is 1, and the convolution output size is 48 × 32;

then, the output is processed by 7 inversion residual modules, each inversion residual module comprises a Bottleneck layer, the repetition times of each Bottleneck layer are respectively 1, 2, 3, 4, 3, 3 and 1, the step length is sequentially 1, 1, 1, 2, 1, 2 and 1, the channel number is sequentially 16, 24, 32, 64, 96, 160 and 320;

4.2, then sending the output of the 7 inverted residual modules into a point-by-point convolution layer with the channel number of 512 and a global average pooling layer replacing a full-connection layer;

4.3, finally, sending the output picture to a point-by-point convolution layer with a channel of 7, wherein except for 1 × 1 convolution for dimensionality reduction, all operations of 2D convolution, dimensionality 1 × 1 convolution and deep convolution are required to pass through a normalization layer and a Relu6 activation layer to accelerate the convergence speed of the network and increase the capability of extracting nonlinear features, and the dimensionality reduction 1 × 1 convolution is only subjected to normalization and does not pass through the nonlinear activation layer to finally form a feature result of 1 × 512.

Further, a bottleeck layer includes two point-by-point convolution layers and a depth convolution layer, the convolution kernel size of the depth convolution layer is 3 × 3, and the convolution kernel size of the point-by-point convolution is 1 × 1.

Further, in step 4, the model is compiled through the. build ().

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a facial expression recognition method combining LBP (local binary pattern) characteristics and a lightweight neural network, which aims to solve the problems of more training parameters and large calculated amount of a traditional neural model;

2. in order to solve the problems of gradient disappearance and gradient explosion and improve the propagation capacity of the gradient between product layers, the method adopts an inverted residual error structure with a linear bottleneck in the MobileNet V2;

3. because the full-connection layer is easy to be over-fitted, not only is the Dropout regularization seriously depended, but also the generalization capability of the whole network is influenced, the global average pooling is adopted to replace the CNN full-connection layer, namely all pixel values of the feature map of each channel are averaged, and then the obtained feature vectors are directly sent to the Softmax layer for classification, so that the correspondence between feature mapping and classification is enhanced, and the over-fitting of the whole structure is prevented;

4. on the data set, the method adopts the Fer2013 data set which is more in line with the actual life scene to carry out experiments, so that the facial expression image recognition precision is improved, and the method has stronger robustness and generalization capability.

Drawings

FIG. 1 is a flow chart of a facial expression recognition method of the present invention combining LBP features and a lightweight neural network;

FIG. 2 is a schematic diagram of texture feature extraction using LBP with rotation invariant patterns;

FIG. 3 is a schematic view of a rotation invariant LBP;

FIG. 4 is a schematic diagram of a comparison of an original image and a corresponding rotation invariant LBP feature image;

FIG. 5 is a confusion matrix computed over a test set;

FIG. 6 is a partial image of a Fer2013 data set after preprocessing;

FIG. 7 is a schematic flow chart of inputting the LBP feature image processed in step 3 into an improved lightweight neural network for processing;

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.

The principle of the invention is as follows: in order to solve the problems of difficult feature extraction and incomplete feature extraction in the traditional method, the invention designs a model combining LBP features and an improved lightweight neural network, and an input picture is a 48-by-48 single-channel gray-scale map after LBP feature extraction. The concept of Xception model decomposition is adopted to split the full convolution layer into two partial convolutions of depth convolution and point-by-point convolution to extract features. Meanwhile, in order to solve the problems of gradient disappearance and gradient explosion and improve the propagation capacity of the gradient between product layers, the model adopts an inverted residual error structure with a linear bottleneck in the MobileNet V2. In order to strengthen the correspondence between the feature mapping and the categories, global average pooling is adopted in the last layer of the network model to replace a CNN full-connection layer, namely, all pixel values of the feature map of each channel are averaged, and then the obtained feature vectors are directly sent to a Softmax layer to perform expression recognition classification. Most facial expression recognition adopts other data sets without interference, the network model adopts a Fer2013 data set, the data set is a data set provided by 2013 Kaggle facial expression analysis competition, most of the data sets are downloaded from network crawlers, and certain error exists, such as interference of non-facial pictures, cartoon faces, side faces, black faces, white faces and the like. However, the data of the Fer2013 data set is more complete and better accords with the scenes of actual life, so that a Fer2013 training and testing model is mainly selected.

The invention provides a facial expression recognition method combining LBP characteristics and a lightweight neural network, which comprises the following specific implementation methods:

fig. 1 is a schematic flow chart of a facial expression recognition method combining LBP features and a lightweight neural network according to the present invention, which mainly includes:

step 1: acquiring a facial expression data set Fer2013, preprocessing a facial expression picture in the data set to obtain original image data, and as shown in fig. 6, obtaining a partial image obtained after preprocessing the Fer2013 data set;

The implementation of each step of the invention is detailed as follows:

step 1: and acquiring a facial expression data set Fer2013, and preprocessing facial expression pictures in the data set to obtain original image data.

The Fer2013 is a data set provided by a Kaggle facial expression analysis match, most of the data set is downloaded from a web crawler, and certain error exists, such as interference of non-facial pictures, cartoon faces, side faces, black faces, white faces and the like. The data set does not directly give pictures, but stores data of expressions, picture data and purposes into a csv file, a first row of the csv file is opened to be a header to explain the meaning of each line of data, a first line represents an expression label, a second line is picture data, the first line is original picture data, and the last line is the purpose. The Fer2013 facial expression data set consists of 35887 facial expression pictures, wherein 28709 test charts (Training), 3589 public verification charts (public verification charts) and 3589 private verification charts (PrivateTest) are respectively provided, each picture consists of a gray image with the size fixed to 48 × 48, 7 expressions are provided, the expressions respectively correspond to the digital labels 0-6, and the labels corresponding to the specific expressions are as follows in Chinese and English: 0-anger-gas generation; 1-disgust-aversion; 2-fear; 3-happy; 4-sad-hurting heart; 5-surrised-surprise; 6-neutral.

Firstly, loading a data set Fer2013, acquiring pixel values of a second column of pixels in a csv file, traversing an image list, converting the pixel values into a 48 × 48 matrix, and expanding the image dimension through Expand _ dims (), so that a picture input by the lightweight neural network is a single-channel image of 48 × 1; acquiring a first column of Emotion expression labels of the expression data set, and acquiring One-Hot Encoding (One-Hot Encoding) of each expression through Get _ dummy ().

Secondly, pixel normalization is carried out after the pixel values and the expression labels are processed, the characteristic value scales of different dimensions are adjusted to be in a similar range through normalization, and each pixel value of the image is divided by 255 and normalized to be in a 0-1 interval. Because the normalization of the image keeps affine invariance, the information storage of the image is not changed after the normalization.

Finally, when the trained model is used for real-time detection, face detection needs to be carried out through an OpenCV (open circuit vehicle) cascade classifier, the size of 48 x 48 is cut out in a zooming picture, a face part is framed by a rectangle, and then face coordinates and the size of the face are saved by a vector; the detected face image is grayed, the color image is sensitive to a light source and is easily influenced by illumination, negative interference is generated on the expression recognition process by directly using the color image, and therefore the three-dimensional color image is grayed and converted into a single-channel gray image through a color space conversion function.

Step 2: and (3) extracting local texture features of the face region by adopting LBP (local binary patterns) of a rotation invariant mode for the original image data obtained in the step (1).

In order to extract more abundant features in the image and improve the identification accuracy, LBP of a rotation invariant pattern is adopted for the original image to extract local texture features of the face region. Fig. 4 is a diagram showing a comparison between an original image and a rotation-invariant LBP feature image.

LBP is an operator used to describe texture features, first proposed in 1996 by t.ojala et al, and the features used for extraction are local texture features of images. Since the original LBP operator is defined in a 3 × 3 window, it covers only a small area within a fixed radius, which obviously cannot satisfy the requirements of different sizes and frequency textures, and the original LBP operator is gray-scale invariant and not rotation-invariant, and the rotation of the image will result in different LBP values. In order to adapt to texture features of different scales and meet the requirements of gray scale and rotation invariance, an LBP operator with rotation invariance, which is also called an LBP rotation invariance mode, is adopted.

In the LBP rotation invariant mode, a 3 x 3 neighborhood is expanded to an arbitrary neighborhood, a square neighborhood is replaced by a circular neighborhood, an improved LBP operator allows P pixel points in the circular neighborhood with the radius of R, the neighborhood can be represented by (P, R), and when the circular neighborhood is continuously rotated to obtain a series of initially defined LBP values, the minimum value of the LBP is taken as the LBP value of the neighborhood. The principle diagram of LBP texture feature extraction is shown in figure 2. In fig. 2, black and white represent weaker and stronger pixels than the central pixel. The image area is flat (i.e., featureless) when the surrounding pixels are either all black or all white. A continuous set of black or white pixels is considered a "uniform" pattern and may be interpreted as a corner or edge. An image area is considered "non-uniform" if the pixels switch back and forth between black and white pixels.

Expressed by a mathematical formula, given a pixel (x)_c，y_c) The number of sampling points is P and the radius of the sampling circle is R, and the obtained LBP can be expressed as the following decimal system:

where P represents the P-th sample point of the total P sample points in the circular region, i_cRepresenting the gray value of the central pixel in the neighborhood of the circle, i_pThe gray value of the pixel point at the circular boundary is represented, and the function s (x) is defined as:

a total of P sample points are on the circular boundary, and the coordinates of the sample points are calculated by the following formula:

the value obtained from the original LBP is converted into a binary code, which is subjected to a cyclic shift operation, expressed by a mathematical formula, taking the smallest value of all results:

fig. 3 is a schematic view of a rotation invariant LBP. Taking P as 8, R as 1 as an example, white is 1, black is 0, the numbers below the operators in the figure represent LBP values corresponding to the operators, the LBP value of the point which is initially calculated is 11100001 as 225, 8 LBP patterns are obtained through rotation, and then the minimum value 15 of the patterns is taken as the LBP value of the point, so that the LBP value of the point is 15 no matter how the picture is rotated, that is, the rotation invariance.

and a cross verification method is adopted to randomly and objectively divide data, so that human factors are reduced. Wherein 80% of the expression pictures are training sets, 20% of the expression pictures are testing sets, and the seeds of the random numbers are set to be 42 to control the random state, so that the random numbers are the same random number each time, the generation of the random numbers depends on the seeds, and the relationship between the random numbers and the seeds obeys the following two rules: different seeds generate different random numbers; the seeds are the same, and the same random numbers are generated even though the examples differ.

Because the data volume is limited, an imagedata generator is adopted for carrying out real-time data enhancement on the divided training set pictures to generate tensor image data batches, and the picture generator is responsible for generating pictures of one batch by one batch and training the models in the form of the generator; carrying out data enhancement processing on the training pictures of each batch timely; in each batch, the random rotation degree range of the image is set to be 10, the horizontal and vertical translation range is set to be 0.1, the random scaling range is set to be 0.1, the random horizontal overturning operation is performed, and the like, so that the size of the data set is effectively expanded, and the generalization capability of the model is enhanced.

And 4, step 4: inputting the data set processed in the step (3) into an improved lightweight neural network for feature extraction, and constructing a model; the model is then compiled and trained with a data-enhanced training set, and the model is tested with a test set, as shown in FIG. 5 for the confusion matrix computed on the test set.

1) Separable convolution: splitting the full convolution layer into two partial convolutions of a depth convolution layer and a point-by-point convolution by adopting an idea of Xception model decomposition; the convolution kernel size of the deep convolution is W_K*H _K1, inputting picturesThe number of the channels is the number of convolution kernels, each channel independently performs convolution operation, and the number of the characteristic graphs generated in the process is completely the same as the number of the input channels; and combining the feature maps generated independently by each channel by point-by-point convolution to generate a new feature map, wherein the size of a convolution kernel is 1x M, M is the number of channels of the previous layer, and several output feature maps exist in several convolution kernels. And the number of partial convolution parameters and the calculation amount are greatly reduced compared with the conventional convolution.

Depth convolution parameter number P1 and calculation number C1 (not taking into account the addition calculation number):

P1＝W_K*H_K*M

C1＝W_K*H_K*1*W_F*H_F*M；

point-by-point convolution parameter P2 and calculation C2 (without taking into account the additive calculation):

P2＝1*1*M*N

C2＝1*1*M*W_F*H_F*N；

separable convolution (Xception) parameter P3 and calculated quantity C3

P3＝P1+P2

C3＝C1+C2；

Convolution parameter P4 and calculated quantity C4

P4＝W_K*H_K*M*N

C4＝W_F*H_F*N*(W_K*H_K*M)；

Separable convolution (Xception) compares with the normal convolution parameters:

separable convolution (Xception) compares the amount of computation with the normal convolution:

where K is the convolution kernel, F is the input image, M is the number of input picture channels, N is the number of output channels, W_K＝H_K，W_F＝H_F。

As can be seen from equations (1) and (2), the depth convolution and the point-by-point convolution have two partial convolutions, and the partial convolution parameters and the amount of calculation are greatly reduced compared to the conventional convolution.

2) Inverted residual structure of linear bottleneck: in order to solve the problems of gradient disappearance and gradient explosion and improve the propagation capacity of the gradient between product layers, an inverted residual structure is adopted, 1 × 1 convolution is firstly carried out to increase the dimension, then 3 × 3 depth convolution layers are carried out, and finally 1 × 1 convolution is used to reduce the dimension, firstly expansion is carried out, and then compression is carried out. Since the last 1 × 1 convolution is dimensionality reduction, the Relu6 activation function can effectively add non-linearity in the high-dimensional space, while for the low-dimensional space, performing linear mapping preserves features, while non-linear mapping destroys features, so the last 1 × 1 convolution is changed to linear mapping without adding the activation function Relu 6. The linear bottleneck and reverse residual structure optimize the network, so that the network hierarchy is deeper, but the model is smaller in size and faster.

Inverted residual structure parameter P4 and calculated quantity C4:

P4＝P1+P2+P2

C4＝C1+C2+C2；

3) inputting the LBP characteristic image processed in the step 3 into an improved lightweight neural network, firstly performing two-dimensional convolution through a 2D convolution layer with the convolution kernel size of 3 x 3 and the convolution kernel number of 32, wherein the step size is 1, and the convolution output size is 48 x 32; then, the output is processed by 7 inversion residual modules (each inversion residual module comprises a bottleeck layer, one bottleeck layer comprises two point-by-point convolution layers and a depth convolution layer, the convolution kernel size of the depth convolution layer is 3 x 3, the convolution kernel size of the point-by-point convolution layer is 1x 1), each bottleeck layer is respectively repeated for 1, 2, 3, 4, 3, 3 and 1 time, the improvement step length is 1, 1, 1, 2, 1, 2 and 1 in sequence, and the number of channels is 16, 24, 32, 64, 96, 160 and 320 in sequence; fig. 7 is a schematic flow chart of inputting the LBP feature image processed in step 3 into an improved lightweight neural network for processing, and then sending the output of 7 inverted residual error modules into point-by-point convolutional layers with 512 channels and a global average pooling layer replacing a full link layer; finally, sending the output picture into a point-by-point convolution layer with a channel of 7; except for the dimension 1 × 1 convolution, all the 2D convolution, dimension 1 × 1 convolution and depth convolution operations need to pass through a normalization layer and a Relu6 activation layer to accelerate the convergence speed of the network and increase the capability of extracting nonlinear features, but only the normalization is carried out on the dimension 1 × 1 convolution layer, and the nonlinear activation layer is not passed through; since the calculation amount of the last layer at the end of the network of MobileNetV2 is large, 512 channels are adopted in the last layer of the application, and finally a characteristic result of 1 × 512 is formed.

In a conventional convolutional neural network, the feature map of the last convolutional layer is vectorized and input into the fully-connected layer, and then the fully-connected layer is sent into the Softmax layer for classification. However, the fully-connected layer is prone to overfitting, resulting in not only a severe dependence on Dropout regularization but also an impact on the generalization capability of the entire network. Therefore, the present application uses global average pooling to replace the CNN fully-connected layers, that is, averaging all pixel values of the feature map of each channel to obtain a new feature map of 1 × 1, that is, changing a tensor of W × H × D into a tensor of 1 × D, and then directly sending the obtained vector to the Softmax layer for classification. Global average pooling has the following advantages compared to fully connected layers:

it is more suitable for convolution structures by enhancing the correspondence between feature maps and classes.

Secondly, no parameters of the global average pooling layer need to be optimized, so that the parameter quantity is reduced, and the calculated quantity is reduced.

Third, the global average pooling itself is a structure regularizer, which itself prevents overfitting of the overall structure.

And fourthly, the global spatial information is integrated, and the robustness to the spatial information of the input picture is stronger.

4) The learning process is configured by the factor () and a cross entropy loss function, category _ cross entropy, is selected as the loss function, and the cross entropy is expressed in machine learning as the difference between the true probability distribution and the predicted probability distribution. The smaller the value of the cross entropy is, the better the model prediction effect is, the cross entropy is often used in combination with a Softmax function, the output result is processed, the sum of the predicted values of a plurality of classifications is 1, and then the loss is calculated through the cross entropy. The model is then trained using the data-enhanced training set,. fit _ generator () function accepts the batch data, performs back propagation, and updates the weights in the model, repeating the process until the desired number of epochs is reached.

5) Forward propagation is inputting image data through the input layer, propagating forward, outputting a result through the output layer. Forward propagation requires the input A [0], i.e., X, to initialize the input values for the first layer. a [0] corresponds to the input features of a training sample and A [0] corresponds to the input features of an entire training sample, so this is the input to the first forward function of the chain below, and repeating this step can compute the forward propagation from left to right.

The steps of forward propagation are as follows:

z^[L]＝W^[L]*a^[L-i]+b^[L]

a^[L]＝g^[L](_z ^[L])；

wherein, L: represents the L-th layer neural network, W: weight, a: activation unit, b: bias unit, g: the function is activated.

6) Back propagation is the difference between the result of calculating the forward propagation output and the reality, called loss or error. Firstly, the error of the last layer is calculated, then the error of each layer is reversely solved by one layer, and the weight is updated through the difference value and the learning rate.

Table 1 shows the comparison of the model of the present invention with Xception and MobileNet V2 parameters, wherein Table 1 specifically follows:

it can be seen from table 1 that the model parameters of the present invention are greatly reduced compared to Xception and MobileNetV 2.

Table 2 shows the comparison of the accuracy of the test set of different algorithm models on the Fer2013 data set (7 types), where table 2 specifically includes the following:

as can be seen from Table 2, the model of the invention can effectively extract facial expression features and classify expressions, and has better accuracy and robustness. Although the model of the present invention achieves good results, there is still much room for improvement, and research on feature extraction will continue to be conducted to achieve higher accuracy.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A facial expression recognition method combining LBP characteristics and a lightweight neural network is characterized by comprising the following steps:

2. The facial expression recognition method combining the LBP features and the lightweight neural network as claimed in claim 1, wherein the step 1 is a process of preprocessing the facial expression picture in the data set, and specifically includes the following steps:

3. The method for recognizing facial expressions by combining LBP characteristics and a lightweight neural network as claimed in claim 1, wherein in step 3, after the data set is randomly divided into a training set and a test set, the ratio of facial expression pictures in the training set and the test set is 8:2, and then real-time data enhancement processing is performed on the facial expression pictures in the training set.

4. The facial expression recognition method combining the LBP features and the lightweight neural network as claimed in claim 1, wherein the process of feature extraction in step 4 specifically comprises the following steps:

4.3, finally, sending the output picture to a point-by-point convolution layer with a channel of 7, wherein except for 1 × 1 convolution for dimensionality reduction, all operations of 2D convolution, dimensionality 1 × 1 convolution and deep convolution are required to pass through a normalization layer and a ReLU6 activation layer to accelerate the convergence speed of the network and increase the capability of extracting nonlinear features, and the 1 × 1 convolution is only subjected to normalization and does not pass through the nonlinear activation layer, so that a 1 × 512 feature result is finally formed.

5. The method of claim 4, wherein a Bottleneck layer comprises two point-by-point convolutional layers and a deep convolutional layer, the convolutional kernel size of the deep convolutional layer is 3 × 3, and the convolutional kernel size of the point-by-point convolution is 1 × 1.

6. The method for facial expression recognition by combining LBP features and a lightweight neural network as claimed in claim 4, wherein in step 4, the model is compiled by means of. component ().