CN112528764B

CN112528764B - Facial expression recognition method, system and device and readable storage medium

Info

Publication number: CN112528764B
Application number: CN202011337423.4A
Authority: CN
Inventors: 孙国辉
Original assignee: Hangzhou Xinhe Shengshi Technology Co ltd
Current assignee: Hangzhou Xinhe Shengshi Technology Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-09-03
Anticipated expiration: 2040-11-25
Also published as: CN112528764A

Abstract

The invention discloses a facial expression recognition method, which comprises the following steps: acquiring a facial expression image, and forming the acquired facial expression image into a non-occlusion facial expression image data set; carrying out shielding treatment on each facial expression image sample in the non-shielding facial expression image dataset to form a shielding facial expression image dataset; carrying out equal proportion mixing on the non-shielding facial expression image data set and the shielding facial expression image data set to obtain an image data set, and dividing the image data set into a training data set and a verification data set in equal proportion; constructing an occlusion facial expression recognition model, training the occlusion facial expression recognition model based on a training data set, taking a verification data set as input after the training is finished, and verifying a training result to obtain the occlusion facial expression recognition model; and performing expression recognition on the facial expression image to be detected with or without a shielding object based on the shielding facial expression recognition model to obtain a facial expression image and an expression recognition result.

Description

Facial expression recognition method, system and device and readable storage medium

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a facial expression recognition method, a system and a device and a readable storage medium.

Background

At present, the facial expression recognition technology has wide application prospect and value in the fields of human-computer interaction, intelligent education, auxiliary medical treatment and the like, and is widely concerned by experts in the fields of computer science, psychology, education and the like. The current facial expression recognition system can better recognize various facial expressions in a natural scene, however, various shelters often exist in facial expression images acquired in an actual scene, such as glasses, a mask, a scarf and hair, and self-sheltering caused by limb swinging of a person under spontaneous conditions. The size and the shape of the shelter are rich and variable, and can block any position of a human face. The recognition of the facial expression under the shielding condition in practical application becomes a problem which needs to be solved urgently. The current recognition methods can be divided into a traditional facial expression recognition method and a deep learning-based facial expression recognition method.

Related patents related to the traditional facial expression recognition method generally include CN 101369310B, CN 105825183A, CN 109711283A and the like, and the patents mainly reconstruct an occluded facial expression image through traditional image processing means such as principal component analysis, low-rank decomposition and the like, and on the basis, a corresponding classifier is used for facial expression recognition. The methods rely on the reconstruction effect of the shielding image, and the reconstruction effect of the image directly influences the subsequent feature extraction and identification precision. Due to the fact that the shielding object in the practical application scene has high complexity and variability, the reconstruction effect of the shielding image cannot be guaranteed, and the practical application value of the method is limited.

Related patents related to the occlusion expression recognition method based on deep learning include CN 110837777A, and the method is mainly characterized in that a batch of occlusion facial expression image training samples are manually constructed, and a corresponding depth network is constructed to perform image feature extraction and expression classification. In addition, patent publication No. CN 110119723 a extracts global and local features of an occlusion image using face feature points, and learns weights of the respective features using an attention mechanism. However, the method relies on the facial feature points to extract the local area of the face, and when the feature points are misaligned, the probability of misjudgment is increased.

At present, other methods for recognizing facial expressions based on deep learning exist, but some problems exist, and most methods cannot ensure that facial expression features irrelevant to occlusion are learned by a model.

Disclosure of Invention

The invention provides a facial expression recognition method, a system, a device and a readable storage medium aiming at the defects in the prior art.

In order to solve the technical problem, the invention is solved by the following technical scheme:

a facial expression recognition method comprises the following steps:

acquiring a facial expression image, and forming the acquired facial expression image into a non-occlusion facial expression image data set;

carrying out random shielding treatment on each facial expression image sample in the non-shielding facial expression image data set to form a shielding facial expression image data set;

carrying out equal proportion mixing on the non-shielding facial expression image data set and the shielding facial expression image data set to obtain an image data set, and dividing the image data set into a training data set and a verification data set in equal proportion;

constructing an occlusion facial expression recognition model, training the occlusion facial expression recognition model based on a training data set, taking a verification data set as input after the training is finished, and verifying a training result to obtain the occlusion facial expression recognition model;

and performing expression recognition on the facial expression image to be detected with or without a shielding object based on the shielding facial expression recognition model to obtain a facial expression image and an expression recognition result.

As an implementation manner, the method further comprises an image data set calibration step:

labeling the following based on the characteristics of each facial expression image sample in the image dataset:

facial expression type, whether occlusion processing exists, and corresponding occluded or unoccluded images.

As an implementation, the facial expression recognition model comprises an encoder, a decoder and an expression classifier;

the method for constructing the facial expression recognition model comprises the following specific steps:

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

the method comprises the steps that I represents an input image, f represents image characteristics, the input image comprises an unshielded facial expression image in an unshielded facial expression image data set and an unshielded facial expression image in an unshielded facial expression image data set, the image characteristics comprise an unshielded facial expression image characteristic, a reconstructed image facial expression image characteristic and an unshielded facial expression characteristic, and the reconstructed image facial expression image characteristic is dynamically generated by the unshielded facial expression image in the training process;

inputting the facial expression image characteristics of the shielding face, the facial expression image characteristics of the reconstructed image and the facial expression characteristics of the non-shielding face into a full connection layer in an expression classifier, mapping the full connection layer into expression classes, obtaining the probability that the facial expression image belongs to each expression class through a softmax function, and calculating to obtain loss values of the expression classes, wherein the loss values of the expression classes are expressed as follows:

wherein W represents a parameter matrix of the full connection layer, and C represents a total expression category; b represents the bias term of the full connection layer, j represents all expression category indexes, f represents the feature vector of the input image, i represents the expression category to which the image belongs, and e represents an index;

inputting the characteristics of the occlusion facial expression images into a decoder for prediction to obtain a prediction result of a non-occlusion facial expression image corresponding to a reconstructed occlusion image, calculating pixel image reconstruction loss values of the reconstructed image and the corresponding non-occlusion image based on the difference between the prediction result and an actual non-occlusion facial expression image set, and calculating the pixel image reconstruction loss value L_reconExpressed as:

L_recon＝|I-D(E(I))|₁

wherein I denotes an input image, D denotes a decoder, and E denotes an encoder;

taking a training data set as input, extracting feature representation of each layer of an original non-shielding image in the training data set through an encoder, constraining output feature similarity of each convolution layer of the encoder through a loss function, and calculating to obtain a feature constraint loss value L_feature；

Obtaining the total training loss of the facial expression recognition model of the occlusion face based on the loss value of the expression classification, the pixel image reconstruction loss value and the feature constraint loss value, wherein the total training loss is represented as: l is_total＝L_emotion+w₁L_feature+w₂L_reconWherein w is₁，w₂Weights representing the feature constraint loss and the image reconstruction loss, respectively;

updating the weight of the model M based on a gradient descent method, and repeating the process of calculating the loss value of expression classification, the pixel image reconstruction loss value and the feature constraint loss value until obtaining a facial expression recognition result of a non-shielding facial expression image set and stopping calculation;

and replacing the loss value of the expression classification, the pixel image reconstruction loss value and the feature constraint loss value into the occlusion facial expression recognition model to obtain the occlusion facial expression recognition model.

As an implementation manner, the specific process of gradient descent is: based on the overall training loss L_totalCalculating a gradient to the parameter θ

And updating a parameter θ in the gradient direction, specifically:

where α represents the learning rate.

As an implementable manner, the method further comprises a step of testing the facial expression occlusion recognition model, specifically:

inputting a test data set mixed with the facial expression image and the facial expression shielding image into a facial detection model to obtain a facial position;

cutting the test data set based on the face position to obtain an aligned facial expression image data set;

and inputting the aligned facial expression image data set to a shielding facial expression recognition model to obtain a facial expression, and further obtaining a test result.

As an implementation manner, the clipping processing manner is: and performing radial transformation on the standard face characteristic points based on the face characteristic points to be recognized to obtain an aligned face image.

A facial expression recognition system comprises an image acquisition module, an occlusion processing module, a data set acquisition module, a model construction and training module and a result acquisition module;

the image acquisition module is used for acquiring a facial expression image and forming the acquired facial expression image into an unobstructed facial expression image data set;

the shielding processing module is used for carrying out random shielding processing on each facial expression image sample in the non-shielding facial expression image data set to form a shielding facial expression image data set;

the data set acquisition module is used for mixing the non-occlusion facial expression image data set and the occlusion facial expression image data set in an equal proportion to obtain an image data set, and dividing the image data set into a training data set and a verification data set in an equal proportion;

the model construction and training module is used for constructing an occlusion facial expression recognition model, training the occlusion facial expression recognition model based on a training data set, taking a verification data set as input after the training is finished, and verifying a training result to obtain the occlusion facial expression recognition model;

the result obtaining module is used for carrying out expression recognition on the facial expression image to be detected with or without the shielding object based on the shielding facial expression recognition model to obtain the facial expression image and an expression recognition result.

As an implementation, the system further comprises a model testing module configured to:

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of:

A facial expression recognition apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following method steps when executing the computer program:

Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:

according to the method, the recognition accuracy of the facial expressions to be shielded is remarkably improved; reconstructing an unobstructed facial expression image by using an encoder-decoder network structure, reconstructing a corresponding unobstructed image, and using the reconstructed image for expression recognition, so that expression features extracted by an encoder can perform implicit inference according to an unobstructed area of an input image and remove an obstructed area, and the robustness of facial expression recognition under an obstructed condition is enhanced; the shared encoder is used for simultaneously extracting the feature representation of the original and corresponding shielding images, the L2 loss function is used for constraining the similarity of the output features of each convolution layer of the encoder, the extraction of facial expression features irrelevant to shielding by the model is further ensured, the final encoder can extract facial expression features irrelevant to shielding, and the trained recognition model can recognize both the shielding facial expression image and the non-shielding facial expression image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic overall flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the overall architecture of the system of the present invention;

FIG. 3 is a schematic diagram of a neural network for occlusion robust face recognition;

FIG. 4 is a schematic diagram of the testing process of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.

Example 1:

a facial expression recognition method, as shown in fig. 1, includes the following steps:

s100, obtaining facial expression images, and forming the obtained facial expression images into an unobstructed facial expression image data set;

s200, carrying out random shielding treatment on each facial expression image sample in the non-shielding facial expression image data set to form a shielding facial expression image data set;

s300, mixing the non-occlusion facial expression image data set and the occlusion facial expression image data set in equal proportion to obtain an image data set, and dividing the image data set into a training data set and a verification data set in equal proportion;

s400, constructing an occlusion facial expression recognition model, training the occlusion facial expression recognition model based on a training data set, taking a verification data set as input after the training is finished, and verifying a training result to obtain the occlusion facial expression recognition model;

s500, performing expression recognition on the facial expression image to be detected with or without the shielding object based on the shielding facial expression recognition model to obtain a facial expression image and an expression recognition result.

In step S200, processing images in facial expression images by an automatic means to generate a blocked facial expression image, counting categories and position distributions of blocking objects in a data set of the real blocked facial expression image, and obtaining a batch of blocked object graphs based on the counted categories of the blocking objects, where the images may be downloaded from the internet, and then pasting the blocked graphs to the non-blocked facial images by guiding positions of facial feature points, thereby constructing a batch of blocked facial expression image data sets. Here, the facial expression image is actually a facial image having various expressions, such as a photograph of a face having a smiling expression, a photograph of a face having a crying expression, a photograph of a face having a sad expression, or a photograph of a face having a consignment expression.

The invention can obtain the generation process of the virtual sample of the facial expression image: collecting a batch of non-shielded facial expression images, collecting a batch of shielded template images according to a shield material counted in advance, randomly shielding any position of the facial expression images by using the shielded template images, generating a shielded facial expression data set, and mixing the original images and the shielded images in equal proportion to obtain a training sample.

Further, the non-shielding facial expression image data set and the shielding facial expression image data set are mixed in an equal proportion to obtain an image data set, the image data set is divided into a training data set and a verification data set in an equal proportion, a shielding facial expression recognition model constructed through training and verification of the training data set and the verification data set is used, a to-be-detected facial expression image with a shielding object is used as the input of the shielding facial expression recognition model, and a non-shielding facial expression image and an expression recognition result can be obtained quickly.

The scheme provided by the invention comprises the following steps: the method has the advantages that the occluded facial expression training and recognition can obviously improve the recognition accuracy of the occluded facial expression; reconstructing an unobstructed facial expression image by using an encoder-decoder network structure, reconstructing a corresponding unobstructed image, and using the reconstructed image for expression recognition, so that expression features extracted by an encoder can perform implicit inference according to an unobstructed area of an input image and remove an obstructed area, and the robustness of facial expression recognition under an obstructed condition is enhanced; the shared encoder is used for simultaneously extracting the feature representation of the original and corresponding shielding images, the L2 loss function is used for constraining the similarity of the output features of each convolution layer of the encoder, the extraction of facial expression features irrelevant to shielding by the model is further ensured, the final encoder can extract facial expression features irrelevant to shielding, and the trained recognition model can recognize both the shielding facial expression image and the non-shielding facial expression image.

In one embodiment, the method further comprises an image dataset calibration step:

In order to better identify each image, each image sample in the image data set is calibrated, and the calibration can be labeling, marking and the like, so that the subsequent application and verification are facilitated.

In one embodiment, the process of model training is specifically described, and in the process of model training, an encoder, a decoder and an expression classifier are applied, that is, the occlusion facial expression recognition model actually includes an encoder, a decoder and an expression classifier;

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

will block the facial expression imageInputting the characteristics into a decoder for prediction to obtain a prediction result of a non-shielding facial expression image corresponding to a reconstructed shielding image, calculating a pixel image reconstruction loss value L of the reconstructed image and a pixel image reconstruction loss value L of a corresponding non-shielding image based on the difference between the prediction result and an actual non-shielding facial expression image set_reconExpressed as:

L_recon＝|I-D(E(I))|₁

In the training of the occlusion facial expression recognition model, an encoder-decoder neural network is used for occlusion removal: the network comprises an encoder and a decoder, wherein the input of the encoder is an occlusion facial expression image, the encoder extracts the feature representation of the input image, and the decoder reconstructs the non-occlusion facial expression image by using the feature representation. The process updates the parameter weights of the encoder-decoder network with a gradient back propagation algorithm. As the input image passes through each middle layer in the coding and decoding network, the receptive field of the network is gradually increased, so that the network can automatically utilize the local information of the facial expression image and the global information of the image to carry out self-adaptive interpolation on the sheltered area, and recover the accurate object sheltered image. In the process, the image features extracted by the encoder can be implicitly deduced and attempt to remove the occlusion region in a decoding stage, so that the robustness of the human face expression recognition under the occlusion condition is enhanced. In order to further ensure the reconstruction effect of the non-shielding face expression image, the reconstructed non-shielding image is sent to a shared encoder, image features are extracted, and expression recognition is carried out.

Thirdly, the method comprises the following steps: end-to-end framework for identifying facial expressions in occlusion mode: as shown in FIG. 3, the input of the encoder contains the original non-occluded facial expression image at the same time, and passes through L₂And the loss function restricts that the output characteristics of the unshielded image and the shielded image are approximate through each convolution layer of the encoder, and each convolution layer of the encoder is forced to extract the facial expression characteristics irrelevant to shielding. During network training, 3 loss functions are provided, wherein the loss functions are 1) pixel-level image reconstruction loss; 2) inputting a feature constraint loss of an occlusion image and a corresponding non-occlusion image; 3) loss of facial expression classification. The weight of the encoder and the weight of the decoder can be updated simultaneously through the pixel-level image reconstruction loss, the weight of the encoder is only updated through the feature constraint loss, and the weight of the classifier is updated through the facial expression classification loss. The encoder-decoder network ensures that the model can extract facial expression features irrelevant to occlusion through loss of a plurality of combined actions in the training process.

In one embodiment, the specific process of gradient descent is as follows: based on the overall training loss L_totalCalculating a gradient to the parameter θ

And updating a parameter θ in the gradient direction, specifically:

where α represents the learning rate.

In addition, in an embodiment, in order to test the accuracy of the trained facial expression recognition model, the method further includes a step of testing the facial expression recognition model, and the specific test process is as follows:

and inputting the aligned facial expression image data set to a shielding facial expression recognition model to obtain a facial expression, and further obtaining a test result. The process is used for testing whether the facial expression recognition model of the shielding face is accurate or not and whether the facial expression recognition model of the shielding face is over-fitted or not.

In the testing process, the cutting processing mode is as follows: and performing radial transformation on the standard face characteristic points based on the face characteristic points to be recognized to obtain an aligned face image. The face detection model can be any face detection model, such as one based on MT-CNN or SeetaFace. In the whole testing stage, only the image feature encoder and the expression classifier are needed to perform expression recognition, the shielding image reconstruction is not performed, and the model inference time is reduced, so that when the shielding facial expression recognition model is used for recognition, expression recognition can be performed on the facial expression image to be detected with or without a shielding object, and a facial expression image and an expression recognition result are obtained.

Example 2:

a facial expression recognition system, as shown in FIG. 2, includes an image acquisition module 100, an occlusion processing module 200, a data set acquisition module 300, a model construction and training module 400, and a result acquisition module 500;

the image acquisition module 100 is configured to acquire a facial expression image, and form the acquired facial expression image into an unobstructed facial expression image dataset;

the shielding processing module 200 is configured to perform shielding processing on each facial expression image sample in the non-shielding facial expression image dataset to form a shielding facial expression image dataset;

the data set acquisition module 300 is configured to perform equal proportion mixing on the non-occlusion facial expression image data set and the occlusion facial expression image data set to obtain an image data set, and divide the image data set into a training data set and a verification data set in equal proportion;

the model construction and training module 400 is used for constructing an occlusion facial expression recognition model, training the occlusion facial expression recognition model based on a training data set, taking a verification data set as input after the training is finished, and verifying a training result to obtain the occlusion facial expression recognition model;

the result obtaining module 500 is configured to perform expression recognition on the facial expression image to be detected with or without a blocking object based on the blocking facial expression recognition model to obtain a facial expression image and an expression recognition result.

In one embodiment, a model test module 600 is further included that is configured to:

inputting a test data set mixed with the facial expression image and the facial expression shielding image into a facial detector to obtain a facial position;

cutting the test data set based on the face position to obtain a second facial expression image data set;

and inputting the second facial expression image data set into the shielding facial expression recognition model to obtain the facial expression, and further obtaining a test result.

The occlusion handling module 200 is arranged to: the method also comprises an image data set calibration step:

The model building and training module 400 is arranged to: the occlusion facial expression recognition model comprises an encoder, a decoder and an expression classifier;

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

L_recon＝|I-D(E(I))|₁

The model building and training module 400 is arranged to: the specific process of gradient descent is as follows: based on the overall training loss L_totalCalculating a gradient to the parameter θ

And updating a parameter θ in the gradient direction, specifically:

where α represents the learning rate.

Further included is a model test module 600 configured to: the cutting processing mode is as follows: and performing radial transformation on the standard face characteristic points based on the face characteristic points to be recognized to obtain an aligned face image.

Example 3:

carrying out shielding treatment on each facial expression image sample in the non-shielding facial expression image dataset to form a shielding facial expression image dataset;

In one embodiment, the implementation further comprises, when the processor executes the computer program, the image dataset scaling step of:

As one implementation, the occlusion facial expression recognition model includes an encoder, a decoder, and an expression classifier;

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

L_recon＝|I-D(E(I))|₁

taking the training data set as input, extracting original non-occlusion images in the training data set through an encoderThe loss value L of the characteristic constraint is calculated and obtained by constraining the similarity of the output characteristics of each convolution layer of the encoder through the loss function_feature；

In one embodiment, when the processor executes the computer program, the specific process of implementing the gradient descent is as follows: based on the overall training loss L_totalCalculating a gradient to the parameter θ

And updating a parameter θ in the gradient direction, specifically:

where α represents the learning rate.

In one embodiment, when the processor executes the computer program, the implementation further includes a step of testing the facial expression occlusion recognition model, specifically:

In one embodiment, when the processor executes the computer program, the cropping processing is implemented by: and performing radial transformation on the standard face characteristic points based on the face characteristic points to be recognized to obtain an aligned face image.

Example 4:

in one embodiment, a facial expression recognition apparatus is provided, and the facial expression recognition apparatus may be a server or a mobile terminal. The facial expression recognition device comprises a processor, a memory, a network interface and a database which are connected through a system bus. Wherein, the processor of the facial expression recognition device is used for providing calculation and control capability. The memory of the facial expression recognition device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. All data of the database facial expression recognition device. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of facial expression recognition.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that:

reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. A facial expression recognition method is characterized by comprising the following steps:

performing expression recognition on the facial expression image to be detected with or without a shielding object based on the shielding facial expression recognition model to obtain a facial expression image and an expression recognition result;

the occlusion facial expression recognition model comprises an encoder, a decoder and an expression classifier;

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

inputting the characteristics of the occlusion facial expression images into a decoder for prediction to obtain a prediction result of a non-occlusion facial expression image corresponding to the reconstructed image, calculating pixel image reconstruction loss values of the reconstructed image and the corresponding non-occlusion image based on the difference between the prediction result and an actual non-occlusion facial expression image set, and calculating the pixel image reconstruction loss value L_reconExpressed as:

L_recon＝|I-D(E(I))|₁

taking a training data set as input, extracting feature representation of each layer of an original non-shielding image in the training data set through an encoder, constraining feature similarity corresponding to the output original non-shielding image of each convolution layer of the encoder through a loss function, and calculating to obtain a feature constraint loss value L_feature；

2. The method of claim 1, further comprising an image dataset labeling step:

3. The method for recognizing facial expressions according to claim 2, wherein the gradient descending specifically comprises: based on overall trainingLoss L_totalCalculating a gradient to the parameter θ

And updating a parameter θ in the gradient direction, specifically:

where α represents the learning rate.

4. The facial expression recognition method according to claim 1, further comprising a step of testing an occlusion facial expression recognition model, specifically:

5. The method for recognizing facial expressions according to claim 4, wherein the clipping process is specifically performed by: and performing radial transformation on the standard face characteristic points based on the face characteristic points to be recognized to obtain an aligned face image.

6. A facial expression recognition system is characterized by comprising an image acquisition module, a shielding processing module, a data set acquisition module, a model construction and training module and a result acquisition module;

the result acquisition module is used for carrying out expression recognition on the facial expression image to be detected with or without a shielding object based on the shielding facial expression recognition model to obtain a facial expression image and an expression recognition result;

the encoder receives an input image, outputs image features, represented as:

f＝E(I)

L_recon＝|I-D(E(I))|₁

7. The system of claim 6, further comprising a model testing module configured to:

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of one of claims 1 to 5.

9. A device for facial expression recognition, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1 to 5 when executing the computer program.