CN110472583B

CN110472583B - Human face micro-expression recognition system based on deep learning

Info

Publication number: CN110472583B
Application number: CN201910758794.0A
Authority: CN
Inventors: 龚泽辉; 李东; 张国生; 冯省城
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-08-16
Filing date: 2019-08-16
Publication date: 2022-04-19
Anticipated expiration: 2039-08-16
Also published as: CN110472583A

Abstract

The embodiment of the invention discloses a face micro expression recognition system based on deep learning, which comprises a deep network model for carrying out face micro expression recognition on an input image and comprising a characteristic feature extraction module and an image recognition module. The feature extraction module is used for extracting image recognition features and comprises a depth feature extraction submodule and a discriminant feature extraction submodule; the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on a convolution result output by the first convolution layer and outputting a depth characteristic; and the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on the discriminant region proposing network and amplifying the clipped features to serve as image recognition features. And the image identification module carries out micro-expression identification on the image identification characteristics and outputs an identification result. The method and the device can efficiently, quickly and accurately realize the recognition of the human face micro-expression.

Description

Human face micro-expression recognition system based on deep learning

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a human face micro-expression recognition system based on deep learning.

Background

In recent years, the deep learning field has become a research hotspot due to the rapid development of computing resources, and computer vision is a more popular research field due to its great practical value, and achieves great performance improvement in image classification, target detection, image segmentation and other tasks compared with the conventional machine learning. Although language is the first choice tool for human communication, the information conveyed by expressions is richer, the micro-expressions can convey real feelings and motivations, and the facial micro-expression recognition is beneficial to enabling the computer vision technology to develop towards more intellectualization.

When the related technology is used for carrying out face micro-expression recognition, the processing steps are required to be divided into a plurality of independent processing steps, and the processing is complicated; the original image is required to be cut, the convolution network is used for extracting the characteristics for a plurality of times in the cutting area, the testing time is long, and the efficiency is low; in addition, the network model has a manual characteristic design process, so that the final performance of the network has a bottleneck and is not too high.

For example, a face micro-expression recognition may include the steps of: firstly, face detection is carried out, face landmark points are detected on a detected face image by combining a Sobel operator edge detection algorithm and a Shi-Tomasi corner point detection algorithm, input features of a Multi-Layer Perceptron (Multi-Layer Perceptron) neural network are defined by the detected landmark points, and facial expressions are identified. In addition, in the method for expression classification and micro-expression detection based on deep learning in the related art, a series of cutting areas can be obtained through detection of face landmark points, the original image is cut and then is respectively sent to a deep learning network structure to obtain characteristics, and final micro-expression classification is carried out.

Disclosure of Invention

The embodiment of the disclosure provides a face micro expression recognition system based on deep learning, which solves the problems of low accuracy and low efficiency caused by multiple steps of artificial feature design and test, and realizes the recognition of face micro expression efficiently, quickly and accurately.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

the embodiment of the invention provides a face micro expression recognition system based on deep learning, which comprises a deep network model for carrying out face micro expression recognition on an input image, wherein the deep network model comprises a feature extraction module for extracting image recognition features and an image recognition module for carrying out micro expression recognition on the image recognition features and outputting a recognition result;

the feature extraction module comprises a depth feature extraction submodule and a discriminant feature extraction submodule;

the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting depth characteristics;

the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on a discriminant region proposing network, and feature-amplifying the clipped features to be used as the image recognition features.

Optionally, the discriminant feature extraction sub-module includes:

the central point coordinate determination unit in the discriminant area is used for obtaining N central point coordinates in the discriminant area by using a discriminant area proposing network based on the depth feature; the discriminant area proposed network sequentially comprises a hole convolution module, a convolution layer and a full connection layer along the data stream processing direction;

the discriminant region determining unit is used for determining a corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region;

a clipping unit configured to clip the depth feature using each discriminant region;

and the feature amplification unit is used for respectively amplifying the feature map sizes of the N cut features to the feature map size of the depth feature.

Optionally, the clipping unit is configured to clip the depth feature by using each discriminant region based on a first formula; the first formula is:

in the formula (I), the compound is shown in the specification,

features obtained by cropping said depth features for the ith discriminant region, F_deepAnd x and y are coordinate values in the width direction and the height direction of the feature map of the depth feature respectively, k is a constant larger than zero, and L is the side length.

Optionally, the feature amplifying unit is configured to amplify the feature of the cut feature according to a second formula, where the second formula is:

x^s＝[x_t/λ_W]，y^s＝[y_t/λ_H]，λ_H＝H/L，λ_W＝W/L；

in the formula (I), the compound is shown in the specification,

is composed of

At position (x)^t，y^t) The value of the pixel of (a) is,

for the pixel value of the feature map of the cropped depth feature at position (m, n), H, W is the height and width of the feature map, respectively, and L is the side length.

Optionally, the depth feature extraction submodule includes 4 cavity convolution modules with the same structure, and each cavity convolution module sequentially includes a 1 × 1 convolution layer, a first BN normalization layer, a first leakage-carrying linear rectification function layer, a 3 × 3 cavity convolution layer, a second BN normalization layer, and a second leakage-carrying linear rectification function layer along the data stream processing direction.

Optionally, the first BN normalization layer includes:

a mean value calculation unit for calculating a mean value,for using

Calculate the pixel mean, μ, for each channel_B(c) Is the pixel mean of channel c, B is the total number of images contained in the current training batch,

h and w are respectively the height and width of a characteristic diagram channel of the b-th input image of the current training batch;

a variance calculation unit for utilizing

The variance of the pixels for each channel is calculated,

is the pixel variance of channel c;

a normalization unit for utilizing

To pair

Carrying out normalization processing to obtain a normalized image

ε is a normal number;

an image processing unit for utilizing

To pair

And carrying out image processing, wherein gamma is a scaling factor and beta is a translation factor.

Optionally, the deep network model further includes an image preprocessing module, configured to convert an image format of the image to be recognized into a preset network input format, where the image preprocessing module includes:

the image scaling submodule is used for scaling the size of the image to be identified to a preset size;

the normalization submodule is used for carrying out pixel normalization on the image to be recognized by utilizing a third formula; the third formula is:

in the formula (I), the compound is shown in the specification,

p_i，j，cfor the pixel value of the c position (i, j) of the image channel to be identified,

is the pixel value after normalization, H is the height of the image to be recognized, W is the width of the image to be recognized,

the pixel value at the C position (i, j) of the mth image channel is M, and M is the total number of images.

Optionally, the image preprocessing module further includes:

the brightness adjusting submodule is used for adjusting the brightness of the image to be identified according to a preset brightness proportion value; the brightness proportion value is selected from a brightness proportion range, and the brightness proportion range is [0.5, 1.5 ];

the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to a preset contrast ratio value; the contrast ratio value is selected from a contrast ratio range, and the brightness ratio range is [0.5, 1.5 ].

Optionally, the image recognition module further includes:

the pooling submodule is used for carrying out global average pooling on each image identification feature by utilizing a fourth formula, wherein the fourth formula is as follows:

in the formula, H_scale、W_scaleIdentifying features for each image separately

The height and the width of the first and second,

is composed of

A pixel value at position (m, n);

the full-connection layer submodule is used for uniformly storing the image identification features processed by the pooling submodule into a feature data set;

and the characteristic identification submodule is used for identifying the image characteristics in the characteristic data set and outputting a result.

Optionally, the feature identification submodule includes:

a target feature vector calculation unit for utilizing the feature data set based on the feature data set

Calculating to obtain a target characteristic vector f_avgThe feature data set comprises N feature vectors

A category vector output unit for calculating a category vector o of each type of micro expression to which the image recognition feature belongs using a fifth formula_iThe fifth formula is:

in the formula, num_clsIs the total number of categories of the micro-expressions of the face, f_avg(i) Is the target feature vectorf_avgThe value of the ith element.

The technical scheme provided by the application has the advantages that firstly, a depth feature extraction submodule is used for extracting the features of an input image to obtain depth features; then, using the discriminant feature extraction submodule to take the depth features as input, and obtaining a series of discriminant features through further feature enhancement; and finally, classifying the discriminative features by using an image recognition module to output an expression classification result. The micro-expression image of the face to be recognized is directly input into the deep network model, so that the final micro-expression classification result can be obtained, and the test is convenient; the method has the advantages that the data-driven mode is utilized to automatically learn and classify required features from the input images, the manual feature design is not needed, the trouble of manual feature design is eliminated, the problems of low accuracy and low efficiency caused by the fact that existing manual feature design and test are complicated in multiple steps are solved, and the efficient, rapid and accurate recognition of the human face micro-expression is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a structural diagram of a specific embodiment of a facial micro-expression recognition system based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a data processing flow of a feature extraction module according to an embodiment of the present invention;

fig. 3 is a structural diagram of a specific implementation of a hole convolution module according to an embodiment of the present invention;

fig. 4 is a structural diagram of a specific implementation of the discriminant feature extraction sub-module according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an embodiment of a differentiated area offer network according to an embodiment of the present invention;

fig. 6 is a structural diagram of an embodiment of an image recognition module according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of image preprocessing according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a deep learning-based facial micro-expression recognition system according to an embodiment of the present invention, in a specific implementation manner, where the embodiment of the present invention may include the following:

the human face micro expression recognition system based on deep learning can comprise a deep network model 1, wherein the deep network model 1 is used for carrying out human face micro expression recognition on an input image and can comprise a feature extraction module 11 and an image recognition module 12.

The feature extraction module 11 is configured to extract image recognition features, where the image recognition features may include depth features and discriminant features, and accordingly, the depth feature extraction submodule 111 and the discriminant feature extraction submodule 112 may be respectively used for extraction.

In the present application, the depth feature extraction sub-module 111 may sequentially include a first convolution layer and a plurality of void convolution modules; and the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting the depth characteristic. The first convolution layer performs convolution processing on an input image to be recognized, and inputs result data after the convolution processing into the first cavity convolution module, the first cavity convolution module processes received data and inputs the processed data into the second cavity convolution module until data output by the last cavity convolution module is the depth characteristic of the image to be recognized. The cavity convolution can increase the receptive field, avoid using pooling to cause image space information loss, and is favorable for improving the accuracy of model identification.

Optionally, referring to fig. 2, the depth feature extraction sub-module 111 may include a first convolution layer of 7 × 7 and 4 identical hole convolution modules, each of which may include a convolution layer of 1 × 1, a first BN normalization layer, a first leaky linear rectification function layer, a hole convolution layer of 3 × 3, a second BN normalization layer, and a second leaky linear rectification function layer in sequence along the data stream processing direction, and the structure of the depth feature extraction sub-module may be as shown in fig. 3, for example. In this embodiment, the hole convolution module may first use the 1 × 1 convolution template K^1×1Convolving the input X of the cavity convolution module and storing the result in Y₁In, Y₁(i，j)＝K^1×1And X (i, j) is the pixel value of the (i, j) position. Then can be paired with Y₁Using batch normalization, the current training batch may contain B input images, with the input of batch normalization being

Wherein the content of the first and second substances,

features derived for the b-th image in the current input batchC, h and w are the number, height and width of the channels of the characteristic diagram respectively, and the output obtained by the first BN normalization layer is Y₂. After passing through the BN normalization layer, a Leaky RELU (Leaky RELU) nonlinear activation function can be used to act on Y₂To obtain an activation output Y₃，Y₃＝LRELU(Y₂)，

Then using a 3 × 3 convolution template K^3×3For Y₃Performing hole convolution operation to save convolution output in Y₄If the hole convolution constant l is set, then the 3 × 3 hole convolution operation can be as follows:

in the presence of a catalyst to obtain Y₄Then, the second BN normalization layer pair Y is utilized₄Normalizing to obtain Y₅The Leaky RELU nonlinear activation function in the second Leaky-band linear rectification function layer can be used to act on Y₅To obtain Y₆，Y₆Is the output of a hole convolution module.

In this embodiment, the first BN normalization layer and the second BN normalization layer are used for batch normalization processing of the input data, and both may include corresponding structures, and the first BN normalization layer may include:

a mean value calculation unit for utilizing

a variance calculation unit for utilizing

The variance of the pixels for each channel is calculated,

is the pixel variance of channel c;

a normalization unit for utilizing

To pair

Carrying out normalization processing to obtain a normalized image

Epsilon is any small normal number;

an image processing unit for utilizing

To pair

And (4) carrying out image processing, wherein gamma is a scaling factor, beta is a translation factor, and both the scaling factor gamma and the translation factor beta can be obtained through network self-learning.

In the present application, referring to fig. 4, the discriminant feature extraction sub-module 112 may be configured to crop the depth feature using a plurality of Discriminant Regions (DRPNs) obtained based on a Discriminant Region Proposed Network (DRPN), and feature-enlarge the cropped feature to serve as the image recognition feature. That is, the discriminative feature extraction sub-module 112 takes the depth features extracted by the depth feature extraction sub-module 111 as input, obtains a series of discriminative features through further feature enhancement, and performs discriminative region localization on the human face micro-expression image by using the discriminative region proposal network. The discriminative area proposed network may sequentially include a void convolution module, a convolution layer, and a full-connection layer along a data stream processing direction, for example, as shown in fig. 5, where the convolution layer may be a 1 × 1 convolution layer, the DCM module is a void convolution module, the number of neurons output by the full-connection layer is 2N, and the number of neurons sequentially corresponds to center point coordinates of N discriminative areas. The discriminant region proposal network has the capability of automatically identifying the discriminant regions in the images that contribute to classification, and can solve the problem that the prior art needs to manually identify the discriminant regions in the micro-expression images.

In one embodiment, the discriminative feature extraction sub-module may include:

a central point coordinate determination unit in the discriminant region, configured to obtain N central point coordinates S { (x) using the discriminant region proposing network based on the depth featureⁱ,yⁱ)|i＝1,...,N}。

The discriminant region determining unit is used for determining a corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region; for example, if the predetermined side length is L, N discriminant regions are obtained

And a clipping unit for clipping the depth feature by using each discriminant region. Any size adjustment algorithm can be adopted to realize image clipping on the feature map of the depth feature, which does not affect the implementation of the application. Optionally, for example, based on the first formula, the depth features may be clipped by using each discriminant region; the first formula may be:

in the formula (I), the compound is shown in the specification,

features obtained by clipping depth features for the ith discriminant region, F_deepFor the depth feature, x and y are coordinate values in the width direction and the height direction of the feature map of the depth feature, respectively, for example, the upper left corner of the image may be the origin of coordinates, k is a constant greater than zero, L is the side length, and δ () is a variant of the sigmoid function. It should be noted that, the related art cuts the original image and sends the original image to the convolution network for feature extraction, and this method is inefficient and has a long testing time.

And the feature amplification unit is used for respectively amplifying the feature map sizes of the N cut features to the feature map size of the depth feature. After the feature map of the depth feature is clipped, a plurality of feature maps are obtained, and the size of the clipped feature map may be adjusted to the size of the feature map of the depth feature by using any size adjustment algorithm, which is not limited in this application. In one embodiment, the feature after the clipping can be amplified according to the second formula, and after the feature amplification operation, a series of discriminant features can be obtained

The second formula may be:

in the formula (I), the compound is shown in the specification,

is composed of

At position (x)^t,y^t) The value of the pixel of (a) is,

for the pixel value of the feature map at position (m, n) of the cropped depth feature, H, W is the height and width of the feature map, respectively, and L is the side length.

In the present application, the image recognition module 12 may be configured to perform micro-expression recognition on the image recognition features and output a recognition result. The recognition result can be the category of the facial micro expression of the image to be recognized, such as sadness, surprise and terror; the probability that the facial micro-expression of the image to be recognized belongs to each type of expression can also be used, and the implementation of the method is not influenced.

In one embodiment, referring to fig. 6, the image recognition module 12 may include:

a pooling sub-module for identifying features for each image using a fourth formula

A Global Average Pooling (GAP) process is performed and the results are stored

In (1). The fourth formula may be:

in the formula, H_scale、W_scaleIdentifying features for each image separately

The height and the width of the first and second,

is composed of

In positionA pixel value of (m, n);

and the full-connection layer submodule is used for uniformly storing the image identification features processed by the pooling submodule into a feature data set. That is, after GAP is performed for each image recognition feature in turn

Using a full-link layer, the number of output neurons of the full-link layer is the same as the number of micro-expression classes, and is set to num_clsStore the results in

And the characteristic identification submodule is used for identifying the image characteristics in the characteristic data set and outputting a result. In one embodiment, the feature recognition sub-module may include:

a target feature vector calculation unit for utilizing the feature data set

A class vector output unit for pair f_avgUsing the softmax activation function to obtain a final class output vector o, namely calculating the class vector o of each type of micro expression to which the image recognition features belong by using a fifth formula_iThe fifth formula is:

in the formula, num_clsIs the total number of categories of the micro-expressions of the face, f_avg(i) As a target feature vector f_avgThe value of the ith element.

It should be noted that the deep network model 1 of the present application is a model obtained by end-to-end training based on a deep learning method. During the training or testing process, the class cross entropy loss function can be used, and the random gradient descent algorithm is used for end-to-end training optimization.

In the technical scheme provided by the embodiment of the invention, firstly, a depth feature extraction submodule is utilized to extract the features of an input image to obtain depth features; then, using the discriminant feature extraction submodule to take the depth features as input, and obtaining a series of discriminant features through further feature enhancement; and finally, classifying the discriminative features by using an image recognition module to output an expression classification result. Because the deep network model structure for carrying out face micro-expression recognition on the input image adopts end-to-end training and testing, the face micro-expression image to be recognized is directly input, so that the final micro-expression classification result can be obtained, and the testing is convenient; the method has the advantages that the data-driven mode is utilized to automatically learn and classify required features from the input images, the manual feature design is not needed, the trouble of manual feature design is eliminated, the problems of low accuracy and low efficiency caused by the fact that existing manual feature design and test are complicated in multiple steps are solved, and the efficient, rapid and accurate recognition of the human face micro-expression is realized.

In another embodiment, in order to improve the accuracy and efficiency of the model for identifying the facial micro-expressions, before extracting the image identification features, the image to be identified may be subjected to image preprocessing. In view of this, the deep learning-based face micro-expression recognition system may further include an image preprocessing module, which is configured to convert an image format of the image to be recognized into a preset network input format. In one embodiment, the image pre-processing module may include:

the image scaling submodule is used for scaling the size of the image to be identified to a preset size; for example, the image to be recognized is scaled to 227 × 227.

The normalization submodule is used for carrying out pixel normalization on the image to be recognized by utilizing the following formula:

in the formula (I), the compound is shown in the specification,

Based on the above embodiment, referring to fig. 6, the image preprocessing module may further include:

and the brightness adjusting submodule is used for adjusting the brightness of the image to be recognized according to the preset brightness proportion value. The brightness proportion value can be selected from a brightness proportion range, the brightness proportion range is [0.5, 1.5], that is, the brightness of the image to be recognized can be adjusted in a proportion of 0.5-1.5, and the brightness proportion value can be any value which does not belong to 0.5-1.5, which does not affect the implementation of the application.

And the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to the preset contrast ratio value. Wherein the contrast ratio is selected from a contrast ratio range, and the brightness ratio range is [0.5, 1.5 ]. That is to say, the contrast of the image to be recognized can be adjusted by a ratio of 0.5 to 1.5, and the contrast ratio value can be any value which does not belong to 0.5 to 1.5, which does not affect the implementation of the present application.

From the above, the application provides an end-to-end expression classification method based on deep learning, the face micro-expression image to be recognized is directly input, the final micro-expression classification result is obtained, the test is convenient, the required features for classification are automatically learned from the input image in a data-driven mode, the manual feature design is not needed, the trouble of manual feature design is saved, only one-time feature extraction needs to be carried out on the image, the features are cut, the test time is short, the efficiency is high, and the DRPN has the capability of automatically recognizing the discriminant region contributing to classification in the image.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The human face micro-expression recognition system based on deep learning provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A human face micro expression recognition system based on deep learning is characterized by comprising a deep network model for carrying out human face micro expression recognition on an input image, wherein the deep network model comprises a feature extraction module for extracting image recognition features and an image recognition module for carrying out micro expression recognition on the image recognition features and outputting a recognition result;

the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting depth characteristics; the depth feature extraction submodule comprises 4 cavity convolution modules with the same structure, and each cavity convolution module sequentially comprises a 1 x 1 convolution layer, a first BN normalization layer, a first leakage linear rectification function layer, a 3 x 3 cavity convolution layer, a second BN normalization layer and a second leakage linear rectification function layer along the data stream processing direction; the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on a discriminant region proposing network, and performing feature amplification on the clipped features to serve as the image recognition features;

wherein the discriminant feature extraction submodule comprises: the central point coordinate determination unit in the discriminant area is used for obtaining N central point coordinates in the discriminant area by using a discriminant area proposing network based on the depth feature; the discriminant area proposed network sequentially comprises a hole convolution module, a convolution layer and a full connection layer along the data stream processing direction; and the discriminant region determining unit is used for determining the corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region.

2. The deep learning based face micro expression recognition system of claim 1, wherein the discriminative feature extraction sub-module comprises:

3. The system for recognizing the micro expression of the human face based on the deep learning of claim 2, wherein the clipping unit is configured to clip the deep feature by using each discriminant region based on a first formula; the first formula is:

δ(x)＝1/(1+exp^-kx)；

in the formula (I), the compound is shown in the specification,

features obtained by cropping said depth features for the ith discriminant region, F_deepFor the depth feature, x and y are coordinate values in the width direction and the height direction of a feature map of the depth feature respectively, k is a constant larger than zero, and L is the preset side length; x is the number ofⁱ,yⁱThe coordinates of the center point are shown as the ith discriminant region.

4. The deep learning-based face micro-expression recognition system according to claim 2, wherein the feature amplification unit is configured to perform feature amplification on the clipped features according to a second formula, and the second formula is as follows:

λ_H＝H/L，λ_W＝W/L；

in the formula (I), the compound is shown in the specification,

is composed of

At position (x)^t,y^t) The value of the pixel of (a) is,

for the pixel value of the feature map of the cropped depth feature at position (m, n), H, W is the height and width of the feature map, respectively, and L is the preset side length.

5. The deep learning based face micro-expression recognition system of claim 1, wherein the first BN normalization layer comprises:

a mean value calculation unit for utilizing

Calculate the pixel mean, μ, for each channel_B(c) Is the pixel mean of channel c, B is the total number of images contained in the current training batch, Y₁ ^b(c, i, j) is the b th input image of the current training batch, and h and w are the height and the width of the feature map channel respectively;

a variance calculation unit for utilizing

The variance of the pixels for each channel is calculated,

is the pixel variance of channel c;

a normalization unit for utilizing

For Y₁ ^b(c, i, j) carrying out normalization processing to obtain a normalized image

ε is a normal number;

an image processing unit for utilizing

To pair

6. The deep learning-based face micro-expression recognition system according to any one of claims 1 to 5, wherein the deep network model further comprises an image preprocessing module for converting an image format of an image to be recognized into a preset network input format, and the image preprocessing module comprises:

in the formula (I), the compound is shown in the specification,

p_i,j,cc bit for the image channel to be identified(ii) setting the pixel value of (i, j),

the pixel value of c position (i, j) of the mth image channel is M, and M is the total number of images.

7. The deep learning based face micro expression recognition system of claim 6, wherein the image preprocessing module further comprises:

the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to a preset contrast ratio value; the contrast ratio value is selected from a contrast ratio range, the contrast ratio range being [0.5, 1.5 ].

8. The deep learning based face micro expression recognition system according to any one of claims 1-5, wherein the image recognition module further comprises:

in the formula, H_scale、W_scaleIdentifying features for each image separately

The height and the width of the first and second,

is composed of

A pixel value at position (m, n);

9. The deep learning based face micro-expression recognition system according to claim 8, wherein the feature recognition sub-module comprises:

in the formula, num_clsIs the total number of categories of the micro-expressions of the face, f_avg(i) Is the target feature vector f_avgThe value of the ith element.