CN110472583B - Human face micro-expression recognition system based on deep learning - Google Patents

Human face micro-expression recognition system based on deep learning Download PDF

Info

Publication number
CN110472583B
CN110472583B CN201910758794.0A CN201910758794A CN110472583B CN 110472583 B CN110472583 B CN 110472583B CN 201910758794 A CN201910758794 A CN 201910758794A CN 110472583 B CN110472583 B CN 110472583B
Authority
CN
China
Prior art keywords
image
feature
discriminant
features
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910758794.0A
Other languages
Chinese (zh)
Other versions
CN110472583A (en
Inventor
龚泽辉
李东
张国生
冯省城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910758794.0A priority Critical patent/CN110472583B/en
Publication of CN110472583A publication Critical patent/CN110472583A/en
Application granted granted Critical
Publication of CN110472583B publication Critical patent/CN110472583B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Abstract

The embodiment of the invention discloses a face micro expression recognition system based on deep learning, which comprises a deep network model for carrying out face micro expression recognition on an input image and comprising a characteristic feature extraction module and an image recognition module. The feature extraction module is used for extracting image recognition features and comprises a depth feature extraction submodule and a discriminant feature extraction submodule; the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on a convolution result output by the first convolution layer and outputting a depth characteristic; and the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on the discriminant region proposing network and amplifying the clipped features to serve as image recognition features. And the image identification module carries out micro-expression identification on the image identification characteristics and outputs an identification result. The method and the device can efficiently, quickly and accurately realize the recognition of the human face micro-expression.

Description

Human face micro-expression recognition system based on deep learning
Technical Field
The embodiment of the invention relates to the technical field of computer vision, in particular to a human face micro-expression recognition system based on deep learning.
Background
In recent years, the deep learning field has become a research hotspot due to the rapid development of computing resources, and computer vision is a more popular research field due to its great practical value, and achieves great performance improvement in image classification, target detection, image segmentation and other tasks compared with the conventional machine learning. Although language is the first choice tool for human communication, the information conveyed by expressions is richer, the micro-expressions can convey real feelings and motivations, and the facial micro-expression recognition is beneficial to enabling the computer vision technology to develop towards more intellectualization.
When the related technology is used for carrying out face micro-expression recognition, the processing steps are required to be divided into a plurality of independent processing steps, and the processing is complicated; the original image is required to be cut, the convolution network is used for extracting the characteristics for a plurality of times in the cutting area, the testing time is long, and the efficiency is low; in addition, the network model has a manual characteristic design process, so that the final performance of the network has a bottleneck and is not too high.
For example, a face micro-expression recognition may include the steps of: firstly, face detection is carried out, face landmark points are detected on a detected face image by combining a Sobel operator edge detection algorithm and a Shi-Tomasi corner point detection algorithm, input features of a Multi-Layer Perceptron (Multi-Layer Perceptron) neural network are defined by the detected landmark points, and facial expressions are identified. In addition, in the method for expression classification and micro-expression detection based on deep learning in the related art, a series of cutting areas can be obtained through detection of face landmark points, the original image is cut and then is respectively sent to a deep learning network structure to obtain characteristics, and final micro-expression classification is carried out.
Disclosure of Invention
The embodiment of the disclosure provides a face micro expression recognition system based on deep learning, which solves the problems of low accuracy and low efficiency caused by multiple steps of artificial feature design and test, and realizes the recognition of face micro expression efficiently, quickly and accurately.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
the embodiment of the invention provides a face micro expression recognition system based on deep learning, which comprises a deep network model for carrying out face micro expression recognition on an input image, wherein the deep network model comprises a feature extraction module for extracting image recognition features and an image recognition module for carrying out micro expression recognition on the image recognition features and outputting a recognition result;
the feature extraction module comprises a depth feature extraction submodule and a discriminant feature extraction submodule;
the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting depth characteristics;
the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on a discriminant region proposing network, and feature-amplifying the clipped features to be used as the image recognition features.
Optionally, the discriminant feature extraction sub-module includes:
the central point coordinate determination unit in the discriminant area is used for obtaining N central point coordinates in the discriminant area by using a discriminant area proposing network based on the depth feature; the discriminant area proposed network sequentially comprises a hole convolution module, a convolution layer and a full connection layer along the data stream processing direction;
the discriminant region determining unit is used for determining a corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region;
a clipping unit configured to clip the depth feature using each discriminant region;
and the feature amplification unit is used for respectively amplifying the feature map sizes of the N cut features to the feature map size of the depth feature.
Optionally, the clipping unit is configured to clip the depth feature by using each discriminant region based on a first formula; the first formula is:
Figure GDA0003383260490000021
Figure GDA0003383260490000022
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000031
features obtained by cropping said depth features for the ith discriminant region, FdeepAnd x and y are coordinate values in the width direction and the height direction of the feature map of the depth feature respectively, k is a constant larger than zero, and L is the side length.
Optionally, the feature amplifying unit is configured to amplify the feature of the cut feature according to a second formula, where the second formula is:
Figure GDA0003383260490000032
xs=[xtW],ys=[ytH],λH=H/L,λW=W/L;
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000033
is composed of
Figure GDA0003383260490000034
At position (x)t,yt) The value of the pixel of (a) is,
Figure GDA0003383260490000035
for the pixel value of the feature map of the cropped depth feature at position (m, n), H, W is the height and width of the feature map, respectively, and L is the side length.
Optionally, the depth feature extraction submodule includes 4 cavity convolution modules with the same structure, and each cavity convolution module sequentially includes a 1 × 1 convolution layer, a first BN normalization layer, a first leakage-carrying linear rectification function layer, a 3 × 3 cavity convolution layer, a second BN normalization layer, and a second leakage-carrying linear rectification function layer along the data stream processing direction.
Optionally, the first BN normalization layer includes:
a mean value calculation unit for calculating a mean value,for using
Figure GDA0003383260490000036
Calculate the pixel mean, μ, for each channelB(c) Is the pixel mean of channel c, B is the total number of images contained in the current training batch,
Figure GDA0003383260490000037
h and w are respectively the height and width of a characteristic diagram channel of the b-th input image of the current training batch;
a variance calculation unit for utilizing
Figure GDA0003383260490000038
The variance of the pixels for each channel is calculated,
Figure GDA0003383260490000039
is the pixel variance of channel c;
a normalization unit for utilizing
Figure GDA0003383260490000041
To pair
Figure GDA0003383260490000042
Carrying out normalization processing to obtain a normalized image
Figure GDA0003383260490000043
ε is a normal number;
an image processing unit for utilizing
Figure GDA0003383260490000044
To pair
Figure GDA0003383260490000045
And carrying out image processing, wherein gamma is a scaling factor and beta is a translation factor.
Optionally, the deep network model further includes an image preprocessing module, configured to convert an image format of the image to be recognized into a preset network input format, where the image preprocessing module includes:
the image scaling submodule is used for scaling the size of the image to be identified to a preset size;
the normalization submodule is used for carrying out pixel normalization on the image to be recognized by utilizing a third formula; the third formula is:
Figure GDA0003383260490000046
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000047
pi,j,cfor the pixel value of the c position (i, j) of the image channel to be identified,
Figure GDA0003383260490000048
is the pixel value after normalization, H is the height of the image to be recognized, W is the width of the image to be recognized,
Figure GDA0003383260490000049
the pixel value at the C position (i, j) of the mth image channel is M, and M is the total number of images.
Optionally, the image preprocessing module further includes:
the brightness adjusting submodule is used for adjusting the brightness of the image to be identified according to a preset brightness proportion value; the brightness proportion value is selected from a brightness proportion range, and the brightness proportion range is [0.5, 1.5 ];
the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to a preset contrast ratio value; the contrast ratio value is selected from a contrast ratio range, and the brightness ratio range is [0.5, 1.5 ].
Optionally, the image recognition module further includes:
the pooling submodule is used for carrying out global average pooling on each image identification feature by utilizing a fourth formula, wherein the fourth formula is as follows:
Figure GDA0003383260490000051
in the formula, Hscale、WscaleIdentifying features for each image separately
Figure GDA0003383260490000052
The height and the width of the first and second,
Figure GDA0003383260490000053
is composed of
Figure GDA0003383260490000054
A pixel value at position (m, n);
the full-connection layer submodule is used for uniformly storing the image identification features processed by the pooling submodule into a feature data set;
and the characteristic identification submodule is used for identifying the image characteristics in the characteristic data set and outputting a result.
Optionally, the feature identification submodule includes:
a target feature vector calculation unit for utilizing the feature data set based on the feature data set
Figure GDA0003383260490000055
Calculating to obtain a target characteristic vector favgThe feature data set comprises N feature vectors
Figure GDA0003383260490000056
A category vector output unit for calculating a category vector o of each type of micro expression to which the image recognition feature belongs using a fifth formulaiThe fifth formula is:
Figure GDA0003383260490000057
in the formula, numclsIs the total number of categories of the micro-expressions of the face, favg(i) Is the target feature vectorfavgThe value of the ith element.
The technical scheme provided by the application has the advantages that firstly, a depth feature extraction submodule is used for extracting the features of an input image to obtain depth features; then, using the discriminant feature extraction submodule to take the depth features as input, and obtaining a series of discriminant features through further feature enhancement; and finally, classifying the discriminative features by using an image recognition module to output an expression classification result. The micro-expression image of the face to be recognized is directly input into the deep network model, so that the final micro-expression classification result can be obtained, and the test is convenient; the method has the advantages that the data-driven mode is utilized to automatically learn and classify required features from the input images, the manual feature design is not needed, the trouble of manual feature design is eliminated, the problems of low accuracy and low efficiency caused by the fact that existing manual feature design and test are complicated in multiple steps are solved, and the efficient, rapid and accurate recognition of the human face micro-expression is realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a structural diagram of a specific embodiment of a facial micro-expression recognition system based on deep learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a data processing flow of a feature extraction module according to an embodiment of the present invention;
fig. 3 is a structural diagram of a specific implementation of a hole convolution module according to an embodiment of the present invention;
fig. 4 is a structural diagram of a specific implementation of the discriminant feature extraction sub-module according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an embodiment of a differentiated area offer network according to an embodiment of the present invention;
fig. 6 is a structural diagram of an embodiment of an image recognition module according to an embodiment of the present invention;
fig. 7 is a schematic flowchart of image preprocessing according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.
Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a deep learning-based facial micro-expression recognition system according to an embodiment of the present invention, in a specific implementation manner, where the embodiment of the present invention may include the following:
the human face micro expression recognition system based on deep learning can comprise a deep network model 1, wherein the deep network model 1 is used for carrying out human face micro expression recognition on an input image and can comprise a feature extraction module 11 and an image recognition module 12.
The feature extraction module 11 is configured to extract image recognition features, where the image recognition features may include depth features and discriminant features, and accordingly, the depth feature extraction submodule 111 and the discriminant feature extraction submodule 112 may be respectively used for extraction.
In the present application, the depth feature extraction sub-module 111 may sequentially include a first convolution layer and a plurality of void convolution modules; and the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting the depth characteristic. The first convolution layer performs convolution processing on an input image to be recognized, and inputs result data after the convolution processing into the first cavity convolution module, the first cavity convolution module processes received data and inputs the processed data into the second cavity convolution module until data output by the last cavity convolution module is the depth characteristic of the image to be recognized. The cavity convolution can increase the receptive field, avoid using pooling to cause image space information loss, and is favorable for improving the accuracy of model identification.
Optionally, referring to fig. 2, the depth feature extraction sub-module 111 may include a first convolution layer of 7 × 7 and 4 identical hole convolution modules, each of which may include a convolution layer of 1 × 1, a first BN normalization layer, a first leaky linear rectification function layer, a hole convolution layer of 3 × 3, a second BN normalization layer, and a second leaky linear rectification function layer in sequence along the data stream processing direction, and the structure of the depth feature extraction sub-module may be as shown in fig. 3, for example. In this embodiment, the hole convolution module may first use the 1 × 1 convolution template K1×1Convolving the input X of the cavity convolution module and storing the result in Y1In, Y1(i,j)=K1×1And X (i, j) is the pixel value of the (i, j) position. Then can be paired with Y1Using batch normalization, the current training batch may contain B input images, with the input of batch normalization being
Figure GDA0003383260490000081
Wherein the content of the first and second substances,
Figure GDA0003383260490000082
features derived for the b-th image in the current input batchC, h and w are the number, height and width of the channels of the characteristic diagram respectively, and the output obtained by the first BN normalization layer is Y2. After passing through the BN normalization layer, a Leaky RELU (Leaky RELU) nonlinear activation function can be used to act on Y2To obtain an activation output Y3,Y3=LRELU(Y2),
Figure GDA0003383260490000083
Then using a 3 × 3 convolution template K3×3For Y3Performing hole convolution operation to save convolution output in Y4If the hole convolution constant l is set, then the 3 × 3 hole convolution operation can be as follows:
Figure GDA0003383260490000084
in the presence of a catalyst to obtain Y4Then, the second BN normalization layer pair Y is utilized4Normalizing to obtain Y5The Leaky RELU nonlinear activation function in the second Leaky-band linear rectification function layer can be used to act on Y5To obtain Y6,Y6Is the output of a hole convolution module.
In this embodiment, the first BN normalization layer and the second BN normalization layer are used for batch normalization processing of the input data, and both may include corresponding structures, and the first BN normalization layer may include:
a mean value calculation unit for utilizing
Figure GDA0003383260490000091
Calculate the pixel mean, μ, for each channelB(c) Is the pixel mean of channel c, B is the total number of images contained in the current training batch,
Figure GDA0003383260490000092
h and w are respectively the height and width of a characteristic diagram channel of the b-th input image of the current training batch;
a variance calculation unit for utilizing
Figure GDA0003383260490000093
The variance of the pixels for each channel is calculated,
Figure GDA0003383260490000094
is the pixel variance of channel c;
a normalization unit for utilizing
Figure GDA0003383260490000095
To pair
Figure GDA0003383260490000096
Carrying out normalization processing to obtain a normalized image
Figure GDA0003383260490000097
Epsilon is any small normal number;
an image processing unit for utilizing
Figure GDA0003383260490000098
To pair
Figure GDA0003383260490000099
And (4) carrying out image processing, wherein gamma is a scaling factor, beta is a translation factor, and both the scaling factor gamma and the translation factor beta can be obtained through network self-learning.
In the present application, referring to fig. 4, the discriminant feature extraction sub-module 112 may be configured to crop the depth feature using a plurality of Discriminant Regions (DRPNs) obtained based on a Discriminant Region Proposed Network (DRPN), and feature-enlarge the cropped feature to serve as the image recognition feature. That is, the discriminative feature extraction sub-module 112 takes the depth features extracted by the depth feature extraction sub-module 111 as input, obtains a series of discriminative features through further feature enhancement, and performs discriminative region localization on the human face micro-expression image by using the discriminative region proposal network. The discriminative area proposed network may sequentially include a void convolution module, a convolution layer, and a full-connection layer along a data stream processing direction, for example, as shown in fig. 5, where the convolution layer may be a 1 × 1 convolution layer, the DCM module is a void convolution module, the number of neurons output by the full-connection layer is 2N, and the number of neurons sequentially corresponds to center point coordinates of N discriminative areas. The discriminant region proposal network has the capability of automatically identifying the discriminant regions in the images that contribute to classification, and can solve the problem that the prior art needs to manually identify the discriminant regions in the micro-expression images.
In one embodiment, the discriminative feature extraction sub-module may include:
a central point coordinate determination unit in the discriminant region, configured to obtain N central point coordinates S { (x) using the discriminant region proposing network based on the depth featurei,yi)|i=1,...,N}。
The discriminant region determining unit is used for determining a corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region; for example, if the predetermined side length is L, N discriminant regions are obtained
Figure GDA0003383260490000101
Figure GDA0003383260490000102
And a clipping unit for clipping the depth feature by using each discriminant region. Any size adjustment algorithm can be adopted to realize image clipping on the feature map of the depth feature, which does not affect the implementation of the application. Optionally, for example, based on the first formula, the depth features may be clipped by using each discriminant region; the first formula may be:
Figure GDA0003383260490000103
Figure GDA0003383260490000104
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000105
features obtained by clipping depth features for the ith discriminant region, FdeepFor the depth feature, x and y are coordinate values in the width direction and the height direction of the feature map of the depth feature, respectively, for example, the upper left corner of the image may be the origin of coordinates, k is a constant greater than zero, L is the side length, and δ () is a variant of the sigmoid function. It should be noted that, the related art cuts the original image and sends the original image to the convolution network for feature extraction, and this method is inefficient and has a long testing time.
And the feature amplification unit is used for respectively amplifying the feature map sizes of the N cut features to the feature map size of the depth feature. After the feature map of the depth feature is clipped, a plurality of feature maps are obtained, and the size of the clipped feature map may be adjusted to the size of the feature map of the depth feature by using any size adjustment algorithm, which is not limited in this application. In one embodiment, the feature after the clipping can be amplified according to the second formula, and after the feature amplification operation, a series of discriminant features can be obtained
Figure GDA0003383260490000111
The second formula may be:
Figure GDA0003383260490000112
Figure GDA0003383260490000113
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000114
is composed of
Figure GDA0003383260490000115
At position (x)t,yt) The value of the pixel of (a) is,
Figure GDA0003383260490000116
for the pixel value of the feature map at position (m, n) of the cropped depth feature, H, W is the height and width of the feature map, respectively, and L is the side length.
In the present application, the image recognition module 12 may be configured to perform micro-expression recognition on the image recognition features and output a recognition result. The recognition result can be the category of the facial micro expression of the image to be recognized, such as sadness, surprise and terror; the probability that the facial micro-expression of the image to be recognized belongs to each type of expression can also be used, and the implementation of the method is not influenced.
In one embodiment, referring to fig. 6, the image recognition module 12 may include:
a pooling sub-module for identifying features for each image using a fourth formula
Figure GDA0003383260490000117
A Global Average Pooling (GAP) process is performed and the results are stored
Figure GDA0003383260490000118
In (1). The fourth formula may be:
Figure GDA0003383260490000119
in the formula, Hscale、WscaleIdentifying features for each image separately
Figure GDA00033832604900001110
The height and the width of the first and second,
Figure GDA00033832604900001111
is composed of
Figure GDA00033832604900001112
In positionA pixel value of (m, n);
and the full-connection layer submodule is used for uniformly storing the image identification features processed by the pooling submodule into a feature data set. That is, after GAP is performed for each image recognition feature in turn
Figure GDA00033832604900001113
Using a full-link layer, the number of output neurons of the full-link layer is the same as the number of micro-expression classes, and is set to numclsStore the results in
Figure GDA00033832604900001114
And the characteristic identification submodule is used for identifying the image characteristics in the characteristic data set and outputting a result. In one embodiment, the feature recognition sub-module may include:
a target feature vector calculation unit for utilizing the feature data set
Figure GDA0003383260490000121
Calculating to obtain a target characteristic vector favgThe feature data set comprises N feature vectors
Figure GDA0003383260490000122
A class vector output unit for pair favgUsing the softmax activation function to obtain a final class output vector o, namely calculating the class vector o of each type of micro expression to which the image recognition features belong by using a fifth formulaiThe fifth formula is:
Figure GDA0003383260490000123
in the formula, numclsIs the total number of categories of the micro-expressions of the face, favg(i) As a target feature vector favgThe value of the ith element.
It should be noted that the deep network model 1 of the present application is a model obtained by end-to-end training based on a deep learning method. During the training or testing process, the class cross entropy loss function can be used, and the random gradient descent algorithm is used for end-to-end training optimization.
In the technical scheme provided by the embodiment of the invention, firstly, a depth feature extraction submodule is utilized to extract the features of an input image to obtain depth features; then, using the discriminant feature extraction submodule to take the depth features as input, and obtaining a series of discriminant features through further feature enhancement; and finally, classifying the discriminative features by using an image recognition module to output an expression classification result. Because the deep network model structure for carrying out face micro-expression recognition on the input image adopts end-to-end training and testing, the face micro-expression image to be recognized is directly input, so that the final micro-expression classification result can be obtained, and the testing is convenient; the method has the advantages that the data-driven mode is utilized to automatically learn and classify required features from the input images, the manual feature design is not needed, the trouble of manual feature design is eliminated, the problems of low accuracy and low efficiency caused by the fact that existing manual feature design and test are complicated in multiple steps are solved, and the efficient, rapid and accurate recognition of the human face micro-expression is realized.
In another embodiment, in order to improve the accuracy and efficiency of the model for identifying the facial micro-expressions, before extracting the image identification features, the image to be identified may be subjected to image preprocessing. In view of this, the deep learning-based face micro-expression recognition system may further include an image preprocessing module, which is configured to convert an image format of the image to be recognized into a preset network input format. In one embodiment, the image pre-processing module may include:
the image scaling submodule is used for scaling the size of the image to be identified to a preset size; for example, the image to be recognized is scaled to 227 × 227.
The normalization submodule is used for carrying out pixel normalization on the image to be recognized by utilizing the following formula:
Figure GDA0003383260490000131
in the formula (I), the compound is shown in the specification,
Figure GDA0003383260490000132
pi,j,cfor the pixel value of the c position (i, j) of the image channel to be identified,
Figure GDA0003383260490000133
is the pixel value after normalization, H is the height of the image to be recognized, W is the width of the image to be recognized,
Figure GDA0003383260490000134
the pixel value at the C position (i, j) of the mth image channel is M, and M is the total number of images.
Based on the above embodiment, referring to fig. 6, the image preprocessing module may further include:
and the brightness adjusting submodule is used for adjusting the brightness of the image to be recognized according to the preset brightness proportion value. The brightness proportion value can be selected from a brightness proportion range, the brightness proportion range is [0.5, 1.5], that is, the brightness of the image to be recognized can be adjusted in a proportion of 0.5-1.5, and the brightness proportion value can be any value which does not belong to 0.5-1.5, which does not affect the implementation of the application.
And the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to the preset contrast ratio value. Wherein the contrast ratio is selected from a contrast ratio range, and the brightness ratio range is [0.5, 1.5 ]. That is to say, the contrast of the image to be recognized can be adjusted by a ratio of 0.5 to 1.5, and the contrast ratio value can be any value which does not belong to 0.5 to 1.5, which does not affect the implementation of the present application.
From the above, the application provides an end-to-end expression classification method based on deep learning, the face micro-expression image to be recognized is directly input, the final micro-expression classification result is obtained, the test is convenient, the required features for classification are automatically learned from the input image in a data-driven mode, the manual feature design is not needed, the trouble of manual feature design is saved, only one-time feature extraction needs to be carried out on the image, the features are cut, the test time is short, the efficiency is high, and the DRPN has the capability of automatically recognizing the discriminant region contributing to classification in the image.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The human face micro-expression recognition system based on deep learning provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (9)

1. A human face micro expression recognition system based on deep learning is characterized by comprising a deep network model for carrying out human face micro expression recognition on an input image, wherein the deep network model comprises a feature extraction module for extracting image recognition features and an image recognition module for carrying out micro expression recognition on the image recognition features and outputting a recognition result;
the feature extraction module comprises a depth feature extraction submodule and a discriminant feature extraction submodule;
the depth feature extraction submodule sequentially comprises a first convolution layer and a plurality of cavity convolution modules; the cavity convolution module is used for carrying out data processing on the convolution result output by the first convolution layer and outputting depth characteristics; the depth feature extraction submodule comprises 4 cavity convolution modules with the same structure, and each cavity convolution module sequentially comprises a 1 x 1 convolution layer, a first BN normalization layer, a first leakage linear rectification function layer, a 3 x 3 cavity convolution layer, a second BN normalization layer and a second leakage linear rectification function layer along the data stream processing direction; the discriminant feature extraction submodule is used for clipping the depth features by using a plurality of discriminant regions obtained based on a discriminant region proposing network, and performing feature amplification on the clipped features to serve as the image recognition features;
wherein the discriminant feature extraction submodule comprises: the central point coordinate determination unit in the discriminant area is used for obtaining N central point coordinates in the discriminant area by using a discriminant area proposing network based on the depth feature; the discriminant area proposed network sequentially comprises a hole convolution module, a convolution layer and a full connection layer along the data stream processing direction; and the discriminant region determining unit is used for determining the corresponding discriminant region based on the center point coordinates and the preset side length of each discriminant region.
2. The deep learning based face micro expression recognition system of claim 1, wherein the discriminative feature extraction sub-module comprises:
a clipping unit configured to clip the depth feature using each discriminant region;
and the feature amplification unit is used for respectively amplifying the feature map sizes of the N cut features to the feature map size of the depth feature.
3. The system for recognizing the micro expression of the human face based on the deep learning of claim 2, wherein the clipping unit is configured to clip the deep feature by using each discriminant region based on a first formula; the first formula is:
Figure FDA0003383260480000011
Figure FDA0003383260480000012
δ(x)=1/(1+exp-kx);
Figure FDA0003383260480000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003383260480000022
features obtained by cropping said depth features for the ith discriminant region, FdeepFor the depth feature, x and y are coordinate values in the width direction and the height direction of a feature map of the depth feature respectively, k is a constant larger than zero, and L is the preset side length; x is the number ofi,yiThe coordinates of the center point are shown as the ith discriminant region.
4. The deep learning-based face micro-expression recognition system according to claim 2, wherein the feature amplification unit is configured to perform feature amplification on the clipped features according to a second formula, and the second formula is as follows:
Figure FDA0003383260480000023
Figure FDA0003383260480000024
λH=H/L,λW=W/L;
in the formula (I), the compound is shown in the specification,
Figure FDA0003383260480000025
is composed of
Figure FDA0003383260480000026
At position (x)t,yt) The value of the pixel of (a) is,
Figure FDA0003383260480000027
for the pixel value of the feature map of the cropped depth feature at position (m, n), H, W is the height and width of the feature map, respectively, and L is the preset side length.
5. The deep learning based face micro-expression recognition system of claim 1, wherein the first BN normalization layer comprises:
a mean value calculation unit for utilizing
Figure FDA0003383260480000028
Calculate the pixel mean, μ, for each channelB(c) Is the pixel mean of channel c, B is the total number of images contained in the current training batch, Y1 b(c, i, j) is the b th input image of the current training batch, and h and w are the height and the width of the feature map channel respectively;
a variance calculation unit for utilizing
Figure FDA0003383260480000029
The variance of the pixels for each channel is calculated,
Figure FDA00033832604800000214
is the pixel variance of channel c;
a normalization unit for utilizing
Figure FDA00033832604800000210
For Y1 b(c, i, j) carrying out normalization processing to obtain a normalized image
Figure FDA00033832604800000211
ε is a normal number;
an image processing unit for utilizing
Figure FDA00033832604800000212
To pair
Figure FDA00033832604800000213
And carrying out image processing, wherein gamma is a scaling factor and beta is a translation factor.
6. The deep learning-based face micro-expression recognition system according to any one of claims 1 to 5, wherein the deep network model further comprises an image preprocessing module for converting an image format of an image to be recognized into a preset network input format, and the image preprocessing module comprises:
the image scaling submodule is used for scaling the size of the image to be identified to a preset size;
the normalization submodule is used for carrying out pixel normalization on the image to be recognized by utilizing a third formula; the third formula is:
Figure FDA0003383260480000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003383260480000032
pi,j,cc bit for the image channel to be identified(ii) setting the pixel value of (i, j),
Figure FDA0003383260480000033
is the pixel value after normalization, H is the height of the image to be recognized, W is the width of the image to be recognized,
Figure FDA0003383260480000034
the pixel value of c position (i, j) of the mth image channel is M, and M is the total number of images.
7. The deep learning based face micro expression recognition system of claim 6, wherein the image preprocessing module further comprises:
the brightness adjusting submodule is used for adjusting the brightness of the image to be identified according to a preset brightness proportion value; the brightness proportion value is selected from a brightness proportion range, and the brightness proportion range is [0.5, 1.5 ];
the contrast adjusting submodule is used for adjusting the contrast of the image to be identified according to a preset contrast ratio value; the contrast ratio value is selected from a contrast ratio range, the contrast ratio range being [0.5, 1.5 ].
8. The deep learning based face micro expression recognition system according to any one of claims 1-5, wherein the image recognition module further comprises:
the pooling submodule is used for carrying out global average pooling on each image identification feature by utilizing a fourth formula, wherein the fourth formula is as follows:
Figure FDA0003383260480000035
in the formula, Hscale、WscaleIdentifying features for each image separately
Figure FDA0003383260480000036
The height and the width of the first and second,
Figure FDA0003383260480000037
is composed of
Figure FDA0003383260480000038
A pixel value at position (m, n);
the full-connection layer submodule is used for uniformly storing the image identification features processed by the pooling submodule into a feature data set;
and the characteristic identification submodule is used for identifying the image characteristics in the characteristic data set and outputting a result.
9. The deep learning based face micro-expression recognition system according to claim 8, wherein the feature recognition sub-module comprises:
a target feature vector calculation unit for utilizing the feature data set based on the feature data set
Figure FDA0003383260480000041
Calculating to obtain a target characteristic vector favgThe feature data set comprises N feature vectors
Figure FDA0003383260480000042
A category vector output unit for calculating a category vector o of each type of micro expression to which the image recognition feature belongs using a fifth formulaiThe fifth formula is:
Figure FDA0003383260480000043
in the formula, numclsIs the total number of categories of the micro-expressions of the face, favg(i) Is the target feature vector favgThe value of the ith element.
CN201910758794.0A 2019-08-16 2019-08-16 Human face micro-expression recognition system based on deep learning Active CN110472583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910758794.0A CN110472583B (en) 2019-08-16 2019-08-16 Human face micro-expression recognition system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910758794.0A CN110472583B (en) 2019-08-16 2019-08-16 Human face micro-expression recognition system based on deep learning

Publications (2)

Publication Number Publication Date
CN110472583A CN110472583A (en) 2019-11-19
CN110472583B true CN110472583B (en) 2022-04-19

Family

ID=68511791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910758794.0A Active CN110472583B (en) 2019-08-16 2019-08-16 Human face micro-expression recognition system based on deep learning

Country Status (1)

Country Link
CN (1) CN110472583B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274895A (en) * 2020-01-15 2020-06-12 新疆大学 CNN micro-expression identification method based on cavity convolution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460830A (en) * 2018-05-09 2018-08-28 厦门美图之家科技有限公司 Image repair method, device and image processing equipment
CN109492529A (en) * 2018-10-08 2019-03-19 中国矿业大学 A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN109902715A (en) * 2019-01-18 2019-06-18 南京理工大学 A kind of method for detecting infrared puniness target based on context converging network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11587304B2 (en) * 2017-03-10 2023-02-21 Tusimple, Inc. System and method for occluding contour detection
US10726858B2 (en) * 2018-06-22 2020-07-28 Intel Corporation Neural network for speech denoising trained with deep feature losses

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460830A (en) * 2018-05-09 2018-08-28 厦门美图之家科技有限公司 Image repair method, device and image processing equipment
CN109492529A (en) * 2018-10-08 2019-03-19 中国矿业大学 A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN109902715A (en) * 2019-01-18 2019-06-18 南京理工大学 A kind of method for detecting infrared puniness target based on context converging network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Efficient Emotion Recognition based on Hybrid Emotion Recognition Neural Network;Yang-Yen Ou等;《2018 International Conference on Orange Technologies (ICOT)》;20190506;2-4 *
Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition;Jianlong Fu等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;4478-4481 *

Also Published As

Publication number Publication date
CN110472583A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN108830188B (en) Vehicle detection method based on deep learning
CN109165623B (en) Rice disease spot detection method and system based on deep learning
CN108960245B (en) Tire mold character detection and recognition method, device, equipment and storage medium
CN107133622B (en) Word segmentation method and device
CN110399884B (en) Feature fusion self-adaptive anchor frame model vehicle detection method
US20170364757A1 (en) Image processing system to detect objects of interest
WO2019080203A1 (en) Gesture recognition method and system for robot, and robot
CN106156777B (en) Text picture detection method and device
CN111652332B (en) Deep learning handwritten Chinese character recognition method and system based on two classifications
CN109766805B (en) Deep learning-based double-layer license plate character recognition method
CN111783576A (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN111967363B (en) Emotion prediction method based on micro-expression recognition and eye movement tracking
CN111339975A (en) Target detection, identification and tracking method based on central scale prediction and twin neural network
CN108898623A (en) Method for tracking target and equipment
CN105893941B (en) A kind of facial expression recognizing method based on area image
CN111310737A (en) Lane line detection method and device
Dorbe et al. FCN and LSTM based computer vision system for recognition of vehicle type, license plate number, and registration country
CN104966095B (en) Image object detection method and device
CN115880529A (en) Method and system for classifying fine granularity of birds based on attention and decoupling knowledge distillation
CN110472583B (en) Human face micro-expression recognition system based on deep learning
CN108921006B (en) Method for establishing handwritten signature image authenticity identification model and authenticity identification method
Jiang et al. Hand gesture detection based real-time american sign language letters recognition using support vector machine
Lin et al. A traffic sign recognition method based on deep visual feature
Golgire Traffic Sign Recognition using Machine Learning: A Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant