CN112800891B - Discriminative feature learning method and system for micro-expression recognition - Google Patents

Discriminative feature learning method and system for micro-expression recognition Download PDF

Info

Publication number
CN112800891B
CN112800891B CN202110060936.3A CN202110060936A CN112800891B CN 112800891 B CN112800891 B CN 112800891B CN 202110060936 A CN202110060936 A CN 202110060936A CN 112800891 B CN112800891 B CN 112800891B
Authority
CN
China
Prior art keywords
expression
micro
layer
image
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110060936.3A
Other languages
Chinese (zh)
Other versions
CN112800891A (en
Inventor
卢官明
韩震
卢峻禾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110060936.3A priority Critical patent/CN112800891B/en
Publication of CN112800891A publication Critical patent/CN112800891A/en
Application granted granted Critical
Publication of CN112800891B publication Critical patent/CN112800891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an identifying characteristic learning method and system for micro-expression recognition. Firstly, extracting an initial frame and a peak frame in a micro-expression video sequence, preprocessing the initial frame and the peak frame, and further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow graph; then selecting an image with an expression category different from that of the peak frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak frame image with the image block obtained by cutting to obtain a composite image; then constructing a double-current convolution neural network model based on a class activation graph attention mechanism, inputting a light-flow graph and a synthetic image into two branches of the double-current convolution neural network respectively, and training the model; and finally, extracting features with strong discriminative power from the input video sequence by using the trained model for micro-expression classification and identification. The method can effectively prevent the model from being over-fitted, enables the model to learn the micro-expression characteristics with strong discriminative power, and improves the accuracy of micro-expression recognition.

Description

Discriminative feature learning method and system for micro-expression recognition
Technical Field
The invention relates to a method and a system for learning discriminative features for micro-expression recognition, and belongs to the field of micro-expression recognition and artificial intelligence.
Background
The expression is a non-linguistic behavior for expressing the human emotion and is also an important way for the robot to intelligently understand the human emotion. The general expression is expressed by a human under the condition that the expression of emotion is not inhibited, the amplitude of facial movement is large, and the duration is long. However, in some cases, people intentionally suppress and hide their own emotions, and these suppressed emotions are spontaneously expressed by extremely fast facial expressions, which are called micro-expressions. The duration of the micro expression is extremely short, less than 0.2 second, and the facial motion changes are so subtle that the recognition accuracy of the micro expression by human is low. At present, micro expression recognition is to classify micro expression sample sequences on the existing database, and the micro expression recognition is generally divided into two steps: and (5) extracting and classifying the features. The main work is focused on feature extraction, and micro expression recognition can be simply divided into two types according to a feature extraction mode, wherein the first type is based on manually designed features, and the second type is based on features extracted by a convolutional neural network.
The method based on the manual design features obtains certain achievements in the aspect of micro-expression recognition through decades of development, but needs professional prior knowledge and a complex parameter adjusting process, and has poor generalization ability and robustness. With the rapid development of machine learning and deep learning, the convolutional neural network obtains good performance in many fields of computer vision, and more researchers apply the convolutional neural network to micro-expression recognition. Ruicong proposes to combine 3D convolutional neural networks (3D-CNNs) with migration learning, firstly, the 3D-CNNs are supervised and learned in a common expression database Ouclu-CASIA, then, a model obtained by pre-training is used for micro-expression training, and in order to solve the problem of too few database samples, an author expands the database by 7 times by utilizing image turning and rotation. Kim combines a convolutional neural network and a long-short term memory network (LSTM) to extract the spatial and temporal information of the micro-expression video sequence, learns the spatial information of each frame of the micro-expression video by using the convolutional neural network, and then learns the temporal information among each frame by using the LSTM, and experimental results show that the method is superior to LBP-TOP and corresponding variants. Liong et al calculate the optical flow information using the start frame and peak frame of the micro expression, then extract and fuse the features of the horizontal direction optical flow graph and the vertical direction optical flow graph respectively using a double-current convolutional neural network, and finally classify.
In chinese patent application "micro expression recognition method and system based on channel attention mechanism" (patent application No. CN202010687230.5, publication No. CN112001241A), a three-dimensional tensor is formed by calculating the horizontal component, the vertical component and the optical flow strength of the optical flow between the peak frame and the start frame, and then the three-dimensional tensor is input into a micro expression recognition network model based on the channel attention mechanism, and finally a classification result is obtained. The input of the method is based on optical flow information, so that the spatial information of the micro-expression video sequence cannot be effectively extracted.
Chinese patent application "a micro expression recognition method based on 3D convolutional neural network" (patent application No. CN201610954555.9, publication No. CN106570474A), extracting a grayscale channel feature map, a horizontal direction gradient channel feature map, a vertical direction gradient channel feature map, a horizontal direction optical flow channel feature map, and a vertical direction optical flow channel feature map for each frame of image in a micro expression video sequence to obtain a feature map group corresponding to the micro expression video sequence to be recognized, and then inputting the feature map group to the 3D convolutional neural network to further extract features and classify the feature. The method has the advantages that each frame of image of the micro-expression video sequence is processed, the calculated amount is extremely large, training data are not expanded, and the model is easy to overfit in the training process.
Although convolutional neural networks have achieved excellent performance in the field of micro-expression recognition, many challenges remain. First, training a convolutional neural network requires a large number of samples, while the database of microexpressions is limited. The micro-expression video library CASME II has only 256 micro-expression video sequences, which easily causes model overfitting. Secondly, the micro expression has small change amplitude and weak strength compared with the common expression, and a general convolutional neural network model usually only focuses on regions (such as the mouth, eyes and other regions) with obvious facial changes, but ignores regions with small facial changes, so that the model extraction information is insufficient, and how to improve the learning capacity of the model on the micro expression discriminative features is an important factor for improving the micro expression recognition accuracy.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems of model overfitting, insufficient extraction of micro-expression discriminative features and the like of a micro-expression recognition method based on convolutional neural network extraction features, the invention provides a discriminative feature learning method and a discriminative feature learning system for micro-expression recognition. In addition, in order to enhance the learning capability of the model to the space-time discriminant characteristics, the method utilizes the double-current convolutional neural network space flow branch to generate the similar activation graph, and utilizes the activation graph to carry out attention enhancement on the input of the double-current convolutional neural network time flow branch.
The technical scheme is as follows: in order to realize the purpose of the invention, the invention adopts the following technical scheme:
a discriminative feature learning method for micro-expression recognition comprises the following steps:
(1) extracting initial frames and peak frames of video sequence samples in a micro-expression video library;
(2) normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
(3) calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
(4) for each micro expression peak frame image, selecting an image with an expression type different from the peak frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a composite image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
(5) constructing a double-current convolutional neural network model based on a class activation graph attention machine mechanism; the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation map generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally, a decision fusion layer is used for combining the outputs of the two flow classification layers; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch;
(6) respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
(7) extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and a pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification identification.
Further, the step (1) comprises the following substeps:
(1.1) taking a first frame of the micro-expression video sequence as an initial frame of the micro-expression video sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
Figure BDA0002902318110000031
in the above formula
Figure BDA0002902318110000041
Representing a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 Is a first frame image;
(1.3) calculating the sum of pixel values of each difference image:
Figure BDA0002902318110000042
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) position;
(1.4) obtaining the frame number of the peak frame image:
Figure BDA0002902318110000043
(1.5) the peak value frame image is the p frame image F in the micro-expression video sequence corresponding to the frame number p p
Further, the step (4) comprises the following substeps:
(4.1) setting the micro expression peak frame image in the step (3) as G and the corresponding category label as l G Selecting a common expression image O with a class label l different from the micro expression peak frame image O
(4.2) normalizing the size of the common expression image O into N multiplied by N pixels, wherein the size of the common expression image O is the same as that of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping region corresponding to the micro expression peak frame image G and replace the pixels with the pixels in the clipping region corresponding to the common expression O, wherein C x 、C y Respectively representing the abscissa and ordinate of the center point of the bounding box, C h 、C w Height and width of the bounding box are respectively represented:
Figure BDA0002902318110000044
where δ is a hyper-parameter obeying a uniform distribution between 0 and 1, C x And C y Obey an even distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is N × N and is composed of 0 and 1, and the mask is in the boundary frame of the cutting areaThe value in the region corresponding to R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji category labels from the binary mask T:
Figure BDA0002902318110000045
in the above formula
Figure BDA0002902318110000046
For the resulting composite image, I is a mask of size N x N, all values of which are 1,
Figure BDA0002902318110000047
representing the multiplication of the corresponding elements.
Further, the specific structure of the spatial flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine mechanism, which is constructed in the step (5), is as follows:
the feature extraction layer of the spatial stream branch extracts the features of the spatial stream branch input to obtain a multi-channel feature map
Figure BDA0002902318110000051
The size of the characteristic diagram is H multiplied by H, and the number of channels is c;
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernel
Figure BDA0002902318110000052
Conversion to c eigenvalues:
Figure BDA0002902318110000053
in the above formula θ n The nth feature value representing the global average pooling layer output,
Figure BDA0002902318110000054
representing the value of the nth channel feature map at the coordinate (i, j) position;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
Figure BDA0002902318110000055
xi in the above formula n The nth characteristic value representing the output of the full connection layer,
Figure BDA0002902318110000056
representing the weight of the connection between the nth output neuron of the full connection layer and the jth characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes corresponding to the expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
Figure BDA0002902318110000057
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,
Figure BDA0002902318110000058
a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,
Figure BDA0002902318110000059
the class activation graph corresponding to the nth class is represented and has the size of H multiplied by H, the class activation graph generation layer outputs the class activation graph corresponding to the micro expression peak frame image label during training, and the class activation graph generation layer outputs a spatial stream branch when the micro expression is identified by using the trained double-current convolutional neural network modelClass activation graph of the class with highest probability in Softmax classification layer.
Further, the specific structure of the time-flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine mechanism constructed in the step (5) is as follows:
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output: the size of the class activation graph is first aligned with the input of the time flow leg:
Figure BDA0002902318110000061
in the above formula, Upsample () is an upsampling function, the size of the class activation map is changed from H × H to N × N by the upsampling function, and then the value on the class activation map is mapped to be between 0 and 1:
Figure BDA0002902318110000062
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
Figure BDA0002902318110000063
in the above equation a is the input of the time stream branch,
Figure BDA0002902318110000064
is the input after the attention mechanism enhancement, I is a mask of size N x N, all values of 1,
Figure BDA0002902318110000065
representing the multiplication of corresponding elements;
the time flow branch input subjected to attention enhancement sequentially passes through a feature extraction layer, a global average pooling layer and a full connection layer of the time flow branch, and finally the probability that the time flow input belongs to each category is output through a time flow Softmax classification layer.
Further, the step (6) comprises the following substeps:
(6.1) initializing network weights using a random initialization method;
(6.2) inputting the synthetic image in the step (4) into a spatial flow branch of the dual-flow convolutional neural network, and constructing a loss function of the spatial flow branch according to the output of the Softmax classification layer of the spatial flow branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l O ]+log(∑ j exp(φ s [j]) 1-delta) in the above formula G 、l O Respectively corresponding to the micro expression peak frame image G and the general expression O for synthesis, delta being the hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch O A value of (d);
(6.3) inputting the optical flow diagram in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
Figure BDA0002902318110000071
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model;
and (6.5) obtaining a trained double-current convolutional neural network model through repeated iterative training.
Based on the same inventive concept, the invention discloses a discriminative feature learning system for micro-expression recognition, which comprises:
the preprocessing module is used for extracting initial frames and peak frames of video sequence samples in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the preprocessed peak frame image into two branches of a trained double-flow convolution neural network model, extracting and obtaining micro expression features with strong discriminative power, and using the micro expression features for micro expression classification recognition.
Based on the same inventive concept, the invention discloses a system for learning the discriminative features for micro expression recognition, which comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the method for learning the discriminative features for micro expression recognition when being loaded to the processor.
Has the advantages that: compared with the prior art, the invention has the following advantages:
(1) the invention constructs a double-current convolution neural network based on a class activation graph attention force mechanism, wherein a space flow branch and a time flow branch of the double-current convolution neural network are not mutually independent, the space flow branch generates a class activation graph, and the class activation graph is utilized to carry out attention enhancement on the input of the time flow branch. The class activation diagram generated by the spatial stream branch indicates that the micro expression characteristics of which regions in the spatial domain are strong in distinctiveness, and in order to enable the model to pay attention to the regions with strong distinctiveness characteristics in the time domain, the class activation diagram is used for carrying out attention enhancement on the input of the spatial stream branch. The class activation graph is priori knowledge generated by the spatial stream branch, the temporal stream branch has information supplement of the priori knowledge, and the learning capability of the model on the time-space discriminant characteristics is enhanced, so that the micro-expression identification accuracy rate is improved;
(2) in the model training stage, the constructed double-current convolutional neural network model is trained by utilizing a synthetic image and an optical flow graph, wherein the synthetic image comprises two different expression class labels, one is a micro expression peak frame class label, and the other is a common expression class label. The benefits of this are: firstly, for a micro expression peak frame image of a certain category, a composite image is obtained by respectively using a common expression image and a micro expression peak frame image which are different from the category, so that a training sample is further amplified, and overfitting of a model is prevented; secondly, the composite image comprises a common expression image part and a micro expression peak frame image part, and compared with the whole micro expression peak frame image which is difficult to identify, the composite image is more suitable for training a network, because the common expression is easier to identify than the micro expression, when the network is trained by the composite image, the network has the main task of identifying the micro expression image part in the composite image, namely only the micro expression distinguishing characteristic of a certain area needs to be extracted, so that the training difficulty of the model is reduced; thirdly, in the training process, the micro expression peak frame class label of the synthetic image is combined with a loss function of the network to guide the model to learn the regional characteristics of the micro expression peak frame image which are not replaced by the common expression image, and because the regions of the micro expression image which are replaced by the common expression image are random, namely, each part of the micro expression image is possibly replaced, the model can fully learn the micro expression distinguishing characteristics of each region of the face along with the increase of the training times, and does not only pay attention to certain specific regions (such as mouth, eyes and the like) where certain changes of the face are obviously intersected;
(3) the discriminative feature learning method for micro-expression recognition provided by the invention can realize automatic feature extraction by utilizing end-to-end training without manually designing a feature extractor, and is simple and efficient.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
fig. 2 is a structural diagram of a dual-flow convolutional neural network based on a class activation graph attention machine mechanism constructed in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
As shown in fig. 1, the method for learning distinctive features for micro expression recognition disclosed in the embodiment of the present invention specifically includes the following steps:
step (1): and extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library. In this embodiment, the method for extracting the start frame and the peak frame of each micro-expression video sequence sample by using the SMIC II database as a data source specifically includes the following sub-steps:
(1.1) transmitting the first frame F of the micro-expression video sequence 1 As the initial frame of the micro expression image sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
Figure BDA0002902318110000091
in the above formula
Figure BDA0002902318110000092
Representing a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 The first frame image is a starting frame image;
(1.3) calculating the sum of pixel values of each difference image:
Figure BDA0002902318110000093
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) location;
(1.4) obtaining the frame number of the peak frame image: :
Figure BDA0002902318110000101
(1.5) the peak value frame image is the p frame image F in the micro-expression video sequence corresponding to the index number p p
Step (2): normalizing the sizes of the initial frame images and the peak frame images obtained in the step (1) to be uniform into NxN pixels (N can be selected from 112-448), and performing Euler motion amplification on the normalized images by using different amplification factors (the amplification factor alpha can be selected from 2-20) to obtain a plurality of groups of images of the micro expression initial frame images and the peak frame images. In this example, the size of the image is set to 224 × 224 pixels, and euler motions with amplification factors of 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 are used to amplify the pixels, so that a slight change in the face is amplified, and the sample size is amplified 10 times as large as the original size after euler motion amplification processing using ten different amplification factors.
And (3): and calculating the optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph. The method specifically comprises the following substeps:
(3.1) calculating optical flow information between the peak frame and the initial frame thereof by using a Deepflow algorithm to obtain an optical flow graph U along the x-axis direction and the y-axis direction x 、U y Will U is x And U y Adding the squares of the optical flow values of all the positions and squaring to obtain another optical flow diagram U z
Figure BDA0002902318110000102
Figure BDA0002902318110000103
Representing a light-flow diagram U z The corresponding optical flow value at the position of the upper coordinate (i, j),
Figure BDA0002902318110000104
representing a light-flow diagram U x The corresponding optical flow value at the position of the upper coordinate (i, j),
Figure BDA0002902318110000105
representing a light-flow diagram U y The corresponding optical flow value at the upper coordinate (i, j) position;
(3.2) adding U x 、U y And U z Linear change to [0-1]In the interval:
Figure BDA0002902318110000106
Figure BDA0002902318110000107
Figure BDA0002902318110000111
in the above formula
Figure BDA0002902318110000112
Representing the minimum optical flow value in the corresponding optical flow graph,
Figure BDA0002902318110000113
representing the maximum optical flow value in the corresponding optical flow graph,
Figure BDA0002902318110000114
for the optical flow graph after the linear change,
Figure BDA0002902318110000115
Figure BDA0002902318110000116
is the optical flow value at the position of coordinate (i, j) on the corresponding optical flow map;
(3.3) mixing
Figure BDA0002902318110000117
And
Figure BDA0002902318110000118
and stacking to form the final three-channel light flow graph U with the dimensions of 224 multiplied by 3.
And (4): and (4) respectively using the common expression images different from the categories of the micro expression peak frame images and the micro expression peak frame images to obtain a composite image. Specifically, an image with an expression category different from the peak frame is selected from a common expression image library, the image is cut, and the corresponding area of the peak frame image is replaced by the cut image block, so that a composite image containing two different expression category labels is obtained. In this embodiment, the common expression images are from a database Ferplus, which has 7 common expression categories, and only the common expression images of the same three categories of happy category, surprised category, and nausea are used. And for a certain category of the micro expression peak frame images, respectively using the common expression images and the micro expression peak frame images which are different from the category of the micro expression peak frame images to obtain a composite image, and further amplifying the sample size. For example, if the expression category of the micro expression peak frame is happy, the composite image is obtained by using the common expression images with the categories of surprise and nausea, and if the expression category of the micro expression peak frame is depressed, the composite image is obtained by using the common expressions with the categories of happy, surprise and nausea. The method for obtaining the composite image by the common expression image and the micro expression peak frame image comprises the following substeps:
(4.1) setting the peak frame image in the step (3) as G and the corresponding class label as l G Selecting a common expression O, wherein the class label of the common expression O is different from the micro expression peak frame image, and the class label of the common expression O is l O
(4.2) normalizing the scale of the common expression image O to 224 multiplied by 224 pixels, which is the same as the size of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping area corresponding to the micro expression peak frame image G and replace the pixels in the clipping area corresponding to the common expression O:
Figure BDA0002902318110000119
where N is the normalized scale in step (2), i.e. 224, δ is a hyper-parameter subject to a uniform distribution between 0 and 1, C x ,C y Obeying uniform distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is NXN and is composed of 0 and 1, the value of the mask in the area corresponding to the cutting area boundary frame R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji class labels from the binary mask T:
Figure BDA0002902318110000121
in the above formula
Figure BDA0002902318110000122
For the resulting composite image, I is a mask of size N, with all values of 1.
And (5): and constructing a double-current convolutional neural network model based on a class activation graph attention machine mechanism, wherein the network model can be divided into a time flow branch and a space flow branch as shown in FIG. 2. The spatial stream branch comprises a feature extraction layer, a global average pooling layer, a full connection layer, a Softmax classification layer and a class activation graph generation layer in sequence, the temporal stream branch comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a Softmax classification layer in sequence, and finally the decision fusion layer is used for merging the output of the two streams of the Softmax classification layer, wherein the specific functions of the layers are as follows:
the characteristic extraction layer of the spatial stream branch extracts the input characteristics of the spatial stream branch to obtain a multi-channel characteristic diagram
Figure BDA0002902318110000123
The feature size is H × H, the number of channels is c, dThe feature extraction layer may use the feature extraction portion of any convolutional neural network in deep learning (e.g., ResNet, VGGNet, AlexNet, etc.). The multi-channel characteristic diagram in the example is
Figure BDA0002902318110000124
That is, the size of the feature map is 7 × 7, the number of channels is 512, and the feature extraction layer adopts the feature extraction part of ResNet-18 (i.e. the part from the first convolutional layer of ResNet-18 to the end of the last convolutional layer);
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernel
Figure BDA0002902318110000125
Conversion to c eigenvalues:
Figure BDA0002902318110000126
in the above formula θ n The nth feature value representing the global average pooling layer output,
Figure BDA0002902318110000127
representing the value at the location of coordinate (i, j) on the nth channel profile;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
Figure BDA0002902318110000131
xi in the above formula n The nth characteristic value representing the full link layer output,
Figure BDA0002902318110000132
representing the weight of the n-th output neuron of the full connection layer connected with the j-th characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes corresponding to the expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes; the micro-expression video library CASME II adopted in the embodiment has five micro-expression categories: happy, surprised, vomit, oppressed and others, v is 5;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
Figure BDA0002902318110000133
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,
Figure BDA0002902318110000134
a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,
Figure BDA0002902318110000135
the size of the class activation map corresponding to the nth class is H × H, and the size of the class activation map in this embodiment is 7 × 7. Because the label is provided during training, the class activation graph generation layer outputs the class activation graph of the class corresponding to the micro-expression peak value frame image label, but when the trained double-current convolutional neural network is used for micro-expression recognition, the class activation graph generation layer outputs the class activation graph of the class with the highest probability in the spatial flow branch Softmax classification layer;
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output. The size of the class activation graph is first aligned with the input of the time flow leg:
Figure BDA0002902318110000136
in the above formula, Upsample () is an upsampling function, the size of the class activation map is changed from H × H to N × N by the upsampling function, and then the value on the class activation map is mapped to be between 0 and 1:
Figure BDA0002902318110000137
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
Figure BDA0002902318110000138
in the above equation a is the input of the time stream branch,
Figure BDA0002902318110000139
is the input after the attention mechanism enhancement, I is the mask of size N × N, whose values are all 1;
then the input with enhanced attention passes through a feature extraction layer, a global average pooling layer and a full connection layer of a time flow branch in sequence, the structures of the feature extraction layer, the global average pooling layer and the full connection layer are the same as those of the space flow, and finally the probability that the flow input belongs to each category is output through a Softmax classification layer;
the decision fusion layer adds the output of the time flow branch Softmax classification layer and the output of the space flow branch Softmax classification layer to obtain a category score of the whole double-flow convolutional neural network for input prediction, and takes the category corresponding to the maximum score as a final classification result;
and (6): training the double-current convolutional neural network constructed in the step (5) by using the data obtained in the steps (3) and (4), wherein the training comprises the following sub-steps:
(6.1) initializing network weights using a random initialization method;
(6.2) new image obtained in step (4)
Figure BDA0002902318110000141
Inputting into spatial stream branches of a double-stream convolutional neural network according to spaceThe output of the stream branch Softmax classification layer constructs a loss function of the spatial stream branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l o ]+log(∑ j exp(φ s [j]) 1-delta) in the above formula G 、l O Respectively a category label corresponding to the micro expression peak value frame image G and a category label corresponding to the common expression O in the step (4), delta is a hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing class label l in spatial stream branch Softmax classification layer output O A value of (d);
(6.3) inputting the optical flow diagram U obtained in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
Figure BDA0002902318110000142
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model by using a back propagation algorithm;
and (6.5) carrying out iterative training for multiple times (such as 50 times) to obtain a trained double-current convolutional neural network model.
And (7): extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment (amplification is carried out by using a certain specific amplification factor during identification), further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the preprocessed peak frame image into two branches of a trained double-current convolution neural network model, and extracting micro-expression characteristics with strong discriminative power for micro-expression classification identification.
Based on the same inventive concept, the embodiment of the invention discloses a discriminative characteristic learning system for micro expression recognition, which comprises:
the preprocessing module is used for extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the hyper-parameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro-expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification recognition.
Based on the same inventive concept, the differential feature learning system for micro expression recognition disclosed by the embodiment of the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the differential feature learning method for micro expression recognition when being loaded to the processor.
The technical solutions described above only represent the preferred technical solutions of the present invention, and some possible modifications made to some parts by those skilled in the art all represent the principles of the present invention, and fall within the protection scope of the present invention.

Claims (8)

1. A method for discriminative feature learning for microexpression recognition, the method comprising the steps of:
(1) extracting initial frames and peak frames of video sequence samples in a micro-expression video library;
(2) normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
(3) calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
(4) for each micro expression peak value frame image, selecting an image of which the expression type is different from that of the peak value frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak value frame image with an image block obtained by cutting to obtain a composite image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
(5) constructing a double-current convolution neural network model based on a class activation graph attention mechanism; the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation map generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally, a decision fusion layer is used for combining the outputs of the two flow classification layers; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch;
(6) respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
(7) extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and a pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification identification.
2. The method for learning distinctive features for micro expression recognition according to claim 1, wherein the step (1) comprises the following sub-steps:
(1.1) using a first frame of the micro-expression video sequence as an initial frame of the micro-expression video sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
Figure FDA0003717456030000021
in the above formula
Figure FDA0003717456030000025
Representing a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 Is a first frame image;
(1.3) calculating the sum of pixel values of each difference image:
Figure FDA0003717456030000022
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) position;
(1.4) obtaining the frame number of the peak frame image:
Figure FDA0003717456030000023
(1.5) the peak frame image is the p frame image F in the micro-expression video sequence corresponding to the frame number p p
3. The method for learning distinctive features for micro expression recognition according to claim 1, wherein the step (4) comprises the following sub-steps:
(4.1) setting the micro-expression peak frame image in the step (3) as G, and setting the corresponding category label as l G Selecting a common expression image O with a class label l different from the micro expression peak frame image o
(4.2) normalizing the size of the common expression image O into N multiplied by N pixels, wherein the size of the common expression image O is the same as that of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping region corresponding to the micro expression peak frame image G and replace the pixels with the pixels in the clipping region corresponding to the common expression O, wherein C x 、C y Respectively representing the abscissa and ordinate of the center point of the bounding box, C h 、C w Height and width of the bounding box are respectively represented:
Figure FDA0003717456030000024
where δ is a hyper-parameter obeying a uniform distribution between 0 and 1, C x And C y Obeying uniform distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is NXN and is composed of 0 and 1, the value of the mask in the area corresponding to the cutting area boundary frame R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji class labels from the binary mask T:
Figure FDA0003717456030000031
in the above formula
Figure FDA0003717456030000032
For the resulting composite image, I is a mask of size N, whichA value of all 1, indicates a multiplication of the corresponding elements.
4. The method for learning the discriminative features for micro-expression recognition according to claim 1, wherein the spatial flow branch of the dual-flow convolutional neural network model based on the class-activation-graph attention machine system constructed in the step (5) has the following specific structure:
the feature extraction layer of the spatial flow branch extracts the features of the spatial flow branch input to obtain a multi-channel feature map
Figure FDA0003717456030000033
The size of the characteristic diagram is H multiplied by H, and the number of channels is c;
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernel
Figure FDA0003717456030000034
Conversion to c eigenvalues:
Figure FDA0003717456030000035
in the above formula θ n The nth feature value representing the global average pooling layer output,
Figure FDA0003717456030000036
representing the value of the nth channel feature map at the coordinate (i, j) position;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
Figure FDA0003717456030000037
xi in the above formula n The nth characteristic value representing the full link layer output,
Figure FDA0003717456030000038
representing the weight of the n-th output neuron of the full connection layer connected with the j-th characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes of corresponding expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
Figure FDA0003717456030000039
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,
Figure FDA00037174560300000310
a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,
Figure FDA0003717456030000041
and the class activation graph is H multiplied by H, the class activation graph generation layer outputs the class activation graph of the class corresponding to the micro expression peak frame image label during training, and the class activation graph generation layer outputs the class activation graph of the class with the highest probability in the spatial flow branch Softmax classification layer during micro expression recognition by using the trained double-current convolutional neural network model.
5. The method for learning the discriminative features for micro expression recognition according to claim 4, wherein the specific structure of the time-flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine constructed in the step (5) is as follows:
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output: the size of the class activation graph is first aligned with the input of the time flow leg:
Figure FDA0003717456030000042
in the above formula, Upsample () is an upsampling function, the size of the class activation graph is changed from H × H to N × N by the upsampling function, and then the value on the class activation graph is mapped to be between 0 and 1:
Figure FDA0003717456030000043
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
Figure FDA0003717456030000044
in the above equation a is the input of the time stream branch,
Figure FDA0003717456030000045
is an input after the attention mechanism enhancement, I is a mask of size NxN, whose values are all 1, indicating multiplication of corresponding elements;
the time flow branch input subjected to attention enhancement sequentially passes through a feature extraction layer, a global average pooling layer and a full connection layer of the time flow branch, and finally the probability that the time flow input belongs to each category is output through a time flow Softmax classification layer.
6. The method for learning distinctive features for micro expression recognition according to claim 1, wherein said step (6) comprises the following sub-steps:
(6.1) initializing network weights using a random initialization method;
(6.2) inputting the synthetic image in the step (4) into a spatial flow branch of the dual-flow convolutional neural network, and constructing a loss function of the spatial flow branch according to the output of the Softmax classification layer of the spatial flow branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l O ]+log(∑ j exp(φ s [j])))×(1-δ)
in the above formula G 、l O Respectively corresponding to the micro expression peak frame image G and the general expression O for synthesis, delta being the hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch O A value of (d);
(6.3) inputting the optical flow diagram in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
Figure FDA0003717456030000051
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model;
and (6.5) obtaining a trained double-current convolutional neural network model through repeated iterative training.
7. A system for discriminative feature learning for micro-expression recognition, comprising:
the preprocessing module is used for extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the hyper-parameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro-expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification recognition.
8. A system for discriminative feature learning for micro expression recognition comprising at least one computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when loaded into the processor implementing a method for discriminative feature learning for micro expression recognition according to any of claims 1-6.
CN202110060936.3A 2021-01-18 2021-01-18 Discriminative feature learning method and system for micro-expression recognition Active CN112800891B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110060936.3A CN112800891B (en) 2021-01-18 2021-01-18 Discriminative feature learning method and system for micro-expression recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110060936.3A CN112800891B (en) 2021-01-18 2021-01-18 Discriminative feature learning method and system for micro-expression recognition

Publications (2)

Publication Number Publication Date
CN112800891A CN112800891A (en) 2021-05-14
CN112800891B true CN112800891B (en) 2022-08-26

Family

ID=75809985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110060936.3A Active CN112800891B (en) 2021-01-18 2021-01-18 Discriminative feature learning method and system for micro-expression recognition

Country Status (1)

Country Link
CN (1) CN112800891B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723287A (en) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 Micro-expression identification method, device and medium based on bidirectional cyclic neural network
CN114005157B (en) * 2021-10-15 2024-05-10 武汉烽火信息集成技术有限公司 Micro-expression recognition method for pixel displacement vector based on convolutional neural network
CN114550272B (en) * 2022-03-14 2024-04-09 东南大学 Micro-expression recognition method and device based on video time domain dynamic attention model
CN116311472B (en) * 2023-04-07 2023-10-31 湖南工商大学 Micro-expression recognition method and device based on multi-level graph convolution network
CN117456586A (en) * 2023-11-17 2024-01-26 江南大学 Micro expression recognition method, system, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287805A (en) * 2019-05-31 2019-09-27 东南大学 Micro- expression recognition method and system based on three stream convolutional neural networks
CN110516571A (en) * 2019-08-16 2019-11-29 东南大学 Inter-library micro- expression recognition method and device based on light stream attention neural network
CN112115796A (en) * 2020-08-21 2020-12-22 西北大学 Attention mechanism-based three-dimensional convolution micro-expression recognition algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287805A (en) * 2019-05-31 2019-09-27 东南大学 Micro- expression recognition method and system based on three stream convolutional neural networks
CN110516571A (en) * 2019-08-16 2019-11-29 东南大学 Inter-library micro- expression recognition method and device based on light stream attention neural network
CN112115796A (en) * 2020-08-21 2020-12-22 西北大学 Attention mechanism-based three-dimensional convolution micro-expression recognition algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于改进的卷积神经网络的人脸表情识别方法;邹建成等;《北方工业大学学报》;20200415(第02期);全文 *

Also Published As

Publication number Publication date
CN112800891A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112800891B (en) Discriminative feature learning method and system for micro-expression recognition
Giannopoulos et al. Deep learning approaches for facial emotion recognition: A case study on FER-2013
Oyedotun et al. Deep learning in vision-based static hand gesture recognition
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN110276248B (en) Facial expression recognition method based on sample weight distribution and deep learning
Gaddam et al. Human facial emotion detection using deep learning
CN111582136B (en) Expression recognition method and device, electronic equipment and storage medium
Ali et al. Facial emotion detection using neural network
DANDIL et al. Real-time Facial Emotion Classification Using Deep Learning Article Sidebar
CN113392766A (en) Attention mechanism-based facial expression recognition method
Xu et al. Face expression recognition based on convolutional neural network
Santhoshkumar et al. Deep learning approach: emotion recognition from human body movements
Zhao et al. Cbph-net: A small object detector for behavior recognition in classroom scenarios
CN114170659A (en) Facial emotion recognition method based on attention mechanism
Gantayat et al. Study of algorithms and methods on emotion detection from facial expressions: a review from past research
Gupta et al. Performance improvement in handwritten devanagari character classification
Kumar et al. Bird species classification from images using deep learning
Kale et al. Age, gender and ethnicity classification from face images with CNN-based features
Handa et al. Incremental approach for multi-modal face expression recognition system using deep neural networks
Kanungo Analysis of Image Classification Deep Learning Algorithm
Srininvas et al. A framework to recognize the sign language system for deaf and dumb using mining techniques
CN113469116A (en) Face expression recognition method combining LBP (local binary pattern) features and lightweight neural network
Pradeep et al. Recognition of Indian Classical Dance Hand Gestures
Nayak et al. Facial Expression Recognition based on Feature Enhancement and Improved Alexnet
Thiruthuvanathan et al. EMONET: A Cross Database Progressive Deep Network for Facial Expression.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant