CN112800891B - Discriminative feature learning method and system for micro-expression recognition - Google Patents
Discriminative feature learning method and system for micro-expression recognition Download PDFInfo
- Publication number
- CN112800891B CN112800891B CN202110060936.3A CN202110060936A CN112800891B CN 112800891 B CN112800891 B CN 112800891B CN 202110060936 A CN202110060936 A CN 202110060936A CN 112800891 B CN112800891 B CN 112800891B
- Authority
- CN
- China
- Prior art keywords
- expression
- micro
- layer
- image
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an identifying characteristic learning method and system for micro-expression recognition. Firstly, extracting an initial frame and a peak frame in a micro-expression video sequence, preprocessing the initial frame and the peak frame, and further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow graph; then selecting an image with an expression category different from that of the peak frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak frame image with the image block obtained by cutting to obtain a composite image; then constructing a double-current convolution neural network model based on a class activation graph attention mechanism, inputting a light-flow graph and a synthetic image into two branches of the double-current convolution neural network respectively, and training the model; and finally, extracting features with strong discriminative power from the input video sequence by using the trained model for micro-expression classification and identification. The method can effectively prevent the model from being over-fitted, enables the model to learn the micro-expression characteristics with strong discriminative power, and improves the accuracy of micro-expression recognition.
Description
Technical Field
The invention relates to a method and a system for learning discriminative features for micro-expression recognition, and belongs to the field of micro-expression recognition and artificial intelligence.
Background
The expression is a non-linguistic behavior for expressing the human emotion and is also an important way for the robot to intelligently understand the human emotion. The general expression is expressed by a human under the condition that the expression of emotion is not inhibited, the amplitude of facial movement is large, and the duration is long. However, in some cases, people intentionally suppress and hide their own emotions, and these suppressed emotions are spontaneously expressed by extremely fast facial expressions, which are called micro-expressions. The duration of the micro expression is extremely short, less than 0.2 second, and the facial motion changes are so subtle that the recognition accuracy of the micro expression by human is low. At present, micro expression recognition is to classify micro expression sample sequences on the existing database, and the micro expression recognition is generally divided into two steps: and (5) extracting and classifying the features. The main work is focused on feature extraction, and micro expression recognition can be simply divided into two types according to a feature extraction mode, wherein the first type is based on manually designed features, and the second type is based on features extracted by a convolutional neural network.
The method based on the manual design features obtains certain achievements in the aspect of micro-expression recognition through decades of development, but needs professional prior knowledge and a complex parameter adjusting process, and has poor generalization ability and robustness. With the rapid development of machine learning and deep learning, the convolutional neural network obtains good performance in many fields of computer vision, and more researchers apply the convolutional neural network to micro-expression recognition. Ruicong proposes to combine 3D convolutional neural networks (3D-CNNs) with migration learning, firstly, the 3D-CNNs are supervised and learned in a common expression database Ouclu-CASIA, then, a model obtained by pre-training is used for micro-expression training, and in order to solve the problem of too few database samples, an author expands the database by 7 times by utilizing image turning and rotation. Kim combines a convolutional neural network and a long-short term memory network (LSTM) to extract the spatial and temporal information of the micro-expression video sequence, learns the spatial information of each frame of the micro-expression video by using the convolutional neural network, and then learns the temporal information among each frame by using the LSTM, and experimental results show that the method is superior to LBP-TOP and corresponding variants. Liong et al calculate the optical flow information using the start frame and peak frame of the micro expression, then extract and fuse the features of the horizontal direction optical flow graph and the vertical direction optical flow graph respectively using a double-current convolutional neural network, and finally classify.
In chinese patent application "micro expression recognition method and system based on channel attention mechanism" (patent application No. CN202010687230.5, publication No. CN112001241A), a three-dimensional tensor is formed by calculating the horizontal component, the vertical component and the optical flow strength of the optical flow between the peak frame and the start frame, and then the three-dimensional tensor is input into a micro expression recognition network model based on the channel attention mechanism, and finally a classification result is obtained. The input of the method is based on optical flow information, so that the spatial information of the micro-expression video sequence cannot be effectively extracted.
Chinese patent application "a micro expression recognition method based on 3D convolutional neural network" (patent application No. CN201610954555.9, publication No. CN106570474A), extracting a grayscale channel feature map, a horizontal direction gradient channel feature map, a vertical direction gradient channel feature map, a horizontal direction optical flow channel feature map, and a vertical direction optical flow channel feature map for each frame of image in a micro expression video sequence to obtain a feature map group corresponding to the micro expression video sequence to be recognized, and then inputting the feature map group to the 3D convolutional neural network to further extract features and classify the feature. The method has the advantages that each frame of image of the micro-expression video sequence is processed, the calculated amount is extremely large, training data are not expanded, and the model is easy to overfit in the training process.
Although convolutional neural networks have achieved excellent performance in the field of micro-expression recognition, many challenges remain. First, training a convolutional neural network requires a large number of samples, while the database of microexpressions is limited. The micro-expression video library CASME II has only 256 micro-expression video sequences, which easily causes model overfitting. Secondly, the micro expression has small change amplitude and weak strength compared with the common expression, and a general convolutional neural network model usually only focuses on regions (such as the mouth, eyes and other regions) with obvious facial changes, but ignores regions with small facial changes, so that the model extraction information is insufficient, and how to improve the learning capacity of the model on the micro expression discriminative features is an important factor for improving the micro expression recognition accuracy.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems of model overfitting, insufficient extraction of micro-expression discriminative features and the like of a micro-expression recognition method based on convolutional neural network extraction features, the invention provides a discriminative feature learning method and a discriminative feature learning system for micro-expression recognition. In addition, in order to enhance the learning capability of the model to the space-time discriminant characteristics, the method utilizes the double-current convolutional neural network space flow branch to generate the similar activation graph, and utilizes the activation graph to carry out attention enhancement on the input of the double-current convolutional neural network time flow branch.
The technical scheme is as follows: in order to realize the purpose of the invention, the invention adopts the following technical scheme:
a discriminative feature learning method for micro-expression recognition comprises the following steps:
(1) extracting initial frames and peak frames of video sequence samples in a micro-expression video library;
(2) normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
(3) calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
(4) for each micro expression peak frame image, selecting an image with an expression type different from the peak frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a composite image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
(5) constructing a double-current convolutional neural network model based on a class activation graph attention machine mechanism; the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation map generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally, a decision fusion layer is used for combining the outputs of the two flow classification layers; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch;
(6) respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
(7) extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and a pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification identification.
Further, the step (1) comprises the following substeps:
(1.1) taking a first frame of the micro-expression video sequence as an initial frame of the micro-expression video sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
in the above formulaRepresenting a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 Is a first frame image;
(1.3) calculating the sum of pixel values of each difference image:
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) position;
(1.4) obtaining the frame number of the peak frame image:
(1.5) the peak value frame image is the p frame image F in the micro-expression video sequence corresponding to the frame number p p 。
Further, the step (4) comprises the following substeps:
(4.1) setting the micro expression peak frame image in the step (3) as G and the corresponding category label as l G Selecting a common expression image O with a class label l different from the micro expression peak frame image O ;
(4.2) normalizing the size of the common expression image O into N multiplied by N pixels, wherein the size of the common expression image O is the same as that of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping region corresponding to the micro expression peak frame image G and replace the pixels with the pixels in the clipping region corresponding to the common expression O, wherein C x 、C y Respectively representing the abscissa and ordinate of the center point of the bounding box, C h 、C w Height and width of the bounding box are respectively represented:
where δ is a hyper-parameter obeying a uniform distribution between 0 and 1, C x And C y Obey an even distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is N × N and is composed of 0 and 1, and the mask is in the boundary frame of the cutting areaThe value in the region corresponding to R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji category labels from the binary mask T:
in the above formulaFor the resulting composite image, I is a mask of size N x N, all values of which are 1,representing the multiplication of the corresponding elements.
Further, the specific structure of the spatial flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine mechanism, which is constructed in the step (5), is as follows:
the feature extraction layer of the spatial stream branch extracts the features of the spatial stream branch input to obtain a multi-channel feature mapThe size of the characteristic diagram is H multiplied by H, and the number of channels is c;
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernelConversion to c eigenvalues:
in the above formula θ n The nth feature value representing the global average pooling layer output,representing the value of the nth channel feature map at the coordinate (i, j) position;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
xi in the above formula n The nth characteristic value representing the output of the full connection layer,representing the weight of the connection between the nth output neuron of the full connection layer and the jth characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes corresponding to the expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,the class activation graph corresponding to the nth class is represented and has the size of H multiplied by H, the class activation graph generation layer outputs the class activation graph corresponding to the micro expression peak frame image label during training, and the class activation graph generation layer outputs a spatial stream branch when the micro expression is identified by using the trained double-current convolutional neural network modelClass activation graph of the class with highest probability in Softmax classification layer.
Further, the specific structure of the time-flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine mechanism constructed in the step (5) is as follows:
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output: the size of the class activation graph is first aligned with the input of the time flow leg:
in the above formula, Upsample () is an upsampling function, the size of the class activation map is changed from H × H to N × N by the upsampling function, and then the value on the class activation map is mapped to be between 0 and 1:
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
in the above equation a is the input of the time stream branch,is the input after the attention mechanism enhancement, I is a mask of size N x N, all values of 1,representing the multiplication of corresponding elements;
the time flow branch input subjected to attention enhancement sequentially passes through a feature extraction layer, a global average pooling layer and a full connection layer of the time flow branch, and finally the probability that the time flow input belongs to each category is output through a time flow Softmax classification layer.
Further, the step (6) comprises the following substeps:
(6.1) initializing network weights using a random initialization method;
(6.2) inputting the synthetic image in the step (4) into a spatial flow branch of the dual-flow convolutional neural network, and constructing a loss function of the spatial flow branch according to the output of the Softmax classification layer of the spatial flow branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l O ]+log(∑ j exp(φ s [j]) 1-delta) in the above formula G 、l O Respectively corresponding to the micro expression peak frame image G and the general expression O for synthesis, delta being the hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch O A value of (d);
(6.3) inputting the optical flow diagram in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model;
and (6.5) obtaining a trained double-current convolutional neural network model through repeated iterative training.
Based on the same inventive concept, the invention discloses a discriminative feature learning system for micro-expression recognition, which comprises:
the preprocessing module is used for extracting initial frames and peak frames of video sequence samples in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the preprocessed peak frame image into two branches of a trained double-flow convolution neural network model, extracting and obtaining micro expression features with strong discriminative power, and using the micro expression features for micro expression classification recognition.
Based on the same inventive concept, the invention discloses a system for learning the discriminative features for micro expression recognition, which comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the method for learning the discriminative features for micro expression recognition when being loaded to the processor.
Has the advantages that: compared with the prior art, the invention has the following advantages:
(1) the invention constructs a double-current convolution neural network based on a class activation graph attention force mechanism, wherein a space flow branch and a time flow branch of the double-current convolution neural network are not mutually independent, the space flow branch generates a class activation graph, and the class activation graph is utilized to carry out attention enhancement on the input of the time flow branch. The class activation diagram generated by the spatial stream branch indicates that the micro expression characteristics of which regions in the spatial domain are strong in distinctiveness, and in order to enable the model to pay attention to the regions with strong distinctiveness characteristics in the time domain, the class activation diagram is used for carrying out attention enhancement on the input of the spatial stream branch. The class activation graph is priori knowledge generated by the spatial stream branch, the temporal stream branch has information supplement of the priori knowledge, and the learning capability of the model on the time-space discriminant characteristics is enhanced, so that the micro-expression identification accuracy rate is improved;
(2) in the model training stage, the constructed double-current convolutional neural network model is trained by utilizing a synthetic image and an optical flow graph, wherein the synthetic image comprises two different expression class labels, one is a micro expression peak frame class label, and the other is a common expression class label. The benefits of this are: firstly, for a micro expression peak frame image of a certain category, a composite image is obtained by respectively using a common expression image and a micro expression peak frame image which are different from the category, so that a training sample is further amplified, and overfitting of a model is prevented; secondly, the composite image comprises a common expression image part and a micro expression peak frame image part, and compared with the whole micro expression peak frame image which is difficult to identify, the composite image is more suitable for training a network, because the common expression is easier to identify than the micro expression, when the network is trained by the composite image, the network has the main task of identifying the micro expression image part in the composite image, namely only the micro expression distinguishing characteristic of a certain area needs to be extracted, so that the training difficulty of the model is reduced; thirdly, in the training process, the micro expression peak frame class label of the synthetic image is combined with a loss function of the network to guide the model to learn the regional characteristics of the micro expression peak frame image which are not replaced by the common expression image, and because the regions of the micro expression image which are replaced by the common expression image are random, namely, each part of the micro expression image is possibly replaced, the model can fully learn the micro expression distinguishing characteristics of each region of the face along with the increase of the training times, and does not only pay attention to certain specific regions (such as mouth, eyes and the like) where certain changes of the face are obviously intersected;
(3) the discriminative feature learning method for micro-expression recognition provided by the invention can realize automatic feature extraction by utilizing end-to-end training without manually designing a feature extractor, and is simple and efficient.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
fig. 2 is a structural diagram of a dual-flow convolutional neural network based on a class activation graph attention machine mechanism constructed in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
As shown in fig. 1, the method for learning distinctive features for micro expression recognition disclosed in the embodiment of the present invention specifically includes the following steps:
step (1): and extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library. In this embodiment, the method for extracting the start frame and the peak frame of each micro-expression video sequence sample by using the SMIC II database as a data source specifically includes the following sub-steps:
(1.1) transmitting the first frame F of the micro-expression video sequence 1 As the initial frame of the micro expression image sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
in the above formulaRepresenting a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 The first frame image is a starting frame image;
(1.3) calculating the sum of pixel values of each difference image:
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) location;
(1.4) obtaining the frame number of the peak frame image: :
(1.5) the peak value frame image is the p frame image F in the micro-expression video sequence corresponding to the index number p p 。
Step (2): normalizing the sizes of the initial frame images and the peak frame images obtained in the step (1) to be uniform into NxN pixels (N can be selected from 112-448), and performing Euler motion amplification on the normalized images by using different amplification factors (the amplification factor alpha can be selected from 2-20) to obtain a plurality of groups of images of the micro expression initial frame images and the peak frame images. In this example, the size of the image is set to 224 × 224 pixels, and euler motions with amplification factors of 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 are used to amplify the pixels, so that a slight change in the face is amplified, and the sample size is amplified 10 times as large as the original size after euler motion amplification processing using ten different amplification factors.
And (3): and calculating the optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph. The method specifically comprises the following substeps:
(3.1) calculating optical flow information between the peak frame and the initial frame thereof by using a Deepflow algorithm to obtain an optical flow graph U along the x-axis direction and the y-axis direction x 、U y Will U is x And U y Adding the squares of the optical flow values of all the positions and squaring to obtain another optical flow diagram U z :
Representing a light-flow diagram U z The corresponding optical flow value at the position of the upper coordinate (i, j),representing a light-flow diagram U x The corresponding optical flow value at the position of the upper coordinate (i, j),representing a light-flow diagram U y The corresponding optical flow value at the upper coordinate (i, j) position;
(3.2) adding U x 、U y And U z Linear change to [0-1]In the interval:
in the above formulaRepresenting the minimum optical flow value in the corresponding optical flow graph,representing the maximum optical flow value in the corresponding optical flow graph,for the optical flow graph after the linear change, is the optical flow value at the position of coordinate (i, j) on the corresponding optical flow map;
(3.3) mixingAndand stacking to form the final three-channel light flow graph U with the dimensions of 224 multiplied by 3.
And (4): and (4) respectively using the common expression images different from the categories of the micro expression peak frame images and the micro expression peak frame images to obtain a composite image. Specifically, an image with an expression category different from the peak frame is selected from a common expression image library, the image is cut, and the corresponding area of the peak frame image is replaced by the cut image block, so that a composite image containing two different expression category labels is obtained. In this embodiment, the common expression images are from a database Ferplus, which has 7 common expression categories, and only the common expression images of the same three categories of happy category, surprised category, and nausea are used. And for a certain category of the micro expression peak frame images, respectively using the common expression images and the micro expression peak frame images which are different from the category of the micro expression peak frame images to obtain a composite image, and further amplifying the sample size. For example, if the expression category of the micro expression peak frame is happy, the composite image is obtained by using the common expression images with the categories of surprise and nausea, and if the expression category of the micro expression peak frame is depressed, the composite image is obtained by using the common expressions with the categories of happy, surprise and nausea. The method for obtaining the composite image by the common expression image and the micro expression peak frame image comprises the following substeps:
(4.1) setting the peak frame image in the step (3) as G and the corresponding class label as l G Selecting a common expression O, wherein the class label of the common expression O is different from the micro expression peak frame image, and the class label of the common expression O is l O ;
(4.2) normalizing the scale of the common expression image O to 224 multiplied by 224 pixels, which is the same as the size of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping area corresponding to the micro expression peak frame image G and replace the pixels in the clipping area corresponding to the common expression O:
where N is the normalized scale in step (2), i.e. 224, δ is a hyper-parameter subject to a uniform distribution between 0 and 1, C x ,C y Obeying uniform distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is NXN and is composed of 0 and 1, the value of the mask in the area corresponding to the cutting area boundary frame R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji class labels from the binary mask T:
And (5): and constructing a double-current convolutional neural network model based on a class activation graph attention machine mechanism, wherein the network model can be divided into a time flow branch and a space flow branch as shown in FIG. 2. The spatial stream branch comprises a feature extraction layer, a global average pooling layer, a full connection layer, a Softmax classification layer and a class activation graph generation layer in sequence, the temporal stream branch comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a Softmax classification layer in sequence, and finally the decision fusion layer is used for merging the output of the two streams of the Softmax classification layer, wherein the specific functions of the layers are as follows:
the characteristic extraction layer of the spatial stream branch extracts the input characteristics of the spatial stream branch to obtain a multi-channel characteristic diagramThe feature size is H × H, the number of channels is c, dThe feature extraction layer may use the feature extraction portion of any convolutional neural network in deep learning (e.g., ResNet, VGGNet, AlexNet, etc.). The multi-channel characteristic diagram in the example isThat is, the size of the feature map is 7 × 7, the number of channels is 512, and the feature extraction layer adopts the feature extraction part of ResNet-18 (i.e. the part from the first convolutional layer of ResNet-18 to the end of the last convolutional layer);
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernelConversion to c eigenvalues:
in the above formula θ n The nth feature value representing the global average pooling layer output,representing the value at the location of coordinate (i, j) on the nth channel profile;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
xi in the above formula n The nth characteristic value representing the full link layer output,representing the weight of the n-th output neuron of the full connection layer connected with the j-th characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes corresponding to the expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes; the micro-expression video library CASME II adopted in the embodiment has five micro-expression categories: happy, surprised, vomit, oppressed and others, v is 5;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,the size of the class activation map corresponding to the nth class is H × H, and the size of the class activation map in this embodiment is 7 × 7. Because the label is provided during training, the class activation graph generation layer outputs the class activation graph of the class corresponding to the micro-expression peak value frame image label, but when the trained double-current convolutional neural network is used for micro-expression recognition, the class activation graph generation layer outputs the class activation graph of the class with the highest probability in the spatial flow branch Softmax classification layer;
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output. The size of the class activation graph is first aligned with the input of the time flow leg:
in the above formula, Upsample () is an upsampling function, the size of the class activation map is changed from H × H to N × N by the upsampling function, and then the value on the class activation map is mapped to be between 0 and 1:
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
in the above equation a is the input of the time stream branch,is the input after the attention mechanism enhancement, I is the mask of size N × N, whose values are all 1;
then the input with enhanced attention passes through a feature extraction layer, a global average pooling layer and a full connection layer of a time flow branch in sequence, the structures of the feature extraction layer, the global average pooling layer and the full connection layer are the same as those of the space flow, and finally the probability that the flow input belongs to each category is output through a Softmax classification layer;
the decision fusion layer adds the output of the time flow branch Softmax classification layer and the output of the space flow branch Softmax classification layer to obtain a category score of the whole double-flow convolutional neural network for input prediction, and takes the category corresponding to the maximum score as a final classification result;
and (6): training the double-current convolutional neural network constructed in the step (5) by using the data obtained in the steps (3) and (4), wherein the training comprises the following sub-steps:
(6.1) initializing network weights using a random initialization method;
(6.2) new image obtained in step (4)Inputting into spatial stream branches of a double-stream convolutional neural network according to spaceThe output of the stream branch Softmax classification layer constructs a loss function of the spatial stream branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l o ]+log(∑ j exp(φ s [j]) 1-delta) in the above formula G 、l O Respectively a category label corresponding to the micro expression peak value frame image G and a category label corresponding to the common expression O in the step (4), delta is a hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing class label l in spatial stream branch Softmax classification layer output O A value of (d);
(6.3) inputting the optical flow diagram U obtained in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model by using a back propagation algorithm;
and (6.5) carrying out iterative training for multiple times (such as 50 times) to obtain a trained double-current convolutional neural network model.
And (7): extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment (amplification is carried out by using a certain specific amplification factor during identification), further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the preprocessed peak frame image into two branches of a trained double-current convolution neural network model, and extracting micro-expression characteristics with strong discriminative power for micro-expression classification identification.
Based on the same inventive concept, the embodiment of the invention discloses a discriminative characteristic learning system for micro expression recognition, which comprises:
the preprocessing module is used for extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the hyper-parameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro-expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification recognition.
Based on the same inventive concept, the differential feature learning system for micro expression recognition disclosed by the embodiment of the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the differential feature learning method for micro expression recognition when being loaded to the processor.
The technical solutions described above only represent the preferred technical solutions of the present invention, and some possible modifications made to some parts by those skilled in the art all represent the principles of the present invention, and fall within the protection scope of the present invention.
Claims (8)
1. A method for discriminative feature learning for microexpression recognition, the method comprising the steps of:
(1) extracting initial frames and peak frames of video sequence samples in a micro-expression video library;
(2) normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
(3) calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
(4) for each micro expression peak value frame image, selecting an image of which the expression type is different from that of the peak value frame from a common expression image library, cutting the image, and replacing a corresponding area of the peak value frame image with an image block obtained by cutting to obtain a composite image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the superparameters which are uniformly distributed from 0 to 1;
(5) constructing a double-current convolution neural network model based on a class activation graph attention mechanism; the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation map generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally, a decision fusion layer is used for combining the outputs of the two flow classification layers; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch;
(6) respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
(7) extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and a pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification identification.
2. The method for learning distinctive features for micro expression recognition according to claim 1, wherein the step (1) comprises the following sub-steps:
(1.1) using a first frame of the micro-expression video sequence as an initial frame of the micro-expression video sequence;
(1.2) setting the total frame number of the micro-expression video sequence as k, and carrying out subtraction operation on each frame image and the first frame image from the second frame to obtain a difference image:
in the above formulaRepresenting a subtraction operation of corresponding pixels, m representing a frame number, F m Representing the m-th frame image, F 1 Is a first frame image;
(1.3) calculating the sum of pixel values of each difference image:
in the above formula D m (i, j) represents a pixel value of the difference image at the coordinate (i, j) position;
(1.4) obtaining the frame number of the peak frame image:
(1.5) the peak frame image is the p frame image F in the micro-expression video sequence corresponding to the frame number p p 。
3. The method for learning distinctive features for micro expression recognition according to claim 1, wherein the step (4) comprises the following sub-steps:
(4.1) setting the micro-expression peak frame image in the step (3) as G, and setting the corresponding category label as l G Selecting a common expression image O with a class label l different from the micro expression peak frame image o ;
(4.2) normalizing the size of the common expression image O into N multiplied by N pixels, wherein the size of the common expression image O is the same as that of the micro expression peak frame image;
(4.3) generating coordinates R ═ C of the bounding box of the clipping region x ,C y ,C h ,C w ) The purpose is to remove the pixels in the clipping region corresponding to the micro expression peak frame image G and replace the pixels with the pixels in the clipping region corresponding to the common expression O, wherein C x 、C y Respectively representing the abscissa and ordinate of the center point of the bounding box, C h 、C w Height and width of the bounding box are respectively represented:
where δ is a hyper-parameter obeying a uniform distribution between 0 and 1, C x And C y Obeying uniform distribution between 0 and N;
(4.4) generating a binary mask T e {0,1} by clipping the region bounding box R N×N The size of the mask T is NXN and is composed of 0 and 1, the value of the mask in the area corresponding to the cutting area boundary frame R is 0, and the rest value is 1;
(4.5) generating a composite image containing two different emoji class labels from the binary mask T:
4. The method for learning the discriminative features for micro-expression recognition according to claim 1, wherein the spatial flow branch of the dual-flow convolutional neural network model based on the class-activation-graph attention machine system constructed in the step (5) has the following specific structure:
the feature extraction layer of the spatial flow branch extracts the features of the spatial flow branch input to obtain a multi-channel feature mapThe size of the characteristic diagram is H multiplied by H, and the number of channels is c;
global average pooling layer of spatial stream branches, feature map output from feature extraction layer using H × H pooling kernelConversion to c eigenvalues:
in the above formula θ n The nth feature value representing the global average pooling layer output,representing the value of the nth channel feature map at the coordinate (i, j) position;
the full-connection layer of the spatial flow branch fully connects the output of the global average pooling layer to v output neurons, and outputs a v-dimensional feature vector:
xi in the above formula n The nth characteristic value representing the full link layer output,representing the weight of the n-th output neuron of the full connection layer connected with the j-th characteristic value output by the global average pooling layer;
the Softmax classification layer of the spatial flow branch fully connects the feature vectors output by the full connection layer to v output nodes of corresponding expression classes, a v-dimensional feature vector is output, the number of each dimension in the vector represents the probability of belonging to the class, wherein v is the number of the classes;
the class activation graph generation layer of the spatial stream branch generates a class activation graph corresponding to a certain class:
in the above formula M j A j-th channel feature map representing the output of the feature extraction layer,a weight representing the connection of the nth output neuron of the full connection layer with the jth eigenvalue output by the global average pooling layer,and the class activation graph is H multiplied by H, the class activation graph generation layer outputs the class activation graph of the class corresponding to the micro expression peak frame image label during training, and the class activation graph generation layer outputs the class activation graph of the class with the highest probability in the spatial flow branch Softmax classification layer during micro expression recognition by using the trained double-current convolutional neural network model.
5. The method for learning the discriminative features for micro expression recognition according to claim 4, wherein the specific structure of the time-flow branch of the dual-flow convolutional neural network model based on the class activation graph attention machine constructed in the step (5) is as follows:
the attention enhancement layer of the temporal streaming branch performs attention enhancement on the input of the temporal streaming branch by using the class activation map of the spatial streaming branch output: the size of the class activation graph is first aligned with the input of the time flow leg:
in the above formula, Upsample () is an upsampling function, the size of the class activation graph is changed from H × H to N × N by the upsampling function, and then the value on the class activation graph is mapped to be between 0 and 1:
sig () is a Sigmoid function, which maps values on the class activation graph to be between 0 and 1, and finally performs attention mechanism enhancement on the input of the time stream branch by using the class activation graph:
in the above equation a is the input of the time stream branch,is an input after the attention mechanism enhancement, I is a mask of size NxN, whose values are all 1, indicating multiplication of corresponding elements;
the time flow branch input subjected to attention enhancement sequentially passes through a feature extraction layer, a global average pooling layer and a full connection layer of the time flow branch, and finally the probability that the time flow input belongs to each category is output through a time flow Softmax classification layer.
6. The method for learning distinctive features for micro expression recognition according to claim 1, wherein said step (6) comprises the following sub-steps:
(6.1) initializing network weights using a random initialization method;
(6.2) inputting the synthetic image in the step (4) into a spatial flow branch of the dual-flow convolutional neural network, and constructing a loss function of the spatial flow branch according to the output of the Softmax classification layer of the spatial flow branch:
L s =(-φ s [l G ]+log((∑ j exp(φ s [j])))×δ+(-φ s [l O ]+log(∑ j exp(φ s [j])))×(1-δ)
in the above formula G 、l O Respectively corresponding to the micro expression peak frame image G and the general expression O for synthesis, delta being the hyper-parameter phi in the step (4) s [j]Represents the value, phi, of the spatial stream tributary Softmax classification layer output corresponding to the class label j s [l G ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch G Value of (phi) s [l O ]Representing the class label l in the output of the Softmax classification layer of the spatial stream branch O A value of (d);
(6.3) inputting the optical flow diagram in the step (3) into a time flow branch of the dual-flow convolutional neural network, and constructing a loss function of the time flow branch according to the output of the Softmax classification layer of the time flow branch:
middle phi of the above formula t [l G ]Representing the corresponding class label l in the output of the Softmax classification layer of the time flow branch G Value of (phi) t [j]Representing the value of the time flow branch Softmax classification layer output corresponding to the class label j;
(6.4) adding the spatial flow loss function and the time flow loss function to obtain the total loss function of the dual-flow convolution neural network:
L sum =L t +L s
from the total loss function L of the double-flow convolutional neural network sum Performing gradient calculation and weight updating on the double-current convolution neural network model;
and (6.5) obtaining a trained double-current convolutional neural network model through repeated iterative training.
7. A system for discriminative feature learning for micro-expression recognition, comprising:
the preprocessing module is used for extracting the initial frame and the peak frame of the video sequence sample in the micro-expression video library; normalizing the sizes of the images of the initial frame and the peak frame to be uniform into NxN pixels, and amplifying the normalized images by using different amplification factors to perform Euler motion to obtain a plurality of groups of images of the micro expression initial frame and the peak frame;
the optical flow information calculation module is used for calculating optical flow information between each group of micro expression peak frames and the initial frame to obtain an optical flow graph;
the image synthesis module is used for selecting an image with an expression type different from that of the peak frame from a common expression image library for each micro expression peak frame image, cutting the image, and replacing a corresponding area of the peak frame image with the cut image block to obtain a synthesized image containing two different expression type labels; the positions of the image blocks to be cut are randomly selected, and the sizes of the image blocks to be cut are controlled by the hyper-parameters which are uniformly distributed from 0 to 1;
the network model building and training module is used for building a double-current convolutional neural network model based on a class activation graph attention machine mechanism, the model is divided into a time flow branch and a space flow branch, the space flow branch sequentially comprises a feature extraction layer, a global average pooling layer, a full connection layer, a classification layer and a class activation graph generation layer, the time flow branch sequentially comprises an attention enhancement layer, a feature extraction layer, a global average pooling layer, a full connection layer and a classification layer, and finally the output of the two stream classification layers is combined by a decision fusion layer; the class activation graph generation layer outputs a class activation graph according to the feature graph output by the feature extraction layer of the spatial stream branch and the weight between the full connection layer and the global average pooling layer; the attention enhancement layer of the time stream branch utilizes the class activation map output by the spatial stream branch to carry out attention enhancement on the input of the time stream branch; respectively inputting the light flow graph and the synthetic image into two branches of the constructed double-current convolution neural network model, and training the model;
and the micro-expression recognition module is used for extracting an initial frame and a peak frame from an input video sequence, carrying out size normalization and Euler motion amplification pretreatment on the initial frame and the peak frame, further calculating optical flow information between the peak frame and the initial frame to obtain an optical flow diagram, respectively inputting the optical flow diagram and the pretreated peak frame image into two branches of a trained double-current convolution neural network model, extracting and obtaining micro-expression characteristics with strong discriminative power, and using the micro-expression characteristics for micro-expression classification recognition.
8. A system for discriminative feature learning for micro expression recognition comprising at least one computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when loaded into the processor implementing a method for discriminative feature learning for micro expression recognition according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110060936.3A CN112800891B (en) | 2021-01-18 | 2021-01-18 | Discriminative feature learning method and system for micro-expression recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110060936.3A CN112800891B (en) | 2021-01-18 | 2021-01-18 | Discriminative feature learning method and system for micro-expression recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112800891A CN112800891A (en) | 2021-05-14 |
CN112800891B true CN112800891B (en) | 2022-08-26 |
Family
ID=75809985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110060936.3A Active CN112800891B (en) | 2021-01-18 | 2021-01-18 | Discriminative feature learning method and system for micro-expression recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800891B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723287A (en) * | 2021-08-30 | 2021-11-30 | 平安科技(深圳)有限公司 | Micro-expression identification method, device and medium based on bidirectional cyclic neural network |
CN114005157B (en) * | 2021-10-15 | 2024-05-10 | 武汉烽火信息集成技术有限公司 | Micro-expression recognition method for pixel displacement vector based on convolutional neural network |
CN114550272B (en) * | 2022-03-14 | 2024-04-09 | 东南大学 | Micro-expression recognition method and device based on video time domain dynamic attention model |
CN116311472B (en) * | 2023-04-07 | 2023-10-31 | 湖南工商大学 | Micro-expression recognition method and device based on multi-level graph convolution network |
CN117456586A (en) * | 2023-11-17 | 2024-01-26 | 江南大学 | Micro expression recognition method, system, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287805A (en) * | 2019-05-31 | 2019-09-27 | 东南大学 | Micro- expression recognition method and system based on three stream convolutional neural networks |
CN110516571A (en) * | 2019-08-16 | 2019-11-29 | 东南大学 | Inter-library micro- expression recognition method and device based on light stream attention neural network |
CN112115796A (en) * | 2020-08-21 | 2020-12-22 | 西北大学 | Attention mechanism-based three-dimensional convolution micro-expression recognition algorithm |
-
2021
- 2021-01-18 CN CN202110060936.3A patent/CN112800891B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287805A (en) * | 2019-05-31 | 2019-09-27 | 东南大学 | Micro- expression recognition method and system based on three stream convolutional neural networks |
CN110516571A (en) * | 2019-08-16 | 2019-11-29 | 东南大学 | Inter-library micro- expression recognition method and device based on light stream attention neural network |
CN112115796A (en) * | 2020-08-21 | 2020-12-22 | 西北大学 | Attention mechanism-based three-dimensional convolution micro-expression recognition algorithm |
Non-Patent Citations (1)
Title |
---|
一种基于改进的卷积神经网络的人脸表情识别方法;邹建成等;《北方工业大学学报》;20200415(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112800891A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800891B (en) | Discriminative feature learning method and system for micro-expression recognition | |
Giannopoulos et al. | Deep learning approaches for facial emotion recognition: A case study on FER-2013 | |
Oyedotun et al. | Deep learning in vision-based static hand gesture recognition | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN110276248B (en) | Facial expression recognition method based on sample weight distribution and deep learning | |
Gaddam et al. | Human facial emotion detection using deep learning | |
CN111582136B (en) | Expression recognition method and device, electronic equipment and storage medium | |
Ali et al. | Facial emotion detection using neural network | |
DANDIL et al. | Real-time Facial Emotion Classification Using Deep Learning Article Sidebar | |
CN113392766A (en) | Attention mechanism-based facial expression recognition method | |
Xu et al. | Face expression recognition based on convolutional neural network | |
Santhoshkumar et al. | Deep learning approach: emotion recognition from human body movements | |
Zhao et al. | Cbph-net: A small object detector for behavior recognition in classroom scenarios | |
CN114170659A (en) | Facial emotion recognition method based on attention mechanism | |
Gantayat et al. | Study of algorithms and methods on emotion detection from facial expressions: a review from past research | |
Gupta et al. | Performance improvement in handwritten devanagari character classification | |
Kumar et al. | Bird species classification from images using deep learning | |
Kale et al. | Age, gender and ethnicity classification from face images with CNN-based features | |
Handa et al. | Incremental approach for multi-modal face expression recognition system using deep neural networks | |
Kanungo | Analysis of Image Classification Deep Learning Algorithm | |
Srininvas et al. | A framework to recognize the sign language system for deaf and dumb using mining techniques | |
CN113469116A (en) | Face expression recognition method combining LBP (local binary pattern) features and lightweight neural network | |
Pradeep et al. | Recognition of Indian Classical Dance Hand Gestures | |
Nayak et al. | Facial Expression Recognition based on Feature Enhancement and Improved Alexnet | |
Thiruthuvanathan et al. | EMONET: A Cross Database Progressive Deep Network for Facial Expression. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |