CN115049957A - Micro-expression identification method and device based on contrast amplification network - Google Patents
Micro-expression identification method and device based on contrast amplification network Download PDFInfo
- Publication number
- CN115049957A CN115049957A CN202210605395.2A CN202210605395A CN115049957A CN 115049957 A CN115049957 A CN 115049957A CN 202210605395 A CN202210605395 A CN 202210605395A CN 115049957 A CN115049957 A CN 115049957A
- Authority
- CN
- China
- Prior art keywords
- micro
- sample
- expression
- contrast
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000003321 amplification Effects 0.000 title claims abstract description 26
- 238000003199 nucleic acid amplification method Methods 0.000 title claims abstract description 26
- 230000014509 gene expression Effects 0.000 claims abstract description 38
- 238000005070 sampling Methods 0.000 claims abstract description 23
- 238000009826 distribution Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 26
- 239000013598 vector Substances 0.000 claims description 22
- 230000010354 integration Effects 0.000 claims description 6
- 238000009827 uniform distribution Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 2
- 230000003416 augmentation Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- OUXCBPLFCPMLQZ-WOPPDYDQSA-N 4-amino-1-[(2r,3s,4s,5r)-4-hydroxy-5-(hydroxymethyl)-3-methyloxolan-2-yl]-5-iodopyrimidin-2-one Chemical compound C[C@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)N=C(N)C(I)=C1 OUXCBPLFCPMLQZ-WOPPDYDQSA-N 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 210000001097 facial muscle Anatomy 0.000 description 2
- 239000002131 composite material Substances 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a micro-expression identification method and a device based on a contrast and amplification network, wherein the method comprises the following steps: (1) the method comprises the steps that a micro expression database (2) is obtained, micro expression videos are converted into a micro expression frame sequence, and sampling is conducted after preprocessing to serve as a source sample; (3) for each source sample, calculating the distance between the residual frame and the source sample in an embedding space, and mapping the distance into probability to obtain distance probability distribution; (4) sampling a plurality of frames from the remaining video frames as negative samples according to the probability distribution; (5) constructing a contrast amplifying network; (6) performing data enhancement on each source sample to form an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplifying network for training, wherein a loss function is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss; (7) and preprocessing the micro expression video to be identified, inputting the preprocessed micro expression video into a trained contrast amplifying network, and identifying the micro expression category. The invention has higher accuracy and is more convenient.
Description
Technical Field
The invention relates to an image processing technology, in particular to a micro-expression identification method and device based on a contrast amplification network.
Background
Micro-expressions are spontaneous, transient facial expressions that hide the true emotional state of a human. On the basis of the expression, the recognition of the micro expression has important significance on emotion calculation and psychological treatment. However, it is induced by subtle facial movements and occurs in much shorter times than macroscopic facial expressions, and automatic micro-expression recognition is therefore a challenging task.
Micro-expressions are a dynamic process of facial muscle movement, and the capture of motion is essential to accurately identify micro-expressions. However, since the intensity of the micro-expression is low, the extraction of the motion feature is very difficult. To address this problem, many methods employ motion amplification techniques to amplify the micro-expression, making facial motion more pronounced. Some conventional amplification techniques and deep learning filters, such as euler motion amplification (EMM), global lagrange motion amplification (GLMM), learning-based motion amplification (LMM), have demonstrated that the performance of micro-expression recognition can be further enhanced by amplifying the micro-expression intensities. Despite this advantage, the existing amplification methods are not yet fully adaptable. When different subjects show different expression states, the change trends of facial muscles are inconsistent, and a uniform amplification level cannot be adapted to all micro expression samples. For example, the same magnification factor may not be sufficient for one sample of micro-expressions, the magnification may not be sufficient to show the emotion classification of the expression, and may be excessive on another sample of micro-expressions, thereby introducing noise. Therefore, the accuracy of the prior art micro-expression recognition based on contrast amplification still needs to be improved.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a micro expression identification method and device based on a contrast and amplification network with higher accuracy.
The technical scheme is as follows: the micro-expression identification method based on the contrast amplifying network comprises the following steps:
(1) acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels;
(2) converting each micro-expression video in the database into a micro-expression frame sequence, preprocessing the micro-expression frame sequence, and sampling a fixed number of frames from each micro-expression frame sequence as source samples;
(3) for each source sample, calculating the distance between the residual frame in the frame sequence where the source sample is located and the embedding space of the source sample, and mapping the distance into probability to obtain the distance probability distribution between each source sample and the residual frame in the frame sequence;
(4) for each source sample, sampling a plurality of frames from the rest video frames according to the calculated probability distribution as corresponding negative samples;
(5) constructing a contrast amplifying network, which comprises Resnet-18 without a full connection layer, a multi-layer perceptron connected behind the Resnet-18, and a softmax layer connected behind the multi-layer perceptron;
(6) performing data enhancement on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplification network for training, wherein a loss function during training is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and optimizing the training through gradient descent;
(7) and (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
Further, the preprocessing in the step (2) includes face registration and face region cutting.
Further, the sampling a fixed number of frames from each micro-expression frame sequence as a source sample in step (2) specifically includes:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
Further, the step (3) specifically comprises:
(3-1) for each sequence of micro-expression frames, using a network model that has been pre-trained on a macro-expression database: resnet-18 extracts feature vectors u of the source samples and the residual frames, and the name of the database is FER +:
u i =g(x i ),i=1,…,N
where g (-) is a neural network, N represents the number of source samples, x i 、Represents the ith source sample and x i J-th residual frame of u i 、Denotes x i 、Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
in the formula,representing the distance of the ith source sample from the jth residual frame in an embedding space;
(3-3) mapping, for each sequence of micro-expression frames, different distances into probabilities using a softmax function according to the following formula, thereby obtaining a distance probability distribution of each source sample from the remaining frames:
in the formula,to representThe probability of the mapping, P, represents the distance probability distribution of the source sample from the remaining frames.
Further, the step (4) specifically comprises: and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample. .
Further, the loss function in step (6) is specifically:
wherein,for total loss, λ 1 ,λ 2 ,λ 3 Are the weights of the three loss functions and,in order to achieve a loss of contrast within the video,in order to have a loss of contrast between classes,is the cross entropy loss.
Further, the specific function of contrast loss in the video is as follows:
in the formula, I is a micro-expression database, are respectively asInputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,the method comprises the steps of obtaining a positive sample, an anchor sample and a jth negative sample of an ith source sample of a kth micro-expression video in the I through data enhancement, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
Further, the specific function of the inter-class contrast loss is as follows:
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, and the sum v in the same batch of samples k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
Further, the cross entropy loss specific function is as follows:
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
The micro expression recognition device based on the contrast and amplification network comprises a processor and a computer program which is stored on a memory and can run on the processor, and the processor realizes the method when executing the program.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: the invention adopts the idea of comparison learning, and can make the network optimization process more concentrate on distinguishing the difference of the positive and negative samples by constructing the positive and negative samples, enlarge the distance of the positive and negative samples in the embedding space, make the intensity comparison and the category comparison more obvious, and directly generate more direct distinguishing characteristics, thereby making the network more smoothly sense the movement change, improving the identification accuracy, reducing the artificially set hyper-parameters, and being more convenient.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a micro expression recognition method based on a contrast amplifying network according to the present invention;
FIG. 2 is a schematic diagram of a contrast amplifying network designed by the present invention;
fig. 3 is a schematic diagram of a negative example generation process.
Detailed Description
The embodiment provides a micro-expression recognition method based on a contrast and amplification network, as shown in fig. 1, including:
(1) and acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels. To improve recognition accuracy, multiple common databases may be employed.
(2) Each micro-expression video in the database is converted into a micro-expression frame sequence, the micro-expression frame sequence is preprocessed, and a fixed number of frames are sampled from each micro-expression frame sequence to serve as source samples.
Wherein the preprocessing comprises face registration and face region shearing. The sampling of a fixed number of frames from each sequence of microexpression frames as a source sample specifically comprises:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
(3) And for each source sample, calculating the distance between the residual frame in the frame sequence and the source sample in the embedding space, and mapping the distance into probability to obtain the distance probability distribution of each source sample and the residual frame in the frame sequence.
The method specifically comprises the following steps:
(3-1) extracting feature vectors u of the source samples and the residual frames by using a pre-trained neural network for each micro expression frame sequence:
u i =g(x i ),i=1,...,N
where g (-) is a neural network, which may be, for example, a network Resnet-18 that has been pre-trained on a macro-expression database, the pre-training using dataset FER +; n denotes the number of source samples, x i 、Represents the ith source sample and x i J-th residual frame of u i 、Denotes x i 、Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
in the formula,representing the distance of the ith source sample from the jth residual frame in the embedding space;
(3-3) mapping the different distances into probabilities using a softmax function for each sequence of micro-expression frames according to the following equation, thereby obtaining a distance probability distribution of each source sample from the rest frames:
in the formula,representThe probability of the mapping, P, represents the distance probability distribution of the source sample from the remaining frames.
Thus, a probability distribution of the source sample and all the remaining video frames with the distance as a discrete variable is obtained, namely: the further away from the source sample at the feature level, the greater the probability of being selected as a negative sample. In the same micro expression sequence, the difference between different frames is only reflected on the intensity information actually, the greater the distance between the frames is, the more obvious the intensity difference is, and by calculating the distance between the source sample and the rest video frames, the longer the distance can be ensured when sampling negative samples, or the video frames with greater intensity difference have greater probability to be sampled.
(4) For each source sample, a number of frames are sampled from the remaining video frames as corresponding negative samples according to the calculated probability distribution, as shown in fig. 3.
The step (4) specifically comprises the following steps: and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample. .
(5) A contrast amplifying network was constructed, as shown in fig. 2, comprising Resnet-18 without a fully connected layer, a multi-layered perceptron connected behind it, and a softmax layer connected behind the multi-layered perceptron.
(6) Data enhancement is carried out on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, the anchor sample, the positive sample and the corresponding negative sample are used as training samples to be input into a contrast amplification network for training, a loss function in the training process is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and training is optimized through gradient descent.
Wherein the loss function is specifically:
wherein,for total loss, λ 1 ,λ 2 ,λ 3 Are the weights of the three loss functions and,in order to provide for a loss of contrast within the video,in order to have a loss of contrast between classes,is the cross entropy loss.
Each source sample obtained by sampling the micro-expression frame sequence is marked as x, and a pair of samples are obtained by data enhancementIn the actual calculation, callAnchor and positive samples of each other, i.e. whenWhen the sample is used as an anchor sample,as a positive sample and vice versa, the anchor sample and the positive sample should have the same intensity information, and may differ in color, style, and so on, so gaussian noise and random gray scale transformation are selected when data enhancement is performed. Comparing Resnet-18 without full connection layer in the amplifying network and marking as Enc (-) and multilayer perceptron as Proj (-) and marking as the sample after data enhancement and all negative samples corresponding to the source sample asSending the video data to a contrast amplification network, wherein a loss function calculated on a characteristic level is contrast loss in a video, and the specific function is as follows:
in the formula, I is a micro-expression database, are respectively asInputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,the ith source sample of the kth micro-expression video in the I is subjected to data enhancementAnd obtaining a positive sample, an anchor sample and a jth negative sample, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
The specific function of the inter-class contrast loss is as follows:
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, the sum of the feature vector and the sum of the sum v and the sum of the sum and the sum of the sum and the sum of the sum and the sum of the sum and the sum of the sum k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
Only the inter-class loss does not enable the network to know the specific class of the sample, so a softmax function is needed to guide the classification, and the specific function is as follows:
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
(7) And (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
The embodiment also provides a micro-expression recognition device based on a contrast and amplification network, which comprises a processor and a computer program stored on a memory and capable of running on the processor, wherein the processor executes the computer program to realize the method.
In order to verify the effectiveness of the present invention, micro-expression recognition is performed among the CAME2 micro-expression database, SAMM micro-expression database and HS sub-database of SMIC database, and the verification results and comparison with other latest methods are shown in Table 1:
TABLE 1
Method | Year of year | CASME2 | SAMM | SMIC-HS |
LBP-SIP | 2014 | 45.36 | 36.76 | 42.12 |
MagGA | 2018 | 63.30 | N/A | N/A |
DSSN | 2019 | 70.78 | 57.35 | 63.41 |
TSCNN-I | 2020 | 74.05 | 63.53 | 72.74 |
LBPAccP u2 | 2021 | 69.03 | N/A | 76.59 |
AU-GCN | 2021 | 74.27 | 74.26 | N/A |
Method for producing a composite material | 2022 | 79.03 | 77.21 | 77.91 |
In Table 1, N/A indicates that there is no relevant record.
The expressions of the CASME2 database are processed as follows: the category with the number of samples less than 10 is omitted, so that the problem of serious imbalance of the samples is avoided. 5 types of recognition tasks are completed, and the expressions of the SAMM database are processed as follows respectively by happy, regression, distorst, fear and surprie: the classes with the number of samples less than 10 are omitted, the problem of serious imbalance of the samples is avoided, and the identification tasks of 5 classes, namely, happy, angry, distust, fear and surpride, are completed. The SMIC database is classified into positive, negative and surrise, and the recognition task of three classifications is completed.
Experimental results show that the micro expression recognition method provided by the invention achieves higher micro expression recognition rate. Compared with the traditional micro expression amplification mode, the method can avoid the complexity of manual setting of part of hyper-parameters, and has stronger adaptability to individuals and more convenience.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
1. A micro expression recognition method based on a contrast amplifying network is characterized by comprising the following steps:
(1) acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels;
(2) converting each micro-expression video in the database into a micro-expression frame sequence, preprocessing the micro-expression frame sequence, and sampling a fixed number of frames from each micro-expression frame sequence as source samples;
(3) for each source sample, calculating the distance between the residual frame in the frame sequence where the source sample is located and the embedding space of the source sample, and mapping the distance into probability to obtain the distance probability distribution between each source sample and the residual frame in the frame sequence;
(4) for each source sample, sampling a plurality of frames from the remaining video frames as corresponding negative samples according to the calculated probability distribution;
(5) constructing a contrast amplifying network, which comprises Resnet-18 without a full connection layer, a multi-layer perceptron connected behind the Resnet-18, and a softmax layer connected behind the multi-layer perceptron;
(6) performing data enhancement on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplification network for training, wherein a loss function during training is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and optimizing the training through gradient descent;
(7) and (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
2. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the preprocessing in the step (2) comprises face registration and face region cutting.
3. The method for identifying micro-expressions based on contrastive augmentation according to claim 1, wherein: the sampling a fixed number of frames from each micro expression frame sequence as a source sample in the step (2) specifically includes:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
4. The micro-expression recognition method based on the contrast and amplification network as claimed in claim 1, wherein: the step (3) specifically comprises the following steps:
(3-1) extracting feature vectors u of the source sample and the residual frame by using a pre-trained neural network for each micro-expression frame sequence:
u i =g(x i ),i=1,…,N
where g (-) is a neural network, N represents the number of source samples, x i 、Represents the ith source sample and x i J-th residual frame of u i 、Denotes x i 、Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
in the formula,representing the distance of the ith source sample from the jth residual frame in an embedding space;
(3-3) mapping, for each sequence of micro-expression frames, different distances into probabilities using a softmax function according to the following formula, thereby obtaining a distance probability distribution of each source sample from the remaining frames:
5. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the step (4) specifically comprises the following steps:
and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample.
6. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the loss function in the step (6) is specifically:
7. The micro expression recognition method based on the contrast amplifying network as claimed in claim 6, wherein: the specific function of contrast loss in the video is as follows:
in the formula, I is a micro-expression database, are respectively asInputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,the method comprises the steps of obtaining a positive sample, an anchor sample and a jth negative sample of an ith source sample of a kth micro-expression video in the I through data enhancement, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
8. The micro-expression recognition method based on the contrast and amplification network as claimed in claim 6, wherein: the inter-class contrast loss is specifically a function as follows:
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, and the sum v in the same batch of samples k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
9. The micro expression recognition method based on the contrast amplifying network as claimed in claim 6, wherein: the cross entropy loss specific function is as follows:
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
10. The utility model provides a little expression recognition device based on contrast amplifier network which characterized in that: comprising a processor and a computer program stored on a memory and executable on the processor, wherein: the processor, when executing the program, implements the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210605395.2A CN115049957B (en) | 2022-05-31 | 2022-05-31 | Micro-expression recognition method and device based on contrast amplification network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210605395.2A CN115049957B (en) | 2022-05-31 | 2022-05-31 | Micro-expression recognition method and device based on contrast amplification network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115049957A true CN115049957A (en) | 2022-09-13 |
CN115049957B CN115049957B (en) | 2024-06-07 |
Family
ID=83160189
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210605395.2A Active CN115049957B (en) | 2022-05-31 | 2022-05-31 | Micro-expression recognition method and device based on contrast amplification network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115049957B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180005272A1 (en) * | 2016-06-30 | 2018-01-04 | Paypal, Inc. | Image data detection for micro-expression analysis and targeted data services |
CN112200065A (en) * | 2020-10-09 | 2021-01-08 | 福州大学 | Micro-expression classification method based on action amplification and self-adaptive attention area selection |
CN113537008A (en) * | 2021-07-02 | 2021-10-22 | 江南大学 | Micro-expression identification method based on adaptive motion amplification and convolutional neural network |
CN114241573A (en) * | 2021-12-23 | 2022-03-25 | 华南师范大学 | Facial micro-expression recognition method and device, electronic equipment and storage medium |
-
2022
- 2022-05-31 CN CN202210605395.2A patent/CN115049957B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180005272A1 (en) * | 2016-06-30 | 2018-01-04 | Paypal, Inc. | Image data detection for micro-expression analysis and targeted data services |
CN112200065A (en) * | 2020-10-09 | 2021-01-08 | 福州大学 | Micro-expression classification method based on action amplification and self-adaptive attention area selection |
CN113537008A (en) * | 2021-07-02 | 2021-10-22 | 江南大学 | Micro-expression identification method based on adaptive motion amplification and convolutional neural network |
CN114241573A (en) * | 2021-12-23 | 2022-03-25 | 华南师范大学 | Facial micro-expression recognition method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115049957B (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN109101938B (en) | Multi-label age estimation method based on convolutional neural network | |
CN108171318B (en) | Convolution neural network integration method based on simulated annealing-Gaussian function | |
CN114038037B (en) | Expression label correction and identification method based on separable residual error attention network | |
CN108304823A (en) | A kind of expression recognition method based on two-fold product CNN and long memory network in short-term | |
CN114176607B (en) | Electroencephalogram signal classification method based on vision transducer | |
CN110263174B (en) | Topic category analysis method based on focus attention | |
CN112257449A (en) | Named entity recognition method and device, computer equipment and storage medium | |
CN117198468B (en) | Intervention scheme intelligent management system based on behavior recognition and data analysis | |
CN113112994B (en) | Cross-corpus emotion recognition method based on graph convolution neural network | |
CN110992988B (en) | Speech emotion recognition method and device based on domain confrontation | |
CN113243924A (en) | Identity recognition method based on electroencephalogram signal channel attention convolution neural network | |
CN110991554B (en) | Improved PCA (principal component analysis) -based deep network image classification method | |
CN117333146A (en) | Manpower resource management system and method based on artificial intelligence | |
CN114511912A (en) | Cross-library micro-expression recognition method and device based on double-current convolutional neural network | |
CN112132257A (en) | Neural network model training method based on pyramid pooling and long-term memory structure | |
Okokpujie et al. | Predictive modeling of trait-aging invariant face recognition system using machine learning | |
CN113762005B (en) | Feature selection model training and object classification methods, devices, equipment and media | |
CN118014545A (en) | Recruitment interview AI scoring algorithm | |
CN118037423A (en) | Method and system for evaluating repayment willingness of farmers after agricultural loans | |
CN113535928A (en) | Service discovery method and system of long-term and short-term memory network based on attention mechanism | |
CN112084944A (en) | Method and system for identifying dynamically evolved expressions | |
CN112863650A (en) | Cardiomyopathy identification system based on convolution and long-short term memory neural network | |
CN115049957B (en) | Micro-expression recognition method and device based on contrast amplification network | |
CN116047418A (en) | Multi-mode radar active deception jamming identification method based on small sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |