CN115049957A - Micro-expression identification method and device based on contrast amplification network - Google Patents

Micro-expression identification method and device based on contrast amplification network Download PDF

Info

Publication number
CN115049957A
CN115049957A CN202210605395.2A CN202210605395A CN115049957A CN 115049957 A CN115049957 A CN 115049957A CN 202210605395 A CN202210605395 A CN 202210605395A CN 115049957 A CN115049957 A CN 115049957A
Authority
CN
China
Prior art keywords
micro
sample
expression
contrast
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210605395.2A
Other languages
Chinese (zh)
Other versions
CN115049957B (en
Inventor
郑文明
魏梦婷
宗源
江星洵
刘佳腾
薛云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210605395.2A priority Critical patent/CN115049957B/en
Publication of CN115049957A publication Critical patent/CN115049957A/en
Application granted granted Critical
Publication of CN115049957B publication Critical patent/CN115049957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a micro-expression identification method and a device based on a contrast and amplification network, wherein the method comprises the following steps: (1) the method comprises the steps that a micro expression database (2) is obtained, micro expression videos are converted into a micro expression frame sequence, and sampling is conducted after preprocessing to serve as a source sample; (3) for each source sample, calculating the distance between the residual frame and the source sample in an embedding space, and mapping the distance into probability to obtain distance probability distribution; (4) sampling a plurality of frames from the remaining video frames as negative samples according to the probability distribution; (5) constructing a contrast amplifying network; (6) performing data enhancement on each source sample to form an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplifying network for training, wherein a loss function is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss; (7) and preprocessing the micro expression video to be identified, inputting the preprocessed micro expression video into a trained contrast amplifying network, and identifying the micro expression category. The invention has higher accuracy and is more convenient.

Description

Micro-expression identification method and device based on contrast amplification network
Technical Field
The invention relates to an image processing technology, in particular to a micro-expression identification method and device based on a contrast amplification network.
Background
Micro-expressions are spontaneous, transient facial expressions that hide the true emotional state of a human. On the basis of the expression, the recognition of the micro expression has important significance on emotion calculation and psychological treatment. However, it is induced by subtle facial movements and occurs in much shorter times than macroscopic facial expressions, and automatic micro-expression recognition is therefore a challenging task.
Micro-expressions are a dynamic process of facial muscle movement, and the capture of motion is essential to accurately identify micro-expressions. However, since the intensity of the micro-expression is low, the extraction of the motion feature is very difficult. To address this problem, many methods employ motion amplification techniques to amplify the micro-expression, making facial motion more pronounced. Some conventional amplification techniques and deep learning filters, such as euler motion amplification (EMM), global lagrange motion amplification (GLMM), learning-based motion amplification (LMM), have demonstrated that the performance of micro-expression recognition can be further enhanced by amplifying the micro-expression intensities. Despite this advantage, the existing amplification methods are not yet fully adaptable. When different subjects show different expression states, the change trends of facial muscles are inconsistent, and a uniform amplification level cannot be adapted to all micro expression samples. For example, the same magnification factor may not be sufficient for one sample of micro-expressions, the magnification may not be sufficient to show the emotion classification of the expression, and may be excessive on another sample of micro-expressions, thereby introducing noise. Therefore, the accuracy of the prior art micro-expression recognition based on contrast amplification still needs to be improved.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a micro expression identification method and device based on a contrast and amplification network with higher accuracy.
The technical scheme is as follows: the micro-expression identification method based on the contrast amplifying network comprises the following steps:
(1) acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels;
(2) converting each micro-expression video in the database into a micro-expression frame sequence, preprocessing the micro-expression frame sequence, and sampling a fixed number of frames from each micro-expression frame sequence as source samples;
(3) for each source sample, calculating the distance between the residual frame in the frame sequence where the source sample is located and the embedding space of the source sample, and mapping the distance into probability to obtain the distance probability distribution between each source sample and the residual frame in the frame sequence;
(4) for each source sample, sampling a plurality of frames from the rest video frames according to the calculated probability distribution as corresponding negative samples;
(5) constructing a contrast amplifying network, which comprises Resnet-18 without a full connection layer, a multi-layer perceptron connected behind the Resnet-18, and a softmax layer connected behind the multi-layer perceptron;
(6) performing data enhancement on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplification network for training, wherein a loss function during training is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and optimizing the training through gradient descent;
(7) and (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
Further, the preprocessing in the step (2) includes face registration and face region cutting.
Further, the sampling a fixed number of frames from each micro-expression frame sequence as a source sample in step (2) specifically includes:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
Figure BDA0003671094850000021
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
Further, the step (3) specifically comprises:
(3-1) for each sequence of micro-expression frames, using a network model that has been pre-trained on a macro-expression database: resnet-18 extracts feature vectors u of the source samples and the residual frames, and the name of the database is FER +:
u i =g(x i ),i=1,…,N
Figure BDA0003671094850000022
where g (-) is a neural network, N represents the number of source samples, x i
Figure BDA0003671094850000023
Represents the ith source sample and x i J-th residual frame of u i
Figure BDA0003671094850000024
Denotes x i
Figure BDA0003671094850000025
Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
Figure BDA0003671094850000026
in the formula,
Figure BDA0003671094850000031
representing the distance of the ith source sample from the jth residual frame in an embedding space;
(3-3) mapping, for each sequence of micro-expression frames, different distances into probabilities using a softmax function according to the following formula, thereby obtaining a distance probability distribution of each source sample from the remaining frames:
Figure BDA0003671094850000032
Figure BDA0003671094850000033
in the formula,
Figure BDA0003671094850000034
to represent
Figure BDA0003671094850000035
The probability of the mapping, P, represents the distance probability distribution of the source sample from the remaining frames.
Further, the step (4) specifically comprises: and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample. .
Further, the loss function in step (6) is specifically:
Figure BDA0003671094850000036
wherein,
Figure BDA0003671094850000037
for total loss, λ 1 ,λ 2 ,λ 3 Are the weights of the three loss functions and,
Figure BDA0003671094850000038
in order to achieve a loss of contrast within the video,
Figure BDA0003671094850000039
in order to have a loss of contrast between classes,
Figure BDA00036710948500000310
is the cross entropy loss.
Further, the specific function of contrast loss in the video is as follows:
Figure BDA00036710948500000311
in the formula, I is a micro-expression database,
Figure BDA00036710948500000312
Figure BDA00036710948500000313
are respectively as
Figure BDA00036710948500000314
Inputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,
Figure BDA00036710948500000315
the method comprises the steps of obtaining a positive sample, an anchor sample and a jth negative sample of an ith source sample of a kth micro-expression video in the I through data enhancement, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
Further, the specific function of the inter-class contrast loss is as follows:
Figure BDA00036710948500000316
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, and the sum v in the same batch of samples k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
Further, the cross entropy loss specific function is as follows:
Figure BDA0003671094850000041
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
The micro expression recognition device based on the contrast and amplification network comprises a processor and a computer program which is stored on a memory and can run on the processor, and the processor realizes the method when executing the program.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: the invention adopts the idea of comparison learning, and can make the network optimization process more concentrate on distinguishing the difference of the positive and negative samples by constructing the positive and negative samples, enlarge the distance of the positive and negative samples in the embedding space, make the intensity comparison and the category comparison more obvious, and directly generate more direct distinguishing characteristics, thereby making the network more smoothly sense the movement change, improving the identification accuracy, reducing the artificially set hyper-parameters, and being more convenient.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a micro expression recognition method based on a contrast amplifying network according to the present invention;
FIG. 2 is a schematic diagram of a contrast amplifying network designed by the present invention;
fig. 3 is a schematic diagram of a negative example generation process.
Detailed Description
The embodiment provides a micro-expression recognition method based on a contrast and amplification network, as shown in fig. 1, including:
(1) and acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels. To improve recognition accuracy, multiple common databases may be employed.
(2) Each micro-expression video in the database is converted into a micro-expression frame sequence, the micro-expression frame sequence is preprocessed, and a fixed number of frames are sampled from each micro-expression frame sequence to serve as source samples.
Wherein the preprocessing comprises face registration and face region shearing. The sampling of a fixed number of frames from each sequence of microexpression frames as a source sample specifically comprises:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
Figure BDA0003671094850000051
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
(3) And for each source sample, calculating the distance between the residual frame in the frame sequence and the source sample in the embedding space, and mapping the distance into probability to obtain the distance probability distribution of each source sample and the residual frame in the frame sequence.
The method specifically comprises the following steps:
(3-1) extracting feature vectors u of the source samples and the residual frames by using a pre-trained neural network for each micro expression frame sequence:
u i =g(x i ),i=1,...,N
Figure BDA0003671094850000052
where g (-) is a neural network, which may be, for example, a network Resnet-18 that has been pre-trained on a macro-expression database, the pre-training using dataset FER +; n denotes the number of source samples, x i
Figure BDA0003671094850000053
Represents the ith source sample and x i J-th residual frame of u i
Figure BDA0003671094850000054
Denotes x i
Figure BDA0003671094850000055
Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
Figure BDA0003671094850000056
in the formula,
Figure BDA0003671094850000057
representing the distance of the ith source sample from the jth residual frame in the embedding space;
(3-3) mapping the different distances into probabilities using a softmax function for each sequence of micro-expression frames according to the following equation, thereby obtaining a distance probability distribution of each source sample from the rest frames:
Figure BDA0003671094850000058
Figure BDA0003671094850000059
in the formula,
Figure BDA00036710948500000510
represent
Figure BDA00036710948500000511
The probability of the mapping, P, represents the distance probability distribution of the source sample from the remaining frames.
Thus, a probability distribution of the source sample and all the remaining video frames with the distance as a discrete variable is obtained, namely: the further away from the source sample at the feature level, the greater the probability of being selected as a negative sample. In the same micro expression sequence, the difference between different frames is only reflected on the intensity information actually, the greater the distance between the frames is, the more obvious the intensity difference is, and by calculating the distance between the source sample and the rest video frames, the longer the distance can be ensured when sampling negative samples, or the video frames with greater intensity difference have greater probability to be sampled.
(4) For each source sample, a number of frames are sampled from the remaining video frames as corresponding negative samples according to the calculated probability distribution, as shown in fig. 3.
The step (4) specifically comprises the following steps: and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample. .
(5) A contrast amplifying network was constructed, as shown in fig. 2, comprising Resnet-18 without a fully connected layer, a multi-layered perceptron connected behind it, and a softmax layer connected behind the multi-layered perceptron.
(6) Data enhancement is carried out on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, the anchor sample, the positive sample and the corresponding negative sample are used as training samples to be input into a contrast amplification network for training, a loss function in the training process is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and training is optimized through gradient descent.
Wherein the loss function is specifically:
Figure BDA0003671094850000061
wherein,
Figure BDA0003671094850000062
for total loss, λ 1 ,λ 2 ,λ 3 Are the weights of the three loss functions and,
Figure BDA0003671094850000063
in order to provide for a loss of contrast within the video,
Figure BDA0003671094850000064
in order to have a loss of contrast between classes,
Figure BDA0003671094850000065
is the cross entropy loss.
Each source sample obtained by sampling the micro-expression frame sequence is marked as x, and a pair of samples are obtained by data enhancement
Figure BDA0003671094850000066
In the actual calculation, call
Figure BDA0003671094850000067
Anchor and positive samples of each other, i.e. when
Figure BDA0003671094850000068
When the sample is used as an anchor sample,
Figure BDA0003671094850000069
as a positive sample and vice versa, the anchor sample and the positive sample should have the same intensity information, and may differ in color, style, and so on, so gaussian noise and random gray scale transformation are selected when data enhancement is performed. Comparing Resnet-18 without full connection layer in the amplifying network and marking as Enc (-) and multilayer perceptron as Proj (-) and marking as the sample after data enhancement and all negative samples corresponding to the source sample as
Figure BDA00036710948500000610
Sending the video data to a contrast amplification network, wherein a loss function calculated on a characteristic level is contrast loss in a video, and the specific function is as follows:
Figure BDA0003671094850000071
in the formula, I is a micro-expression database,
Figure BDA0003671094850000072
Figure BDA0003671094850000073
are respectively as
Figure BDA0003671094850000074
Inputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,
Figure BDA0003671094850000075
the ith source sample of the kth micro-expression video in the I is subjected to data enhancementAnd obtaining a positive sample, an anchor sample and a jth negative sample, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
The specific function of the inter-class contrast loss is as follows:
Figure BDA0003671094850000076
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, the sum of the feature vector and the sum of the sum v and the sum of the sum and the sum of the sum and the sum of the sum and the sum of the sum and the sum of the sum k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
Only the inter-class loss does not enable the network to know the specific class of the sample, so a softmax function is needed to guide the classification, and the specific function is as follows:
Figure BDA0003671094850000077
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
(7) And (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
The embodiment also provides a micro-expression recognition device based on a contrast and amplification network, which comprises a processor and a computer program stored on a memory and capable of running on the processor, wherein the processor executes the computer program to realize the method.
In order to verify the effectiveness of the present invention, micro-expression recognition is performed among the CAME2 micro-expression database, SAMM micro-expression database and HS sub-database of SMIC database, and the verification results and comparison with other latest methods are shown in Table 1:
TABLE 1
Method Year of year CASME2 SAMM SMIC-HS
LBP-SIP 2014 45.36 36.76 42.12
MagGA 2018 63.30 N/A N/A
DSSN 2019 70.78 57.35 63.41
TSCNN-I 2020 74.05 63.53 72.74
LBPAccP u2 2021 69.03 N/A 76.59
AU-GCN 2021 74.27 74.26 N/A
Method for producing a composite material 2022 79.03 77.21 77.91
In Table 1, N/A indicates that there is no relevant record.
The expressions of the CASME2 database are processed as follows: the category with the number of samples less than 10 is omitted, so that the problem of serious imbalance of the samples is avoided. 5 types of recognition tasks are completed, and the expressions of the SAMM database are processed as follows respectively by happy, regression, distorst, fear and surprie: the classes with the number of samples less than 10 are omitted, the problem of serious imbalance of the samples is avoided, and the identification tasks of 5 classes, namely, happy, angry, distust, fear and surpride, are completed. The SMIC database is classified into positive, negative and surrise, and the recognition task of three classifications is completed.
Experimental results show that the micro expression recognition method provided by the invention achieves higher micro expression recognition rate. Compared with the traditional micro expression amplification mode, the method can avoid the complexity of manual setting of part of hyper-parameters, and has stronger adaptability to individuals and more convenience.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A micro expression recognition method based on a contrast amplifying network is characterized by comprising the following steps:
(1) acquiring a micro-expression database, wherein the micro-expression database comprises a plurality of micro-expression videos and corresponding micro-expression category labels;
(2) converting each micro-expression video in the database into a micro-expression frame sequence, preprocessing the micro-expression frame sequence, and sampling a fixed number of frames from each micro-expression frame sequence as source samples;
(3) for each source sample, calculating the distance between the residual frame in the frame sequence where the source sample is located and the embedding space of the source sample, and mapping the distance into probability to obtain the distance probability distribution between each source sample and the residual frame in the frame sequence;
(4) for each source sample, sampling a plurality of frames from the remaining video frames as corresponding negative samples according to the calculated probability distribution;
(5) constructing a contrast amplifying network, which comprises Resnet-18 without a full connection layer, a multi-layer perceptron connected behind the Resnet-18, and a softmax layer connected behind the multi-layer perceptron;
(6) performing data enhancement on each source sample in the micro-expression frame sequence to form a pair of samples, namely an anchor sample and a positive sample, inputting the anchor sample, the positive sample and a corresponding negative sample as training samples into a contrast amplification network for training, wherein a loss function during training is the sum of contrast loss in a video, contrast loss between classes and cross entropy loss, and optimizing the training through gradient descent;
(7) and (3) preprocessing the micro expression video to be identified according to the step (2), and inputting the preprocessed micro expression video into a trained contrast amplifying network to identify the micro expression category.
2. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the preprocessing in the step (2) comprises face registration and face region cutting.
3. The method for identifying micro-expressions based on contrastive augmentation according to claim 1, wherein: the sampling a fixed number of frames from each micro expression frame sequence as a source sample in the step (2) specifically includes:
for a preprocessed micro-expression frame sequence, sampling N frames as source samples according to uniform distribution, wherein the form of uniform distribution is as follows:
Figure FDA0003671094840000011
where f (N) represents a probability density function for sampling the source samples, N represents an index number of the sampled source samples, and N represents the number of the sampled source samples.
4. The micro-expression recognition method based on the contrast and amplification network as claimed in claim 1, wherein: the step (3) specifically comprises the following steps:
(3-1) extracting feature vectors u of the source sample and the residual frame by using a pre-trained neural network for each micro-expression frame sequence:
u i =g(x i ),i=1,…,N
Figure FDA0003671094840000021
where g (-) is a neural network, N represents the number of source samples, x i
Figure FDA0003671094840000022
Represents the ith source sample and x i J-th residual frame of u i
Figure FDA0003671094840000023
Denotes x i
Figure FDA0003671094840000024
Ni represents x i The number of remaining frames;
(3-2) for each micro-expression frame sequence, calculating the distance of the feature vectors of the rest frames and the source samples in the embedding space in the frame sequence according to the following formula:
Figure FDA0003671094840000025
in the formula,
Figure FDA0003671094840000026
representing the distance of the ith source sample from the jth residual frame in an embedding space;
(3-3) mapping, for each sequence of micro-expression frames, different distances into probabilities using a softmax function according to the following formula, thereby obtaining a distance probability distribution of each source sample from the remaining frames:
Figure FDA0003671094840000027
Figure FDA0003671094840000028
in the formula,
Figure FDA0003671094840000029
to represent
Figure FDA00036710948400000210
The probability of the mapping, P, represents the distance probability distribution of the source sample from the remaining frames.
5. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the step (4) specifically comprises the following steps:
and for each source sample, sampling from the distance probability distribution, and taking a video frame corresponding to the probability obtained by sampling as a negative sample.
6. The micro expression recognition method based on the contrast amplifying network according to claim 1, wherein: the loss function in the step (6) is specifically:
Figure FDA00036710948400000211
wherein,
Figure FDA00036710948400000212
for total loss, λ 1 ,λ 2 ,λ 3 Are the weights of the three loss functions and,
Figure FDA00036710948400000213
in order to achieve a loss of contrast within the video,
Figure FDA00036710948400000214
in order to have a loss of inter-class contrast,
Figure FDA00036710948400000215
is the cross entropy loss.
7. The micro expression recognition method based on the contrast amplifying network as claimed in claim 6, wherein: the specific function of contrast loss in the video is as follows:
Figure FDA0003671094840000031
in the formula, I is a micro-expression database,
Figure FDA0003671094840000032
Figure FDA0003671094840000033
are respectively as
Figure FDA0003671094840000034
Inputting the feature vectors output by the multilayer perceptron after the comparison and amplification network,
Figure FDA0003671094840000035
the method comprises the steps of obtaining a positive sample, an anchor sample and a jth negative sample of an ith source sample of a kth micro-expression video in the I through data enhancement, wherein tau is a temperature coefficient, S is the number of the negative samples, and N is the number of the source samples.
8. The micro-expression recognition method based on the contrast and amplification network as claimed in claim 6, wherein: the inter-class contrast loss is specifically a function as follows:
Figure FDA0003671094840000036
wherein I is a micro-expression database, P (k) is ≡ P ∈ A (k) y p =y k Denotes a sample set having the same microexpression class label as the current sample k in a (k), a (k) denotes a set of a positive sample of the current sample and a corresponding negative sample, and v is LSTM (z) 1 ,z 2 ,…,z N ),LSTM(z 1 ,z 2 ,…,z N ) Indicating z by using a long-short memory network LSTM 1 ,z 2 ,…,z N Integration into a feature vector, z, of the video sample 1 ,z 2 ,…,z N Respectively representing the eigenvectors output by the multilayer perceptron after the 1 st, 2 nd, … th and N source samples are input into the contrast amplifying network, wherein tau is a temperature coefficient, and v is k 、v p 、v a Respectively representing the feature vector of the current video sample with subscript k after LSTM integration, and the sum v in the same batch of samples k Feature vectors of samples with the same label, except v in the same batch of samples k Feature vectors of other samples than the one.
9. The micro expression recognition method based on the contrast amplifying network as claimed in claim 6, wherein: the cross entropy loss specific function is as follows:
Figure FDA0003671094840000037
wherein C is the number of micro-expression categories, y c Is a label value of class c, p c Is the probability of belonging to class c.
10. The utility model provides a little expression recognition device based on contrast amplifier network which characterized in that: comprising a processor and a computer program stored on a memory and executable on the processor, wherein: the processor, when executing the program, implements the method of any of claims 1-9.
CN202210605395.2A 2022-05-31 2022-05-31 Micro-expression recognition method and device based on contrast amplification network Active CN115049957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210605395.2A CN115049957B (en) 2022-05-31 2022-05-31 Micro-expression recognition method and device based on contrast amplification network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210605395.2A CN115049957B (en) 2022-05-31 2022-05-31 Micro-expression recognition method and device based on contrast amplification network

Publications (2)

Publication Number Publication Date
CN115049957A true CN115049957A (en) 2022-09-13
CN115049957B CN115049957B (en) 2024-06-07

Family

ID=83160189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210605395.2A Active CN115049957B (en) 2022-05-31 2022-05-31 Micro-expression recognition method and device based on contrast amplification network

Country Status (1)

Country Link
CN (1) CN115049957B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180005272A1 (en) * 2016-06-30 2018-01-04 Paypal, Inc. Image data detection for micro-expression analysis and targeted data services
CN112200065A (en) * 2020-10-09 2021-01-08 福州大学 Micro-expression classification method based on action amplification and self-adaptive attention area selection
CN113537008A (en) * 2021-07-02 2021-10-22 江南大学 Micro-expression identification method based on adaptive motion amplification and convolutional neural network
CN114241573A (en) * 2021-12-23 2022-03-25 华南师范大学 Facial micro-expression recognition method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180005272A1 (en) * 2016-06-30 2018-01-04 Paypal, Inc. Image data detection for micro-expression analysis and targeted data services
CN112200065A (en) * 2020-10-09 2021-01-08 福州大学 Micro-expression classification method based on action amplification and self-adaptive attention area selection
CN113537008A (en) * 2021-07-02 2021-10-22 江南大学 Micro-expression identification method based on adaptive motion amplification and convolutional neural network
CN114241573A (en) * 2021-12-23 2022-03-25 华南师范大学 Facial micro-expression recognition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115049957B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN108171318B (en) Convolution neural network integration method based on simulated annealing-Gaussian function
CN114038037B (en) Expression label correction and identification method based on separable residual error attention network
CN108304823A (en) A kind of expression recognition method based on two-fold product CNN and long memory network in short-term
CN114176607B (en) Electroencephalogram signal classification method based on vision transducer
CN110263174B (en) Topic category analysis method based on focus attention
CN112257449A (en) Named entity recognition method and device, computer equipment and storage medium
CN117198468B (en) Intervention scheme intelligent management system based on behavior recognition and data analysis
CN113112994B (en) Cross-corpus emotion recognition method based on graph convolution neural network
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN113243924A (en) Identity recognition method based on electroencephalogram signal channel attention convolution neural network
CN110991554B (en) Improved PCA (principal component analysis) -based deep network image classification method
CN117333146A (en) Manpower resource management system and method based on artificial intelligence
CN114511912A (en) Cross-library micro-expression recognition method and device based on double-current convolutional neural network
CN112132257A (en) Neural network model training method based on pyramid pooling and long-term memory structure
Okokpujie et al. Predictive modeling of trait-aging invariant face recognition system using machine learning
CN113762005B (en) Feature selection model training and object classification methods, devices, equipment and media
CN118014545A (en) Recruitment interview AI scoring algorithm
CN118037423A (en) Method and system for evaluating repayment willingness of farmers after agricultural loans
CN113535928A (en) Service discovery method and system of long-term and short-term memory network based on attention mechanism
CN112084944A (en) Method and system for identifying dynamically evolved expressions
CN112863650A (en) Cardiomyopathy identification system based on convolution and long-short term memory neural network
CN115049957B (en) Micro-expression recognition method and device based on contrast amplification network
CN116047418A (en) Multi-mode radar active deception jamming identification method based on small sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant