CN112800998A - Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA - Google Patents

Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Download PDF

Info

Publication number
CN112800998A
CN112800998A CN202110159085.8A CN202110159085A CN112800998A CN 112800998 A CN112800998 A CN 112800998A CN 202110159085 A CN202110159085 A CN 202110159085A CN 112800998 A CN112800998 A CN 112800998A
Authority
CN
China
Prior art keywords
emotion
expression
electroencephalogram
vector
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110159085.8A
Other languages
Chinese (zh)
Other versions
CN112800998B (en
Inventor
卢官明
朱清扬
卢峻禾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110159085.8A priority Critical patent/CN112800998B/en
Publication of CN112800998A publication Critical patent/CN112800998A/en
Application granted granted Critical
Publication of CN112800998B publication Critical patent/CN112800998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Abstract

The invention discloses a multi-modal emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA). The method comprises the following steps: respectively extracting electroencephalogram signal features, peripheral physiological signal features and expression features from the preprocessed electroencephalogram signals, peripheral physiological signals and facial expression videos; respectively extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism; obtaining electroencephalogram-peripheral physiological-expression multi-mode emotional characteristics by using a DMCCA method for the electroencephalogram emotional characteristics, the peripheral physiological emotional characteristics and the expression emotional characteristics; and (4) performing classification and identification on the multi-modal emotional features by using a classifier. According to the method, the attention mechanism is adopted to selectively focus on the characteristics with emotion discrimination in each mode, and the relevance and complementarity among emotion characteristics of different modes are fully utilized by combining DMCCA, so that the accuracy and robustness of emotion recognition can be effectively improved.

Description

Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Technical Field
The invention relates to the technical field of emotion recognition and artificial intelligence, in particular to a multi-modal emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA).
Background
Human emotion is a psychological and physiological state accompanying the process of human consciousness, and plays an important role in interpersonal communication. With the continuous progress of technologies such as artificial intelligence, people pay more attention to the more intelligent and humanized Human-Computer interaction (HCIs) experience. People have higher and higher requirements on machine intellectualization, and the machine is expected to have the capability of perceiving, understanding and even expressing emotion, realize humanized human-computer interaction and better serve human beings. Emotion recognition is a branch of emotion calculation, is a basic and core technology for realizing human-computer emotion interaction, has become a research hotspot in the fields of computer science, cognitive science, artificial intelligence and the like, and is widely concerned by the academic and industrial fields. For example, in clinical care, if the emotional state of a patient, especially a patient with a dysexpressive disorder, can be known, different care measures can be taken to improve the quality of care. In addition, there is also an increasing interest in psychobehavioral monitoring of patients with mental disorders, human-machine friendly interaction of emotional robots, and the like.
In the past, many studies on emotion recognition have focused on recognizing human emotional states using information of a single modality, such as speech-based emotion recognition and facial expression-based emotion recognition. Because the emotion information expressed by single voice or expression information is incomplete and is easily influenced by various external factors, for example, facial expression recognition is easily influenced by shading and illumination change, while emotion recognition based on voice is easily influenced by environmental noise interference and sound difference of different subjects, in addition, sometimes people face and smile, hold a cavity and do nothing to silence in order to cover up their real emotions, at this time, the facial expression or body posture has certain deception, and the emotion recognition method based on voice is invalid when people silence and are not speaking, so that the single-mode emotion recognition has certain limitation. Therefore, more and more researchers are focusing on emotion recognition research based on multi-mode information fusion, and it is expected that a robust emotion recognition model can be constructed by utilizing complementarity between various modal information so as to achieve higher emotion recognition accuracy.
Currently, in multi-modal emotion recognition research, a more common information fusion strategy includes decision layer fusion and feature layer fusion. Decision layer fusion is usually based on the result of individual identification of each mode, and then decision judgment is made according to relevant rules, such as a Mean (Mean) rule, a Sum (Sum) rule, a maximum (Max) rule, a voting mechanism of minority majority obeying, and the like, so as to obtain a final identification result. The decision layer fusion technology considers the difference of different modal information comprehensively according to different contributions of the different modal information to emotion recognition, but ignores the correlation of the different modal information. The multi-modal emotion recognition performance based on decision-making layer fusion is not only related to the emotion recognition rate of a single mode, but also depends on the performance of a decision-making layer fusion algorithm. The feature layer fusion refers to combining emotional features of a plurality of modes to form a fused feature vector. The feature layer fusion method utilizes the complementarity of different modal emotional features, but how to determine the weights of the different modal emotional features to reflect the differences of the different features in emotion classification and identification is a key for performing multi-modal feature fusion, and is still an open subject facing challenges at present.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of low accuracy and poor robustness of single-mode emotion recognition and the defects of the existing multi-mode emotion feature fusion method, the invention aims to provide the multi-mode emotion recognition method and system for fusing an attention mechanism and identifying multi-set canonical correlation analysis (DMCCA).
The technical scheme is as follows: the invention adopts the following technical scheme for realizing the aim of the invention:
a multi-modal emotion recognition method fusing an attention mechanism and DMCCA comprises the following steps:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof;
(2) mapping the electroencephalogram signal feature vector, the peripheral physiological signal feature vector and the expression feature vector into a plurality of groups of feature vectors through linear transformation matrixes respectively, determining importance weights of different feature vector groups by using an attention mechanism module respectively, and forming an electroencephalogram emotion feature vector, a peripheral physiological emotion feature vector and an expression emotion feature vector which have the same dimension and are discriminating through weighting fusion;
(3) determining a projection matrix of each emotion characteristic vector by using a discrimination multiple set canonical correlation analysis (DMCCA) method for the electroencephalogram emotion characteristic vector, the peripheral physiological emotion characteristic vector and the expression emotion characteristic vector and maximizing the correlation among different modal emotion characteristics of the same type of sample, projecting each emotion characteristic vector to a public subspace, and obtaining the electroencephalogram-peripheral physiological-expression multi-modal emotion characteristic vector after addition and fusion;
(4) and classifying and identifying the multi-mode emotion feature vectors by using a classifier to obtain emotion categories.
Further, the specific steps of extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism module in the step (2) comprise:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure BDA0002935593770000031
And by linearly transforming the matrix W(1)Mapping to M1Group feature vector
Figure BDA0002935593770000032
4≤M1Not more than 16, the dimension of each group of feature vectors is N, N is not less than 16 and not more than 64, and the order is
Figure BDA0002935593770000033
The linear transformation expression is as follows:
E(1)=(F(1))TW(1)
wherein, the superscript (1) represents an electroencephalogram mode, and T represents a transposed symbol;
determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure BDA0002935593770000034
And the electroencephalogram emotional characteristic vector x(1)Expressed as:
Figure BDA0002935593770000035
Figure BDA0002935593770000036
wherein, r is 1,2, …, M1
Figure BDA0002935593770000037
Representing the r-th group of electroencephalogram signal feature vectors,
Figure BDA0002935593770000038
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e;
(2.2) expressing the peripheral physiological signal characteristics extracted in the step (1) in a matrix form
Figure BDA0002935593770000039
Combined pipeOver-linear transformation matrix W(2)Mapping to M2Group feature vector
Figure BDA00029355937700000310
4≤M2Less than or equal to 16, order
Figure BDA00029355937700000311
Figure BDA00029355937700000312
The linear transformation expression is as follows:
E(2)=(F(2))TW(2)
wherein the superscript (2) represents a peripheral physiological modality;
determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectors
Figure BDA0002935593770000041
And peripheral physiological emotion feature vector x(2)Expressed as:
Figure BDA0002935593770000042
Figure BDA0002935593770000043
wherein, s is 1,2, …, M2
Figure BDA0002935593770000044
Represents the s-th group of peripheral physiological signal feature vectors,
Figure BDA0002935593770000045
a trainable linear transformation parameter vector;
(2.3) expressing features extracted in the step (1) in a matrix formIs shown as
Figure BDA0002935593770000046
And by linearly transforming the matrix W(3)Mapping to M3Group feature vector
Figure BDA0002935593770000047
4≤M3Less than or equal to 16, order
Figure BDA0002935593770000048
Figure BDA0002935593770000049
The linear transformation expression is as follows:
E(3)=(F(3))TW(3)
wherein, the superscript (3) represents an expression mode;
determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure BDA00029355937700000410
And expression emotion feature vector x(3)Expressed as:
Figure BDA00029355937700000411
Figure BDA00029355937700000412
wherein, t is 1,2, …, M3
Figure BDA00029355937700000413
Representing the characteristic vector of the t-th group expression,
Figure BDA00029355937700000414
for trainable linear transformationA parameter vector.
Further, the step (3) specifically comprises the following sub-steps:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure BDA0002935593770000051
Figure BDA0002935593770000052
And
Figure BDA0002935593770000053
32≤d≤128;
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2)(1)Peripheral physiological emotion feature vector x(2)And expression emotion feature vector x(3)Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x(1)Projection into d-dimensional common subspace is ΩTx(1)Peripheral physiological affective feature vector x(2)Projection into d-dimensional common subspace is ΨTx(2)Expression emotion feature vector x(3)Projection into d-dimensional common subspace is ΨTx(3)
(3.3) reducing omegaTx(1)、ΦTx(2)And ΨTx(3)Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omegaTx(1)Tx(2)Tx(3)
Further, the projection matrices Ω, Φ, and Ψ in step (3.1) are obtained by training in the following steps:
(3.1.1) respectively extracting training samples of all emotion types from the training sample set to generate 3 groups of emotion feature vectors
Figure BDA0002935593770000054
Wherein
Figure BDA0002935593770000055
M is the number of training samples, N is
Figure BDA0002935593770000056
I 1,2,3, M1, 2, …, M; let i-1 represent the electroencephalogram modality, i-2 represent the peripheral physiological modality, i-3 represent the expression modality,
Figure BDA0002935593770000057
representing the electroencephalogram emotional characteristic vector,
Figure BDA0002935593770000058
representing a vector of peripheral physiological emotional features,
Figure BDA0002935593770000059
representing an expression emotion feature vector;
(3.1.2) calculation of X(i)Mean of vectors in each column, pair X(i)Carrying out centralized operation;
(3.1.3) solving a group of projection matrixes omega, phi and psi based on the idea of identifying multi-set canonical correlation analysis (DMCCA), so that the linear correlation of the same type of samples in a public projection shadow space is maximized, the inter-class dispersion of data in the modality is maximized, and the intra-class dispersion of the data in the modality is minimized, and X is enabled to be(i)Is a projection vector of
Figure BDA00029355937700000510
1,2,3, the objective function of DMCCA is:
Figure BDA0002935593770000061
wherein the content of the first and second substances,
Figure BDA0002935593770000062
represents X(i)The intra-class dispersion matrix of (a),
Figure BDA0002935593770000063
represents X(i)Cov (·, ·) represents the covariance, i, j ∈ {1,2,3 };
constructing an optimization model as follows and solving to obtain projection matrixes omega, phi and psi:
Figure BDA0002935593770000064
further, solving the optimization model of the DMCCA objective function using Lagrange multiplier (Lagrange multiplier) can obtain the following Lagrange (Lagrange) function:
Figure BDA0002935593770000065
wherein λ is Lagrange multiplier, and then respectively calculating L (w)(1),w(2),w(3)) To w(1)、w(2)And w(3)And making it zero, i.e. order
Figure BDA0002935593770000066
To obtain
Figure BDA0002935593770000067
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure BDA0002935593770000068
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula1≥λ2≥…≥λdCorresponding characteristic vector, namely obtaining a projection matrix
Figure BDA0002935593770000071
Figure BDA0002935593770000072
And
Figure BDA0002935593770000073
based on the same inventive concept, the multi-modal emotion recognition system integrating the attention mechanism and the DMCCA, provided by the invention, comprises:
the characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
the characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
the projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same class of samples by using a discrimination multi-set canonical correlation analysis (DMCCA) method;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining an electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
and the classification and identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain the emotion types.
Based on the same inventive concept, the multi-modal emotion recognition system fusing the attention mechanism and the DMCCA provided by the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the multi-modal emotion recognition method fusing the attention mechanism and the DMCCA when being loaded to the processor.
Has the advantages that: compared with the prior art, the invention has the following technical effects:
(1) according to the invention, an attention mechanism is adopted to selectively focus on the significant characteristics playing a key role in emotion recognition in each mode, the characteristics with emotion identification capability are adaptively learned, and the accuracy and robustness of multi-mode emotion recognition can be effectively improved.
(2) The invention adopts a typical correlation analysis method for identifying multiple sets, introduces the category information of the samples, can excavate the nonlinear correlation relationship among different modes by maximizing the correlation among different modal emotional characteristics of the same category sample and maximizing the inter-class dispersion of the same modal emotional characteristics and minimizing the intra-class dispersion of the same modal emotional characteristics, fully utilizes the correlation and complementarity among electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics, eliminates some invalid redundant characteristics at the same time, and can effectively improve the identification power and robustness of characteristic representation.
(3) Compared with a single-mode emotion recognition method, the method comprehensively utilizes various modal information in the emotion expression process, can combine the characteristics of different modes and fully utilize the complementarity of the characteristics to mine multi-mode emotion characteristics, and can effectively improve the accuracy and robustness of emotion recognition.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
fig. 2 is a block diagram of an embodiment of the present invention.
Detailed Description
For a more detailed understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings and specific examples.
As shown in fig. 1 and fig. 2, a multi-modal emotion recognition method combining an attention mechanism and a DMCCA provided by an embodiment of the present invention mainly includes the following steps:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using the trained neural network models respectively, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof.
In this embodiment, a deap (database for electronic Analysis using physical signals) Emotion database is used, and in practice, other Emotion databases including electroencephalogram, peripheral Physiological signals, and facial expression videos may be used. The DEAP database used in this example was a published multimodal emotion database collected by Koelstra et al, university of Mary, London, England. The database comprises physiological signals generated by 32 subjects watching different types of music video clip evoked stimuli with the time length of 40 minutes, peripheral physiological signals and facial expression videos of the first 22 subjects watching the music video clip. Each subject required 40 experiments and had a timely Self-assessment (SAM) after each experiment was completed, 40 Self-assessments on a SAM questionnaire. The SAM questionnaire contains mental scales of the subjects' Arousal (Arousal), Valence (Valence), Dominance (Dominance) and Liking (Liking) for the video. The arousal degree represents the state excitation degree of the human, the change range is gradually transited from a calm state to an excitation state, and the value is measured by the value from 1 to 9; the valence degree is also called the pleasure degree and represents the pleasure degree of the mood of a person, and the variation range is gradually transited from a Negative (Negative) state to a Positive (Positive) state and is also measured by the scores of the numbers 1 to 9; the degree of dominance varies from compliant (or "uncontrolled") to dominant (or "controlled"); the preference indicates the individual preference of the subject for the video. Each subject needs to select a score representing the emotional state after each experiment for classification and identification analysis of the subsequent emotional classifications.
In the DEAP database, the physiological signals are 512Hz sampled, 128Hz complex sampled (preprocessed complex sampled data is provided by the authorities), and the physiological signal matrix of each subject is 40 × 40 × 8064(40 different kinds of music video clips, 40 physiological signal channels, 8064 sampling points). Of the 40 physiological signal channels, the first 32 channels collect electroencephalogram signals, and the last 8 channels collect peripheral physiological signals. The 8064 samples are 63s long at 128Hz sampling rate, and each segment of the signal has 3s silence time before recording.
In the embodiment of the invention, 880 samples with electroencephalogram signals, peripheral physiological signals and facial expressions are used as training samples, and classification recognition is respectively carried out on 4 dimensions of arousal degree, valence degree, dominance degree and preference degree.
The Neural Network model for extracting the electroencephalogram signal features can adopt a Long Short-Term Memory (LSTM) Network or a Convolutional Neural Network (CNN), and the Neural Network model for extracting the expression features can adopt a 3D Convolutional Neural Network, a CNN-LSTM, and the like. In this embodiment, a trained Convolutional Neural Network (CNN) model is used to perform feature extraction on the preprocessed electroencephalogram signal, so as to obtain a 256-dimensional electroencephalogram signal feature vector; extracting 128-dimensional peripheral physiological signal characteristic vectors of preprocessed peripheral physiological signals such as electrocardio, respiration, electrooculogram and myoelectricity by extracting Low Level Descriptors (LLD) of signal waveforms and statistical characteristics (including average value, standard deviation, power spectrum, median, maximum value and minimum value) of the LLD; and extracting 256-dimensional expression feature vectors from the preprocessed facial expression video by using a trained CNN-LSTM model.
(2) And respectively extracting the discriminating electroencephalogram emotion characteristic vector, peripheral physiological emotion characteristic vector and expression emotion characteristic vector from the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector by using an attention mechanism module.
(3) And obtaining the electroencephalogram-peripheral physiology-expression multi-mode emotion feature vector by using a discrimination multi-set canonical correlation analysis (DMCCA) method for the electroencephalogram emotion feature vector, the peripheral physiology emotion feature vector and the expression emotion feature vector.
(4) And classifying and identifying the multi-mode emotion feature vectors by using a classifier to obtain emotion categories.
Further, the specific steps of extracting discriminating electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics by using an attention mechanism module in the step (2) comprise:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure BDA0002935593770000101
And by linearly transforming the matrix W(1)Mapping to M1Group feature vector
Figure BDA0002935593770000102
4≤M1Not more than 16, the dimension of each group of feature vectors is N, N is not less than 16 and not more than 64, and the order is
Figure BDA0002935593770000103
The linear transformation expression is as follows:
E(1)=(F(1))TW(1)
wherein, the superscript (1) represents the electroencephalogram mode, and T represents the transposed symbol.
Determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure BDA0002935593770000104
And the electroencephalogram emotional characteristic vector x(1)Expressed as:
Figure BDA0002935593770000105
Figure BDA0002935593770000106
wherein, r is 1,2, …, M1
Figure BDA0002935593770000107
Representing the r-th group of electroencephalogram signal feature vectors,
Figure BDA0002935593770000108
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e. In this embodiment, M1=8,N=32。
To train the linear transformation matrix W(1)The parameter of (2) needs to be connected with a softmax classifier after the first attention mechanism module, and the electroencephalogram emotion feature vector x output by the first attention mechanism module is used for(1)C output nodes connected to the softmax classifier output a probability distribution vector after passing through the softmax function
Figure BDA0002935593770000111
Wherein C is [1, C ]]And C is the number of emotion categories.
Further, the linear transformation matrix W is trained by the cross entropy loss function shown in the following equation(1)The parameter (c) of (c).
Figure BDA0002935593770000112
Figure BDA0002935593770000113
Wherein x is(1)The electroencephalogram emotion feature vector is 32-dimensional;
Figure BDA0002935593770000114
probability distribution vectors representing the prediction emotion classes of the softmax classification model;
Figure BDA0002935593770000115
representing the real emotion category label of the mth electroencephalogram sample, and if the real emotion category label of the mth electroencephalogram sample is c when one-hot coding is adopted
Figure BDA0002935593770000116
Otherwise
Figure BDA0002935593770000117
Representing the probability that the softmax classification model predicts the mth electroencephalogram sample as the class c; loss(1)Representing a linear transformation matrix W(1)A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, the electroencephalogram emotional characteristic vector x can be extracted from the electroencephalogram signal of the newly input test sample(1)
(2.2) expressing the peripheral physiological signal characteristics extracted in the step (1) in a matrix form
Figure BDA0002935593770000118
And by linearly transforming the matrix W(2)Mapping to M2Group feature vector
Figure BDA0002935593770000119
4≤M2Less than or equal to 16, order
Figure BDA00029355937700001110
Figure BDA00029355937700001111
The linear transformation expression is as follows:
E(2)=(F(2))TW(2)
wherein the superscript (2) represents the peripheral physiological modality.
Determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectorsHeavy load
Figure BDA00029355937700001112
And peripheral physiological emotion feature vector x(2)Expressed as:
Figure BDA00029355937700001113
Figure BDA0002935593770000121
wherein, s is 1,2, …, M2
Figure BDA0002935593770000122
Represents the s-th group of peripheral physiological signal feature vectors,
Figure BDA0002935593770000123
the parameter vector is transformed linearly, which is trainable. In this embodiment, M2=4。
To train the linear transformation matrix W(2)The peripheral physiological emotion feature vector x output by the second attention mechanism module needs to be connected with a softmax classifier after the second attention mechanism module(2)C output nodes connected to the softmax classifier output a probability distribution vector after passing through the softmax function
Figure BDA0002935593770000124
Further, the linear transformation matrix W is trained by the cross entropy loss function shown in the following equation(2)The parameter (c) of (c).
Figure BDA0002935593770000125
Figure BDA0002935593770000126
Wherein x is(2)A 32-dimensional peripheral physiological emotion feature vector;
Figure BDA0002935593770000127
probability distribution vectors representing the prediction emotion classes of the softmax classification model;
Figure BDA0002935593770000128
when one-hot coding is adopted, if the real emotion category label of the mth peripheral physiological signal sample is c, then
Figure BDA0002935593770000129
Otherwise
Figure BDA00029355937700001210
Representing the probability that the softmax classification model predicts the mth peripheral physiological signal sample as class c; loss(2)Representing a linear transformation matrix W(2)A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, a peripheral physiological emotion characteristic vector x can be extracted from the newly input peripheral physiological signal of the test sample(2)
(2.3) expressing the expression characteristics extracted in the step (1) in a matrix form into expression characteristics
Figure BDA00029355937700001211
And by linearly transforming the matrix W(3)Mapping to M3Group feature vector
Figure BDA00029355937700001212
4≤M3Less than or equal to 16, order
Figure BDA00029355937700001213
Figure BDA00029355937700001214
The linear transformation expression is as follows:
E(3)=(F(3))TW(3)
wherein, the superscript (3) represents an expression mode.
Determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure BDA0002935593770000131
And expression emotion feature vector x(3)Expressed as:
Figure BDA0002935593770000132
Figure BDA0002935593770000133
wherein, t is 1,2, …, M3
Figure BDA0002935593770000134
Representing the characteristic vector of the t-th group expression,
Figure BDA0002935593770000135
the parameter vector is transformed linearly, which is trainable. In this embodiment, M3=8。
To train the linear transformation matrix W(3)The third attention mechanism module is connected with a softmax classifier, and the expression emotion feature vector x output by the third attention mechanism module is used for classifying the expression emotion feature vector x(3)C output nodes connected to the softmax classifier output a probability distribution vector after passing through the softmax function
Figure BDA0002935593770000136
Further, represented by the following formulaTraining linear transformation matrix W by cross entropy loss function(3)The parameter (c) of (c).
Figure BDA0002935593770000137
Figure BDA0002935593770000138
Wherein x is(3)Expression emotion feature vectors in 32 dimensions;
Figure BDA0002935593770000139
probability distribution vectors representing the prediction emotion classes of the softmax classification model;
Figure BDA00029355937700001310
and when one-hot coding is adopted, if the real emotion category label of the mth expression video sample is c, then
Figure BDA00029355937700001311
Otherwise
Figure BDA00029355937700001312
Representing the probability that the m-th expression video sample is predicted to be of the category c by the softmax classification model; loss(3)Representing a linear transformation matrix W(3)A loss function during training; in this embodiment, C is 2 and M is 880.
And continuously carrying out iterative training through an error back propagation algorithm until the model parameters reach the optimal values. Then, the expression emotion feature vector x can be extracted from the newly input expression video of the test sample(3)
Further, the step (3) specifically comprises the following sub-steps:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure BDA0002935593770000141
Figure BDA0002935593770000142
And
Figure BDA0002935593770000143
d is more than or equal to 32 and less than or equal to 128. In the present embodiment, d is 40.
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2)(1)Peripheral physiological emotion feature vector x(2)And expression emotion feature vector x(3)Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x(1)Projection into d-dimensional common subspace is ΩTx(1)Peripheral physiological affective feature vector x(2)Projection into d-dimensional common subspace is ΦTx(2)Expression emotion feature vector x(3)Projection into d-dimensional common subspace is ΨTx(3)
(3.3) reducing omegaTx(1)、ΦTx(2)And ΨTx(3)Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omegaTx(1)Tx(2)Tx(3)
Further, the projection matrices Ω, Φ, and Ψ in step (3.1) are obtained by training in the following steps:
(3.1.1) generating 3 groups of emotional feature vectors for the samples of the class C emotion classes in the training sample set
Figure BDA0002935593770000144
Wherein
Figure BDA0002935593770000145
M is the number of training samples (in this example, the data size in the sample set is not large, all samples participate in the calculation, and the sample set with large data size can randomly extract samples of each emotion type), i is 1,2,3, M is 1,2, …, M; let i-1 represent the electroencephalogram mode,i-2 stands for peripheral physiological modality, i-3 stands for expression modality,
Figure BDA0002935593770000146
representing the electroencephalogram emotional characteristic vector,
Figure BDA0002935593770000147
representing a vector of peripheral physiological emotional features,
Figure BDA0002935593770000148
representing an expression emotion feature vector; in this embodiment, C is 2, M is 880, and N is 32.
(3.1.2) calculation of X(i)Mean value of the vectors of each column
Figure BDA0002935593770000149
To X(i)Performing a centralization operation to obtain
Figure BDA00029355937700001410
For convenience of description, the following will be centered
Figure BDA00029355937700001411
Is still marked as X(i)I.e. to assume
Figure BDA00029355937700001412
Have all been centralized.
(3.1.3) the idea of discriminating multiple sets canonical correlation analysis (DMCCA) is to find a set of projection matrices Ω, Φ, and Ψ to maximize the linear correlation of homogeneous samples in the common projection shadow space, while also maximizing the inter-class scattering of data within the modality and minimizing the intra-class scattering of data within the modality, let X be(i)Is a projection vector of
Figure BDA0002935593770000151
1,2,3, the objective function of DMCCA is:
Figure BDA0002935593770000152
wherein the content of the first and second substances,
Figure BDA0002935593770000153
represents X(i)The intra-class dispersion matrix of (a),
Figure BDA0002935593770000154
represents X(i)Cov (·, ·) represents the covariance, i, j ∈ {1,2,3 }.
The solution to the DMCCA objective function may be represented as an optimization model as follows:
Figure BDA0002935593770000155
(3.1.4) solving the optimization model of the DMCCA objective function using Lagrange multiplier (Lagrange multiplier) yields the following Lagrange (Lagrange) function:
Figure BDA0002935593770000156
wherein λ is Lagrange multiplier, and then respectively calculating L (w)(1),w(2),w(3)) To w(1)、w(2)And w(3)And making it zero, i.e. order
Figure BDA0002935593770000157
To obtain
Figure BDA0002935593770000158
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure BDA0002935593770000161
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula1≥λ2≥…≥λdCorresponding characteristic vector, namely obtaining a projection matrix
Figure BDA0002935593770000162
Figure BDA0002935593770000163
And
Figure BDA0002935593770000164
in the present embodiment, d is 40.
Based on the same inventive concept, the multi-modal emotion recognition system integrating the attention mechanism and the DMCCA provided by the embodiment of the invention comprises:
the characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
the characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
the projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same type of samples by using a DMCCA method;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining the electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
and the classification and identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain the emotion types.
For specific implementation of each module, reference is made to the above method embodiment, and details are not repeated. Those skilled in the art will appreciate that the modules in the embodiments may be adaptively changed and arranged in one or more systems different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components.
Based on the same inventive concept, the multi-modal emotion recognition system combining the attention mechanism and the DMCCA provided by the embodiment of the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and the computer program realizes the multi-modal emotion recognition method combining the attention mechanism and the DMCCA when being loaded into the processor.
The technical scheme disclosed by the invention not only comprises the technical methods related in the above embodiments, but also comprises the technical scheme formed by randomly combining the above technical methods. Those skilled in the art can make certain improvements and modifications without departing from the principles of the present invention, and such improvements and modifications are to be considered within the scope of the present invention.

Claims (7)

1. The multimode emotion recognition method integrating the attention mechanism and the DMCCA is characterized by comprising the following steps of:
(1) extracting electroencephalogram signal feature vectors and expression feature vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal feature vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical features thereof;
(2) mapping the electroencephalogram signal feature vector, the peripheral physiological signal feature vector and the expression feature vector into a plurality of groups of feature vectors through linear transformation matrixes respectively, determining importance weights of different feature vector groups by using an attention mechanism module respectively, and forming an electroencephalogram emotion feature vector, a peripheral physiological emotion feature vector and an expression emotion feature vector which have the same dimension and are discriminating through weighting fusion;
(3) determining a projection matrix of each emotion characteristic vector by using a discrimination multiple set canonical correlation analysis (DMCCA) method for the electroencephalogram emotion characteristic vector, the peripheral physiological emotion characteristic vector and the expression emotion characteristic vector and maximizing the correlation among different modal emotion characteristics of the same type of sample, projecting each emotion characteristic vector to a public subspace, and obtaining the electroencephalogram-peripheral physiological-expression multi-modal emotion characteristic vector after addition and fusion;
(4) and classifying and identifying the multi-mode emotion feature vectors by using a classifier to obtain emotion categories.
2. The multi-modal emotion recognition method combining attention mechanism and DMCCA as recited in claim 1, wherein step (2) comprises the sub-steps of:
(2.1) representing the electroencephalogram signal characteristics extracted in the step (1) into a matrix form
Figure FDA0002935593760000011
And by linearly transforming the matrix W(1)Mapping to M1Group feature vector
Figure FDA0002935593760000012
4≤M1Not more than 16, the dimension of each group of feature vectors is N, N is not less than 16 and not more than 64, and the order is
Figure FDA0002935593760000013
The linear transformation expression is as follows:
E(1)=(F(1))TW(1)
wherein, the superscript (1) represents an electroencephalogram mode, and T represents a transposed symbol;
determining importance weights of different feature vector groups by using a first attention mechanism module, and forming the electroencephalogram emotional feature vector with discriminative power by weighted fusion, wherein the weight of the characteristic vector of the r-th group of electroencephalogram signals
Figure FDA0002935593760000014
And the electroencephalogram emotional characteristic vector x(1)Expressed as:
Figure FDA0002935593760000021
Figure FDA0002935593760000022
wherein, r is 1,2, …, M1
Figure FDA0002935593760000023
Representing the r-th group of electroencephalogram signal feature vectors,
Figure FDA0002935593760000024
for a trainable linear transformation parameter vector, exp (·) represents an exponential function based on a natural constant e;
(2.2) expressing the peripheral physiological signal characteristics extracted in the step (1) in a matrix form
Figure FDA0002935593760000025
And by linearly transforming the matrix W(2)Mapping to M2Group feature vector
Figure FDA0002935593760000026
4≤M2Less than or equal to 16, order
Figure FDA0002935593760000027
Figure FDA0002935593760000028
The linear transformation expression is as follows:
E(2)=(F(2))TW(2)
wherein the superscript (2) represents a peripheral physiological modality;
determining importance weights of different feature vector groups by using a second attention mechanism module, and forming discriminating peripheral physiological emotion feature vectors by weighted fusion, wherein the weights of the s-th group of peripheral physiological signal feature vectors
Figure FDA0002935593760000029
And peripheral physiological emotion feature vector x(2)Expressed as:
Figure FDA00029355937600000210
Figure FDA00029355937600000211
wherein, s is 1,2, …, M2
Figure FDA00029355937600000212
Represents the s-th group of peripheral physiological signal feature vectors,
Figure FDA00029355937600000213
a trainable linear transformation parameter vector;
(2.3) expressing the expression characteristics extracted in the step (1) in a matrix form into expression characteristics
Figure FDA00029355937600000214
And by linearly transforming the matrix W(3)Mapping to M3Group feature vector
Figure FDA0002935593760000031
4≤M3Less than or equal to 16, order
Figure FDA0002935593760000032
Figure FDA0002935593760000033
The linear transformation expression is as follows:
E(3)=(F(3))TW(3)
wherein, the superscript (3) represents an expression mode;
determining importance weights of different feature vector groups by using a third attention mechanism module, and forming expression emotion feature vectors with discriminative power by weighted fusion, wherein the weights of the t-th group of expression emotion feature vectors
Figure FDA0002935593760000034
And expression emotion feature vector x(3)Expressed as:
Figure FDA0002935593760000035
Figure FDA0002935593760000036
wherein, t is 1,2, …, M3
Figure FDA0002935593760000037
Representing the characteristic vector of the t-th group expression,
Figure FDA0002935593760000038
the parameter vector is transformed linearly, which is trainable.
3. The multi-modal emotion recognition method combining attention mechanism and DMCCA as recited in claim 2, wherein step (3) comprises the sub-steps of:
(3.1) acquiring DMCCA projection matrix which is obtained through training and respectively corresponds to electroencephalogram emotional characteristics, peripheral physiological emotional characteristics and expression emotional characteristics
Figure FDA0002935593760000039
Figure FDA00029355937600000310
And
Figure FDA00029355937600000311
32≤d≤128;
(3.2) respectively using projection matrixes omega, phi and psi to extract the electroencephalogram emotion feature vector x from the step (2)(1)Peripheral physiological emotion feature vector x(2)And expression emotion feature vector x(3)Projected into a d-dimensional public subspace, wherein the electroencephalogram emotional characteristic vector x(1)Projection into d-dimensional common subspace is ΩTx(1)Peripheral physiological affective feature vector x(2)Projection into d-dimensional common subspace is ΦTx(2)Expression emotion feature vector x(3)Projection into d-dimensional common subspace is ΨTx(3)
(3.3) reducing omegaTx(1)、ΦTx(2)And ΨTx(3)Fusing to obtain electroencephalogram-peripheral physiology-expression multi-modal emotion feature vector omegaTx(1)Tx(2)Tx(3)
4. The multi-modal emotion recognition method integrating an attention mechanism and DMCCA according to claim 3, wherein the projection matrices Ω, Φ and Ψ in step (3.1) are obtained by training:
(3.1.1) respectively extracting training samples of each emotion type from the training sample set to generate 3 groups of emotional feature vectorsMeasurement of
Figure FDA0002935593760000041
Wherein
Figure FDA0002935593760000042
M is the number of training samples, i is 1,2,3, M is 1,2, …, M; let i-1 represent the electroencephalogram modality, i-2 represent the peripheral physiological modality, i-3 represent the expression modality,
Figure FDA0002935593760000043
representing the electroencephalogram emotional characteristic vector,
Figure FDA0002935593760000044
representing a vector of peripheral physiological emotional features,
Figure FDA0002935593760000045
representing an expression emotion feature vector;
(3.1.2) calculation of X(i)Mean of vectors in each column, pair X(i)Carrying out centralized operation;
(3.1.3) solving a group of projection matrixes omega, phi and psi based on the idea of identifying multi-set canonical correlation analysis (DMCCA), so that the linear correlation of the same type of samples in a public projection shadow space is maximized, the inter-class dispersion of data in the modality is maximized, and the intra-class dispersion of the data in the modality is minimized, and X is enabled to be(i)Is a projection vector of
Figure FDA0002935593760000046
The objective function of DMCCA is:
Figure FDA0002935593760000047
wherein the content of the first and second substances,
Figure FDA0002935593760000048
represents X(i)The intra-class dispersion matrix of (a),
Figure FDA0002935593760000049
Figure FDA00029355937600000410
represents X(i)Cov (·, ·) represents the covariance, i, j ∈ {1,2,3 }; constructing an optimization model as follows and solving to obtain projection matrixes omega, phi and psi:
Figure FDA00029355937600000411
5. the multi-modal emotion recognition method integrating the attention mechanism and the DMCCA according to claim 4, wherein the optimized model of the DMCCA objective function constructed by solving the method by using the Lagrangian multiplier method is specifically as follows: the optimization model is expressed as the following lagrange function:
Figure FDA0002935593760000051
wherein λ is Lagrange multiplier, and then respectively calculating L (w)(1),w(2),w(3)) To w(1)、w(2)And w(3)And making it zero, i.e. order
Figure FDA0002935593760000052
To obtain
Figure FDA0002935593760000053
By further simplifying the above equation, the following generalized eigenvalue problem can be obtained:
Figure FDA0002935593760000054
the first d maximum eigenvalues lambda are selected by solving the generalized eigenvalue problem in the above formula1≥λ2≥…≥λdCorresponding characteristic vector, namely obtaining a projection matrix
Figure FDA0002935593760000055
Figure FDA0002935593760000056
And
Figure FDA0002935593760000057
6. the multimode emotion recognition system fusing an attention mechanism and DMCCA is characterized by comprising:
the characteristic primary extraction module is used for respectively extracting electroencephalogram signal characteristic vectors and expression characteristic vectors from the preprocessed electroencephalogram signals and facial expression videos by using respective trained neural network models, and extracting peripheral physiological signal characteristic vectors from the preprocessed peripheral physiological signals by extracting signal waveform descriptors and statistical characteristics thereof;
the characteristic identification enhancement module is used for mapping the electroencephalogram signal characteristic vector, the peripheral physiological signal characteristic vector and the expression characteristic vector into a plurality of groups of characteristic vectors through linear transformation matrixes respectively, determining importance weights of different characteristic vector groups respectively by using the attention mechanism module, and forming an electroencephalogram emotion characteristic vector, a peripheral physiological emotion characteristic vector and an expression emotion characteristic vector which have the same dimension and have identification power through weighting fusion;
the projection matrix determining module is used for determining a projection matrix of each emotion characteristic vector by maximizing the correlation among different modal emotion characteristics of the same class of samples by using a discrimination multi-set canonical correlation analysis (DMCCA) method;
the feature fusion module is used for projecting the electroencephalogram emotion feature vector, the peripheral physiological emotion feature vector and the expression emotion feature vector to a public subspace through respective corresponding projection matrixes, and obtaining an electroencephalogram-peripheral physiological-expression multi-mode emotion feature vector after addition and fusion;
and the classification and identification module is used for classifying and identifying the multi-mode emotion feature vectors by using the classifier to obtain the emotion types.
7. A multi-modal emotion recognition system combining an attention mechanism and DMCCA, comprising at least one computing device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when loaded into the processor implementing the multi-modal emotion recognition method combining an attention mechanism and DMCCA according to any of claims 1-5.
CN202110159085.8A 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA Active CN112800998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110159085.8A CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Publications (2)

Publication Number Publication Date
CN112800998A true CN112800998A (en) 2021-05-14
CN112800998B CN112800998B (en) 2022-07-29

Family

ID=75814276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110159085.8A Active CN112800998B (en) 2021-02-05 2021-02-05 Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA

Country Status (1)

Country Link
CN (1) CN112800998B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269173A (en) * 2021-07-20 2021-08-17 佛山市墨纳森智能科技有限公司 Method and device for establishing emotion recognition model and recognizing human emotion
CN113297981A (en) * 2021-05-27 2021-08-24 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113326781A (en) * 2021-05-31 2021-08-31 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113616209A (en) * 2021-08-25 2021-11-09 西南石油大学 Schizophrenia patient discrimination method based on space-time attention mechanism
CN113729710A (en) * 2021-09-26 2021-12-03 华南师范大学 Real-time attention assessment method and system integrating multiple physiological modes
CN113749656A (en) * 2021-08-20 2021-12-07 杭州回车电子科技有限公司 Emotion identification method and device based on multi-dimensional physiological signals
CN114298189A (en) * 2021-12-20 2022-04-08 深圳市海清视讯科技有限公司 Fatigue driving detection method, device, equipment and storage medium
CN114947852A (en) * 2022-06-14 2022-08-30 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510456A (en) * 2018-03-27 2018-09-07 华南理工大学 The sketch of depth convolutional neural networks based on perception loss simplifies method
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109543502A (en) * 2018-09-27 2019-03-29 天津大学 A kind of semantic segmentation method based on the multiple dimensioned neural network of depth

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁秋壮等: "基于深度学习神经网络的SAR星上目标识别系统研究", 《上海航天》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297981B (en) * 2021-05-27 2023-04-07 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113297981A (en) * 2021-05-27 2021-08-24 西北工业大学 End-to-end electroencephalogram emotion recognition method based on attention mechanism
CN113326781A (en) * 2021-05-31 2021-08-31 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113326781B (en) * 2021-05-31 2022-09-02 合肥工业大学 Non-contact anxiety recognition method and device based on face video
CN113269173A (en) * 2021-07-20 2021-08-17 佛山市墨纳森智能科技有限公司 Method and device for establishing emotion recognition model and recognizing human emotion
CN113749656A (en) * 2021-08-20 2021-12-07 杭州回车电子科技有限公司 Emotion identification method and device based on multi-dimensional physiological signals
CN113749656B (en) * 2021-08-20 2023-12-26 杭州回车电子科技有限公司 Emotion recognition method and device based on multidimensional physiological signals
CN113616209A (en) * 2021-08-25 2021-11-09 西南石油大学 Schizophrenia patient discrimination method based on space-time attention mechanism
CN113616209B (en) * 2021-08-25 2023-08-04 西南石油大学 Method for screening schizophrenic patients based on space-time attention mechanism
CN113729710A (en) * 2021-09-26 2021-12-03 华南师范大学 Real-time attention assessment method and system integrating multiple physiological modes
CN114298189A (en) * 2021-12-20 2022-04-08 深圳市海清视讯科技有限公司 Fatigue driving detection method, device, equipment and storage medium
CN114947852B (en) * 2022-06-14 2023-01-10 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium
CN114947852A (en) * 2022-06-14 2022-08-30 华南师范大学 Multi-mode emotion recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112800998B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN112800998B (en) Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Abdullah et al. Multimodal emotion recognition using deep learning
CN108805087B (en) Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system
CN108877801B (en) Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system
CN108899050B (en) Voice signal analysis subsystem based on multi-modal emotion recognition system
CN108805089B (en) Multi-modal-based emotion recognition method
CN108805088B (en) Physiological signal analysis subsystem based on multi-modal emotion recognition system
CN112784798A (en) Multi-modal emotion recognition method based on feature-time attention mechanism
CN111210846B (en) Parkinson speech recognition system based on integrated manifold dimensionality reduction
CN111134666A (en) Emotion recognition method of multi-channel electroencephalogram data and electronic device
JP6831453B2 (en) Signal search device, method, and program
Schels et al. Multi-modal classifier-fusion for the recognition of emotions
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Xie et al. WT feature based emotion recognition from multi-channel physiological signals with decision fusion
Rayatdoost et al. Subject-invariant EEG representation learning for emotion recognition
Lu et al. Speech depression recognition based on attentional residual network
CN116230234A (en) Multi-mode feature consistency psychological health abnormality identification method and system
Chen et al. Patient emotion recognition in human computer interaction system based on machine learning method and interactive design theory
Chen et al. Hybrid feature embedded sparse stacked autoencoder and manifold dimensionality reduction ensemble for mental health speech recognition
Peng Research on Emotion Recognition Based on Deep Learning for Mental Health
Li et al. Acoustic-articulatory emotion recognition using multiple features and parameter-optimized cascaded deep learning network
Li et al. An optimized multi-label TSK fuzzy system for emotion recognition of multimodal physiological signals
Kächele Machine learning systems for multimodal affect recognition
Kapse et al. Advanced deep learning techniques for depression detection: a review
Akalya devi et al. Multimodal emotion recognition framework using a decision-level fusion and feature-level fusion approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant