CN114424940A - Emotion recognition method and system based on multi-mode spatiotemporal feature fusion - Google Patents
Emotion recognition method and system based on multi-mode spatiotemporal feature fusion Download PDFInfo
- Publication number
- CN114424940A CN114424940A CN202210101019.XA CN202210101019A CN114424940A CN 114424940 A CN114424940 A CN 114424940A CN 202210101019 A CN202210101019 A CN 202210101019A CN 114424940 A CN114424940 A CN 114424940A
- Authority
- CN
- China
- Prior art keywords
- modal
- data
- fusion
- emotion recognition
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 63
- 230000002123 temporal effect Effects 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000008451 emotion Effects 0.000 claims description 25
- 230000000638 stimulation Effects 0.000 claims description 16
- 230000002996 emotional effect Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 15
- 230000009467 reduction Effects 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 7
- 230000015654 memory Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 description 13
- 238000013527 convolutional neural network Methods 0.000 description 11
- 238000000605 extraction Methods 0.000 description 11
- 238000002474 experimental method Methods 0.000 description 8
- 108010076504 Protein Sorting Signals Proteins 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000029058 respiratory gaseous exchange Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000004424 eye movement Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000241 respiratory effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 231100000430 skin reaction Toxicity 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/165—Evaluating the state of mind, e.g. depression, anxiety
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Psychiatry (AREA)
- Public Health (AREA)
- Surgery (AREA)
- Veterinary Medicine (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Child & Adolescent Psychology (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Fuzzy Systems (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Developmental Disabilities (AREA)
- Educational Technology (AREA)
- Hospice & Palliative Care (AREA)
- Psychology (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of emotion recognition, and provides an emotion recognition method and system based on multi-mode spatiotemporal feature fusion, which comprises the following steps: acquiring original physiological data; preprocessing the acquired original physiological data to obtain multi-modal physiological data; respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data; performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features; and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
Description
Technical Field
The disclosure belongs to the technical field of emotion recognition, and particularly relates to an emotion recognition method and system based on multi-mode spatiotemporal feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The emotional changes are usually produced under the stimulation of the external environment, accompanied by changes in individual characteristics and psychological response, and thus can be measured and simulated by scientific methods. With the research and development and popularization of portable and wearable sensors, the difficulty in acquiring physiological signals is reduced, and emotion recognition methods based on physiological signals are concerned by more and more researchers. The emotion recognition method based on physiological signals is used for analyzing emotional changes of human mind by processing the physiological signals. It has been widely used in many fields such as fatigue driving detection and medical care. However, many studies show that physiological signals such as electroencephalograms (EEG), Electrocardiograms (ECG), Galvanic Skin Response (GSR), Respiration (RSP), and Electrooculogram (EOG) have certain correlation with specific emotions, but each physiological signal has different characteristics, and shows different behaviors in emotion recognition tasks, and each physiological signal needs to be studied independently.
Current emotion recognition methods based on physiological signals can be roughly divided into two categories: a single-modal emotion recognition method and a multi-modal emotion recognition method. Among the single-mode emotion recognition methods, the emotion recognition method based on electroencephalogram signals is most used, and besides, physiological signals such as electrocardiograms, skin electricity and respiration have good effects in the emotion recognition field. The multi-modal emotion recognition method is a method for combining data and characteristics of multiple modalities to obtain a final classification result together, and mainly comprises three levels of data level fusion, characteristic level fusion and decision level fusion. The method for recognizing the single-modal or multi-modal emotion mainly comprises the steps of data preprocessing, feature extraction, feature optimization, feature fusion, emotion classification and the like, and is mainly characterized in a feature engineering part. Therefore, how to extract features with strong emotion characterization capability and apply the features to an emotion recognition task becomes a key challenge.
According to the inventor, the existing emotion recognition method has the following technical problems:
(1) the traditional emotion recognition method based on physiological signals mainly extracts artificial features such as statistical features and frequency domain features from data according to professional knowledge and experience, has strong interpretability, but has high requirements on the professional knowledge and can cause problems such as information loss and the like, thereby influencing the recognition accuracy. Some researchers have proposed that extracting high-level features of data using neural networks is another possible method, however, different network structures show a large difference in the effect of feature extraction. How to use the neural network to extract high-level features with high emotion characterization capability becomes a great technical difficulty.
(2) The multi-modal technology mainly comprises three aspects of data level fusion, feature level fusion and decision level fusion, and research is mainly focused on decision level fusion and feature level fusion in the field of emotion recognition. Decision-level fusion is to integrate results on the basis of training a classifier for each mode, so that beneficial information of each mode can be fully utilized, the implementation is simple and strong in interpretability, and complementary information loss among the modes can be caused. The feature level fusion is to extract features from original data, fuse the features of the modalities to obtain fusion features for recognition tasks, and can fully utilize emotion complementarity of the modalities.
Disclosure of Invention
In order to solve the problems, the emotion recognition method and system based on multi-mode spatiotemporal feature fusion are provided by the disclosure, ECG, RSP and eye movement signals are used as input, linear filling and noise reduction processing are carried out on original physiological data, and the influence of abnormal values and noise on recognition accuracy is eliminated; extracting time characteristics and space characteristics of physiological data by using a Convolutional Neural Network (CNN) and a Long Short Term Memory Neural Network (LSTM) to represent emotion; the multi-mode compact bilinear pooling layer is used for fusing the time characteristics and the space characteristics of the physiological data, complementary information among different physiological data is fully utilized on the basis of keeping effective information and reducing dimensionality, the related technical problem of complementary information fusion among different modes is solved, and the identification accuracy of the model is improved.
According to some embodiments, a first aspect of the present disclosure provides an emotion recognition method based on multi-modal spatiotemporal feature fusion, which adopts the following technical solutions:
a method for recognizing emotion based on multi-modal spatiotemporal feature fusion comprises the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
Here, the acquired raw physiological data includes at least acquisition of electrocardiographic, respiratory, and eye movement signals.
As a further technical limitation, the original emotion data set comprises an emotional stimulation stage and a self-evaluation stage, physiological signals in the original emotion data set are cut, and emotional stimulation stage data are intercepted; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
As a further technical limitation, the preprocessing comprises:
the original emotion data set comprises an emotion stimulation stage and self-evaluation truncation, physiological signals in the original emotion data set are cut, and emotion stimulation stage data are intercepted;
performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes;
and carrying out noise reduction processing on the data by using a wavelet noise reduction method, and eliminating the shadow of noise on the identification effect.
Furthermore, the preprocessed multi-modal physiological data is converted into a gray scale image and input into the neural network, and the spatial features of the multi-modal physiological data are extracted from the gray scale image.
Furthermore, the preprocessed multi-modal physiological data are respectively input into the neural network, and the time characteristics of the multi-modal physiological data are extracted.
As a further technical limitation, feature level fusion is carried out on the time features and the spatial features of the multi-modal physiological data extracted from the neural network to obtain fusion features, the fusion features are used for emotion recognition tasks, and the specific process is as follows: counting the occurrence frequency of each element by using a CountSktech algorithm to realize mapping from a high dimension to a low dimension; and fusing the reduced features through a bilinear pooling method to obtain fused features.
As a further technical limitation, a classification task is performed according to the obtained fusion features to obtain a final emotion recognition result, and the specific process is as follows: training an SVM classifier; and inputting the fusion features into a classifier to obtain a final recognition result.
According to some embodiments, a second aspect of the present disclosure provides an emotion recognition system based on multi-modal spatiotemporal feature fusion, which adopts the following technical solutions:
an emotion recognition system based on multi-modal spatiotemporal feature fusion, comprising:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
According to some embodiments, a third aspect of the present disclosure provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, on which a program is stored, which, when executed by a processor, implements the steps in the method for emotion recognition based on multimodal spatiotemporal feature fusion as described in the first aspect of the present disclosure.
According to some embodiments, a fourth aspect of the present disclosure provides an electronic device, which adopts the following technical solutions:
an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method for emotion recognition based on multimodal spatiotemporal feature fusion according to the first aspect of the present disclosure when executing the program.
Compared with the prior art, the beneficial effect of this disclosure is:
1. the method comprises the steps of firstly preprocessing an original physiological signal, wherein the preprocessing step comprises data truncation, linear interpolation and noise reduction processing; secondly, a spatial feature extraction module based on a Two-Dimensional Convolutional Neural Network (2D-CNN for short) is provided, which converts the preprocessed multi-modal physiological data into an image of 1 × 120 × 120 (1 represents the number of layers, and 120 × 120 represents the pixel size of the image), and inputs the image into the 2D-CNN to extract spatial features of the multi-modal physiological data; thirdly, a time characteristic extraction module based on a long-time memory neural network (LSTM) is provided, and the preprocessed multi-modal physiological data are respectively input into the LSTM to extract the time characteristics of the multi-modal physiological data; fourthly, a multimode Compact Bilinear Pooling (multimode Compact Bilinear Pooling) method is adopted to perform feature level fusion on the time features and the space features to obtain fusion features; and fifthly, inputting the fusion characteristics into a trained classifier to obtain a final emotion recognition result.
2. The system described in this disclosure consists of five parts: the device comprises a data preprocessing module, a spatial feature extraction module, a temporal feature extraction module, a feature fusion module and an emotion classification module. Through analysis, when the human emotion changes or fluctuates greatly, the EEG, the RSP and the like also change, such as amplitude increase, frequency increase and the like, and the current emotion is accurately identified through analysis of the waveform rules; the spatial feature extraction module proposed in the present disclosure converts the preprocessed multi-modal physiological data into 1 × 120 × 120 images (1 represents the number of layers, and 120 × 120 represents the pixel size of the images), extracts the spatial features of the physiological signals using a 2D-CNN network, and utilizes the waveform law of the physiological signals to the maximum extent.
3. In order to extract the time information in the physiological signal in a targeted manner, the disclosure provides a time characteristic extraction module, which respectively inputs the preprocessed physiological signal into the LSTM to extract the time characteristic, and utilizes the time information in the physiological signal to the maximum extent.
4. In order to fully integrate the time characteristic and the space characteristic and utilize complementary information between different modes, the invention provides a characteristic fusion module which fuses the time characteristic and the space characteristic by adopting a multimode compact bilinear pooling method to obtain a fusion characteristic for an emotion recognition task.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flowchart of an emotion recognition method based on multi-modal spatiotemporal feature fusion in a first embodiment of the disclosure;
FIG. 2 is a specific working schematic diagram of an emotion recognition method based on multi-modal spatiotemporal feature fusion in a first embodiment of the disclosure;
FIG. 3 is a flowchart illustration of an emotion recognition method based on multi-modal spatiotemporal feature fusion in an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of spatial feature extraction in the first embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of temporal feature extraction in the first embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of feature fusion in the first embodiment of the present disclosure;
fig. 7 is a block diagram of a structure of an emotion recognition system based on multi-modal spatiotemporal feature fusion in a second embodiment of the disclosure.
Detailed Description
The present disclosure is further illustrated by the following examples in conjunction with the accompanying drawings.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment I of the disclosure introduces an emotion recognition method based on multi-mode spatiotemporal feature fusion.
As shown in FIG. 1, the emotion recognition method based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
As shown in fig. 2 and fig. 3, the embodiment provides an emotion recognition method based on multi-modal spatiotemporal feature fusion, and the embodiment is exemplified by applying the method to a server, it can be understood that the method can also be applied to a terminal, and can also be applied to a system comprising the terminal and the server, and is implemented through interaction between the terminal and the server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network server, cloud communication, middleware service, a domain name service, a security service CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the method includes the steps of:
step S01: for an original emotion data set D, data cutting, linear interpolation and noise reduction processing are carried out on physiological signals in the data set D to obtain a data set D*;
Step S02: for data set D*The physiological signal S in (1) is converted into a gray image I _ S, and the gray image I _ S is input into the 2D-CNN to obtain a spatial characteristic Fspatio;
Step S03: for data level D*The physiological signal S is input into the LSTM to obtain a time characteristic Ftemporal;
Step S04: spatial feature FspatioAnd time characteristic FtemporalInputting the data into a multi-mode compact bilinear pooling layer for feature fusion to obtain a fusion feature Ffusion;
Step S05: fusing the features FfusionInput to the classificationAnd obtaining a final recognition result in the device.
In step S01 of the embodiment, the original emotion data set includes an emotional stimulation phase and a self-evaluation phase, and the physiological signals in the original emotion data set are clipped to capture emotional stimulation phase data; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
Initializing input original mood data set D ═ S1,S1,…,Sn]In which S isnAnd (4) obtaining a physiological signal sequence collected in the nth emotional stimulation experiment, wherein N belongs to N, and N represents the number of experiments. Physiological signal sequence S collected in each experimentn=[ECG1,n,ECG2,n,ECG3,n,RSPn,Eye_Datan]Wherein the ECGi,nRepresenting the electrocardiosignals of the ith channel collected in the nth emotional stimulation experiment, i is 1,2,3, RSPnRepresents the respiratory Data collected in the nth emotional stimulation experiment, Eye _ DatanShowing the eye movement data collected in the nth emotional stimulation experiment. Physiological signal sequence S for each experiment in the data set DnPerforming data truncation, intercepting the physiological signal of the emotional stimulation part, and performing linear interpolation and noise reduction processing to obtain a preprocessed data set D*。
In step S02 of the present embodiment, as shown in fig. 4, the present disclosure uses matplotlib toolkit to combine data set D*Each physiological signal sequence in (1 × 120 × 120, 1 representing the number of layers, 120 × 120 representing the pixel size of the image) is converted into a grayscale image. Then building a 2D-CNN network by using a Pythrch toolkit, inputting the gray level picture into the network, and extracting the spatial feature Fspatio。
For the design of the spatial feature extraction module, firstly, a grayscale image I _ s with the dimension of 120 × 120 × 1 is input into a 2D-CNN, the 2D-CNN includes 2 convolution operations of 5 × 5 × 32 and 2 convolution operations of 5 × 5 × 64 to obtain a feature map of 30 × 30 × 64, and then a spatial feature F with the dimension of 1 × 50 is obtained through two fully connected layersspatio. Convolution with a bit lineThe operation can be expressed as:
s(t)=(x×ω)(t) (1)
where the first parameter x becomes the "input data", the second parameter ω is called the "kernel function", and s (t) is the output, i.e. the feature map.
Taking the gray image I _ s of a certain experiment as an example, the calculation process for extracting the spatial features by using the 2D-CNN is as follows:
(1) first convolutional layer: inputting a 120 × 120 × 1 grayscale image I _ s (height × width × number of color channels), the layer having 32 convolution kernels, each convolution kernel having a size of 7 × 7 × 1, and an output matrix having a size of 60 × 60 × 32;
(2) a second convolutional layer: inputting a matrix of 60 multiplied by 32, wherein the layer has 32 convolution kernels, the size of each convolution kernel is 7 multiplied by 32, and the size of an output matrix is 60 multiplied by 32;
(3) a third convolutional layer: inputting a matrix of 60 × 60 × 32, wherein the layer has 64 convolution kernels, the size of each convolution kernel is 7 × 7 × 32, and the size of an output matrix is 30 × 30 × 64;
(4) a fourth convolutional layer: inputting a matrix of 30 × 30 × 64, wherein the layer has 64 convolution kernels, the size of each convolution kernel is 7 × 7 × 64, and the size of an output matrix is 30 × 30 × 64;
(5) first fully-connected layer: inputting a matrix of 30 multiplied by 64 and outputting a matrix of 1 multiplied by 512;
(6) second fully-connected layer: the input is a 1 × 512 matrix and the output is a 1 × 50 feature matrix.
In step S03 of the present embodiment, as shown in fig. 5, the present disclosure builds an LSTM network from dataset D using a pytorech toolkit*Extracting a time characteristic F from each physiological signal sequence in the sequencetemporal. The LSTM network requires the concept of a gate, which is essentially a fully connected layer, and can be expressed as:
g(x)=σ(wx+b) (2)
where x is the input vector, w is the weight vector of the gate, b is the bias term, and g (x) is the output vector.
LSTM uses two gates to control the contents of cell state c, oneIs a forgetting Gate (Forget Gate) which determines the cell state c at the previous momentt-1How much to keep current time ctThe forgetting gate can be expressed as:
f1=σ(Wf·[ht-1·xt]+bf) (3)
wherein, WfIs the weight matrix of the forgetting gate, [ h ]t-1·xt]Representing the concatenation of two vectors into a longer vector, bfIs the bias term for the forgetting gate, σ is the sigmoid function.
The other is an Input Gate (Input Gate), which determines the Input x of the network at the present momenttHow many cells have been saved to cell state ct. The input gate can be expressed as:
it=σ(Wi·[ht-1·xt]+bi) (4)
wherein, WiIs the weight matrix of the forgetting gate, biIs the bias term for the forgetting gate, σ is the sigmoid function.
Currently entered cell stateIt is calculated according to the last output and the current input, and the calculation formula is as follows:
next, the state c of the cell at the current time is calculatedtFrom the last cell state ct-1Multiplication by element of forget gate ftReuse the currently input cell stateElement multiplication input gate itAnd then the two products are added to generate the sum, and the calculation formula is as follows:
wherein, ". "means multiplication by element.
Output Gate (Output Gate) for LSTM controls how much unit state c is Output to current Output value h of LSTMtThe calculation is disclosed as follows:
ot=σ(Wo·[ht-1·xt]+bo) (7)
the final output of the LSTM is determined by the output gate and cell states, and the calculation is disclosed as follows:
ht=ot·tanh(ct) (8)
in step S04 of the embodiment, as shown in fig. 6, the present disclosure uses the CountSktech algorithm to count the frequency of occurrence of each element, implementing data FspatioAnd FtemporalMapping from high dimension to low dimension to obtain the feature after dimension reductionAndthe computation load of subsequent feature fusion is reduced. Then, a fusion feature F is obtained by Fast Fourier Transform (FFT), point multiplication, and Inverse Fast Fourier Transform (IFFT)fusion。
In step S05 of this embodiment, a classification task is performed according to the obtained fusion features to obtain a final emotion recognition result, which specifically includes:
training an SVM classifier;
and inputting the fusion features into a classifier to obtain a final recognition result.
Example two
The second embodiment of the disclosure introduces an emotion recognition system based on multi-modal spatiotemporal feature fusion.
Fig. 7 shows an emotion recognition system based on multi-modal spatiotemporal feature fusion, which includes:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described herein again.
EXAMPLE III
The third embodiment of the disclosure provides a computer-readable storage medium.
A computer-readable storage medium, on which a program is stored, which when executed by a processor implements the steps in the method for emotion recognition based on multi-modal spatiotemporal feature fusion according to an embodiment of the present disclosure.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described again here.
Example four
The fourth embodiment of the disclosure provides an electronic device.
An electronic device includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for emotion recognition based on multi-modal spatiotemporal feature fusion according to an embodiment of the present disclosure.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described herein again.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A method for recognizing emotion based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
2. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein the original emotion data set comprises an emotional stimulation phase and a self-assessment phase, the physiological signals in the original emotion data set are cut, and the emotional stimulation phase data are intercepted; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
3. A method of emotion recognition based on multi-modal spatiotemporal feature fusion as recited in claim 1, wherein said preprocessing comprises:
the original emotion data set comprises an emotional stimulation phase and self-evaluation truncation, a physiological signal in the original emotion data set is cut, and emotional stimulation phase data are intercepted;
performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes;
and performing noise reduction processing on the data by using a wavelet noise reduction method, and eliminating the shadow of noise on the identification effect.
4. The emotion recognition method based on fusion of multi-modal spatiotemporal features as defined in claim 3, wherein the preprocessed multi-modal physiological data is converted into gray scale images and inputted into the neural network, and the spatial features of the multi-modal physiological data are extracted from the gray scale images.
5. The emotion recognition method based on fusion of multi-modal spatiotemporal features as defined in claim 3, wherein the pre-processed multi-modal physiological data are respectively inputted into the neural network to extract the temporal features of the multi-modal physiological data.
6. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein the temporal features and the spatial features of the multi-modal physiological data extracted from the neural network are subjected to feature level fusion to obtain fusion features for emotion recognition tasks, and the specific process is as follows: counting the occurrence frequency of each element by using a CountSktech algorithm to realize mapping from a high dimension to a low dimension; and fusing the reduced features through a bilinear pooling method to obtain fused features.
7. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein, according to the obtained fusion features, a classification task is performed to obtain a final emotion recognition result, and the specific process is as follows: training an SVM classifier; and inputting the fusion features into a classifier to obtain a final recognition result.
8. An emotion recognition system based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps of:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
9. A computer-readable storage medium, on which a program is stored, which when executed by a processor performs the steps in the method for emotion recognition based on multi-modal spatiotemporal feature fusion as claimed in any of claims 1-7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for emotion recognition based on multi-modal spatiotemporal feature fusion as claimed in any of claims 1-7 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210101019.XA CN114424940A (en) | 2022-01-27 | 2022-01-27 | Emotion recognition method and system based on multi-mode spatiotemporal feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210101019.XA CN114424940A (en) | 2022-01-27 | 2022-01-27 | Emotion recognition method and system based on multi-mode spatiotemporal feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114424940A true CN114424940A (en) | 2022-05-03 |
Family
ID=81313284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210101019.XA Pending CN114424940A (en) | 2022-01-27 | 2022-01-27 | Emotion recognition method and system based on multi-mode spatiotemporal feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114424940A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694234A (en) * | 2022-06-02 | 2022-07-01 | 杭州智诺科技股份有限公司 | Emotion recognition method, system, electronic device and storage medium |
CN114913590A (en) * | 2022-07-15 | 2022-08-16 | 山东海量信息技术研究院 | Data emotion recognition method, device and equipment and readable storage medium |
CN115985464A (en) * | 2023-03-17 | 2023-04-18 | 山东大学齐鲁医院 | Muscle fatigue degree classification method and system based on multi-modal data fusion |
CN117520826A (en) * | 2024-01-03 | 2024-02-06 | 武汉纺织大学 | Multi-mode emotion recognition method and system based on wearable equipment |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066583A (en) * | 2017-04-14 | 2017-08-18 | 华侨大学 | A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity |
CN109508375A (en) * | 2018-11-19 | 2019-03-22 | 重庆邮电大学 | A kind of social affective classification method based on multi-modal fusion |
CN109614895A (en) * | 2018-10-29 | 2019-04-12 | 山东大学 | A method of the multi-modal emotion recognition based on attention Fusion Features |
CN109934158A (en) * | 2019-03-11 | 2019-06-25 | 合肥工业大学 | Video feeling recognition methods based on local strengthening motion history figure and recursive convolution neural network |
KR20190128933A (en) * | 2018-05-09 | 2019-11-19 | 연세대학교 산학협력단 | Emotion recognition apparatus and method based on spatiotemporal attention |
CN110532409A (en) * | 2019-07-30 | 2019-12-03 | 西北工业大学 | Image search method based on isomery bilinearity attention network |
CN111297380A (en) * | 2020-02-12 | 2020-06-19 | 电子科技大学 | Emotion recognition method based on space-time convolution core block |
CN111461204A (en) * | 2020-03-30 | 2020-07-28 | 华南理工大学 | Emotion identification method based on electroencephalogram signals and used for game evaluation |
CN112120716A (en) * | 2020-09-02 | 2020-12-25 | 中国人民解放军军事科学院国防科技创新研究院 | Wearable multi-mode emotional state monitoring device |
CN112381008A (en) * | 2020-11-17 | 2021-02-19 | 天津大学 | Electroencephalogram emotion recognition method based on parallel sequence channel mapping network |
CN113057633A (en) * | 2021-03-26 | 2021-07-02 | 华南理工大学 | Multi-modal emotional stress recognition method and device, computer equipment and storage medium |
CN113288146A (en) * | 2021-05-26 | 2021-08-24 | 杭州电子科技大学 | Electroencephalogram emotion classification method based on time-space-frequency combined characteristics |
CN113705398A (en) * | 2021-08-17 | 2021-11-26 | 陕西师范大学 | Music electroencephalogram space-time characteristic classification method based on convolution-long and short term memory network |
-
2022
- 2022-01-27 CN CN202210101019.XA patent/CN114424940A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066583A (en) * | 2017-04-14 | 2017-08-18 | 华侨大学 | A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity |
KR20190128933A (en) * | 2018-05-09 | 2019-11-19 | 연세대학교 산학협력단 | Emotion recognition apparatus and method based on spatiotemporal attention |
CN109614895A (en) * | 2018-10-29 | 2019-04-12 | 山东大学 | A method of the multi-modal emotion recognition based on attention Fusion Features |
CN109508375A (en) * | 2018-11-19 | 2019-03-22 | 重庆邮电大学 | A kind of social affective classification method based on multi-modal fusion |
CN109934158A (en) * | 2019-03-11 | 2019-06-25 | 合肥工业大学 | Video feeling recognition methods based on local strengthening motion history figure and recursive convolution neural network |
CN110532409A (en) * | 2019-07-30 | 2019-12-03 | 西北工业大学 | Image search method based on isomery bilinearity attention network |
CN111297380A (en) * | 2020-02-12 | 2020-06-19 | 电子科技大学 | Emotion recognition method based on space-time convolution core block |
CN111461204A (en) * | 2020-03-30 | 2020-07-28 | 华南理工大学 | Emotion identification method based on electroencephalogram signals and used for game evaluation |
CN112120716A (en) * | 2020-09-02 | 2020-12-25 | 中国人民解放军军事科学院国防科技创新研究院 | Wearable multi-mode emotional state monitoring device |
CN112381008A (en) * | 2020-11-17 | 2021-02-19 | 天津大学 | Electroencephalogram emotion recognition method based on parallel sequence channel mapping network |
CN113057633A (en) * | 2021-03-26 | 2021-07-02 | 华南理工大学 | Multi-modal emotional stress recognition method and device, computer equipment and storage medium |
CN113288146A (en) * | 2021-05-26 | 2021-08-24 | 杭州电子科技大学 | Electroencephalogram emotion classification method based on time-space-frequency combined characteristics |
CN113705398A (en) * | 2021-08-17 | 2021-11-26 | 陕西师范大学 | Music electroencephalogram space-time characteristic classification method based on convolution-long and short term memory network |
Non-Patent Citations (1)
Title |
---|
程程: "基于多模态深度学习的情感识别方法研究", 《中国优秀硕士学位论文全文库》, 15 August 2021 (2021-08-15), pages 1 - 4 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694234A (en) * | 2022-06-02 | 2022-07-01 | 杭州智诺科技股份有限公司 | Emotion recognition method, system, electronic device and storage medium |
CN114694234B (en) * | 2022-06-02 | 2023-02-03 | 杭州智诺科技股份有限公司 | Emotion recognition method, system, electronic device and storage medium |
CN114913590A (en) * | 2022-07-15 | 2022-08-16 | 山东海量信息技术研究院 | Data emotion recognition method, device and equipment and readable storage medium |
CN115985464A (en) * | 2023-03-17 | 2023-04-18 | 山东大学齐鲁医院 | Muscle fatigue degree classification method and system based on multi-modal data fusion |
CN117520826A (en) * | 2024-01-03 | 2024-02-06 | 武汉纺织大学 | Multi-mode emotion recognition method and system based on wearable equipment |
CN117520826B (en) * | 2024-01-03 | 2024-04-05 | 武汉纺织大学 | Multi-mode emotion recognition method and system based on wearable equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tao et al. | EEG-based emotion recognition via channel-wise attention and self attention | |
CN111209885B (en) | Gesture information processing method and device, electronic equipment and storage medium | |
CN113693613B (en) | Electroencephalogram signal classification method, electroencephalogram signal classification device, computer equipment and storage medium | |
CN114424940A (en) | Emotion recognition method and system based on multi-mode spatiotemporal feature fusion | |
Guo et al. | A hybrid fuzzy cognitive map/support vector machine approach for EEG-based emotion classification using compressed sensing | |
Majumdar et al. | Robust greedy deep dictionary learning for ECG arrhythmia classification | |
CN110555468A (en) | Electroencephalogram signal identification method and system combining recursion graph and CNN | |
Peng et al. | OGSSL: A semi-supervised classification model coupled with optimal graph learning for EEG emotion recognition | |
Zhao et al. | Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection | |
Jinliang et al. | EEG emotion recognition based on granger causality and capsnet neural network | |
WO2022012668A1 (en) | Training set processing method and apparatus | |
WO2022188793A1 (en) | Electrophysiological signal classification processing method and apparatus, computer device and storage medium | |
Miah et al. | Movie oriented positive negative emotion classification from eeg signal using wavelet transformation and machine learning approaches | |
Paul et al. | Deep learning and its importance for early signature of neuronal disorders | |
Tang et al. | A hybrid SAE and CNN classifier for motor imagery EEG classification | |
CN114595725B (en) | Electroencephalogram signal classification method based on addition network and supervised contrast learning | |
CN115238796A (en) | Motor imagery electroencephalogram signal classification method based on parallel DAMSCN-LSTM | |
Kulkarni et al. | Analysis of DEAP dataset for emotion recognition | |
CN117874570A (en) | Electroencephalogram signal multi-classification method, equipment and medium based on mixed attention mechanism | |
Houssein et al. | TFCNN-BiGRU with self-attention mechanism for automatic human emotion recognition using multi-channel EEG data | |
Indira et al. | Deep learning methods for data science | |
CN112259228A (en) | Depression screening method by dynamic attention network non-negative matrix factorization | |
Malathi et al. | An estimation of PCA feature extraction in EEG-based emotion prediction with support vector machines | |
CN116421200A (en) | Brain electricity emotion analysis method of multi-task mixed model based on parallel training | |
Taha et al. | EEG Emotion Recognition Via Ensemble Learning Representations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |