CN114424940A - Emotion recognition method and system based on multi-mode spatiotemporal feature fusion - Google Patents

Emotion recognition method and system based on multi-mode spatiotemporal feature fusion Download PDF

Info

Publication number
CN114424940A
CN114424940A CN202210101019.XA CN202210101019A CN114424940A CN 114424940 A CN114424940 A CN 114424940A CN 202210101019 A CN202210101019 A CN 202210101019A CN 114424940 A CN114424940 A CN 114424940A
Authority
CN
China
Prior art keywords
modal
data
fusion
emotion recognition
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210101019.XA
Other languages
Chinese (zh)
Inventor
郑向伟
郭鲠源
张利峰
郑法
高鹏志
嵇存
李淑芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202210101019.XA priority Critical patent/CN114424940A/en
Publication of CN114424940A publication Critical patent/CN114424940A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Public Health (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Fuzzy Systems (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Developmental Disabilities (AREA)
  • Educational Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of emotion recognition, and provides an emotion recognition method and system based on multi-mode spatiotemporal feature fusion, which comprises the following steps: acquiring original physiological data; preprocessing the acquired original physiological data to obtain multi-modal physiological data; respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data; performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features; and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.

Description

Emotion recognition method and system based on multi-mode spatiotemporal feature fusion
Technical Field
The disclosure belongs to the technical field of emotion recognition, and particularly relates to an emotion recognition method and system based on multi-mode spatiotemporal feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The emotional changes are usually produced under the stimulation of the external environment, accompanied by changes in individual characteristics and psychological response, and thus can be measured and simulated by scientific methods. With the research and development and popularization of portable and wearable sensors, the difficulty in acquiring physiological signals is reduced, and emotion recognition methods based on physiological signals are concerned by more and more researchers. The emotion recognition method based on physiological signals is used for analyzing emotional changes of human mind by processing the physiological signals. It has been widely used in many fields such as fatigue driving detection and medical care. However, many studies show that physiological signals such as electroencephalograms (EEG), Electrocardiograms (ECG), Galvanic Skin Response (GSR), Respiration (RSP), and Electrooculogram (EOG) have certain correlation with specific emotions, but each physiological signal has different characteristics, and shows different behaviors in emotion recognition tasks, and each physiological signal needs to be studied independently.
Current emotion recognition methods based on physiological signals can be roughly divided into two categories: a single-modal emotion recognition method and a multi-modal emotion recognition method. Among the single-mode emotion recognition methods, the emotion recognition method based on electroencephalogram signals is most used, and besides, physiological signals such as electrocardiograms, skin electricity and respiration have good effects in the emotion recognition field. The multi-modal emotion recognition method is a method for combining data and characteristics of multiple modalities to obtain a final classification result together, and mainly comprises three levels of data level fusion, characteristic level fusion and decision level fusion. The method for recognizing the single-modal or multi-modal emotion mainly comprises the steps of data preprocessing, feature extraction, feature optimization, feature fusion, emotion classification and the like, and is mainly characterized in a feature engineering part. Therefore, how to extract features with strong emotion characterization capability and apply the features to an emotion recognition task becomes a key challenge.
According to the inventor, the existing emotion recognition method has the following technical problems:
(1) the traditional emotion recognition method based on physiological signals mainly extracts artificial features such as statistical features and frequency domain features from data according to professional knowledge and experience, has strong interpretability, but has high requirements on the professional knowledge and can cause problems such as information loss and the like, thereby influencing the recognition accuracy. Some researchers have proposed that extracting high-level features of data using neural networks is another possible method, however, different network structures show a large difference in the effect of feature extraction. How to use the neural network to extract high-level features with high emotion characterization capability becomes a great technical difficulty.
(2) The multi-modal technology mainly comprises three aspects of data level fusion, feature level fusion and decision level fusion, and research is mainly focused on decision level fusion and feature level fusion in the field of emotion recognition. Decision-level fusion is to integrate results on the basis of training a classifier for each mode, so that beneficial information of each mode can be fully utilized, the implementation is simple and strong in interpretability, and complementary information loss among the modes can be caused. The feature level fusion is to extract features from original data, fuse the features of the modalities to obtain fusion features for recognition tasks, and can fully utilize emotion complementarity of the modalities.
Disclosure of Invention
In order to solve the problems, the emotion recognition method and system based on multi-mode spatiotemporal feature fusion are provided by the disclosure, ECG, RSP and eye movement signals are used as input, linear filling and noise reduction processing are carried out on original physiological data, and the influence of abnormal values and noise on recognition accuracy is eliminated; extracting time characteristics and space characteristics of physiological data by using a Convolutional Neural Network (CNN) and a Long Short Term Memory Neural Network (LSTM) to represent emotion; the multi-mode compact bilinear pooling layer is used for fusing the time characteristics and the space characteristics of the physiological data, complementary information among different physiological data is fully utilized on the basis of keeping effective information and reducing dimensionality, the related technical problem of complementary information fusion among different modes is solved, and the identification accuracy of the model is improved.
According to some embodiments, a first aspect of the present disclosure provides an emotion recognition method based on multi-modal spatiotemporal feature fusion, which adopts the following technical solutions:
a method for recognizing emotion based on multi-modal spatiotemporal feature fusion comprises the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
Here, the acquired raw physiological data includes at least acquisition of electrocardiographic, respiratory, and eye movement signals.
As a further technical limitation, the original emotion data set comprises an emotional stimulation stage and a self-evaluation stage, physiological signals in the original emotion data set are cut, and emotional stimulation stage data are intercepted; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
As a further technical limitation, the preprocessing comprises:
the original emotion data set comprises an emotion stimulation stage and self-evaluation truncation, physiological signals in the original emotion data set are cut, and emotion stimulation stage data are intercepted;
performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes;
and carrying out noise reduction processing on the data by using a wavelet noise reduction method, and eliminating the shadow of noise on the identification effect.
Furthermore, the preprocessed multi-modal physiological data is converted into a gray scale image and input into the neural network, and the spatial features of the multi-modal physiological data are extracted from the gray scale image.
Furthermore, the preprocessed multi-modal physiological data are respectively input into the neural network, and the time characteristics of the multi-modal physiological data are extracted.
As a further technical limitation, feature level fusion is carried out on the time features and the spatial features of the multi-modal physiological data extracted from the neural network to obtain fusion features, the fusion features are used for emotion recognition tasks, and the specific process is as follows: counting the occurrence frequency of each element by using a CountSktech algorithm to realize mapping from a high dimension to a low dimension; and fusing the reduced features through a bilinear pooling method to obtain fused features.
As a further technical limitation, a classification task is performed according to the obtained fusion features to obtain a final emotion recognition result, and the specific process is as follows: training an SVM classifier; and inputting the fusion features into a classifier to obtain a final recognition result.
According to some embodiments, a second aspect of the present disclosure provides an emotion recognition system based on multi-modal spatiotemporal feature fusion, which adopts the following technical solutions:
an emotion recognition system based on multi-modal spatiotemporal feature fusion, comprising:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
According to some embodiments, a third aspect of the present disclosure provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, on which a program is stored, which, when executed by a processor, implements the steps in the method for emotion recognition based on multimodal spatiotemporal feature fusion as described in the first aspect of the present disclosure.
According to some embodiments, a fourth aspect of the present disclosure provides an electronic device, which adopts the following technical solutions:
an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method for emotion recognition based on multimodal spatiotemporal feature fusion according to the first aspect of the present disclosure when executing the program.
Compared with the prior art, the beneficial effect of this disclosure is:
1. the method comprises the steps of firstly preprocessing an original physiological signal, wherein the preprocessing step comprises data truncation, linear interpolation and noise reduction processing; secondly, a spatial feature extraction module based on a Two-Dimensional Convolutional Neural Network (2D-CNN for short) is provided, which converts the preprocessed multi-modal physiological data into an image of 1 × 120 × 120 (1 represents the number of layers, and 120 × 120 represents the pixel size of the image), and inputs the image into the 2D-CNN to extract spatial features of the multi-modal physiological data; thirdly, a time characteristic extraction module based on a long-time memory neural network (LSTM) is provided, and the preprocessed multi-modal physiological data are respectively input into the LSTM to extract the time characteristics of the multi-modal physiological data; fourthly, a multimode Compact Bilinear Pooling (multimode Compact Bilinear Pooling) method is adopted to perform feature level fusion on the time features and the space features to obtain fusion features; and fifthly, inputting the fusion characteristics into a trained classifier to obtain a final emotion recognition result.
2. The system described in this disclosure consists of five parts: the device comprises a data preprocessing module, a spatial feature extraction module, a temporal feature extraction module, a feature fusion module and an emotion classification module. Through analysis, when the human emotion changes or fluctuates greatly, the EEG, the RSP and the like also change, such as amplitude increase, frequency increase and the like, and the current emotion is accurately identified through analysis of the waveform rules; the spatial feature extraction module proposed in the present disclosure converts the preprocessed multi-modal physiological data into 1 × 120 × 120 images (1 represents the number of layers, and 120 × 120 represents the pixel size of the images), extracts the spatial features of the physiological signals using a 2D-CNN network, and utilizes the waveform law of the physiological signals to the maximum extent.
3. In order to extract the time information in the physiological signal in a targeted manner, the disclosure provides a time characteristic extraction module, which respectively inputs the preprocessed physiological signal into the LSTM to extract the time characteristic, and utilizes the time information in the physiological signal to the maximum extent.
4. In order to fully integrate the time characteristic and the space characteristic and utilize complementary information between different modes, the invention provides a characteristic fusion module which fuses the time characteristic and the space characteristic by adopting a multimode compact bilinear pooling method to obtain a fusion characteristic for an emotion recognition task.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flowchart of an emotion recognition method based on multi-modal spatiotemporal feature fusion in a first embodiment of the disclosure;
FIG. 2 is a specific working schematic diagram of an emotion recognition method based on multi-modal spatiotemporal feature fusion in a first embodiment of the disclosure;
FIG. 3 is a flowchart illustration of an emotion recognition method based on multi-modal spatiotemporal feature fusion in an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of spatial feature extraction in the first embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of temporal feature extraction in the first embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of feature fusion in the first embodiment of the present disclosure;
fig. 7 is a block diagram of a structure of an emotion recognition system based on multi-modal spatiotemporal feature fusion in a second embodiment of the disclosure.
Detailed Description
The present disclosure is further illustrated by the following examples in conjunction with the accompanying drawings.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment I of the disclosure introduces an emotion recognition method based on multi-mode spatiotemporal feature fusion.
As shown in FIG. 1, the emotion recognition method based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
As shown in fig. 2 and fig. 3, the embodiment provides an emotion recognition method based on multi-modal spatiotemporal feature fusion, and the embodiment is exemplified by applying the method to a server, it can be understood that the method can also be applied to a terminal, and can also be applied to a system comprising the terminal and the server, and is implemented through interaction between the terminal and the server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network server, cloud communication, middleware service, a domain name service, a security service CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the method includes the steps of:
step S01: for an original emotion data set D, data cutting, linear interpolation and noise reduction processing are carried out on physiological signals in the data set D to obtain a data set D*
Step S02: for data set D*The physiological signal S in (1) is converted into a gray image I _ S, and the gray image I _ S is input into the 2D-CNN to obtain a spatial characteristic Fspatio
Step S03: for data level D*The physiological signal S is input into the LSTM to obtain a time characteristic Ftemporal
Step S04: spatial feature FspatioAnd time characteristic FtemporalInputting the data into a multi-mode compact bilinear pooling layer for feature fusion to obtain a fusion feature Ffusion
Step S05: fusing the features FfusionInput to the classificationAnd obtaining a final recognition result in the device.
In step S01 of the embodiment, the original emotion data set includes an emotional stimulation phase and a self-evaluation phase, and the physiological signals in the original emotion data set are clipped to capture emotional stimulation phase data; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
Initializing input original mood data set D ═ S1,S1,…,Sn]In which S isnAnd (4) obtaining a physiological signal sequence collected in the nth emotional stimulation experiment, wherein N belongs to N, and N represents the number of experiments. Physiological signal sequence S collected in each experimentn=[ECG1,n,ECG2,n,ECG3,n,RSPn,Eye_Datan]Wherein the ECGi,nRepresenting the electrocardiosignals of the ith channel collected in the nth emotional stimulation experiment, i is 1,2,3, RSPnRepresents the respiratory Data collected in the nth emotional stimulation experiment, Eye _ DatanShowing the eye movement data collected in the nth emotional stimulation experiment. Physiological signal sequence S for each experiment in the data set DnPerforming data truncation, intercepting the physiological signal of the emotional stimulation part, and performing linear interpolation and noise reduction processing to obtain a preprocessed data set D*
In step S02 of the present embodiment, as shown in fig. 4, the present disclosure uses matplotlib toolkit to combine data set D*Each physiological signal sequence in (1 × 120 × 120, 1 representing the number of layers, 120 × 120 representing the pixel size of the image) is converted into a grayscale image. Then building a 2D-CNN network by using a Pythrch toolkit, inputting the gray level picture into the network, and extracting the spatial feature Fspatio
For the design of the spatial feature extraction module, firstly, a grayscale image I _ s with the dimension of 120 × 120 × 1 is input into a 2D-CNN, the 2D-CNN includes 2 convolution operations of 5 × 5 × 32 and 2 convolution operations of 5 × 5 × 64 to obtain a feature map of 30 × 30 × 64, and then a spatial feature F with the dimension of 1 × 50 is obtained through two fully connected layersspatio. Convolution with a bit lineThe operation can be expressed as:
s(t)=(x×ω)(t) (1)
where the first parameter x becomes the "input data", the second parameter ω is called the "kernel function", and s (t) is the output, i.e. the feature map.
Taking the gray image I _ s of a certain experiment as an example, the calculation process for extracting the spatial features by using the 2D-CNN is as follows:
(1) first convolutional layer: inputting a 120 × 120 × 1 grayscale image I _ s (height × width × number of color channels), the layer having 32 convolution kernels, each convolution kernel having a size of 7 × 7 × 1, and an output matrix having a size of 60 × 60 × 32;
(2) a second convolutional layer: inputting a matrix of 60 multiplied by 32, wherein the layer has 32 convolution kernels, the size of each convolution kernel is 7 multiplied by 32, and the size of an output matrix is 60 multiplied by 32;
(3) a third convolutional layer: inputting a matrix of 60 × 60 × 32, wherein the layer has 64 convolution kernels, the size of each convolution kernel is 7 × 7 × 32, and the size of an output matrix is 30 × 30 × 64;
(4) a fourth convolutional layer: inputting a matrix of 30 × 30 × 64, wherein the layer has 64 convolution kernels, the size of each convolution kernel is 7 × 7 × 64, and the size of an output matrix is 30 × 30 × 64;
(5) first fully-connected layer: inputting a matrix of 30 multiplied by 64 and outputting a matrix of 1 multiplied by 512;
(6) second fully-connected layer: the input is a 1 × 512 matrix and the output is a 1 × 50 feature matrix.
In step S03 of the present embodiment, as shown in fig. 5, the present disclosure builds an LSTM network from dataset D using a pytorech toolkit*Extracting a time characteristic F from each physiological signal sequence in the sequencetemporal. The LSTM network requires the concept of a gate, which is essentially a fully connected layer, and can be expressed as:
g(x)=σ(wx+b) (2)
where x is the input vector, w is the weight vector of the gate, b is the bias term, and g (x) is the output vector.
LSTM uses two gates to control the contents of cell state c, oneIs a forgetting Gate (Forget Gate) which determines the cell state c at the previous momentt-1How much to keep current time ctThe forgetting gate can be expressed as:
f1=σ(Wf·[ht-1·xt]+bf) (3)
wherein, WfIs the weight matrix of the forgetting gate, [ h ]t-1·xt]Representing the concatenation of two vectors into a longer vector, bfIs the bias term for the forgetting gate, σ is the sigmoid function.
The other is an Input Gate (Input Gate), which determines the Input x of the network at the present momenttHow many cells have been saved to cell state ct. The input gate can be expressed as:
it=σ(Wi·[ht-1·xt]+bi) (4)
wherein, WiIs the weight matrix of the forgetting gate, biIs the bias term for the forgetting gate, σ is the sigmoid function.
Currently entered cell state
Figure BDA0003492307340000131
It is calculated according to the last output and the current input, and the calculation formula is as follows:
Figure BDA0003492307340000132
next, the state c of the cell at the current time is calculatedtFrom the last cell state ct-1Multiplication by element of forget gate ftReuse the currently input cell state
Figure BDA0003492307340000133
Element multiplication input gate itAnd then the two products are added to generate the sum, and the calculation formula is as follows:
Figure BDA0003492307340000134
wherein, ". "means multiplication by element.
Output Gate (Output Gate) for LSTM controls how much unit state c is Output to current Output value h of LSTMtThe calculation is disclosed as follows:
ot=σ(Wo·[ht-1·xt]+bo) (7)
the final output of the LSTM is determined by the output gate and cell states, and the calculation is disclosed as follows:
ht=ot·tanh(ct) (8)
in step S04 of the embodiment, as shown in fig. 6, the present disclosure uses the CountSktech algorithm to count the frequency of occurrence of each element, implementing data FspatioAnd FtemporalMapping from high dimension to low dimension to obtain the feature after dimension reduction
Figure BDA0003492307340000141
And
Figure BDA0003492307340000142
the computation load of subsequent feature fusion is reduced. Then, a fusion feature F is obtained by Fast Fourier Transform (FFT), point multiplication, and Inverse Fast Fourier Transform (IFFT)fusion
In step S05 of this embodiment, a classification task is performed according to the obtained fusion features to obtain a final emotion recognition result, which specifically includes:
training an SVM classifier;
and inputting the fusion features into a classifier to obtain a final recognition result.
Example two
The second embodiment of the disclosure introduces an emotion recognition system based on multi-modal spatiotemporal feature fusion.
Fig. 7 shows an emotion recognition system based on multi-modal spatiotemporal feature fusion, which includes:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described herein again.
EXAMPLE III
The third embodiment of the disclosure provides a computer-readable storage medium.
A computer-readable storage medium, on which a program is stored, which when executed by a processor implements the steps in the method for emotion recognition based on multi-modal spatiotemporal feature fusion according to an embodiment of the present disclosure.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described again here.
Example four
The fourth embodiment of the disclosure provides an electronic device.
An electronic device includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for emotion recognition based on multi-modal spatiotemporal feature fusion according to an embodiment of the present disclosure.
The detailed steps are the same as the emotion recognition method based on multi-modal spatiotemporal feature fusion provided in the first embodiment, and are not described herein again.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A method for recognizing emotion based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps:
acquiring original physiological data;
preprocessing the acquired original physiological data to obtain multi-modal physiological data;
respectively extracting spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data;
performing feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and classifying according to the obtained fusion characteristics to obtain a result of emotion recognition.
2. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein the original emotion data set comprises an emotional stimulation phase and a self-assessment phase, the physiological signals in the original emotion data set are cut, and the emotional stimulation phase data are intercepted; performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes; and performing noise reduction processing on the data by using a wavelet noise reduction method to eliminate the influence of noise on the identification effect.
3. A method of emotion recognition based on multi-modal spatiotemporal feature fusion as recited in claim 1, wherein said preprocessing comprises:
the original emotion data set comprises an emotional stimulation phase and self-evaluation truncation, a physiological signal in the original emotion data set is cut, and emotional stimulation phase data are intercepted;
performing linear interpolation on the intercepted data to eliminate the influence of missing values in the data acquisition and processing processes;
and performing noise reduction processing on the data by using a wavelet noise reduction method, and eliminating the shadow of noise on the identification effect.
4. The emotion recognition method based on fusion of multi-modal spatiotemporal features as defined in claim 3, wherein the preprocessed multi-modal physiological data is converted into gray scale images and inputted into the neural network, and the spatial features of the multi-modal physiological data are extracted from the gray scale images.
5. The emotion recognition method based on fusion of multi-modal spatiotemporal features as defined in claim 3, wherein the pre-processed multi-modal physiological data are respectively inputted into the neural network to extract the temporal features of the multi-modal physiological data.
6. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein the temporal features and the spatial features of the multi-modal physiological data extracted from the neural network are subjected to feature level fusion to obtain fusion features for emotion recognition tasks, and the specific process is as follows: counting the occurrence frequency of each element by using a CountSktech algorithm to realize mapping from a high dimension to a low dimension; and fusing the reduced features through a bilinear pooling method to obtain fused features.
7. The emotion recognition method based on multi-modal spatiotemporal feature fusion as claimed in claim 1, wherein, according to the obtained fusion features, a classification task is performed to obtain a final emotion recognition result, and the specific process is as follows: training an SVM classifier; and inputting the fusion features into a classifier to obtain a final recognition result.
8. An emotion recognition system based on multi-modal spatiotemporal feature fusion is characterized by comprising the following steps of:
the acquisition module is configured to acquire original physiological data, and preprocess the acquired original physiological data to obtain multi-modal physiological data;
the fusion module is configured to respectively extract spatial characteristics and temporal characteristics of the multi-modal data based on the obtained multi-modal physiological data, and perform feature level fusion on the spatial characteristics and the temporal characteristics of the extracted multi-modal data to obtain fusion features;
and the recognition module is configured to classify according to the obtained fusion characteristics to obtain a result of emotion recognition.
9. A computer-readable storage medium, on which a program is stored, which when executed by a processor performs the steps in the method for emotion recognition based on multi-modal spatiotemporal feature fusion as claimed in any of claims 1-7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for emotion recognition based on multi-modal spatiotemporal feature fusion as claimed in any of claims 1-7 when executing the program.
CN202210101019.XA 2022-01-27 2022-01-27 Emotion recognition method and system based on multi-mode spatiotemporal feature fusion Pending CN114424940A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210101019.XA CN114424940A (en) 2022-01-27 2022-01-27 Emotion recognition method and system based on multi-mode spatiotemporal feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210101019.XA CN114424940A (en) 2022-01-27 2022-01-27 Emotion recognition method and system based on multi-mode spatiotemporal feature fusion

Publications (1)

Publication Number Publication Date
CN114424940A true CN114424940A (en) 2022-05-03

Family

ID=81313284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210101019.XA Pending CN114424940A (en) 2022-01-27 2022-01-27 Emotion recognition method and system based on multi-mode spatiotemporal feature fusion

Country Status (1)

Country Link
CN (1) CN114424940A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694234A (en) * 2022-06-02 2022-07-01 杭州智诺科技股份有限公司 Emotion recognition method, system, electronic device and storage medium
CN114913590A (en) * 2022-07-15 2022-08-16 山东海量信息技术研究院 Data emotion recognition method, device and equipment and readable storage medium
CN115985464A (en) * 2023-03-17 2023-04-18 山东大学齐鲁医院 Muscle fatigue degree classification method and system based on multi-modal data fusion
CN117520826A (en) * 2024-01-03 2024-02-06 武汉纺织大学 Multi-mode emotion recognition method and system based on wearable equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508375A (en) * 2018-11-19 2019-03-22 重庆邮电大学 A kind of social affective classification method based on multi-modal fusion
CN109614895A (en) * 2018-10-29 2019-04-12 山东大学 A method of the multi-modal emotion recognition based on attention Fusion Features
CN109934158A (en) * 2019-03-11 2019-06-25 合肥工业大学 Video feeling recognition methods based on local strengthening motion history figure and recursive convolution neural network
KR20190128933A (en) * 2018-05-09 2019-11-19 연세대학교 산학협력단 Emotion recognition apparatus and method based on spatiotemporal attention
CN111297380A (en) * 2020-02-12 2020-06-19 电子科技大学 Emotion recognition method based on space-time convolution core block
CN111461204A (en) * 2020-03-30 2020-07-28 华南理工大学 Emotion identification method based on electroencephalogram signals and used for game evaluation
CN112120716A (en) * 2020-09-02 2020-12-25 中国人民解放军军事科学院国防科技创新研究院 Wearable multi-mode emotional state monitoring device
CN112381008A (en) * 2020-11-17 2021-02-19 天津大学 Electroencephalogram emotion recognition method based on parallel sequence channel mapping network
CN113057633A (en) * 2021-03-26 2021-07-02 华南理工大学 Multi-modal emotional stress recognition method and device, computer equipment and storage medium
CN113288146A (en) * 2021-05-26 2021-08-24 杭州电子科技大学 Electroencephalogram emotion classification method based on time-space-frequency combined characteristics
CN113705398A (en) * 2021-08-17 2021-11-26 陕西师范大学 Music electroencephalogram space-time characteristic classification method based on convolution-long and short term memory network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190128933A (en) * 2018-05-09 2019-11-19 연세대학교 산학협력단 Emotion recognition apparatus and method based on spatiotemporal attention
CN109614895A (en) * 2018-10-29 2019-04-12 山东大学 A method of the multi-modal emotion recognition based on attention Fusion Features
CN109508375A (en) * 2018-11-19 2019-03-22 重庆邮电大学 A kind of social affective classification method based on multi-modal fusion
CN109934158A (en) * 2019-03-11 2019-06-25 合肥工业大学 Video feeling recognition methods based on local strengthening motion history figure and recursive convolution neural network
CN111297380A (en) * 2020-02-12 2020-06-19 电子科技大学 Emotion recognition method based on space-time convolution core block
CN111461204A (en) * 2020-03-30 2020-07-28 华南理工大学 Emotion identification method based on electroencephalogram signals and used for game evaluation
CN112120716A (en) * 2020-09-02 2020-12-25 中国人民解放军军事科学院国防科技创新研究院 Wearable multi-mode emotional state monitoring device
CN112381008A (en) * 2020-11-17 2021-02-19 天津大学 Electroencephalogram emotion recognition method based on parallel sequence channel mapping network
CN113057633A (en) * 2021-03-26 2021-07-02 华南理工大学 Multi-modal emotional stress recognition method and device, computer equipment and storage medium
CN113288146A (en) * 2021-05-26 2021-08-24 杭州电子科技大学 Electroencephalogram emotion classification method based on time-space-frequency combined characteristics
CN113705398A (en) * 2021-08-17 2021-11-26 陕西师范大学 Music electroencephalogram space-time characteristic classification method based on convolution-long and short term memory network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程程: "基于多模态深度学习的情感识别方法研究", 《中国优秀硕士学位论文全文库》, 15 August 2021 (2021-08-15), pages 1 - 4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694234A (en) * 2022-06-02 2022-07-01 杭州智诺科技股份有限公司 Emotion recognition method, system, electronic device and storage medium
CN114694234B (en) * 2022-06-02 2023-02-03 杭州智诺科技股份有限公司 Emotion recognition method, system, electronic device and storage medium
CN114913590A (en) * 2022-07-15 2022-08-16 山东海量信息技术研究院 Data emotion recognition method, device and equipment and readable storage medium
CN115985464A (en) * 2023-03-17 2023-04-18 山东大学齐鲁医院 Muscle fatigue degree classification method and system based on multi-modal data fusion
CN117520826A (en) * 2024-01-03 2024-02-06 武汉纺织大学 Multi-mode emotion recognition method and system based on wearable equipment
CN117520826B (en) * 2024-01-03 2024-04-05 武汉纺织大学 Multi-mode emotion recognition method and system based on wearable equipment

Similar Documents

Publication Publication Date Title
Tao et al. EEG-based emotion recognition via channel-wise attention and self attention
CN111209885B (en) Gesture information processing method and device, electronic equipment and storage medium
CN114424940A (en) Emotion recognition method and system based on multi-mode spatiotemporal feature fusion
CN113693613B (en) Electroencephalogram signal classification method, electroencephalogram signal classification device, computer equipment and storage medium
Majumdar et al. Robust greedy deep dictionary learning for ECG arrhythmia classification
Guo et al. A hybrid fuzzy cognitive map/support vector machine approach for EEG-based emotion classification using compressed sensing
CN110555468A (en) Electroencephalogram signal identification method and system combining recursion graph and CNN
Peng et al. OGSSL: A semi-supervised classification model coupled with optimal graph learning for EEG emotion recognition
Zhao et al. Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection
WO2022012668A1 (en) Training set processing method and apparatus
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Paul et al. Deep learning and its importance for early signature of neuronal disorders
Miah et al. Movie Oriented Positive Negative Emotion Classification from EEG Signal using Wavelet transformation and Machine learning Approaches
CN114595725B (en) Electroencephalogram signal classification method based on addition network and supervised contrast learning
Tang et al. A hybrid SAE and CNN classifier for motor imagery EEG classification
CN115238796A (en) Motor imagery electroencephalogram signal classification method based on parallel DAMSCN-LSTM
Salim et al. A review on hand gesture and sign language techniques for hearing impaired person
Indira et al. Deep Learning Methods for Data Science
CN112259228A (en) Depression screening method by dynamic attention network non-negative matrix factorization
Bhanumathi et al. Feedback artificial shuffled shepherd optimization-based deep maxout network for human emotion recognition using EEG signals
Kulkarni et al. Analysis of DEAP dataset for emotion recognition
CN116421200A (en) Brain electricity emotion analysis method of multi-task mixed model based on parallel training
WO2022188793A1 (en) Electrophysiological signal classification processing method and apparatus, computer device and storage medium
Zou et al. Multi-task motor imagery EEG classification using broad learning and common spatial pattern
Malathi et al. An estimation of PCA feature extraction in EEG-based emotion prediction with support vector machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination