CN113297981A

CN113297981A - End-to-end electroencephalogram emotion recognition method based on attention mechanism

Info

Publication number: CN113297981A
Application number: CN202110584519.9A
Authority: CN
Inventors: 刘继尧; 蒋冬梅
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-08-24
Anticipated expiration: 2041-05-27
Also published as: CN113297981B

Abstract

The invention discloses an end-to-end electroencephalogram emotion recognition method based on an attention mechanism, which comprises the steps of firstly determining the length of an electroencephalogram sample, the length of a slice in the sample and the window shift of the slice, and slicing the electroencephalogram sample; calculating the characteristics of the electroencephalogram signals, and rearranging the characteristics according to the distribution of leads of the electroencephalogram acquisition equipment on the head area to obtain electroencephalogram signal characteristics similar to thermodynamic diagrams; then, inputting the electroencephalogram signal characteristics into a convolutional neural network based on an attention system to obtain electroencephalogram signal short-time characteristic expression; then, taking the short-time feature representation of the electroencephalogram signal as input, learning the importance of the electroencephalogram signal in different time periods in the sample to emotion classification by utilizing an LSTM network added with an attention mechanism, and obtaining the long-time feature representation of the electroencephalogram sample; and finally, taking the long-term feature expression of the whole sample as input, and entering a full-connection layer for emotion classification. The method improves the accuracy of emotion classification based on the electroencephalogram, and provides reference for the duration of maintenance of the electroencephalogram paradigm when the emotion changes.

Description

End-to-end electroencephalogram emotion recognition method based on attention mechanism

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to an electroencephalogram emotion recognition method.

Background

In recent years, attention is paid to the field of automatic emotion recognition, for example, in the field of human-computer interaction, a machine can automatically recognize the emotion of an observed person and respond accordingly. At present, the emotion recognition field is mainly divided into two types, one type is discrete emotion recognition, namely, the emotion of a person is classified into a plurality of common states of happiness, sadness, anger and the like; the other is continuous emotion recognition, which expresses the emotional state of a person in two dimensions, wherein Arousal represents excitement degree and value represents pleasure degree. The different angles of the descriptions of the two fields are different, and the information conveyed to people is also different, which is the focus of the research of the emotional field.

In the past years, emotion classification through electroencephalogram signal states has achieved a lot of important results, and a lot of research has proved that the effect on emotion classification or mental stress is superior to that of a model using global features of electroencephalogram signals as input by paying attention to the spatiotemporal characteristics of electroencephalogram signals at the same time. In the documents "EEG-Based emission Recognition using 3D conditional Neural Networks" International Journal of Advanced Computer Science and Applications (IJACSA),9(8),2018, the 3D-CNN is used for recognizing human Emotion from multi-channel EEG data of DEAP data sets, and 87.44% and 88.49% of Recognition accuracy documents "Spatial-Temporal current Neural Network for Emotion Recognition," in IEEE Transactions on Cybernetics, vol.49, No.3, pp.839-847 propose a new deep learning framework, called spatio-Temporal Recurrent Neural Network (STRNN), which integrates the characteristics of Spatial information and Temporal information of signal sources into a unified system, and makes reasonable classification of discrete electroencephalogram information by using spatio-Temporal characteristics.

Space-time correlation researchers have achieved certain achievements in emotion classification models, however continuous dimension emotion estimation still faces the following challenges due to emotion complexity and individual variability:

1) the "key frame" problem. In the long-time sequence emotion classification task, the emotion state of each moment has strong correlation with the emotion state of the latest moment and has stronger correlation with the emotion information of some key moments, and meanwhile, the emotion state of each moment has less possible relation with the emotion information of a long time before. In the past electroencephalogram-based emotion classification research, when the emotional state at each moment is estimated, past emotion information is processed in an equally important mode, so that the model is difficult to acquire key context information, and the generalization capability and accuracy of the model are influenced.

2) And (4) sample length regulation. Input samples of a traditional electroencephalogram-based emotion classification model are often not the same in length, emotion related features are generally calculated in 1 second or 4 seconds for emotion classification, if a time sequence model is not used, the sample length is often set to be 1s or 4s, electroencephalogram related features such as power spectral density are calculated in 10s medically, and a unified window length determination rule for calculating the sample length and the emotion related features is not provided. The classification accuracy calculated by different sample lengths also varies.

3) The model addresses the force mechanism problem. The traditional electroencephalogram-based emotion classification method generally utilizes global characteristics or Principal Component Analysis (PCA) and other methods to perform characteristic dimension reduction, uses the reduced characteristics as the input of a model, or divides different electroencephalogram leads into different areas, calculates the electroencephalogram emotion related characteristics in a partitioning mode as the input of the model, and finally performs result comparison. No weights are assigned to the different brain regions, frequency bands and time periods. It is difficult to mine the characteristics of electroencephalogram signals and the feature importance of different time and space from the machine learning angle.

In conclusion, the conventional electroencephalogram-based emotion classification method is limited by the traditional method, key space-time information is difficult to find, and the problems of low emotion classification precision, poor generalization capability and the like are easily caused.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an end-to-end electroencephalogram emotion recognition method based on an attention mechanism, wherein firstly, the length of an electroencephalogram sample, the length of a slice in the sample and the window shift of the slice are determined, and the electroencephalogram sample is sliced; calculating the characteristics of the electroencephalogram signals, and rearranging the characteristics according to the distribution of leads of the electroencephalogram acquisition equipment on the head area to obtain electroencephalogram signal characteristics similar to thermodynamic diagrams; then, inputting the electroencephalogram signal characteristics into a convolutional neural network based on an attention system to obtain electroencephalogram signal short-time characteristic expression; then, taking the short-time feature representation of the electroencephalogram signal as input, learning the importance of the electroencephalogram signal in different time periods in the sample to emotion classification by utilizing an LSTM network added with an attention mechanism, and obtaining the long-time feature representation of the electroencephalogram sample; and finally, taking the long-term feature expression of the whole sample as input, and entering a full-connection layer for emotion classification. The method improves the accuracy of emotion classification based on the electroencephalogram, and provides reference for the duration of maintenance of the electroencephalogram paradigm when the emotion changes.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1: preprocessing an electroencephalogram signal;

setting the length of an electroencephalogram sample, the length of a slice in the electroencephalogram sample and the slice window shift of the electroencephalogram sample, and slicing an electroencephalogram signal to obtain an electroencephalogram sample slice;

dividing each lead in an electroencephalogram sample slice into 5 different wave bands to calculate power spectral density and differential entropy characteristics, and forming a matrix by the power spectral density and the differential entropy characteristics to form electroencephalogram signal characteristics;

step 2: constructing an emotion classification model based on an electroencephalogram signal, wherein the emotion classification model comprises a convolutional neural network based on an attention mechanism and an LSTM network based on the attention mechanism;

step 2-1: a convolutional neural network based on an attention mechanism;

the attention-based convolutional neural network comprises a band attention module and a lead attention module;

step 2-1-1: inputting the electroencephalogram signal characteristics into a wave band attention module, firstly pooling, taking the pooled one-dimensional characteristics as the input of a convolutional neural network, and then outputting the convolutional neural network through a Softmax function; the band attention module generates a band attention weight vector by using the characteristic relation among frequency bands, and the formula is as follows:

W_b(E)＝σ(f(Pool(E)))

wherein E represents electroencephalogram signal characteristics, Pool (.), f (.) represents a convolutional neural network, and sigma (.) represents a Softmax function;

step 2-1-2: inputting the wave band attention weight vector into a lead attention module, and generating a lead attention matrix by utilizing the spatial relation of the electroencephalogram energy, wherein the lead attention matrix is a two-dimensional square matrix;

firstly, performing one-dimensional pooling operation on a wave band attention weight vector, splicing pooled characteristic values into a two-dimensional square matrix according to the original position of a lead in a brain area, then performing convolution operation on the two-dimensional square matrix by using convolution check, and ensuring the size of the square matrix of the lead dimension to be unchanged by using padding and step length; the calculation formula of the lead attention matrix is as follows:

W_C(W_b)＝σ(filter(Pool(W_b)))

wherein, filter (.) represents the convolution operation;

step 2-1-3: and adding the lead attention matrix and the electroencephalogram signal characteristics to obtain short-time characteristics of the electroencephalogram signals:

W_f＝(E+W_C(W_b(E)))

step 2-2: an LSTM network based on an attention mechanism;

adding a time sequence attention module in the LSTM network for acquiring the electroencephalogram signal time sequence context;

definition of Q^tIs the characteristic vector of the electroencephalogram sample at the time t, and Q is equal to (Q)¹,...Q^t,...Qⁿ) The method comprises the steps that a feature set of a single sample is obtained, and n represents the number of slices in the single sample, namely the number of short-time features; defining K, Q, V, and K and V are respectively a key name and mapping of electroencephalogram characteristics;

q, K, V is linearly projected to a plurality of subspaces, attention weights at different moments in time sequence are calculated in each subspace, weighting is carried out to obtain a feature vector of the whole sample, and finally the feature vector of the whole sample is linearly projected again to obtain final feature representation; the calculation formula of the whole time sequence attention module is as follows:

Q＝QW^(T)Q

K＝KW^(T)K

V＝VW^(T)V

Attention(Q,K,V)＝AV

wherein Q is weighted EEG signal characteristic W_f；W^(T)Q,W^(T)K,W^(T)VRespectively representing a query matrix, a K matrix and a key value matrix; t denotes the matrix transposition, d_kIs a constant;

inputting the short-term characteristics of the electroencephalogram sample obtained in the step 2-1-3 into an LSTM network based on an attention mechanism to obtain long-term characteristics of the electroencephalogram sample;

step 2-3: a forward propagation module;

the forward propagation module comprises two linear mappings and a nonlinear activation function, and the calculation formula is as follows:

Cls(x)＝Softmax((xW₁+b)+b)

where x represents the long-term characteristics of the brain electrical sample, where W₁B is a parameter of the neural network, W₁Representing neuron weights, b representing bias;

inputting the long-term characteristics of the electroencephalogram sample into a forward propagation module, and outputting an emotion classification result of the electroencephalogram signal;

and step 3: and training the emotion classification model based on the electroencephalogram signals, and obtaining a final emotion classification model after the training is finished, wherein the final emotion classification model is used for carrying out emotion judgment on the input electroencephalogram signals.

Preferably, the step of slicing the electroencephalogram signal is as follows:

setting 3 different sample lengths, dividing the samples into different slice lengths and window shifts in the samples, and processing and slicing the samples as dynamic sequences; the power spectral density and differential entropy characteristics are then calculated for 5 different bands separately for each lead in the sample slice. And (3) putting the characteristic values of all leads on the same wave band into a 9 x 9 two-dimensional square matrix according to the distribution mode of the leads on the electroencephalogram acquisition equipment, stacking the 5 lead characteristic square matrices according to the wave bands, and splicing the 5 lead characteristic square matrices into a three-dimensional 5 x 9 matrix as the input of the emotion classification model.

The invention has the following beneficial effects:

1. according to the invention, an attention mechanism is added in the CNN and LSTM models, so that the models can pay more attention to physical characteristics of a certain time and a certain point, thereby improving the accuracy of electroencephalogram-based emotion classification, and providing reference for the segmentation length of an electroencephalogram sample during emotion classification, namely the duration of maintenance of an electroencephalogram paradigm during emotion change.

2. The time periods, leads and wave bands with large energy fluctuation when being stimulated by emotional inducing factors are mined. The method is helpful for simplifying the calculation of the electroencephalogram signal emotion related characteristics and improving the real-time performance and generalization capability of electroencephalogram-based emotion classification.

Drawings

FIG. 1 is a diagram of an emotion classification model structure based on electroencephalogram signals.

FIG. 2 is a block diagram of a band attention module according to the present invention.

Figure 3 is a block diagram of a lead attention module of the present invention.

FIG. 4 is a block diagram of a short-term feature extraction module according to the present invention

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

The invention provides an emotion classification model consisting of a Convolutional Neural Network (CNN), a long-short term memory network (LSTM), an Attention mechanism (Attention mechanism) and a time sequence convolutional network (TCN), which comprises two sub-networks connected in series, wherein each slice recombination characteristic in an electroencephalogram sample is firstly input into the convolutional neural network based on the Attention mechanism to obtain a new weighted and dimensionality-reduced electroencephalogram signal short-time characteristic expression; and then, taking the short-time feature representation of the electroencephalogram signal as input, learning the importance of the electroencephalogram signal in different time periods in the sample to emotion classification by using an LSTM network added with an attention mechanism, and obtaining the long-time feature representation of the electroencephalogram sample. And (4) taking long-term feature expression of the whole sample as input, and entering a full-connection layer for emotion classification.

As shown in fig. 1, an end-to-end electroencephalogram emotion recognition method based on an attention mechanism includes the following steps:

step 1: preprocessing an electroencephalogram signal;

setting the length of an electroencephalogram sample, the length of a slice in the electroencephalogram sample and the slice window shift of the electroencephalogram sample, and slicing an electroencephalogram signal to obtain an electroencephalogram sample slice; rearranging the characteristics according to the distribution of the leads of the electroencephalogram acquisition equipment on the head area to obtain electroencephalogram signal characteristics similar to thermodynamic diagrams;

step 2-1: a convolutional neural network based on an attention mechanism;

step 2-1-1: as shown in fig. 2, the electroencephalogram signal features are input into a waveband attention module, and the spatial dimension of the input features is compressed; pooling, taking the pooled one-dimensional features as input of a convolutional neural network, and performing a Softmax function on output of the convolutional neural network; the band attention module generates a band attention weight vector by using the characteristic relation among frequency bands, and the formula is as follows:

W_b(E)＝σ(f(Pool(E)))

step 2-1-2: as shown in fig. 3, the wave band attention weight vector is input into the lead attention module, and a lead attention matrix is generated by using the spatial relationship of the electroencephalogram energy, wherein the lead attention matrix is a two-dimensional square matrix;

W_C(W_b)＝σ(filter(Pool(W_b)))

wherein, filter (.) represents the convolution operation;

step 2-1-3: as shown in fig. 4, the lead attention matrix and the electroencephalogram signal characteristics are added to obtain the short-time characteristics of the electroencephalogram signal:

W_f＝(E+W_C(W_B(E)))

step 2-2: an LSTM network based on an attention mechanism;

in a limited sample length, different short-time stimulation segments have different influences on emotion, for example, different video pictures and different music tunes have different feelings to people, and in order to quantify the stimulation influence of time dimension, a time sequence attention module is added in an LSTM network for acquiring the time sequence context relationship of electroencephalogram signals;

Q＝QW^(T)Q

K＝KW^(T)K

V＝VW^(T)V

Attention(Q,K,V)＝AV

step 2-3: a forward propagation module;

Cls(x)＝Softmax((xW₁+b)+b)

The specific embodiment is as follows:

the embodiment mainly designs an electroencephalogram lead wave band attention module, inserts the module into a CNN encoder, combines with an LSTM model based on an attention mechanism, expands the encoder into a model capable of acquiring time sequence context dependency relation for multi-modal characteristics at the same time, and realizes an emotion classification framework by utilizing the model. Through experimental inspection, the accuracy of classifying the emotion dimension Arousal can reach 0.91, the accuracy of classifying the emotion dimension Valence can reach 0.89, and the accuracy of classifying the active, neutral and passive discrete emotions can reach 0.92.

The invention designs an emotion classification model based on a Convolutional Neural Network (CNN), a long-short term memory network (LSTM) and an Attention Mechanism (Attention Mechanism):

1) according to the distribution form of the brain electricity leads on the brain electricity equipment, a new brain electricity characteristic organization form is designed, the new characteristic organization form is used as input, and the brain electricity characteristic organization form is more suitable for the time-space characteristics of brain electricity signals collected by the brain electricity collection equipment. And the influence of the sample duration, the length of a slice inside the sample and the slice time window shift on emotion classification is explored on a new electroencephalogram characteristic tissue form.

2) A band channel attention module is provided on the basis of a new electroencephalogram characteristic organization form, the importance of related characteristics of different bands of different leads on emotion classification is learned, and meanwhile, the relationship between stimulation content and electroencephalogram signal change of a subject when receiving emotion stimulation is learned together by combining with an attention mechanism of time dimension. At each time in the brain electrical sample, the feature input in a picture-like form automatically learns the importance of each feature point to the emotion classification within the time slice. The model then learns the interaction information between different moments and the importance of each moment to the emotion classification by means of an attention mechanism. And finally, the output of the weighted sum of the importance is used as the input of a full connection layer to obtain the emotion classification probability value.

3) A set of complete emotion classification models based on electroencephalogram signals is provided, and the models can adapt to continuous dimension emotion classification and discrete emotion classification by changing the number of nodes of the last layer of the models. The model contains two sub-networks in total: firstly, a wave band channel characteristic learning network extracts short time sequence characteristic expression of an electroencephalogram signal by using CNN and Attention, and the short time sequence characteristic expression is used as input of a long time sequence network model; secondly, a long-time-sequence electroencephalogram emotion prediction network is trained to obtain high-level feature expression fused with electroencephalogram time sequence context information from input short-time electroencephalogram features by using LSTM and Attention, and classification of emotion states is carried out.

Claims

1. An end-to-end electroencephalogram emotion recognition method based on an attention mechanism is characterized by comprising the following steps:

step 1: preprocessing an electroencephalogram signal;

step 2-1: a convolutional neural network based on an attention mechanism;

W_b(E)＝σ(f(Pool(E)))

W_C(W_b)＝σ(filter(Pool(W_b)))

wherein, filter (.) represents the convolution operation;

W_f＝(E+W_C(W_b(E)))

step 2-2: an LSTM network based on an attention mechanism;

Q＝QW^(T)Q

K＝KW^(T)K

V＝VW^(T)V

Attention(Q,K,V)＝AV

step 2-3: a forward propagation module;

Cls(x)＝Softmax((xW₁+b)+b)

2. The attention mechanism-based end-to-end electroencephalogram emotion recognition method as recited in claim 1, wherein the step of slicing the electroencephalogram signal is as follows:

setting 3 different sample lengths, dividing the samples into different slice lengths and window shifts in the samples, and processing and slicing the samples as dynamic sequences; then, calculating power spectral density and differential entropy characteristics of each lead in the sample slice for 5 different wave bands respectively; and (3) putting the characteristic values of all leads on the same wave band into a 9 x 9 two-dimensional square matrix according to the distribution mode of the leads on the electroencephalogram acquisition equipment, stacking the 5 lead characteristic square matrices according to the wave bands, and splicing the 5 lead characteristic square matrices into a three-dimensional 5 x 9 matrix as the input of the emotion classification model.