CN114176607B

CN114176607B - Electroencephalogram signal classification method based on vision transducer

Info

Publication number: CN114176607B
Application number: CN202111616915.1A
Authority: CN
Inventors: 曾虹; 刘洋; 郑浩浩; 徐非凡; 潘登; 李明明; 金燕萍; 夏念章; 吴琪; 赵月
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2024-04-19
Anticipated expiration: 2041-12-27
Also published as: CN114176607A

Abstract

The invention discloses an electroencephalogram signal classification method based on a visual transducer, which comprises the steps of firstly, preprocessing data, obtaining processed EEG data with labels, and then constructing an electroencephalogram signal classification model based on Vision Transformer; finally training an electroencephalogram signal classification model through the preprocessed EEG data: according to the method, the EEG sample is subjected to feature embedding through a proper EEG feature embedding method, then the long-time dependency relationship between the local features of the EEG sample and the continuous EEG signal is learned, and good performance is achieved in an EEG signal classification task.

Description

Electroencephalogram signal classification method based on vision transducer

Technical Field

The invention relates to the field of electroencephalogram signal identification in the field of biological feature identification, in particular to an electroencephalogram signal classification method based on vision transformers (Vision Transformer, viT).

Background

Brain waves (EEG) are a method of recording brain activity using electrophysiological indicators, and are formed by summing up postsynaptic potentials that occur synchronously with a large number of neurons when the brain is active. It records the changes in electrical waves during brain activity, which is an overall reflection of the electrophysiological activity of brain nerve cells on the surface of the cerebral cortex or scalp. The brain electrical signal contains abundant, various and objective physiological information, and is often used in brain-computer interfaces (Brain Computer Interface, BCI) for research and analysis of the brain electrical signal so as to realize direct connection between the brain of a human or animal and external equipment and realize information exchange between the brain and the equipment. Because of its objectivity and convenience compared with other physiological signals such as nuclear magnetic resonance, the electroencephalogram signal is also used for judging physiological states, such as analyzing the emotion of people through the electroencephalogram signal, judging whether a driver is in a fatigue state or not by using the electroencephalogram signal in fatigue driving application, and the like.

The traditional electroencephalogram signal analysis method has a traditional signal analysis method based on a time domain or a frequency domain. Common time domain analysis methods are: waveform characterization, autoregressive AR models, and the like. Common frequency domain analysis methods are: fourier transform, power spectral density, non-parametric spectrum estimation, power spectrum estimation based on AR model, etc. The traditional signal analysis method needs to manually extract the characteristics of the brain electrical signals, and is time-consuming and labor-consuming. With the development of artificial intelligence technology, more and more artificial intelligence methods are used for analyzing brain electrical signals, and classical machine learning methods such as KNN, SVM, LDA and the like, and deep learning methods such as CNN, RNN, LSTM and the like are used for analyzing brain electrical signals. The method based on machine learning and deep learning can automatically extract and learn the characteristics in the electroencephalogram signals, and greatly promote the development of electroencephalogram signal analysis. Although these machine learning-based, deep learning methods have met with some success, they have not fully utilized the unique features of the electroencephalogram signals.

The electroencephalogram signal is a time series signal having a long dependency. The transducer model proposed in paper Attention is All You Need has been extremely successful in the NLP field, and the model can learn long dependency relations in sentences, but cannot achieve ideal effects when being directly applied to classification of electroencephalogram signals. The paper AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE proposes a ViT model suitable for the image field, which has been extremely successful in the image field, and the electroencephalogram signal is also a special image in a certain sense. However, the use of ViT for classification of the electroencephalogram signal directly cannot achieve an ideal effect because there is no module for learning local features, which are important for the electroencephalogram signal.

Disclosure of Invention

In order to solve the problems that the performance of applying the transducer to EEG signal classification is poor and local characteristics of the characteristics cannot be learned, and the like, the invention provides an EEG signal classification method EEG Vision Transformer (EEGViT) based on visual transducer, which learns the local characteristics through a EEG Transformer Encoder module, and Sequence In Time Transformer Encoder learns time sequence dependency relations among continuous samples to improve classification performance.

The invention adopts the technical proposal for overcoming the defects of the prior method that:

an electroencephalogram signal classification method based on visual transducer comprises the following steps:

Step 1: data preprocessing, namely acquiring labeled processed EEG data:

Step 2: establishing an electroencephalogram signal classification model based on Vision Transformer;

step 3: training an electroencephalogram classification model by the preprocessed EEG data:

further, the specific method in the step 1 is as follows:

The method comprises the steps of adopting a public emotion data Set (SEED) data set, segmenting preprocessed EEG data provided in the SEED data set according to 1 second time, extracting differential entropy characteristics of each electroencephalogram channel according to segmented data, obtaining differential entropy characteristic data of 62 channels multiplied by 5 frequency bands, and finally flattening the data into a one-dimensional data sample with the length of 310, wherein the label of the sample is the label of the provided corresponding stimulus emotion. Successive num samples are a set of data as input to the model.

Further, the specific method in the step 2 is as follows:

The Vision Transformer-based electroencephalogram signal classification model (EEG Vision Transformer) is improved based on Vision Transformer, a CNN module is added after multi-head attention in Transformer Encoder of Vision Transformer to form a EEG Transformer Encoder module, a MBConv module is added after multi-head attention in Transformer Encoder to form a Sequence In Time Transformer Encoder module, the electroencephalogram signal classification model comprises num EEG Transformer Encoder modules and is used for learning local features of continuous num samples, the learned local features of the num samples are input into a Sequence In Time Transformer Encoder module to learn time sequence dependence among the samples, and num tokens with time sequence dependence are obtained. And finally, inputting the num token into num classifiers for classification, wherein the classifiers are MLP modules.

Further, the specific method of the step 3 is as follows:

input: the EEG data samples after the processing with the labels are a group of input models, total_group, maximum iteration number N, EEG sample size EEG_size, patch block size patch_size and embedded dimension size emped_dim, wherein the continuous num samples are a group of input models.

Step 3.1: initializing:

Num EEG Transformer Encoder modules and num classifiers are initialized, one Sequence In Time Transformer Encoder module, and the initial iteration number t=1.

Step 3.2: the local features of the individual samples are learned by EEG Transformer Encoder module.

Step 3.3: the time sequence dependence among samples is learned by Sequence In Time Transformer Encoder modules.

Step 3.4: the learned features are classified by a classifier.

Step 3.5: and calculating cross entropy according to the classification result and the real label, and updating the model super-parameters by minimizing cross entropy loss.

Step 3.6: steps 3.2 to 3.5 are performed once for each set of data entered;

Step 3.7: after all input data are processed, the iteration times t=t+1 return to the step 3.2 for iteration until the set iteration times are reached;

and (3) outputting: probability of predicted electroencephalogram signal class.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention provides a feature embedding method suitable for EEG samples.

(2) The invention can learn the local characteristics of the input data through the EEG Transformer Encoder module.

(3) According to the invention, the time dependency relationship between the continuous samples is learned through the Sequence In Time Transformer Encoder module, so that better classification performance is obtained.

In a word, EEG Vision Transformer (EEGViT) provided by the invention performs characteristic embedding on an EEG sample through a proper EEG characteristic embedding method, then learns the long-time dependency relationship between the local characteristics of the EEG sample and continuous EEG signals, and obtains better performance in an EEG signal classification task.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a block diagram of EEG Vision Transformer modules;

Fig. 3 is a block diagram of Sequence In Time Transformer Encoder modules.

Detailed Description

The invention is further described below with reference to the drawings and examples.

The flow of the present invention is shown in figure 1.

Step 1: data preprocessing, namely acquiring labeled processed EEG data:

With the public emotion dataset SEED dataset contained in the Preprocessed _eeg folder is EEG data down sampled to 200Hz, pre-processed using a 0-75Hz band pass filter. As shown in the data processing part of fig. 1, the data processing process is that preprocessing EEG data provided in the SEED data set is segmented according to 1 second time, and each electroencephalogram channel of the segmented data is subjected to differential entropy feature extraction, wherein the differential entropy feature is defined as follows:

Where X corresponds to a Gaussian distribution N (μ, σ ²), μ is the mean of the distribution, σ is the standard deviation of the distribution, X is the variable, pi and e are constants, exp is an exponential operation, and log is a logarithmic operation.

And obtaining differential entropy characteristic data of 62 channels multiplied by 5 frequency bands after the differential entropy characteristic extraction, and finally flattening the differential entropy characteristic data into a one-dimensional data sample with the length of 310, wherein the label of the sample is the label of the provided corresponding stimulus emotion. Successive num samples are a set of data as input to the model.

The Vision Transformer-based electroencephalogram classification model EEG Vision Transformer is composed of num EEG Transformer Encoder modules, one Sequence In Time Transformer Encoder module and num classifiers. EEG Transformer Encoder is formed by adding a CNN module after multi-head attention in Transformer Encoder of Vision Transformer, the structure is shown in fig. 2, embedded Patches is the input of the module, then the module is normalized, the multi-head self-attention multi-head attention module has a shortcut branch after Embedded Patches, the Embedded Patches will be added with the output of the multi-head attention module, then the next module is sent after the Embedded Patches will be added with the output of the multi-head attention module, the next module is the one-dimensional data, the CNN module, the norm module and the MLP module, because the input sample is one-dimensional data, the CNN module uses a one-dimensional convolution module, the data after Embedded Patches will be added with the multi-head attention module also has a shortcut branch with the final output, and Embedded Patches will be added with the data after the multi-head attention module and the final output as the final output. Sequence In Time Transformer Encoder is formed by adding a MBConv module after multi-head attention in Transformer Encoder of Vision Transformer, the structure of the MBConv module is introduced to MBConv module in the paper of EFFICIENTNETV2: smaller Models AND FASTER TRAINING, sequence In Time Transformer Encoder structure is shown in figure 3, the structure is similar to EEG Transformer Encoder module, and the CNN module is replaced by MBConv module. The classifier is an MLP module.

Input: the labeled processed EEG data samples, the consecutive num samples are 1 set of input models, total_group, maximum iteration number N, EEG sample size EEG_size, patch block size (patch_size needs to be divisible by EEG_size), and embedded dimension size emped_dim.

Step 3.1: initializing: randomly initializing num EEG Transformer Encoder modules, sequence In Time Transformer Encoder modules, hyper-parameters of num classifiers and initial iteration times t=1.

Step 3.2: learning local features of a single sample;

The input num samples learn the local features of the individual samples through num EEG Transformer Encoder, the specific steps of each EEG Transformer Encoder are as follows:

a) Feature segmentation: unlike the two-dimensional picture input of ViT, each sample of EEGViT is 5 frequency bands multiplied by 62 channels of one-dimensional feature data of length 310, and the input one-dimensional feature is segmented into EEG_size/patch_size one-dimensional data segments according to the size of patch_size.

B) Feature embedding: flattening the segmented data segment according to the embedded dimension size emmbed_dim, and adding a token for local feature learning to obtain Embedded Patches. The processed Embedded Patches is input EEG Transformer Encoder to learn the local features.

C) Local feature learning: the local features are learned by EEG Transformer Encoder encoder. Unlike Transformer Encoder of ViT, EEG Transformer Encoder adds a CNN module for learning and extracting local characteristics of input data between the Multi-Head Attention module and the MLP module in order to learn the local characteristic information of input brain data. Embedded Patches, firstly normalizing the data, and then inputting the normalized data into a Multi-Head attribute module, wherein the formula of the Multi-Head attribute is as follows:

MultiHead(Q,K,V)＝Concat(head₁,…,head_h)W^O

Where Q is the query matrix, K is the key matrix, V is the value matrix, Is the parameter matrix of the ith head of the query matrix,Is the key matrix of the ith head,/>Is the value matrix of the i-th header, W ^O is the parameter matrix, concat is the join operation.

Wherein the equation of the Attention is:

Where Q is the query matrix, K is the keyword matrix, V is the value matrix, d _k is the dimensions of the query matrix and the keyword matrix, and softmax is the softmax operation.

Through the Multi-Head Attention module, the interdependence relationship between the segmentation data can be learned.

The data output by the Multi-Head Attention module is added with Embedded Patches data, normalized and then sent to the CNN module to learn the local characteristics of the input data. The convolution kernel parameter sharing and interlayer connection sparsity of the CNN module can extract local features of input data with small calculation amount, so that the local features of the input data are learned and extracted. The convolution kernel of the CNN module is 3 steps and 1.

The data output by the CNN module is input into the MLP module after normalization to further learn hidden features in the data, the size of the input data is reduced by 2 after convolution operation with the size of a convolution kernel being 3 and the step length being 1, so that the size of an input layer of the MLP is embed_dim-2, the hidden features in node learning data of the hidden layer are increased, the size of the hidden layer is set to be embed_dim 4, and finally the size of an output layer is embed_dim. The output of the MLP module is added to the data of the Multi-Head Attention module added to Embedded Patches data as the output of EEG Transformer Encoder. The output at this time is EEG_size/patch_size+1 token, numbered 0 through EEG_size/patch_size, and the local feature token numbered 0 is returned for feature learning of the subsequent module.

Since the CNN module and the MLP module are well known to those skilled in the art, the structure thereof is not explained in detail.

Step 3.3: the num local features learned by num EEG Transformer Encoder constitute new Embedded Patches and are sent to the Sequence In Time Transformer Encoder module to learn the dependency relationship of the num continuous samples. Sequence In Time Transformer Encoder the learning process is similar to EEG Transformer Encoder. num local features learn the interdependencies between themselves through the Multi-Head Attention module, followed by MBConv modules to learn further features between samples. Finally, the hidden features among the samples are learned through the MLP module. The input layer size of the MLP module is num, the hidden layer size is set to num×4, and finally the output layer size is num. And finally outputting num feature token with mutual time sequence dependency.

Step 3.4: and (3) inputting the returned num token into num classifiers for classification, wherein the classifier is an MLP module, the input layer size is an embed_dim, the output layer size is the category number, the SEED data set is 3, the output layer is 3, no hidden layer exists, and finally the num classifiers obtain the predicted probability that the num samples belong to each category.

Step 3.5: the cross entropy is calculated according to the classification result and the real label,

Where M is the number of classes, y _ic is the sign function, 1 is taken when the class is equal to the real label, otherwise 0 is taken, p _ic is the predicted probability that the sample belongs to class c, and num is the number of samples.

The model super parameters are updated by minimizing cross entropy loss, and an adopted optimization method is a random gradient descent method (SGD):

θ＝θ-∈g# (5)

m is the number of m samples taken from the input samples for updating the parameters, g is the gradient, and θ is the hyper-parameter in the model.

Step 3.6: steps 3.2 to 3.5 are performed once for each set of data;

And (3) outputting: predicting the probability that the electroencephalogram signal belongs to each category, and taking the category with the highest probability as the prediction category.

Claims

1. An electroencephalogram signal classification method based on visual transducer is characterized by comprising the following steps:

Step 1: data preprocessing, namely acquiring labeled processed EEG data:

The Vision Transformer-based electroencephalogram signal classification model EEG Vision Transformer is composed of num EEG Transformer Encoder modules, one Sequence In Time Transformer Encoder module and num classifiers;

EEG Transformer Encoder is formed by adding a CNN module after multi-head attention in Transformer Encoder of Vision Transformer; sequence In Time Transformer Encoder is constituted by adding a MBConv module to the multi-head attention in Transformer Encoder of Vision Transformer;

the num EEG Transformer Encoder modules are used for learning local features of continuous num samples, and the learned local features of the num samples are input into the Sequence In Time Transformer Encoder module to learn time sequence dependence among the samples to obtain num token with time sequence dependence; finally, inputting num token into num classifiers for classification, wherein the classifiers are MLP modules;

step 3: and training an electroencephalogram signal classification model through the preprocessed EEG data.

2. The brain electrical signal classification method based on visual transducer according to claim 1, wherein the specific method in step 1 is as follows:

Adopting a public emotion data Set (SEED) data set, segmenting the preprocessed EEG data provided in the SEED data set according to 1 second time, extracting differential entropy characteristics of each electroencephalogram channel by the segmented data to obtain differential entropy characteristic data of 62 channels multiplied by 5 frequency bands, and finally flattening the data into a one-dimensional data sample with the length of 310, wherein the label of the sample is the label of the provided corresponding stimulus emotion; successive num samples are a set of data as input to the model.

3. The method for classifying electroencephalogram signals based on visual transducer according to claim 2, wherein the specific method in step 3 is as follows:

Input: the processed EEG data samples with labels are a group of input models, total_group groups are used as continuous num samples, the maximum iteration number N is the size EEG_size of the EEG samples, the patch block size patch_size is embedded into the dimension size emped_dim;

step 3.1: initializing:

initializing num EEG Transformer Encoder modules and num classifiers, one Sequence In Time Transformer Encoder module, and performing initial iteration times t=1;

step 3.2: learning local features of the individual samples by EEG Transformer Encoder module;

Step 3.3: learning timing dependencies among samples by Sequence In Time Transformer Encoder modules;

step 3.4: classifying the learned features by a classifier;

step 3.5: calculating cross entropy according to the classification result and the real label, and updating the model super-parameters by minimizing cross entropy loss;

Step 3.6: steps 3.2 to 3.5 are performed once for each set of data entered;

4. The brain electrical signal classification method based on visual transducer according to claim 3, wherein the specific method of step 3.2 is as follows:

a) Feature segmentation: unlike the two-dimensional picture input of ViT, each sample of EEGViT is one-dimensional feature data of length 310 of 5 frequency bands by 62 channels, and the input one-dimensional feature is segmented into EEG_size/patch_size one-dimensional data segments according to the size of patch_size;

b) Feature embedding: flattening the segmented data segment according to the embedded dimension size emmbed_dim, and adding a token for local feature learning to obtain Embedded Patches; inputting the processed Embedded Patches into EEG Transformer Encoder to learn local features;

c) Local feature learning: learning the local features by EEG Transformer Encoder encoders; unlike Transformer Encoder of ViT, EEG Transformer Encoder adds a CNN module for learning and extracting local characteristics of input data between the Multi-Head Attention module and the MLP module in order to learn the local characteristic information of input brain data; embedded Patches, firstly normalizing the data, and then inputting the normalized data into a Multi-Head attribute module, wherein the formula of the Multi-Head attribute is as follows:

MultiHead(Q,K,V)＝Concat(head₁,…,head_h)W^O

Where Q is the query matrix, K is the key matrix, V is the value matrix, Is the parameter matrix of the ith head of the query matrix,/>Is the key matrix of the ith head,/>Is the value matrix of the ith header, W ^O is the parameter matrix, concat is the join operation;

Wherein the equation of the Attention is:

Where Q is the query matrix, K is the keyword matrix, V is the value matrix, d _k is the dimensions of the query matrix and the keyword matrix, and softmax is the softmax operation;

Through the Multi-Head Attention module, the interdependence relationship between the segmentation data can be learned;

Adding data output by the Multi-Head Attention module and Embedded Patches data, normalizing and then sending the normalized data into the CNN module to learn local characteristics of input data; the convolution kernel parameter sharing and interlayer connection sparsity of the CNN module can extract local features of input data with small calculation amount, so that the local features of the input data are learned and extracted; the convolution kernel of the CNN module is 3 steps and 1;

The data output by the CNN module is input into the MLP module after normalization to further learn hidden features in the data, the size of the input data is reduced by 2 after convolution operation with the size of a convolution kernel being 3 and the step length being 1, so that the size of an input layer of the MLP is embed_dim-2, the hidden features in node learning data of a hidden layer are increased, the size of the hidden layer is set to be embed_dim 4, and finally the size of an output layer is embed_dim; the output of the MLP module is added with the data obtained by adding the data output by the Multi-Head Attention module and Embedded Patches data to be used as the output of EEG Transformer Encoder; the output at this time is EEG_size/patch_size+1 token, numbered 0 through EEG_size/patch_size, and the local feature token numbered 0 is returned for feature learning of the subsequent module.

5. The method for classifying electroencephalogram signals based on visual transducer according to claim 4, wherein the specific method in step 3.3 is as follows:

The num local features learned by the num EEG Transformer Encoder constitute new Embedded Patches and are sent to a Sequence In Time Transformer Encoder module to learn the dependency relationship of the num continuous samples; sequence In Time Transformer Encoder the learning process is similar to EEG Transformer Encoder; num local features learn the mutual dependency relationship through a Multi-Head Attention module, and then learn the further features among samples through a MBConv module; finally, learning hidden features among samples through an MLP module; the size of an input layer of the MLP module is num, the size of a hidden layer is set to be num 4, and finally the size of an output layer is num; and finally outputting num feature token with mutual time sequence dependency.

6. The method for classifying electroencephalogram signals based on visual transducer according to claim 5, wherein the specific method in step 3.4 is as follows:

And (3) inputting the returned num token into num classifiers for classification, wherein the classifier is an MLP module, the input layer size is an embed_dim, the output layer size is the category number, the SEED data set is 3, the output layer is 3, no hidden layer exists, and finally the num classifiers obtain the predicted probability that the num samples belong to each category.

7. The method for classifying electroencephalogram signals based on visual transducer according to claim 6, wherein the specific method in step 3.5 is as follows:

the cross entropy is calculated according to the classification result and the real label,

Wherein M is the number of categories, y _ic is the sign function, 1 is taken when the category is equal to the real label, otherwise 0 is taken, p _ic is the prediction probability that the sample belongs to the c category, and num is the number of samples;

θ＝θ-∈_g#(5)