CN114595725B

CN114595725B - Electroencephalogram signal classification method based on addition network and supervised contrast learning

Info

Publication number: CN114595725B
Application number: CN202210253209.3A
Authority: CN
Inventors: 李畅; 赵禹阊; 宋仁成; 刘羽; 成娟; 陈勋
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2024-02-20
Anticipated expiration: 2042-03-15
Also published as: CN114595725A

Abstract

The invention discloses an electroencephalogram signal classification method based on addition network and supervised contrast learning, which comprises the following steps: 1, carrying out data selection and pretreatment of slicing on original electroencephalogram data; 2, establishing an addition network classification model; 3, designing a mixing loss function, and establishing a classification model optimization target; and 4, training the network by inputting data, and completing electroencephalogram signal classification by using the trained optimal model. According to the invention, the multiplication operation is replaced by addition, so that the calculation complexity and energy consumption are greatly reduced, the loss function of mixing the supervision contrast loss and the cross entropy loss is used, the signal classification can be automatically completed without manually extracting the characteristics or processing the signals of the original electroencephalogram, and the accuracy of the electroencephalogram classification can be remarkably improved, thereby increasing the application value of the electroencephalogram in the fields of medical treatment and the like.

Description

Electroencephalogram signal classification method based on addition network and supervised contrast learning

Technical Field

The invention relates to the field of electroencephalogram signal classification, in particular to a method for automatically classifying and predicting original electroencephalogram data of a subject through a deep learning method.

Background

The brain controls the behavior, emotion and other physiological activities of the human, and the electrical activity in the cerebral cortex contains rich information, which may contain information of different emotions, motor imagery and diseases of the human. Along with the development of brain-computer interface field and intelligent medical treatment, brain-computer signals are widely applied to various fields such as emotion calculation, motor imagery, medical health and the like. If the information of the electroencephalogram signals can be fully mined, different electroencephalogram signals can be accurately classified, and the use value of the electroencephalogram signals in the fields of medical treatment and the like can be increased.

Electroencephalogram (EEG) is a portable device that records electrical activity of the cerebral cortex and can detect various information related to brain electrical function. Intracranial EEG signals are acquired through electrodes placed under the scalp, while scalp EEG signals are acquired through electrodes placed on the scalp surface. Intracranial electroencephalogram is suitable for a long-term implantable monitoring system, generally has a high signal-to-noise ratio, and scalp electroencephalogram does not need to be implanted and is noninvasive for a patient, so that the intracranial electroencephalogram is common in practical use. Studies of the subject's EEG data show that some activities associated with the brain electrical signals begin to show signs a few minutes to a few hours before onset, so we can predictively classify the associated activities by capturing information in the EEG signal. However, analysis of EEG signals often requires a great deal of expertise and expert experience, which is a time-consuming and labor-consuming project; furthermore, EEG signals are continuous in time, and the subject outputs brain electrical signals at any time, so that a system capable of automatically predicting and classifying brain electrical signals is needed.

In the traditional algorithm of prediction classification based on EEG signals, researchers usually denoise the EEG signals, extract relevant features, and classify the obtained features by using a classifier to obtain a prediction effect. Common features are e.g. Hjorth parameters, statistical moments, accumulated energy, autoregressive coefficients, lyapunov indices, etc. Commonly used classifiers are support vector machines, bayesian classifiers, etc. However, extraction of these features also requires a rich expert experience, and the effect of classification is also largely dependent on the extracted features, which can lead to poor generalization effects; moreover, the traditional classifier has the defect of improving the classification performance of the electroencephalogram signals.

In recent years, the deep learning method is widely applied to the field of brain-computer interfaces, can automatically learn more suitable features from input, can simultaneously learn tasks of feature extraction and classification, and can obtain more accurate prediction effects in electroencephalogram signal classification tasks. However, deep learning methods often involve significant computational and hardware costs, which are disadvantageous in terms of clinical deployment, mobile applications, and implantable device applications. The past approaches have been directed primarily to designing a feature pre-processing approach or to designing a particular network architecture. The characteristic preprocessing process generally converts the original electroencephalogram data into various forms of characteristics, and also comprises operations such as filtering and denoising, and although more clean data can be obtained, some important information can be lost at the same time; the special network structure has better effect aiming at specific situations, but the performance is obviously reduced in complex and diverse environments; these methods ignore the inherent association between data.

Disclosure of Invention

The invention provides an electroencephalogram signal classification method based on addition network and supervised contrast learning to overcome the defects of the prior art, so that the electroencephalogram signal classification can be automatically realized under the environment with low energy consumption, low time delay and friendly hardware, and the electroencephalogram signal classification accuracy can be remarkably improved, thereby increasing the application value of the electroencephalogram signal in the fields of medical treatment and the like.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention discloses an electroencephalogram signal classification method based on addition network and supervised contrast learning, which is characterized by comprising the following steps of:

step 1, acquiring an electroencephalogram data set with labeling information, and performing channel data selection and sample segmentation pretreatment on an original electroencephalogram in the electroencephalogram data set to obtain N sections of electroencephalogram samples with the duration of T and form a training sample set, wherein the training sample set is marked as X= { X ₁ ,X ₂ ,...,X _n ,...,X _N (wherein X is _n ∈R ^W×H Representing an nth section of electroencephalogram signal sample, wherein H represents the channel number of the electroencephalogram signal, W=T×s represents the sampling point number, and s represents the sampling rate of the electroencephalogram signal; let the nth section of brain electric signal sample X _n The corresponding label is marked as Y _n The label set corresponding to the training sample set X is denoted as y= { Y ₁ ,Y ₂ ,...,Y _n ,...,Y _N }；

Step 2, establishing an addition network model, and comprising the following steps: the system comprises a one-dimensional convolution layer, M addition convolution modules, a self-adaptive pooling layer, a projection layer and a classification layer;

the addition convolution module consists of an addition convolution layer and an addition convolution residual layer; let the addition convolution kernel of the mth addition convolution block have the size h _m Step length w _m The ReLU activation function and batch normalization operation are adopted between the addition convolution layers and between the addition convolution residual layers; wherein, only the first addition convolution block is provided with the maximum pooling operation; m=1, 2, …, M;

step 2.1, initializing model parameters:

initializing weights of all convolution layers by using a device_unique_initialization;

step 2.2, the nth section of the electroencephalogram signal sample X is processed _n ∈R ^W×H Inputting the data into the addition network model, extracting the time characteristics of the one-dimensional convolution layer and performing data dimension reduction operation to obtainNth one-dimensional convolution feature sequenceWherein (1)>An nth one-dimensional convolution characteristic sequence representing the output of the one-dimensional convolution layer>The x-th feature map, C ₀ Represents the nth characteristic sequence->The number of the feature graphs in the model (a);

step 2.3, processing of the addition convolution module:

step 2.3.1, when m=1, the nth one-dimensional convolution feature sequence is obtainedAs input to the mth addition convolution module and denoted as characteristic sequence of the mth addition convolution module +.>Wherein (1)>Characteristic sequence +.>The x-th feature map, C _m Characteristic sequence +.about.representing the mth addition convolution module>The number of the feature graphs in the model (a);

step 2.3.2, feature sequence of the mth addition convolution moduleThe characteristic sequence of the mth addition convolution layer is obtained through the processing of the mth addition convolution layer per se>Wherein (1)>An mth additive convolution layer characteristic sequence representing an output of said mth additive convolution layer +.>The x-th feature map of (a);

the feature sequence of the mth addition convolution moduleThe feature sequence of the mth addition convolution residual layer is obtained through the processing of the mth addition convolution residual layer per se>Wherein (1)>An mth addition convolution residual layer characteristic sequence representing an output of the mth addition convolution residual layer +.>The x-th feature map of (a);

step 2.3.3, outputting the feature sequence of the mth addition convolution layerAnd the feature sequence of the mth addition convolution residual layer output +.>After addition, the mth fusion characteristic sequence +.>Wherein (1)>Represents the mth fusion characteristic sequence +.>The x-th feature map of (a);

step 2.3.4, judging whether m=1 is satisfied, if so, fusing the feature sequence with the mPerforming maximum pooling operation to obtain characteristic sequence +.1 of the m+1th addition convolution module>Otherwise, the mth fusion feature sequence +.>Characteristic sequence +.1 as m+1 addition convolution module>Wherein (1)>Characteristic sequence +.1 representing the m+1th addition convolution module>The x-th feature map of (a); c (C) _m+1 Characteristic sequence +.1 representing the m+1th addition convolution module>The number of the feature graphs in the model (a);

step 2.3.5, assigning m+1 to m; judging that M is more than M, if so, entering a step 2.3.6; otherwise, returning to the step 2.3.2;

step 2.3.6, feature sequence of the Mth addition convolution moduleBy the processing of the adaptive pooling layer, the feature vector +.>Wherein (1)>Characteristic sequence +.about.Mth addition convolution module>The x-th feature map, C _M Characteristic sequence +.about.Mth addition convolution module>The number of feature patterns in->Feature vector representing the output of the adaptive pooling layer +.>R represents the number of eigenvalues;

step 2.4, processing the projection layer and the classification layer;

step 2.4.1, the fully connected projection layer outputs the characteristic vector of the self-adaptive pooling layerProjecting to the feature space to obtain projection layer feature vector +.>Wherein (1)>Representing the projection layer feature vector +.>Is the r-th eigenvalue of (a);

the adaptive pooling layer feature vectorMeanwhile, the nth probability p belonging to different categories is obtained through the processing of the full-connection classification layer _n ＝{p _n,1 ,p _n,2 ,...,p _n,a ...,p _n,k -a }; wherein p is _n,a An nth section of electroencephalogram signal sample X which represents the output of the classification layer _n Probability of belonging to class a; k represents the number of categories;

step 3, randomly extracting a plurality of samples from a plurality of segments of electroencephalogram signal samples of the training sample set X to form a batch of data which is recorded as x= { X ₁ ,x ₂ ,...,x _i ,...,x _m -a }; the corresponding tag is noted as y= { y ₁ ,y ₂ ,...,y _i ,...,y _m -a }; wherein x is _i Representing the ith electroencephalogram signal sample in a batch of data, y _i Represents x _i M represents the number of batch samples;

after the batch of data is processed according to the process from step 2.2 to step 2.4, the projection layer outputs the characteristic vectorWherein (1)>A feature vector representing an ith electroencephalogram signal sample in a batch of data output by the projection layer; outputting a probability p= { P by the classification layer ₁ ,p ₂ ,...,p _i ,...,p _m -a }; wherein p is _i Representing the probability of an ith electroencephalogram signal sample in a batch of data output by the classification layer;

establishing a mixing loss function L by using the formula (1) -formula (3) _n ：

L＝αL ^sup +(1-α)L ^{cross-entropy} (1)

In the formulas (1) - (3), alpha is a parameter for adjusting two types of error weights, L ^sup Indicating a loss of supervision contrast, L ^cross ^-entropy Representing the cross-entropy loss,indicating the number of samples with the same label as the ith electroencephalogram signal sample in one batch of samples, ++>Indicating that the condition y is satisfied _j ＝y _i When j is not equal to i, the value is 1, otherwise, the value is 0; />A feature vector representing a j-th electroencephalogram signal sample in a batch of data output by the projection layer, τ represents a hyper-parameter controlling the training smoothness, +.>The feature vector of the t-th electroencephalogram signal sample in a batch of data output by the projection layer;

and 4, training the addition network model by using an Adam optimizer based on the training sample set X, calculating a mixed loss function L, and adjusting the learning rate in the training process by adopting a self-adaptive learning rate method until the verification loss is not reduced or the maximum training times are reached, so that the trained addition network model is obtained and is used for classifying the electroencephalogram signals.

1. The invention provides an addition network model based on deep learning, which uses an addition network in electroencephalogram signal classification for the first time, uses cheap addition to replace complex multiplication operation, realizes lower calculation cost and hardware cost, and simultaneously maintains the same accuracy.

2. The invention uses supervision contrast loss for electroencephalogram signal classification for the first time, the contrast loss can fully explore the inherent connection of data, the data of the same class are concentrated together, the data of different classes are simultaneously far away, and the cross entropy loss is combined, so that the classification performance of the electroencephalogram signal is improved.

3. The invention is an end-to-end structure model, does not need to carry out manual denoising and characteristic preprocessing processes on the original EEG signals in advance, directly carries out training learning from the original EEG data, and is more in line with a deep learning data driving mode, so that a great deal of expert experience and expertise are not needed, and better generalization is obtained.

Drawings

FIG. 1 is a schematic diagram of a network architecture according to the present invention;

FIG. 2 is a schematic diagram of a conventional convolution calculation;

FIG. 3 is a schematic diagram of an additive convolution calculation of the present invention;

FIG. 4 is a schematic diagram of the comparative learning of the present invention;

FIG. 5 is a graph showing comparison of the AUC effect of electroencephalogram classification in a CHB-MIT database;

FIG. 6 is a graph showing the comparison of the sensitivity effects of electroencephalogram classification in a CHB-MIT database;

FIG. 7 is a graph showing the comparison of the effect of classifying the EEG signals in the CHB-MIT database with FPR.

Detailed Description

In this embodiment, an electroencephalogram signal classification method based on addition network and supervised contrast learning mainly uses the addition network to classify electroencephalogram signals. The addition network uses the addition to replace multiplication in convolution by changing the similarity measurement in the convolution process, so that the calculation cost is reduced, the similar samples are mutually close, the different samples are mutually far away, and the accurate classification effect is finally achieved. As shown in fig. 1, the method specifically comprises the following steps:

step 1, obtainThe electroencephalogram signal data set marked with information is subjected to channel data selection and sample segmentation pretreatment on the original electroencephalogram signals in the electroencephalogram signal data set, so that N segments of electroencephalogram signal samples with the duration of T are obtained, a training sample set is formed, and the training sample set is marked as X= { X ₁ ,X ₂ ,...,X _n ,...,X _N (wherein X is _n ∈R ^W×H Representing an nth section of electroencephalogram signal sample, wherein H represents the channel number of the electroencephalogram signal, W=T×s represents the sampling point number, and s represents the sampling rate of the electroencephalogram signal; let the nth section of brain electric signal sample X _n The corresponding label is marked as Y _n The label set corresponding to the training sample set X is denoted as y= { Y ₁ ,Y ₂ ,...,Y _n ,...,Y _N -a }; the method uses a public electroencephalogram epileptic dataset CHB-MIT and kagle;

the addition convolution module consists of an addition convolution layer and an addition convolution residual error layer; let the addition convolution kernel of the mth addition convolution block have the size h _m Step length w _m The ReLU activation function and batch normalization operation are adopted between the addition convolution layers and between the addition convolution residual layers; wherein, only the first addition convolution block is provided with the maximum pooling operation; m=1, 2, …, M; the method sets m=3; h is a ₁ ＝11×1,s ₁ ＝1×1；h ₂ ＝5×5,s ₁ ＝2×2；h ₃ ＝5×5,s ₁ ＝2×2；

Step 2.1, initializing model parameters:

step 2.2, the nth section of the electroencephalogram signal sample X _n ∈R ^W×H In the addition network model, the nth one-dimensional convolution characteristic sequence is obtained after the time characteristic extraction and the data dimension reduction operation of the one-dimensional convolution layer are carried out firstlyWherein (1)>N-th one-dimensional convolution characteristic sequence representing one-dimensional convolution layer output +.>The x-th feature map, C ₀ Represents the nth characteristic sequence->The number of the feature graphs in the model (a); because the original electroencephalogram signal is used, noise information is contained in the signal, the function of denoising can be achieved by using one-dimensional convolution, meanwhile, the size of the data dimension can be reduced, the size of a convolution kernel used in the experiment is 21 multiplied by 1, the step size is 1, and the maximum pooling operation size is 8 multiplied by 1;

step 2.3, processing of an addition convolution module:

step 2.3.1, when m=1, the nth one-dimensional convolution feature sequence is obtainedAs input to the mth addition convolution module and denoted as characteristic sequence of the mth addition convolution module +.>Wherein (1)>Characteristic sequence +.about.representing the mth addition convolution module>The x-th feature map, C _m Characteristic sequence +.about.representing the mth addition convolution module>The number of the feature graphs in the model (a);

step 2.3.2, feature sequence of mth addition convolution moduleThe characteristic sequence of the mth addition convolution layer is obtained through the processing of the mth addition convolution layer per se>Wherein,an mth addition convolution layer characteristic sequence representing an mth addition convolution layer output +.>The x-th feature map of (a);

feature sequence of mth addition convolution moduleThe feature sequence of the mth addition convolution residual layer is obtained through the processing of the mth addition convolution residual layer per se>Wherein (1)>An mth addition convolution residual layer characteristic sequence representing an mth addition convolution residual layer output +.>The x-th feature map of (a); the conventional convolution calculates the similarity by calculating an inner product between the feature map and the filter, and the additive convolution calculates the similarity by calculating an L1 distance between the feature map and the filter; assume that a filter of a certain layer of the network isWherein the filter size is h×w, c _in And c _out Representing the number of input channels and the number of output channels; the input characteristic diagram is->Wherein H and W represent input feature map size; the output characteristic O is calculated as in equation (1):

in the formula (1), a is more than or equal to 1 and less than or equal to H, b is more than or equal to 1 and less than or equal to W, c is more than or equal to 1 and less than or equal to c _out The larger the output feature O, the higher the similarity of the two, as shown in fig. 2; the addition convolution calculates the L1 distance by changing the multiplication to subtraction (addition in a computer is convenient by subtraction complement conversion), as in equation (2):

taking the opposite number of the L1 distance as a similarity measure, the larger the output characteristic O is, the smaller the L1 distance is, and the higher the similarity of the two is, as shown in FIG. 3; since the values calculated by the conventional convolution may be positive or negative, and the values calculated by the additive convolution may only be negative, the batch normalization in the conventional convolution is used for processing, which makes better use of the conventional activation function.

step 2.4, processing a projection layer and a classification layer;

step 2.4.1, the fully connected projection layer outputs the characteristic vector from the self-adaptive pooling layerProjecting to the feature space to obtain projection layer feature vector +.>Wherein (1)>Representing projection layer feature vector +.>Is the r-th eigenvalue of (a);

adaptive pooling layer feature vectorsSimultaneous communicationProcessing the full connection classification layer to obtain the nth probability p belonging to different categories _n ＝{p _n,1 ,p _n,2 ,...,p _n,a ...,p _n,k -a }; wherein p is _n,a N-th section electroencephalogram signal sample X representing output of classification layer _n Probability of belonging to class a; k represents the number of categories; projecting feature vectors into a feature space through a projection layer, scaling their lengths to 1 using normalization so that all feature vectors fall onto one hypersphere, then aggregating projection features of the same kind using contrast loss, feature vectors of different kinds being far apart, as shown in fig. 4;

after a batch of data is processed according to the process from step 2.2 to step 2.4, the projection layer outputs the characteristic vectorWherein (1)>Representing the characteristic vector of the ith electroencephalogram signal sample in a batch of data output by the projection layer; the probability p= { P is output by the classification layer ₁ ,p ₂ ,...,p _i ,...,p _m -a }; wherein p is _i Representing the probability of the ith electroencephalogram signal sample in a batch of data output by the classification layer;

establishing a mixing loss function L by using the formula (3) -formula (5) _n ：

L＝αL ^sup +(1-α)L ^{cross-entropy} (3)

In the formulas (3) - (5), alpha is a parameter for adjusting two types of error weights, L ^sup Indicating a loss of supervision contrast, L ^cross ^-entropy Representing the cross-entropy loss,indicating the number of samples with the same label as the ith electroencephalogram signal sample in one batch of samples, ++>Indicating that the condition y is satisfied _j ＝y _i When j is not equal to i, the value is 1, otherwise, the value is 0; />Characteristic vector of jth electroencephalogram signal sample in a batch of data output by a projection layer is represented, τ represents super-parameter for controlling training smoothness, and +.>The feature vector of the t-th electroencephalogram signal sample in a batch of data output by the projection layer;

and 4, training the addition network model by using an Adam optimizer based on the training sample set X, calculating a mixed loss function L, and adjusting the learning rate in the training process by adopting a self-adaptive learning rate method until the verification loss is not reduced or the maximum training times are reached, so that the trained addition network model is obtained and is used for classifying the electroencephalogram signals. Because the method of similarity calculation is changed, the function of gradient back propagation is also changed, which leads to the large gradient magnitude difference between different layers, and in order to enable the network to converge and learn a better model, the invention uses a self-adaptive learning rate method, adjusts the learning rate according to the majority of parameters of each layer, and calculates the following formula (6) -formula (7):

ΔF _l ＝η×θ _l ×ΔL(F _l ) (6)

in the formulas (6) - (7), η represents the overall learning rate, θ _l Is the local learning rate of the first layer of the network, ΔL (F _l ) Is the gradient of the filter of the first layer of the network; lambda denotes the super parameter controlling the magnitude of the local learning rate, z denotes the first layer filter F _l Is a parameter number of (2);

in particular, additive networks and supervised contrast learning networks (SCL-AddNTs) are compared with some advanced electroencephalographic classification deep learning methods such as one-dimensional convolutional neural networks (1D+CNN), deep convolutional neural networks+multi-layer perceptrons (DCNN+MLP), deep neural networks+two-way long and short term memory networks (DCNN+Bi-LSTM), and residual networks (ResCNN). Performance indicators on the CHB-MIT and Kaggle databases are as follows:

TABLE 1 average Performance of different methods on the CHB-MIT database for classifying electroencephalograms

	Sensitivity (%)	AUC	FPR(\h)
				1D+CNN	88.7	0.881	0.172
DCNN+MLP	87.8	0.861	0.208
				ResCNN	89.9	0.911	0.140
SCL-AddNets	94.9	0.942	0.077

TABLE 2 average Performance of different methods on the Kaggle database to classify electroencephalograms signals

	Sensitivity (%)	AUC	FPR(\h)
				1D+CNN	80.9	0.808	0.134
DCNN+MLP	82.9	0.811	0.156
				ResCNN	81.2	0.829	0.161
SCL-AddNets	89.1	0.831	0.120

TABLE 3 comparison of calculation complexity and parameter quantity for different methods

	Quantity of parameters (. Times.10) ⁶ )	Multiplication times	Number of additions	Energy consumption (mJ)	Delay of
						1D+CNN	1.07	0.80×10 ⁹	0.80×10 ⁹	3.68	4.80
DCNN+MLP	0.43	0.45×10 ⁹	0.45×10 ⁹	2.07	2.70
						ResCNN	0.12	0.31×10 ⁹	0.31×10 ⁹	1.43	1.86
SCL-AddNets	0.12	7.57×10 ⁶	0.54×10 ⁹	0.51	1.11

The remaining cross-validation results for 19 subjects are shown in figures 5, 6 and 7. Analysis of results:

the experimental results in tables 1 and 2 show that, compared with other deep learning methods 1D+CNN, DCNN+MLP and ResCNN in the electroencephalogram signal classification field, SCL-AddNTs are improved in various indexes, and the number of false alarms in the inter-seizure period can be reduced while the pre-seizure period can be predicted more accurately on two databases. It can be seen from table 3 that SCL-AddNets change a large number of multiplications into additions, which greatly improves both power consumption and delay. In addition, as can be seen from fig. 4, 5 and 6, the model is obviously improved in most subjects, and the type area and the signal distribution of different types of electroencephalogram signals are different for different subjects, so that the method has good identification capability and strong generalization effect on different subjects.

In summary, the invention fully utilizes the rich electroencephalogram information contained in the original EEG signal, reduces the calculation cost by using an addition network, simultaneously maintains the classification precision, combines the supervision and contrast learning, makes similar samples close to each other, and makes different samples far away from each other, thereby achieving the more accurate electroencephalogram signal classification effect. In the two classification tests of the public data set CHB-MIT and Kagle, the electroencephalogram data of the pre-seizure class can be classified more quickly and accurately, and meanwhile, the number of false alarms in the inter-seizure class is reduced, which is superior to most of traditional deep learning methods.

Claims

1. An electroencephalogram signal classification method based on addition network and supervised contrast learning is characterized by comprising the following steps:

step 2.1, initializing model parameters:

step 2.2, the nth section of the electroencephalogram signal sample X is processed _n ∈R ^W×H Inputting the data into the addition network model, and obtaining an nth one-dimensional convolution characteristic sequence after the time characteristic extraction and the data dimension reduction operation of the one-dimensional convolution layerWherein (1)>An nth one-dimensional convolution characteristic sequence representing the output of the one-dimensional convolution layer>The x-th feature map, C ₀ Represents the nth characteristic sequence->The number of the feature graphs in the model (a);

step 2.3, processing of the addition convolution module:

step 2.3.2, feature sequence of the mth addition convolution moduleThe characteristic sequence of the mth addition convolution layer is obtained through the processing of the mth addition convolution layer per se>Wherein,an mth additive convolution layer characteristic sequence representing an output of said mth additive convolution layer +.>The x-th feature map of (a);

step 2.3.6, feature sequence of the Mth addition convolution moduleBy the processing of the adaptive pooling layer, the feature vector +.>Wherein (1)>Characteristic sequence +.about.Mth addition convolution module>The x-th feature map, C _M Feature sequence representing the Mth addition convolution moduleThe number of feature patterns in->Feature vector representing the output of the adaptive pooling layer +.>R represents the number of eigenvalues;

step 2.4, processing the projection layer and the classification layer;

step 3, several sections of brain telecom from the training sample set XA plurality of samples are randomly extracted from the number samples and form a batch of data which is marked as x= { x ₁ ,x ₂ ,...,x _i ,...,x _m -a }; the corresponding tag is noted as y= { y ₁ ,y ₂ ,...,y _i ,...,y _m -a }; wherein x is _i Representing the ith electroencephalogram signal sample in a batch of data, y _i Represents x _i M represents the number of batch samples;

L＝αL ^sup +(1-α)L ^{cross-entropy} (1)

In the formulas (1) - (3), alpha is a parameter for adjusting two types of error weights, L ^sup Indicating a loss of supervision contrast, L ^{cross-entropy} Representing the cross-entropy loss,the number of samples with the same label as the ith electroencephalogram signal sample in one batch of samples is represented,indicating that the condition y is satisfied _j ＝y _i When j is not equal to i, the value is 1, otherwise, the value is 0; />A feature vector representing a j-th electroencephalogram signal sample in a batch of data output by the projection layer, τ represents a hyper-parameter controlling the training smoothness, +.>The feature vector of the t-th electroencephalogram signal sample in a batch of data output by the projection layer;