CN114601474A

CN114601474A - Source domain sample screening method for motor imagery transfer learning

Info

Publication number: CN114601474A
Application number: CN202111669390.8A
Authority: CN
Inventors: 祝磊; 楚超; 杨君婷; 张建海; 何光发
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-06-10

Abstract

The invention discloses a source domain sample screening method for motor imagery transfer learning, which comprises the following steps: intercepting an electroencephalogram signal of a subject as motor imagery data; setting motor imagery data of one subject as a test set, and setting data of other subjects as a training set; aligning the EEG signals of the test set and the training set by using an EA (Interactive analysis) alignment method; performing filtering operation on the training set by using a TS algorithm to obtain a final source domain training set; extracting features in the electroencephalogram signal by using the CSP; and adjusting LDA parameters by using the final source domain training set data, classifying the test set data, and identifying the specific limb part of the motor imagery of the subject. The difference between the source domain samples and the target samples is reduced substantially through alignment, and the source domain samples with larger difference with the target samples are eliminated, so that the difference between the source domain samples and the target samples is reduced maximally.

Description

Source domain sample screening method for motor imagery transfer learning

Technical Field

The invention belongs to the field of motor imagery transfer learning, and particularly relates to a source domain sample screening method for motor imagery transfer learning.

Background

Transfer learning improves the classification effect of a target field by learning knowledge of a related field, and has been greatly developed in the field of brain-computer interfaces at present. However, because the amplitude of the electroencephalogram signal is weak and complex, even if the same activity is performed, the electroencephalogram signals of different people are not completely the same, and therefore, when the classifier obtained by directly using samples of other subjects for training is applied to a target subject, the accuracy rate is easily reduced, namely, negative migration is easily caused.

Chinese patent CN 111728609A discloses an electroencephalogram signal classification method, an electroencephalogram signal classification model training device and a electroencephalogram signal classification model training medium, wherein characteristics of electroencephalograms are extracted to obtain signal characteristics corresponding to the electroencephalograms, a difference distribution proportion is obtained and used for representing influences of different types of difference distribution on the distribution of signal characteristics and source domain characteristics on a characteristic domain, and then the signal characteristics and the source domain characteristics are aligned according to the difference distribution proportion to obtain aligned signal characteristics; and classifying the aligned signal characteristics to obtain the motor imagery type corresponding to the electroencephalogram signal. The method can enable the electroencephalogram signal classification model to identify various types of electroencephalogram signals based on the idea of transfer learning, but the method has the defects of reducing the difference among subjects and causing the accuracy to be reduced.

Disclosure of Invention

Aiming at the problem that the accuracy of a test result is reduced due to the difference of electroencephalogram signal sampling subjects, the invention provides a source domain sample screening method combining EA and TS, so that the difference between the subjects is reduced to the maximum extent, and the effect of improving the accuracy is achieved.

The technical problem of the invention is mainly solved by the following technical scheme: the invention comprises the following steps: s1 intercepting the electroencephalogram of the subject as motor imagery data;

s2, setting motor imagery data of one subject as a test set, and setting data of other subjects as a training set;

s3, aligning the EEG signals of the test set and the training set by using an EA (electroencephalogram) alignment method;

s4, filtering the training set by using a TS algorithm to obtain a final source domain training set;

s5, extracting features in the electroencephalogram signal by using the CSP;

s6, adjusting LDA parameters by using the final source domain training set data, classifying the test set data, and identifying the specific limb part of the motor imagery of the subject.

In step S5, the CSP is to extract the feature information of the electroencephalogram signal through a form of spatial filtering. In particular, the CSP can effectively maximize the difference between the two types of signals by finding the optimal spatial filter.

The LDA in step S6 is a supervised classifier, which transforms the input sample data into a low-dimensional space through projection. In this space, the intra-class distance of the sample data is made as small as possible, and after the inter-class distance is made as large as possible, the sample data is classified.

The ETS provided by combining the advantages of EA and TS is characterized in that the algorithm reduces the difference between the source domain samples and the target samples substantially through alignment, then eliminates the source domain samples with larger difference with the target samples to obtain new source domain samples, and maximally reduces the difference between the subjects.

Preferably, the step S1 specifically includes: and (3) processing the target signal by using an 8-30Hz band-pass filter to eliminate muscle artifacts, line noise pollution and direct current drift, and intercepting the electroencephalogram signal 0.5-3.5 seconds after the motor imagery prompt signal as motor imagery data. The filtering frequency band of the band-pass filter is too wide, which affects the filtering effect.

Preferably, the step S3 specifically includes:

s31, calculating the arithmetic mean value of the covariance matrix of the motor imagery experiment of the test set;

s32 combines all the motor imagery data X_iArithmetic mean to covariance matrix

Performing upper projection to obtain aligned electroencephalogram data; after S33 alignment, an average covariance matrix of the n samples of the subject is obtained.

Projecting the data of each sample of the subject onto the mean covariance matrix of all samples can reduce inter-subject variability, thereby reducing negative migration.

Preferably, the arithmetic mean of the covariance matrix in step S31 is specifically:

wherein

Is an arithmetic mean of a covariance matrix, X_iThe electroencephalogram data of the ith motor imagery experiment of the subject, and n represents the number of experiments performed by the subject.

Preferably, the electroencephalogram data in step S32 are:

wherein

Representing imagination data X_iIn that

Projection of (2).

The mean covariance matrix in step S33 is:

it is known that after EA, the distance between samples from different subjects will be zero, since the mean covariance matrix of all subjects is the identity matrix, which is very advantageous for transfer learning.

Preferably, the step S4 specifically includes:

s41 setting the test set as the target sample

Setting training set as source domain samples

S42 calculating each aligned sample

And

euclidean distance between two covariance matrices;

s43, selecting k source domain samples which are closest to each target sample according to the distance obtained in S42;

s44 removes the repeated samples from the selected k × n source domain samples to obtain a final source domain training set.

The purpose of step S4 is to reduce the difference between the source domain samples participating in training and the target samples by eliminating source domain samples farther away from the target samples.

Preferably, the euclidean distance in step S41 is:

wherein N is equal to l.cndot.n^t*n^s，d_NTo represent

And

the euclidean distance between them,

indicating the aligned ith target sample,

indicating that the jth source domain sample is aligned.

The invention has the beneficial effects that:

1. according to the method, an electroencephalogram data alignment method EA in Euclidean space is adopted, each sample data of a subject is projected to an average covariance matrix of all samples, so that the difference among the subjects is reduced, and therefore negative migration is reduced;

2. according to the TS algorithm, a new source domain sample is obtained by eliminating a source domain sample which is far away from a target sample, so that the difference between the source domain sample and the target sample which participate in training is reduced;

3. the ETS provided by combining the advantages of EA and TS is adopted, the algorithm reduces the difference between the source domain sample and the target sample substantially through alignment, then the source domain sample with larger difference with the target sample is removed, a new source domain sample is obtained, and the difference between the subjects is reduced to the maximum extent.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The specific embodiments described herein are merely illustrative of the spirit of the invention.

Example (b): a source domain sample screening method for motor imagery transfer learning according to this embodiment, as shown in fig. 1, includes: 1. intercepting the electroencephalogram signal of the subject as motor imagery data, specifically, eliminating muscle artifacts, line noise pollution and direct current drift of the target signal by using a band-pass filter with a cut-off frequency of 8-30HZ, and then intercepting the electroencephalogram signal 0.5-3.5 seconds after the motor imagery prompt signal as the motor imagery data.

2. The motor imagery data of one subject is set as a test set, and the data of other subjects is set as a training set.

3. Preprocessing the motor imagery data of the subjects, wherein the data preprocessing comprises EA alignment and TS sample screening, the EA can enable the data distribution from different subjects to be more similar, does not need any marking data from new subjects, and is low in calculation cost; and the TS sample screening is to calculate the Euclidean distance between the aligned source domain samples and the aligned target samples and select k source domain samples closest to each target sample. And removing repeated samples from all the selected samples (only one sample is reserved) to be used as a final source domain training set.

The specific steps of EA alignment include:

3.11, calculating the arithmetic mean value of the covariance matrix of the test set motor imagery experiment:

wherein, X_iRepresenting the electroencephalogram data of the ith motor imagery experiment of the subject, and n represents the experiment times of the subject;

3.21, all the motor imagery data X_iIn that

And (3) projecting to obtain the aligned electroencephalogram data:

wherein

Representing imagination data X_iIn that

Projection onto

3.31, the mean covariance matrix of the n samples of the subject is:

The specific TS sample screening steps comprise:

3.12 setting test set as target sample

Setting training set as source domain samples

3.22 calculate each aligned sample

And

euclidean distance between covariance matrices:

wherein N is equal to l.cndot.n^t*n^s，d_NTo represent

And with

The euclidean distance between them,

indicating the i-th target sample after alignment,

indicating that the jth source domain sample is aligned.

3.32 selecting the k nearest source domain samples of each target sample according to the distance obtained in S42;

3.42 removing the repeated samples from the selected k × n source domain samples to obtain the final source domain training set.

4. And filtering the electroencephalogram signal by using the CSP to obtain the characteristics of the electroencephalogram signal.

And adjusting LDA parameters by using the final source domain training set data, classifying the test set data, and identifying the specific limb part of the motor imagery of the subject.

The CSP extracts the characteristic information of the electroencephalogram signal in a spatial filtering mode. In particular, the CSP can effectively maximize the difference between the two types of signals by finding the optimal spatial filter.

5. And classifying the test set data by using LDA, specifically comprising adjusting LDA parameters by using the final source domain training set data, classifying the test set data, and identifying the specific limb part of the motor imagery of the subject.

The LDA is a supervised classifier that transforms input sample data into a low-dimensional space via projection. In this space, the intra-class distance of the sample data is made as small as possible, and after the inter-class distance is made as large as possible, the sample data is classified.

More specific examples are:

the feasibility of the above method was verified using the motor imagery data set Dataset 2a in the 2008 international BCI competition. The main flow of the data set acquisition method is as follows. Before the start of the acquisition, the subject sits in a comfortable chair and relaxes to wait for the experiment to start. Subsequently, ten characters appear in the center of the black screen in front of the subject, accompanied by a short alert tone, reminding the subject that the acquisition is about to start. After two seconds, any one identifier of four motor imagery tasks of left hand, right hand, feet and tongue appears randomly at ten characters in the center of the screen. At this time, the subject needs to perform the imaginary exercise according to the identifier. After the identifier disappears, the subject can continue to rest for the next acquisition. The data set collects four different motor imagery electroencephalogram data of nine healthy subjects through the method. The electroencephalogram signal acquisition uses a sampling frequency of 250Hz, and 22 paths of electroencephalogram signals and 3 paths of electro-oculogram signals are recorded. The experiment mainly considers the influence of the screening of the source domain data on the classification of the electroencephalogram signals. Therefore, experiments were performed using only electroencephalogram data of right and left hand motor imagery and tags thereof.

In order to better observe the influence of ETS on classification of electroencephalogram signals, the ETS is compared with the following three preprocessing methods in the experiment:

RAW: training by using all source domain data and labels thereof;

TS: a method for screening source domain samples by utilizing Riemann distance;

EA: an alignment method in Euclidean space can effectively reduce the difference between source domain samples.

As shown in the table, the classification accuracy obtained by training the source domain samples screened by the ETS is 11.04% higher than the accuracy obtained by training all the source domain samples, which shows that the ETS has a promoting effect on classification of electroencephalogram signals. Meanwhile, the classification accuracy of the target subject is obviously improved by using ETS for pretreatment compared with other two methods. This shows that the invention can provide a better preprocessing method for the motor imagery transfer learning. The difference between the subjects is reduced through alignment, then the Euclidean distances between the source domain samples and the target samples are sequentially calculated, and the source domain samples with relatively short distances are selected as a final source domain sample set. The method is simple to operate and easy to understand, can greatly improve the classification accuracy of the motor imagery transfer learning, and can be popularized.

Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Although the terms test set, ETS, motor imagery data, etc. are used more herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims

1. A source domain sample screening method for motor imagery transfer learning is characterized by comprising the following steps:

s1 intercepting the electroencephalogram signal of the subject as motor imagery data;

s5, extracting features in the electroencephalogram signal by using the CSP;

2. The method for screening source domain samples for motor imagery transfer learning according to claim 1, wherein the step S1 specifically includes: and (3) processing the electroencephalogram signals of the testee by using an 8-30Hz band-pass filter, eliminating muscle artifacts, line noise pollution and direct current drift, and intercepting the electroencephalogram signals 0.5-3.5 seconds after the motor imagery prompt signals as motor imagery data.

3. The method for screening source domain samples for motor imagery transfer learning according to claim 1, wherein the step S3 specifically includes:

s32 combines all the motor imagery data X_iArithmetic mean to covariance matrix

Performing upper projection to obtain aligned electroencephalogram data;

after S33 alignment, an average covariance matrix of the n samples of the subject is obtained.

4. The method for screening a source domain sample for migration learning of motor imagery according to claim 3, wherein the arithmetic mean of the covariance matrix in step S31 is specifically:

wherein X_iThe electroencephalogram data of the ith motor imagery experiment of the subject, and n represents the number of experiments performed by the subject.

5. The method for screening a source domain sample for motor imagery transfer learning according to claim 3, wherein the electroencephalogram data of step S32 is:

wherein

Representing imagination data X_iIn that

Projection of (2).

6. The method as claimed in claim 3, wherein the mean covariance matrix in step S33 is:

7. the method for screening source domain samples for motor imagery transfer learning according to claim 1, wherein the step S4 specifically includes:

s41 setting the test set as the target sample