CN110390272B

CN110390272B - EEG signal feature dimension reduction method based on weighted principal component analysis

Info

Publication number: CN110390272B
Application number: CN201910582226.XA
Authority: CN
Inventors: 董娜; 李英杰; 常建芳; 高忠科
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-06-30
Filing date: 2019-06-30
Publication date: 2023-07-18
Anticipated expiration: 2039-06-30
Also published as: CN110390272A

Abstract

An EEG signal feature dimension reduction method based on weighted principal component analysis comprises the following steps: extracting samples of m EEG signals of fatigue driving, and dividing the samples into a training set and a testing set for training to obtain a classification total accuracy A and n different classification accuracy; respectively making differences between the total accuracy A and different classification accuracy to obtain n differences; normalizing the n differences to obtain n weights; constructing weight diagonal matrixes for n weights; writing samples of m EEG signals into an m x n dimensional matrix; multiplying the m-by-n dimensional matrix with the diagonal matrix of the weight to obtain weighted EEG signal characteristic data; calculating and decomposing a covariance matrix of the vector, and obtaining a characteristic value of the covariance matrix and a unitized characteristic vector corresponding to the characteristic value; selecting unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices to be combined to form a mapping matrix; thereby obtaining the EEG signal characteristic data with reduced dimension. The invention effectively improves the classification recognition precision and reduces the training time of the recognition model.

Description

EEG signal feature dimension reduction method based on weighted principal component analysis

Technical Field

The invention relates to an EEG signal characteristic dimension reduction method. In particular to an EEG signal characteristic dimension reduction method based on weighted principal component analysis.

Background

In recent decades, the number of automobiles in China has drastically increased, and traffic accidents have increased [1]. According to the related reports, china becomes one of the countries with frequent traffic accidents. There are many factors that cause traffic accidents, and among them, driver fatigue driving is a main cause of traffic accidents. The driver is distracted and thinking is reduced in fatigue, so that the driver has a slow response and reduced control power of the vehicle, and the possibility of traffic accidents is increased [2]. Therefore, it is important to accurately and rapidly detect the driving fatigue state of the driver. The driving fatigue detection based on the physiological information is an objective means [3] which can detect and identify the fatigue state of the driver according to the physiological index change. The study shows that the fatigue state of the driver can be effectively evaluated and detected through the electrophysiological information such as body temperature, blood pressure, electrocardiogram (EEG), electromyogram (EEG) and the like. The analysis of the electroencephalogram signal is the most commonly used method for distinguishing.

The brain electricity reflects the electrophysiological signals of the brain nervous system, and the fatigue state of a driver can be detected and analyzed through the brain electricity signals. Since the characteristics of the electroencephalogram signals are usually high-dimensional characteristic signals, a certain difficulty is brought to processing of the electroencephalogram signals, and in order to solve the problem, the high-dimensional characteristic signals need to be subjected to dimension reduction processing. Principal component analysis (principal component analysis, PCA) is a commonly used dimension reduction method, which can resolve the main influencing factors from a plurality of things, reveal the nature of the things and simplify the complex problem.

Liu J et al propose a mixed dimensional feature reduction scheme [4] for emotion classification problems, extracting 14 different features from EEG, minimum redundant maximum correlation (mRMR) for maximizing the correlation between features and classification variables and minimizing the correlation between features, further reducing the generated features by PCA to extract the principal components. Bousseta R and the like extract EEG features [5] through two methods of Continuous Wavelet Transform (CWT) and Empirical Mode Decomposition (EMD), perform feature dimension reduction by using PCA, and perform left and right hand motor imagery classification through a linear and Radial Basis Function (RBF) kernel function SVM classifier. Sun Ying and the like propose a fusion algorithm [6] based on nonlinear global features and power spectrum entropy of PCA, extract nonlinear geometric features of EEG as new emotion electroencephalogram features, combine the nonlinear attribute features of the EEG such as the power spectrum entropy and Hurst index and the like, perform dimension reduction and feature fusion through PCA technology, and perform emotion recognition by using SVM as a classifier. Neshov NN et al propose an algorithm to identify five psychological tasks using 6-channel EEG data [7], the main idea being to divide the original EEG signal into several frames and calculate its spectrum, apply the second derivative of Gaussian to extract features, and use PCA to reduce feature dimensions. Xin L I and the like extract 14 channels of electroencephalogram data [8] of which 8 positive and negative emotions represent brain regions, reconstruct delta, theta, alpha, beta four rhythm waves based on wavelet decomposition, fuse wavelet characteristics, approximate entropy and electroencephalogram characteristics of Hurst indexes by using Principal Component Analysis (PCA), and reduce characteristic dimensions. Zarei R et al propose a feature extraction method [9] combining PCA and cross covariance technique (CCOV), extracting discrimination information from mental states based on EEG signals in brain-computer interface technique application, applying a correlation-based variable selection method, and performing an optimal first search on the extracted features to identify an optimal feature set for characterizing mental state signal distribution.

In the application of PCA, the predecessor treats each dimension characteristic on average based on the dimension reduction of the data variance. However, the different features play a different role in the recognition process [10], and therefore each dimension feature cannot be treated equally. From the above analysis, the PCA dimension reduction method is still further improved and updated.

Reference to the literature

[1] Wang Fuwang and Wang Hong brain-electric characteristic analysis of fatigue state of coach driver [ J ]. Instrument and instrument report, 2013,34 (5): 1146-1152.

[2] Yu Xulei, li Xiangze real-time fatigue driving detection system based on portable brain data [ J ]. Modern information technology, 2018, v.2 (04): 45-47.

[3] She Jianfang, liu Jiang, li Xueying. Optimization study of fatigue driving detection recognition model based on random forest [ J ]. Automobile Utility technology, 2018, no.268 (13): 46-50.

[4]Liu J,Meng H,Li M,et al.Emotion detection from EEG recordings based on supervised and unsupervised dimension reduction[J].Concurrency and Computation:Practice and Experience,2018:e4446.

[5]Bousseta R,Tayeb S,Ouakouak I E,et al.EEG efficient classification of imagined right and left hand movement using RBF kernel SVM and the joint CWT_PCA[J].2018.

[6]Ying S,Jianghe M A,Xueying Z.EEG emotion recognition based on nonlinear global features and spectral feature[J].Computer Engineering and Applications,2018.

[7]Neshov NN,Manolova AH,Draganov IR,et al.Classification of Mental Tasks from EEG Signals Using Spectral Analysis,PCA and SVM[J].Cybernetics and Information Technologies,2018.

[8] Li Xin, cai Erjuan, tian Yanxiu et al, 32-39+50, an improved electroencephalogram feature extraction algorithm and its use in emotion recognition [ J ]. J.biomedical engineering journal, 2017 (04).

[9]Zarei R,He J,Siuly S,et al.A PCA Aided Cross-Covariance Scheme for Discriminative Feature Extraction from EEG Signals[J].Computer methods and programs in biomedicine,2017,146:47.

[10] Wang Yongxin, zhang Huaxiang, wang Shuang principal component analysis algorithms based on attribute weighting [ J ]. University of Jinan university journal (Nature science edition), 2015,29 (6): 438-443.

Disclosure of Invention

The invention aims to solve the technical problem of providing an EEG signal feature dimension reduction method based on weighted principal component analysis, which can improve the fatigue driving state classification accuracy.

The technical scheme adopted by the invention is as follows: an EEG signal feature dimension reduction method based on weighted principal component analysis, comprising the steps of:

1) Extracting samples of m EEG signals of fatigue driving by using an AR model, wherein each sample contains n-dimensional characteristic data;

2) Dividing m samples of EEG signals into a training set and a testing set, respectively training the training set and the testing set by using SVM, and classifying the testing set to obtain a classification total accuracy A;

3) Removing first dimension characteristic data in the n-dimension characteristic data of each EEG signal sample to obtain each EEG signalN-1 dimension characteristic data of the number sample, dividing the m samples of EEG signals into a training set and a testing set, respectively training the training set and the testing set by using SVM, and classifying the testing set to obtain a classification accuracy A ₁ The method comprises the steps of carrying out a first treatment on the surface of the Removing second dimension data in n-dimension characteristic data of each EEG signal sample, dividing the m EEG signal samples into a training set and a test set, training and classifying the rest n-1 dimension data by using SVM to obtain classification accuracy A ₂ And analogically, finally obtaining the accuracy A of n classifications ₁ ,A ₂ ,…A _i ,…,A _n ；

4) The classification total accuracy A obtained in the step 2) is respectively compared with the classification accuracy A obtained in the step 3) ₁ ,A ₂ ,…A _i ,…,A _n Performing difference to obtain n difference values; when the difference is negative, the classification accuracy A of the difference is made _i The corresponding removed ith dimension features have negative influence on classification, and when the difference value is positive, the classification accuracy A is poor _i The corresponding removed ith dimension feature has positive influence on classification;

5) Respectively carrying out normalization processing on the n differences to obtain n weights;

6) For n weights w _i i=1, 2, …, n, constructing a weight diagonal matrix W _n*n ：

7) Writing samples of m EEG signals into an m X n dimensional matrix X _m*n ：

Wherein x is ₁ ...x _m Samples of m EEG signals;

8) Matrix X in m X n dimensions _m*n Diagonal matrix W with weight W _n*n Multiplying to obtain weighted EEG signal characteristic data Z _m*n ：

9) Computing weighted EEG signal feature data Z _m*n Covariance matrix C';

10 Decomposing the covariance matrix C' to obtain a characteristic value of the covariance matrix and a unitized characteristic vector corresponding to the characteristic value;

11 Selecting unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices according to a given accumulated contribution rate, and combining to form a mapping matrix P';

12 Using the mapping matrix P ', obtaining reduced-dimension EEG signal characteristic data Y' by the following formula:

Y′＝Z _m*n P′ (9)

wherein Z is _m*n Is weighted EEG signal characteristic data.

The normalization processing in the step 5) adopts the following formula:

wherein d is data before normalization, max is the largest value of n differences, min is the smallest value of n differences, and w is a weight.

The covariance matrix in step 9) is calculated by the following formula:

wherein, C' is covariance matrix; n represents the dimension of each sample feature; z is Z _m*n Representing weighted EEG signal characteristic data.

The decomposition covariance matrix C' of step 10) is obtained by using the following formula

λ _i u _i ＝C′u _i ，i＝1,2,3,…,n (6)

Wherein lambda is _i As the ith eigenvalue of the covariance matrix, u _i Lambda is lambda _i Corresponding unitized eigenvectors are arranged in descending order for the obtained n eigenvalues to obtain lambda ₁ ,λ ₂ ,…,λ _n 。

Step 11) includes selecting eigenvalues of the first k covariance matrices, and calculating cumulative contribution rates of eigenvalues of the ith covariance matrix as follows:

according to the accumulated contribution rate, unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices are selected and combined to form a mapping matrix P ''

P′＝[u ₁ ,u ₂ ,…,u _k ] (8)

Wherein i is less than or equal to k, alpha _i To accumulate contribution rate lambda _j As the j-th eigenvalue of covariance matrix, u ₁ ,u ₂ ,…,u _k The eigenvectors are unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices.

The EEG signal feature dimension reduction method based on weighted principal component analysis has the beneficial effects that:

first, principal component analysis is based on data variance dimensionality reduction, with each dimension of the features being treated on average, however the role played by different features in the recognition process is different. The invention reduces the dimension based on the attribute of the characteristic data, strengthens some characteristics which are critical to the identification, and weakens some characteristics which have little relation with the identification.

Secondly, the invention can effectively improve the classification recognition precision and reduce the training time of the recognition model by reducing the dimension of the electroencephalogram characteristic data.

Drawings

FIG. 1 is a flow chart of a method of feature dimensionality reduction in EEG signals based on weighted principal component analysis in accordance with the present invention;

FIG. 2a is a graph of experimental results for subject 1;

FIG. 2b is a graph of experimental results for subject 2;

FIG. 2c is a graph of experimental results for subject 3;

FIG. 2d is a graph of experimental results for subject 4;

fig. 2e is a graph of experimental results for subject 5.

Detailed Description

A method for reducing the feature dimensions of EEG signals based on weighted principal component analysis according to the invention will now be described in more detail with reference to the examples and drawings.

According to the EEG signal feature dimension reduction method based on the weighted principal component analysis, one feature is sequentially removed, the influence of different features on fatigue state classification performance is counted, then the accuracy reduction value normalization processing of the different features is used as the weight of the feature, and finally the weighted principal component analysis method is established to reduce the dimension of the feature.

The invention discloses an EEG signal characteristic dimension reduction method based on weighted principal component analysis, which comprises the following steps:

3) Removing first dimension characteristic data in n dimension characteristic data of each EEG signal sample to obtain n-1 dimension characteristic data of each EEG signal sample, dividing the m EEG signal samples into a training set and a test set, respectively training the training set and the test set by using SVM, classifying the test set, and obtaining classification accuracy A ₁ The method comprises the steps of carrying out a first treatment on the surface of the Removing second dimension data in n-dimension characteristic data of each EEG signal sample, dividing the m EEG signal samples into a training set and a test set, training and classifying the rest n-1 dimension data by using SVM to obtain classification accuracy A ₂ And analogically, finally obtaining the accuracy A of n classifications ₁ ,A ₂ …A _i ,…,A _n ；

4) The classification total accuracy A obtained in the step 2) is respectively compared with the classification accuracy A obtained in the step 3) ₁ ,A ₂ …A _i ,…,A _n Performing difference to obtain n difference values; when the difference is negative, the classification accuracy A of the difference is made _i The corresponding removed ith dimension features have negative influence on classification, and when the difference value is positive, the classification accuracy A is poor _i The corresponding removed ith dimension feature has positive influence on classification;

5) Respectively carrying out normalization processing on the n differences to obtain n weights; the normalization process adopts the following formula:

7) Writing samples of m EEG signals into an m X n dimensional matrix X _m*n ：

Wherein x is ₁ ...x _m Samples of m EEG signals;

9) Computing weighted EEG signal feature data Z _m*n Covariance matrix C'; the covariance matrix is calculated by adopting the following formula:

10 Decomposing the covariance matrix C' to obtain a characteristic value of the covariance matrix and a unitized characteristic vector corresponding to the characteristic value; the decomposition covariance matrix C' is prepared by adopting the following formula

λ _i u _i ＝C′u _i ，i＝1,2,3,…,n (6)

11 Selecting unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices according to a given accumulated contribution rate, and combining to form a mapping matrix P'; the method comprises the steps of selecting eigenvalues of the first k covariance matrixes, and calculating the cumulative contribution rate of the eigenvalues of the ith covariance matrix as follows:

P′＝[u ₁ ,u ₂ ,…,u _k ] (8)

Y′＝Z _m*n P′ (9)

wherein Z is _m*n Is weighted EEG signal characteristic data.

EEG signal acquisition in the examples of the present invention was performed in simulated driving experiments in which 5 right-handed college students aged 19 years to 26 years (3 men and 2 women; average age: 22.73) voluntarily participated in the experiment, and none of them had psychosis-related diseases. Two days prior to the experiment, subjects were required to avoid taking anti-fatigue beverages or drugs that cause somnolence. At the same time, they need to remain at a reasonable rest, with sleep times exceeding 7 hours per night. Since all subjects did not contact the driving simulator, they had to practice driving until proficiency was achieved before the experiment.

Experiments used PGFD001 driving simulators and were equipped with pedals, steering wheel and clutches. In the virtual driver 3DInstructor2, ordinary automobile pharton 2.0l is used, defaulting to automatic gear shifting. The experimental road is a highway with almost no curve. In addition, a webcam 360D618, a projector and a stereo speaker are added to improve the perceived ability.

In order to improve the fatigue of the subjects, the experimental time was 14:00-15:30, there are studies showing that this is a period of time that is prone to fatigue, with a trial time of about 90 minutes. The subject's whole scalp EEG signal is collected in an isolated and quiet room. In addition, we also monitored the facial status of the subject through the front-facing camera to verify the extent of fatigue. Before the start of the experiment there was a 10 minute scene setting and approximately 20 minutes driving exercise. After the start of the experiment, the subject will continue to drive until his mild fatigue is reported, which typically lasts about 30 minutes as a state of alertness. Then, after a continuous driving transition for 10 minutes, the subject underwent driving for another 30 minutes, which was used as a fatigue state. After this stage, each subject was asked to report some questions and post-experimental comments. The recording time for each subject was approximately 90 minutes, which varies slightly from individual to individual. As the experiment progresses, the fatigue of the subject becomes more and more intense. As important physiological measures of fatigue status, their subjective evaluation and actual behavior show more and more fatigue.

Data acquisition and data preprocessing:

as a measure of driver fatigue status, the EEG recording device was equipped with an ESI Neuroscan system, with 40 electrodes and arranged according to the International Standard 10/20 System, with a sampling frequency of 200Hz. Prior to acquisition, the skin impedance of the EEG electrode was adjusted to below 5kΩ by injecting a conductive gel. During the experiment, all subjects limited as much unnecessary physical movements as possible, maintained constant speed as much as possible and avoided car collisions. Of these 40 electrodes, two are defined as reference electrodes in addition to four electrodes as internal structures, and four (placed across the horizontal and vertical directions) are used to monitor eye movements. The original EEG signal is subject to high frequency noise and low frequency noise based on eye power and therefore requires pre-processing using the EEGLAB kit. A 30-channel EEG signal is obtained after eliminating the interference of noise and ocular artifacts.

The data collected in the first 10min of the alert state are selected as non-fatigue data, and the data collected in the last 10min of the fatigue state are selected as fatigue data. The data segmentation was performed using a sliding window with a fixed length, the window length being 1 second, without overlap. After the data segmentation was completed, 1200 sets of samples were obtained for each subject.

The embodiment of the invention applies an AR model to extract characteristics of electroencephalogram data, and the AR modeling principle is as follows:

where x (t) represents EEG data at time t, P represents AR model order, e (t) is a white noise sequence, and a (k) represents AR model coefficients.

AR modeling requires selection of model orders. The orders are 3, 4 and 5, respectively. The size of the AR features is equal to the AR order multiplied by 30 EEG channel units, so the AR 3 order gets 90 feature units, the AR 4 order gets 120 feature units, and the AR 5 order gets 150 feature units.

For comparison, the embodiment of the invention also adopts the power spectrum density as a characteristic extraction mode. Electroencephalogram includes four bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz) and beta (14-30 Hz). The total power of each EEG activity band is calculated by adopting a trapezoid numerical integration method as a characteristic value, and 4 units of characteristic values can be obtained, so that 120 characteristic units are obtained in total for 30 EEG channels.

The EEG signal feature dimension reduction method based on weighted principal component analysis is verified through a classifier and a classification index:

the support vector machine is an efficient supervised two-classifier and plays an important role in realizing data classification of small samples, nonlinearities and high-dimensional modes. The invention selects SVM as classifier and Radial Basis Function (RBF) as kernel function.

For evaluating the classification result, the following three evaluation indexes are used:

accuracy rate:

sensitivity:

specificity:

TP is the number of correctly identified positive samples, namely the number of correctly identified samples for fatigue driving; TN is the number of samples that correctly identifies the negative sample, i.e., identifies normal driving as normal driving; FN is the number of samples for which no positive samples are identified, namely the number of samples for which fatigue driving is identified as normal driving; FP is the number of samples for which no negative samples are identified, i.e., the number of samples for which normal driving is identified as fatigue driving. The accuracy reflects the proportion of the samples with correct classification to the total samples, the sensitivity reflects the classification accuracy of the positive samples, and the specificity reflects the classification accuracy of the negative samples.

Matlab experiment settings:

in the 30-channel electroencephalogram data set of 5 subjects, 1200 sets of sample data were obtained for each subject. The invention adopts an AR model to extract the characteristics of the data set, the orders are 3 order, 4 order and 5 order respectively, and simultaneously adopts the power spectrum density as the characteristic extraction mode.

In the embodiment of the invention, WPCA is adopted to reduce the dimension of the characteristic data, the cumulative contribution rate is 0.95, and SVM is selected as a classifier. To verify the reliability of the experimental results, a 10-fold cross-validation method was used: the dataset was divided into 10 parts, 1 part as test set and the remaining 9 parts as training set. The cross-validation was repeated 10 times so that each could be tested as a test set and finally the average of 10 test data was taken as the final result.

Claims

1. An EEG signal feature dimension reduction method based on weighted principal component analysis, comprising the steps of:

3) Removing first dimension characteristic data in n dimension characteristic data of each EEG signal sample to obtain n-1 dimension characteristic data of each EEG signal sample, dividing the m EEG signal samples into a training set and a test set, respectively training the training set and the test set by using SVM, classifying the test set, and obtaining classification accuracy A ₁ The method comprises the steps of carrying out a first treatment on the surface of the Removing second dimension data in n-dimension characteristic data of each EEG signal sample, dividing the m EEG signal samples into a training set and a test set, training and classifying the rest n-1 dimension data by using SVM to obtain classification accuracy A ₂ And analogically, finally obtaining the accuracy A of n classifications ₁ ,A ₂ ,…A _i ,…,A _n ；

7) Writing samples of m EEG signals into an m X n dimensional matrix X _m*n ：

Wherein x is ₁ ...x _m Samples of m EEG signals;

wherein, C' is covariance matrix; n represents the dimension of each sample feature; z is Z _m*n Representing weighted EEG signal characteristic data;

10 Decomposing the covariance matrix C' to obtain a characteristic value of the covariance matrix and a unitized characteristic vector corresponding to the characteristic value; the decomposition covariance matrix C' is prepared by adopting the following formula:

λ _i u _i ＝C′u _i ，i＝1,2,3,…,n (6)

wherein lambda is _i As the ith eigenvalue of the covariance matrix, u _i Lambda is lambda _i Corresponding unitized eigenvectors are arranged in descending order for the obtained n eigenvalues to obtain lambda ₁ ,λ ₂ ,…,λ _n ；

P′＝[u ₁ ,u ₂ ,…,u _k ] (8)

Wherein i is less than or equal to k, alpha _i To accumulate contribution rate lambda _j As the j-th eigenvalue of covariance matrix, u ₁ ,u ₂ ,…,u _k Unitized eigenvectors corresponding to eigenvalues of the first k covariance matrices;

Y′＝Z _m*n P′ (9)

wherein Z is _m*n Is weighted EEG signal characteristic data.

2. The method of claim 1, wherein the normalization in step 5) is performed using the following formula: