CN109903852A

CN109903852A - User-customized intelligent epilepsy prediction method based on PCA-LDA

Info

Publication number: CN109903852A
Application number: CN201910048393.6A
Authority: CN
Inventors: 曹九稳; 胡丁寒
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2019-06-18

Abstract

The invention discloses an intelligent epilepsy prediction method based on PCA-LDA user customization. The steps of the present invention are as follows: Step 1, classification of the EEG signal and a sample resampling algorithm based on SMOTE, to convert the sample imbalance classification problem into the sample balanced classification problem; Step 2, perform 6-layer wavelet packet on the EEG signal. Transform, extract statistical features and wavelet packet features; step 3, use principal component analysis and linear discriminant analysis to reduce the feature dimension for the high-dimensional features fused by SFs and WPFs; step 4, apply deep forest algorithm to construct Epilepsy prediction model for a sample dataset. The invention can not only overcome the problem of unbalanced clinical EEG signal samples, but also solve the problem of long time and low efficiency of high-dimensional feature training, and can also achieve accurate prediction of epileptic seizure period and pre-epilepsy period.

Description

Based on the customized intelligent Epileptic Prediction of PCA-LDA

Technical field

The invention belongs to EEG Processing and field of wisdom medical treatment, are related to a kind of special based on PCA-LDA fusion statistics The customized intelligent Epileptic Prediction of (SFs) and wavelet packet character (WPFs) is levied, the present invention is based on unbalanced brain electricity The intellectual monitoring algorithm of EEG data collection.

Background technique

Traditional epileptics monitoring model is usually the brain electricity EEG signal using a large amount of epileptics to construct, and is built The premise of mould often assumes that stage of attack, breaking-out early period and interictal sample are balanced.And in practical clinical, by In the transience of the sporadic and stage of attack of epilepsy, what the epileptic seizure prediction algorithm of personal user's customization almost all suffered from is one The modeling problem of a imbalanced data sets.For single epileptic, time stage of attack is relative to interictal and breaks out early period, Its corresponding EEG sample set is that extreme is unbalanced, and traditional prediction algorithm based on the building of equalization data collection can not solve this Class problem.For this problem, using SMOTE resampling methods, the invention proposes one kind to merge statistics based on PCA-LDA The customized intelligent Epileptic Prediction of feature (SFs) and wavelet packet character (WPFs)；Meanwhile for breaking-out EEG early period Sample of signal does classification, to realize the accurate detection of epileptic attack early period and stage of attack.

Summary of the invention

The present invention is sporadic for epileptics, and the epileptic attack phase is short and this longer feature of other periods, utilizes SMOTE Resampling methods handle the imbalanced training sets problem, propose the feature extracting method based on PCA-LDA fusion SFs and WPFs, And epileptic attack early period and accurate prediction model stage of attack of unbalanced sample are constructed in conjunction with depth forest algorithm.

Technical solution of the present invention mainly includes the following steps:

Step 1, the category division of brain electricity EEG signal and be based on SMOTE (Synthetic Minority Oversampling Technique) sample resampling methods, imbalanced training sets classification problem is converted into sample equilibrium point Class problem.

Step 2 carries out 6 layers of wavelet package transforms to brain electricity EEG signal, extracts statistics feature (SFs) and wavelet packet character (WPFs)。

Step 3, the high dimensional feature merged for SFs with WPFs, using Principal Component Analysis (PCA) and linear discriminant analysis Method (LDA) does Feature Dimension Reduction.

Step 4 is directed to unequal sample numbers according to the epileptic prediction model of collection using the building of depth forest algorithm.

The step 1, the specific steps are as follows:

The category division of 1-1 brain electricity EEG signal.Seizure class will be denoted as stage of attack；By the data after four hours stages of attack Four hours data are denoted as interictal Interictal class before stage of attack next time；Breaking-out is divided into 3 one hour early period A continuous and nonoverlapping period, i.e., breaking-out first 60 minutes to breaking-out first 40 minutes are Pre1 class, are extremely broken out within breaking-out first 40 minutes First 20 minutes are Pre2 class, and breaking-out first 20 minutes to breaking-out first 0 minute are Pre3 class.

1-2 is since the data time of Seizure class in the brain electricity EEG signal of single epileptic is shorter, and other classes Other data time is longer, results in imbalanced training sets, to solve this problem, using SMOTE resampling methods to Seizure class Sample is analyzed, generates new group sample to reach data balancing, and balanced sample classification problem is formed.

The step 2, the specific steps are as follows:

For the every class signal of 2-1 using mono- frame of 2s as input x, taking frame Duplication is 50%.To each of each frame input signal Channel carries out 6 layers of WAVELET PACKET DECOMPOSITION, and therefore, the last layer after decomposition includes 64 frequency sub-band s_m, m=1,2 ..., 64. In 15 frequency sub-band s of low frequency_m′, in m '=1,2 ..., 15, mean value, standard deviation, intermediate value, coefficient of kurtosis are calculated to each frequency range With coefficient of skew constitutive characteristic SFs.For the brain electricity EEG signal in M channel, this five statistics features constitute 75*M dimension Feature vector SFs.

2-2 calculates mean value by following formula:

2-3 calculates standard deviation by following formula:

2-4 calculates median by the following method:

By s_m′(k) it sorts from small to large ord are as follows:Then when N is odd number, median isWhen N is even number, median is

2-5 calculates coefficient of kurtosis by following formula:

2-6 calculates the coefficient of skew by following formula

Wherein s_m′, m '=1,2 ..., 15 indicates the small echo after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band reconstruct of layer 6 Packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate a frequency sub-band of m ' Factor standard is poor.

2-7 characteristic vector W PFs is made of energy ratio and three kinds of entropys, i.e. the energy ratio of 64 sub- frequency extractions of the last layer With the Shannon entropy, logarithmic energy entropy and norm entropy of 15 sub- frequency extractions of low frequency.The brain electricity EEG for being N for M channel frame length Signal, energy ratio and three kinds of entropys constitute the characteristic vector W PFs of 109*M dimension.

2-8 calculates energy ratio by following formula:

Wherein s_m(m=1,2 ..., 65) indicate WAVELET PACKET DECOMPOSITION after layer 6 m-th of frequency sub-band coefficient reconstruct after Wavelet packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that m-th of frequency sub-band Coefficient Mean, σ indicate m-th of son frequency Section factor standard is poor.

2-9 calculates Shannon entropy by following formula

2-10 calculates logarithmic energy entropy by following formula

2-11 calculates norm entropy by following formula

Wherein s_m′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6 Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and p >=1 is positive integer.

The step 3, the specific steps are as follows:

Multi-channel feature vector SFs and characteristic vector W PFs fusion is constituted the high dimensional feature vector of (184*M) dimension by 3-1, Principal component analysis and dimensionality reduction are carried out to it using PCA technology, feature of the variance contribution ratio preceding 90% is retained in reduction process, i.e., Lower dimensional feature can be obtained.

3-2 carries out secondary dimensionality reduction to lower dimensional feature obtained in 3-1 using LDA.The dimension of the LDA dimensionality reduction of selection is less than Equal to classification number.In this classification problem, sample class number is 5, therefore the dimension of settable LDA dimensionality reduction is 4 dimensions.

3-3 is operated by above-mentioned dimensionality reduction twice, and the high dimensional feature vector that the SFs of higher-dimension is merged with WPFs is down to low-dimensional, The low-dimensional feature based on PCA-LDA fusion SFs and WPFs can be obtained.

3-4 carries out features described above extraction, fusion and dimensionality reduction to all training samples, and training set feature can be obtained.

The step 4, the specific steps are as follows:

4-1 combination depth forest algorithm, three levels of setting join, by adjusting the number of the random forest of knot cascading layers and every The parameter of a random forest decision tree constructs optimal sorter model.

4-2 is to any test sample, using the feature extracting method above-mentioned based on PCA-LDA fusion SFs and WPFs, benefit Prediction result is calculated with the sorter model built.

The present invention has the beneficial effect that

The present invention is customized for personal user faces that epileptics is sporadic and transience stage of attack in seizure monitoring system Problem is constructed using SMOTE resampling methods based on PCA-LDA fusion SFs and WPFs and in conjunction with depth forest classified device The accurate prediction model of epilepsy can not only overcome the problems, such as clinical brain electricity EEG signal imbalanced training sets, but also can solve high dimensional feature instruction The problem of practicing time long low efficiency, additionally it is possible to the accurate prediction realizing the epileptic attack phase and breaking out early period.

Detailed description of the invention

Algorithm flow chart Fig. 1 of the invention.

The depth forest cascade structure that Fig. 2 present invention uses.

Specific embodiment

It elaborates with reference to the accompanying drawings and detailed description to the present invention.

As illustrated in fig. 1 and 2, based on the customized intelligent Epileptic Prediction of PCA-LDA, the general epileptic attack phase with The realization step of breaking-out prediction technique early period has detailed introduction in summary of the invention, i.e. technical solution of the present invention is mainly wrapped Include following steps:

Step 1, the category division of brain electricity EEG signal and be based on SMOTE (Synthetic Minority Oversampling Technique) small sample resampling methods, by imbalanced training sets classification problem be converted into sample equilibrium Classification problem.

The step 1, the specific steps are as follows:

The step 2, the specific steps are as follows:

For the every class signal of 2-1 using mono- frame of 2s as input x, taking frame Duplication is 50%.To each of each frame input signal Channel carries out 6 layers of WAVELET PACKET DECOMPOSITION, and therefore, the last layer after decomposition includes 64 frequency sub-band s_m, m=1,2 ..., 64. In 15 frequency sub-band s of low frequency_m′, in m '=1,2 ..., 15, mean value, standard deviation, intermediate value, coefficient of kurtosis are calculated to each frequency range With coefficient of skew constitutive characteristic SFs.For the brain electricity EEG signal in M channel, this five statistics features constitute (75*M) dimension Feature vector SFs.

2-2 calculates mean value by following formula

Wherein s_m′(15) m '=1 2 ..., indicates small after the m ' a frequency sub-band reconstruct of layer 6 after WAVELET PACKET DECOMPOSITION Wave packet coefficient, N indicate the points that each frequency sub-band includes.

2-3 calculates standard deviation by following formula

Wherein s_m′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6 Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and μ indicates a frequency sub-band Coefficient Mean of m '.

2-4 calculates median by the following method, by s_m′(k) it sorts from small to large ord are as follows:Then when N is odd number, median isWhen N is even number, median For

2-5 calculates coefficient of kurtosis by following formula

Wherein s_m′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6 Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate that m ' is a Frequency sub-band factor standard is poor.

2-6 calculates the coefficient of skew by following formula

2-7 feature WPFs is made of energy ratio and three kinds of entropys, i.e. the energy ratio of 64 sub- frequency extractions of the last layer and low Shannon entropy, logarithmic energy entropy and the norm entropy of 15 sub- frequency extractions of frequency.The brain electricity EEG that M channel frame length is N is believed Number, energy ratio and three kinds of entropys constitute the characteristic vector W PFs of (109*M) dimension.

2-8 calculates energy ratio by following formula

Wherein s_m(m=1,2 ..., 64) indicate WAVELET PACKET DECOMPOSITION after layer 6 m-th of frequency sub-band coefficient reconstruct after Wavelet packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that m-th of frequency sub-band Coefficient Mean, σ indicate m-th of son frequency Section factor standard is poor.

2-9 calculates Shannon entropy by following formula

Wherein s_m′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6 Wavelet packet coefficient, N indicates the points that each frequency sub-band includes.

2-10 calculates logarithmic energy entropy by following formula

2-11 calculates norm entropy by following formula

The step 3, the specific steps are as follows:

Multichannel SFs and WPFs Fusion Features are constituted the high dimensional feature vector of (184*M) dimension by 3-1, using PCA technology pair It carries out principal component analysis and dimensionality reduction, and variance contribution ratio is retained in reduction process in preceding 90% feature, can be obtained compared with low-dimensional Feature.

3-2 carries out secondary dimensionality reduction to lower dimensional feature obtained in 3-1 using LDA.Generally, the LDA dimensionality reduction of selection Dimension is less than or equal to classification number.In this classification problem, sample class number is 5, therefore the dimension of settable LDA dimensionality reduction is 4 dimensions.

3-3 is operated by above-mentioned dimensionality reduction twice, and the SFs of higher-dimension and WPFs fusion feature are down to low-dimensional, base can be obtained In the low-dimensional feature of PCA-LDA fusion SFs and WPFs.

The step 4, the specific steps are as follows:

Join in order to reach better epileptic seizure phase and breaking-out accurate prediction effect early period, when below will be from practical application Several selections and design aspect expansion are introduced, the reference to be used for other application as the invention:

For this method when handling epileptic electroencephalogram (eeg) EEG signal, it is 2 seconds that each frame input, which is arranged, and the Duplication of frame is set as 50%. Aiming at the problem that clinically personal user customizes and faces sporadic epileptics and transience stage of attack in monitoring system, present invention benefit Imbalanced training sets problem is handled with SMOTE resampling methods.The statistics feature and 2-9 to 2-11 that step 2-1 to 2-6 is extracted mention The three kinds of entropys taken only extract on 15 low frequency frequency sub-band, this is because containing the most main of epileptic EEG Signal in low frequency band Information is wanted, and the energy ratio that step 2-8 is extracted is extracted on 64 frequency sub-band, this is to guarantee the complete of energy information Property.

In PCA reduction process, retain feature of the variance contribution ratio preceding 90%；In bis- dimensionality reductions of LDA, after dimensionality reduction is set Dimension be 4 dimension.The depth forest classified algorithm finally used be provided with that three levels hold examination jointly worry to be if that cascading layers are very few can lead Cause classifying quality poor, if excessively to will lead to the training time too long for cascading layers.

It is proposed of the invention can be used for solving clinically customizing in seizure monitoring system for personal user facing epileptics The problem of sporadic and transience stage of attack, by the artificial synthesized new minority class sample of analysis to minority class sample, and adopts Low-dimensional feature vector is obtained with the method based on PCA-LDA fusion SFs and WPFs, can be reached in conjunction with depth forest classified model Accurately epileptic attack phase and breaking-out prediction early period.

Claims

1. based on the customized intelligent Epileptic Prediction of PCA-LDA, it is characterised in that include the following steps:

Step 1, the category division of brain electricity EEG signal and the sample resampling methods based on SMOTE ask imbalanced training sets classification Topic is converted into sample equilibrium classification problem；

Step 2 carries out 6 layers of wavelet package transforms to brain electricity EEG signal, extracts statistics feature and wavelet packet character；

Step 3, the high dimensional feature merged for SFs with WPFs, do feature using Principal Component Analysis and Fisher face Dimensionality reduction；

2. according to claim 1 based on the customized intelligent Epileptic Prediction of PCA-LDA, it is characterised in that described Step 1 is implemented as follows:

The category division of 1-1 brain electricity EEG signal；Seizure class will be denoted as stage of attack；By the data after four hours stages of attack under Four hours data are denoted as interictal Interictal class before stage of attack；Breaking-out is divided into three companies one hour early period Continuous and nonoverlapping period, i.e., breaking-out first 60 minutes to breaking-out first 40 minutes are Pre1 class, breaking-out extremely breaking-out preceding 20 in first 40 minutes Minute is Pre2 class, and breaking-out first 20 minutes to breaking-out first 0 minute are Pre3 class；

1-2 is since the data time of Seizure class in the brain electricity EEG signal of single epileptic is shorter, and other classification numbers It is longer according to the time, imbalanced training sets are resulted in, to solve this problem, using SMOTE resampling methods to Seizure class sample It analyzed, generate new group sample to reach data balancing, form balanced sample classification problem.

3. according to claim 2 based on the customized intelligent Epileptic Prediction of PCA-LDA, it is characterised in that described Step 2 is implemented as follows:

For the every class signal of 2-1 using mono- frame of 2s as input x, taking frame Duplication is 50%；To each channel of each frame input signal 6 layers of WAVELET PACKET DECOMPOSITION are carried out, therefore, the last layer after decomposition includes 64 frequency sub-band s_m, m=1,2 ..., 64；Low 15 frequency sub-band s of frequency_m′, in m '=1,2 ..., 15, mean value, standard deviation, intermediate value, coefficient of kurtosis and partially are calculated to each frequency range State coefficient constitutive characteristic SFs；For the brain electricity EEG signal in M channel, this five statistics features constitute the feature of 75*M dimension Vector SFs；

2-2 calculates mean value by following formula:

2-3 calculates standard deviation by following formula:

2-4 calculates median by the following method:

2-5 calculates coefficient of kurtosis by following formula:

2-6 calculates the coefficient of skew by following formula

Wherein s_m′, m '=1,2 ..., 15 indicates the wavelet packet system after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band reconstruct of layer 6 Number, N indicate the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate a frequency sub-band coefficient of m ' Standard deviation；

2-7 characteristic vector W PFs is made of energy ratio and three kinds of entropys, i.e. the energy ratio of 64 sub- frequency extractions of the last layer and low Shannon entropy, logarithmic energy entropy and the norm entropy of 15 sub- frequency extractions of frequency；The brain electricity EEG that M channel frame length is N is believed Number, energy ratio and three kinds of entropys constitute the characteristic vector W PFs of 109*M dimension；

2-8 calculates energy ratio by following formula:

Wherein s_m(m=1,2 ..., 64) indicate WAVELET PACKET DECOMPOSITION after layer 6 m-th of frequency sub-band coefficient reconstruct after wavelet packet Coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that m-th of frequency sub-band Coefficient Mean, σ indicate m-th of frequency sub-band coefficient Standard deviation；

2-9 calculates Shannon entropy by following formula

2-10 calculates logarithmic energy entropy by following formula

2-11 calculates norm entropy by following formula

Wherein s_m′(15) m '=1 2 ..., indicates small after the m ' a frequency sub-band coefficient reconstruct of layer 6 after WAVELET PACKET DECOMPOSITION Wave packet coefficient, N indicate the points that each frequency sub-band includes, and p >=1 is positive integer.

4. according to claim 3 based on the customized intelligent Epileptic Prediction of PCA-LDA, it is characterised in that described Step 3 is implemented as follows:

Multi-channel feature vector SFs and characteristic vector W PFs fusion is constituted the high dimensional feature vector of 184*M dimension by 3-1, using PCA Technology carries out principal component analysis and dimensionality reduction to it, and variance contribution ratio is retained in reduction process in preceding 90% feature, can be obtained Lower dimensional feature；

3-2 carries out secondary dimensionality reduction to lower dimensional feature obtained in 3-1 using LDA；The dimension of the LDA dimensionality reduction of selection is less than or equal to Classification number；In this classification problem, sample class number is 5, therefore the dimension of settable LDA dimensionality reduction is 4 dimensions；

3-3 is operated by above-mentioned dimensionality reduction twice, and the high dimensional feature vector that the SFs of higher-dimension is merged with WPFs is down to low-dimensional Obtain the low-dimensional feature that SFs and WPFs is merged based on PCA-LDA；

5. according to claim 4 based on the customized intelligent Epileptic Prediction of PCA-LDA, it is characterised in that described Step 4 is implemented as follows:

4-1 combination depth forest algorithm, setting three levels join, by adjust knot cascading layers random forest number and each with The parameter of machine forest decision tree constructs optimal sorter model；

4-2 is to any test sample, using the feature extracting method above-mentioned based on PCA-LDA fusion SFs and WPFs, using The sorter model built calculates prediction result.