Summary of the invention
The present invention is sporadic for epileptics, and the epileptic attack phase is short and this longer feature of other periods, utilizes SMOTE
Resampling methods handle the imbalanced training sets problem, propose the feature extracting method based on PCA-LDA fusion SFs and WPFs,
And epileptic attack early period and accurate prediction model stage of attack of unbalanced sample are constructed in conjunction with depth forest algorithm.
Technical solution of the present invention mainly includes the following steps:
Step 1, the category division of brain electricity EEG signal and be based on SMOTE (Synthetic Minority
Oversampling Technique) sample resampling methods, imbalanced training sets classification problem is converted into sample equilibrium point
Class problem.
Step 2 carries out 6 layers of wavelet package transforms to brain electricity EEG signal, extracts statistics feature (SFs) and wavelet packet character
(WPFs)。
Step 3, the high dimensional feature merged for SFs with WPFs, using Principal Component Analysis (PCA) and linear discriminant analysis
Method (LDA) does Feature Dimension Reduction.
Step 4 is directed to unequal sample numbers according to the epileptic prediction model of collection using the building of depth forest algorithm.
The step 1, the specific steps are as follows:
The category division of 1-1 brain electricity EEG signal.Seizure class will be denoted as stage of attack;By the data after four hours stages of attack
Four hours data are denoted as interictal Interictal class before stage of attack next time;Breaking-out is divided into 3 one hour early period
A continuous and nonoverlapping period, i.e., breaking-out first 60 minutes to breaking-out first 40 minutes are Pre1 class, are extremely broken out within breaking-out first 40 minutes
First 20 minutes are Pre2 class, and breaking-out first 20 minutes to breaking-out first 0 minute are Pre3 class.
1-2 is since the data time of Seizure class in the brain electricity EEG signal of single epileptic is shorter, and other classes
Other data time is longer, results in imbalanced training sets, to solve this problem, using SMOTE resampling methods to Seizure class
Sample is analyzed, generates new group sample to reach data balancing, and balanced sample classification problem is formed.
The step 2, the specific steps are as follows:
For the every class signal of 2-1 using mono- frame of 2s as input x, taking frame Duplication is 50%.To each of each frame input signal
Channel carries out 6 layers of WAVELET PACKET DECOMPOSITION, and therefore, the last layer after decomposition includes 64 frequency sub-band sm, m=1,2 ..., 64.
In 15 frequency sub-band s of low frequencym′, in m '=1,2 ..., 15, mean value, standard deviation, intermediate value, coefficient of kurtosis are calculated to each frequency range
With coefficient of skew constitutive characteristic SFs.For the brain electricity EEG signal in M channel, this five statistics features constitute 75*M dimension
Feature vector SFs.
2-2 calculates mean value by following formula:
2-3 calculates standard deviation by following formula:
2-4 calculates median by the following method:
By sm′(k) it sorts from small to large ord are as follows:Then when N is odd number, median isWhen N is even number, median is
2-5 calculates coefficient of kurtosis by following formula:
2-6 calculates the coefficient of skew by following formula
Wherein sm′, m '=1,2 ..., 15 indicates the small echo after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band reconstruct of layer 6
Packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate a frequency sub-band of m '
Factor standard is poor.
2-7 characteristic vector W PFs is made of energy ratio and three kinds of entropys, i.e. the energy ratio of 64 sub- frequency extractions of the last layer
With the Shannon entropy, logarithmic energy entropy and norm entropy of 15 sub- frequency extractions of low frequency.The brain electricity EEG for being N for M channel frame length
Signal, energy ratio and three kinds of entropys constitute the characteristic vector W PFs of 109*M dimension.
2-8 calculates energy ratio by following formula:
Wherein sm(m=1,2 ..., 65) indicate WAVELET PACKET DECOMPOSITION after layer 6 m-th of frequency sub-band coefficient reconstruct after
Wavelet packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that m-th of frequency sub-band Coefficient Mean, σ indicate m-th of son frequency
Section factor standard is poor.
2-9 calculates Shannon entropy by following formula
2-10 calculates logarithmic energy entropy by following formula
2-11 calculates norm entropy by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and p >=1 is positive integer.
The step 3, the specific steps are as follows:
Multi-channel feature vector SFs and characteristic vector W PFs fusion is constituted the high dimensional feature vector of (184*M) dimension by 3-1,
Principal component analysis and dimensionality reduction are carried out to it using PCA technology, feature of the variance contribution ratio preceding 90% is retained in reduction process, i.e.,
Lower dimensional feature can be obtained.
3-2 carries out secondary dimensionality reduction to lower dimensional feature obtained in 3-1 using LDA.The dimension of the LDA dimensionality reduction of selection is less than
Equal to classification number.In this classification problem, sample class number is 5, therefore the dimension of settable LDA dimensionality reduction is 4 dimensions.
3-3 is operated by above-mentioned dimensionality reduction twice, and the high dimensional feature vector that the SFs of higher-dimension is merged with WPFs is down to low-dimensional,
The low-dimensional feature based on PCA-LDA fusion SFs and WPFs can be obtained.
3-4 carries out features described above extraction, fusion and dimensionality reduction to all training samples, and training set feature can be obtained.
The step 4, the specific steps are as follows:
4-1 combination depth forest algorithm, three levels of setting join, by adjusting the number of the random forest of knot cascading layers and every
The parameter of a random forest decision tree constructs optimal sorter model.
4-2 is to any test sample, using the feature extracting method above-mentioned based on PCA-LDA fusion SFs and WPFs, benefit
Prediction result is calculated with the sorter model built.
The present invention has the beneficial effect that
The present invention is customized for personal user faces that epileptics is sporadic and transience stage of attack in seizure monitoring system
Problem is constructed using SMOTE resampling methods based on PCA-LDA fusion SFs and WPFs and in conjunction with depth forest classified device
The accurate prediction model of epilepsy can not only overcome the problems, such as clinical brain electricity EEG signal imbalanced training sets, but also can solve high dimensional feature instruction
The problem of practicing time long low efficiency, additionally it is possible to the accurate prediction realizing the epileptic attack phase and breaking out early period.
Specific embodiment
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
As illustrated in fig. 1 and 2, based on the customized intelligent Epileptic Prediction of PCA-LDA, the general epileptic attack phase with
The realization step of breaking-out prediction technique early period has detailed introduction in summary of the invention, i.e. technical solution of the present invention is mainly wrapped
Include following steps:
Step 1, the category division of brain electricity EEG signal and be based on SMOTE (Synthetic Minority
Oversampling Technique) small sample resampling methods, by imbalanced training sets classification problem be converted into sample equilibrium
Classification problem.
Step 2 carries out 6 layers of wavelet package transforms to brain electricity EEG signal, extracts statistics feature (SFs) and wavelet packet character
(WPFs)。
Step 3, the high dimensional feature merged for SFs with WPFs, using Principal Component Analysis (PCA) and linear discriminant analysis
Method (LDA) does Feature Dimension Reduction.
Step 4 is directed to unequal sample numbers according to the epileptic prediction model of collection using the building of depth forest algorithm.
The step 1, the specific steps are as follows:
The category division of 1-1 brain electricity EEG signal.Seizure class will be denoted as stage of attack;By the data after four hours stages of attack
Four hours data are denoted as interictal Interictal class before stage of attack next time;Breaking-out is divided into 3 one hour early period
A continuous and nonoverlapping period, i.e., breaking-out first 60 minutes to breaking-out first 40 minutes are Pre1 class, are extremely broken out within breaking-out first 40 minutes
First 20 minutes are Pre2 class, and breaking-out first 20 minutes to breaking-out first 0 minute are Pre3 class.
1-2 is since the data time of Seizure class in the brain electricity EEG signal of single epileptic is shorter, and other classes
Other data time is longer, results in imbalanced training sets, to solve this problem, using SMOTE resampling methods to Seizure class
Sample is analyzed, generates new group sample to reach data balancing, and balanced sample classification problem is formed.
The step 2, the specific steps are as follows:
For the every class signal of 2-1 using mono- frame of 2s as input x, taking frame Duplication is 50%.To each of each frame input signal
Channel carries out 6 layers of WAVELET PACKET DECOMPOSITION, and therefore, the last layer after decomposition includes 64 frequency sub-band sm, m=1,2 ..., 64.
In 15 frequency sub-band s of low frequencym′, in m '=1,2 ..., 15, mean value, standard deviation, intermediate value, coefficient of kurtosis are calculated to each frequency range
With coefficient of skew constitutive characteristic SFs.For the brain electricity EEG signal in M channel, this five statistics features constitute (75*M) dimension
Feature vector SFs.
2-2 calculates mean value by following formula
Wherein sm′(15) m '=1 2 ..., indicates small after the m ' a frequency sub-band reconstruct of layer 6 after WAVELET PACKET DECOMPOSITION
Wave packet coefficient, N indicate the points that each frequency sub-band includes.
2-3 calculates standard deviation by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and μ indicates a frequency sub-band Coefficient Mean of m '.
2-4 calculates median by the following method, by sm′(k) it sorts from small to large ord are as follows:Then when N is odd number, median isWhen N is even number, median
For
2-5 calculates coefficient of kurtosis by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate that m ' is a
Frequency sub-band factor standard is poor.
2-6 calculates the coefficient of skew by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and μ indicates that a frequency sub-band Coefficient Mean of m ', σ indicate that m ' is a
Frequency sub-band factor standard is poor.
2-7 feature WPFs is made of energy ratio and three kinds of entropys, i.e. the energy ratio of 64 sub- frequency extractions of the last layer and low
Shannon entropy, logarithmic energy entropy and the norm entropy of 15 sub- frequency extractions of frequency.The brain electricity EEG that M channel frame length is N is believed
Number, energy ratio and three kinds of entropys constitute the characteristic vector W PFs of (109*M) dimension.
2-8 calculates energy ratio by following formula
Wherein sm(m=1,2 ..., 64) indicate WAVELET PACKET DECOMPOSITION after layer 6 m-th of frequency sub-band coefficient reconstruct after
Wavelet packet coefficient, N indicate the points that each frequency sub-band includes, and μ indicates that m-th of frequency sub-band Coefficient Mean, σ indicate m-th of son frequency
Section factor standard is poor.
2-9 calculates Shannon entropy by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes.
2-10 calculates logarithmic energy entropy by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes.
2-11 calculates norm entropy by following formula
Wherein sm′(15) m '=1 2 ..., is indicated after WAVELET PACKET DECOMPOSITION after the m ' a frequency sub-band coefficient reconstruct of layer 6
Wavelet packet coefficient, N indicates the points that each frequency sub-band includes, and p >=1 is positive integer.
The step 3, the specific steps are as follows:
Multichannel SFs and WPFs Fusion Features are constituted the high dimensional feature vector of (184*M) dimension by 3-1, using PCA technology pair
It carries out principal component analysis and dimensionality reduction, and variance contribution ratio is retained in reduction process in preceding 90% feature, can be obtained compared with low-dimensional
Feature.
3-2 carries out secondary dimensionality reduction to lower dimensional feature obtained in 3-1 using LDA.Generally, the LDA dimensionality reduction of selection
Dimension is less than or equal to classification number.In this classification problem, sample class number is 5, therefore the dimension of settable LDA dimensionality reduction is 4 dimensions.
3-3 is operated by above-mentioned dimensionality reduction twice, and the SFs of higher-dimension and WPFs fusion feature are down to low-dimensional, base can be obtained
In the low-dimensional feature of PCA-LDA fusion SFs and WPFs.
3-4 carries out features described above extraction, fusion and dimensionality reduction to all training samples, and training set feature can be obtained.
The step 4, the specific steps are as follows:
4-1 combination depth forest algorithm, three levels of setting join, by adjusting the number of the random forest of knot cascading layers and every
The parameter of a random forest decision tree constructs optimal sorter model.
4-2 is to any test sample, using the feature extracting method above-mentioned based on PCA-LDA fusion SFs and WPFs, benefit
Prediction result is calculated with the sorter model built.
Join in order to reach better epileptic seizure phase and breaking-out accurate prediction effect early period, when below will be from practical application
Several selections and design aspect expansion are introduced, the reference to be used for other application as the invention:
For this method when handling epileptic electroencephalogram (eeg) EEG signal, it is 2 seconds that each frame input, which is arranged, and the Duplication of frame is set as 50%.
Aiming at the problem that clinically personal user customizes and faces sporadic epileptics and transience stage of attack in monitoring system, present invention benefit
Imbalanced training sets problem is handled with SMOTE resampling methods.The statistics feature and 2-9 to 2-11 that step 2-1 to 2-6 is extracted mention
The three kinds of entropys taken only extract on 15 low frequency frequency sub-band, this is because containing the most main of epileptic EEG Signal in low frequency band
Information is wanted, and the energy ratio that step 2-8 is extracted is extracted on 64 frequency sub-band, this is to guarantee the complete of energy information
Property.
In PCA reduction process, retain feature of the variance contribution ratio preceding 90%;In bis- dimensionality reductions of LDA, after dimensionality reduction is set
Dimension be 4 dimension.The depth forest classified algorithm finally used be provided with that three levels hold examination jointly worry to be if that cascading layers are very few can lead
Cause classifying quality poor, if excessively to will lead to the training time too long for cascading layers.
It is proposed of the invention can be used for solving clinically customizing in seizure monitoring system for personal user facing epileptics
The problem of sporadic and transience stage of attack, by the artificial synthesized new minority class sample of analysis to minority class sample, and adopts
Low-dimensional feature vector is obtained with the method based on PCA-LDA fusion SFs and WPFs, can be reached in conjunction with depth forest classified model
Accurately epileptic attack phase and breaking-out prediction early period.