WO2020061971A1 - Epilepsy brain wave state detection method based on machine learning - Google Patents
Epilepsy brain wave state detection method based on machine learning Download PDFInfo
- Publication number
- WO2020061971A1 WO2020061971A1 PCT/CN2018/108154 CN2018108154W WO2020061971A1 WO 2020061971 A1 WO2020061971 A1 WO 2020061971A1 CN 2018108154 W CN2018108154 W CN 2018108154W WO 2020061971 A1 WO2020061971 A1 WO 2020061971A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- frequency
- brain wave
- sample
- data set
- Prior art date
Links
- 210000004556 brain Anatomy 0.000 title claims abstract description 33
- 206010015037 epilepsy Diseases 0.000 title claims abstract description 27
- 238000010801 machine learning Methods 0.000 title claims abstract description 11
- 238000001514 detection method Methods 0.000 title abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000009467 reduction Effects 0.000 claims abstract description 21
- 230000003044 adaptive effect Effects 0.000 claims abstract description 18
- 230000009466 transformation Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 238000001228 spectrum Methods 0.000 claims abstract description 12
- 238000012706 support-vector machine Methods 0.000 claims abstract description 10
- 230000008676 import Effects 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000006243 chemical reaction Methods 0.000 claims abstract description 4
- 238000013145 classification model Methods 0.000 claims abstract description 4
- 238000013507 mapping Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000001174 ascending effect Effects 0.000 claims description 3
- 238000007621 cluster analysis Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 9
- 238000003745 diagnosis Methods 0.000 description 3
- 210000003710 cerebral cortex Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000036624 brainpower Effects 0.000 description 1
- 230000002490 cerebral effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 210000004761 scalp Anatomy 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
- A61B5/372—Analysis of electroencephalograms
Definitions
- the invention belongs to the technical field of machine learning and medical diagnosis, and particularly relates to a method for detecting brain wave state of epilepsy based on machine learning.
- EEG activity is a spontaneous, rhythmic potential change generated by neurons in the cerebral cortex, and it is the overall reflection of the electrophysiological activity of cerebral nerve cells on the surface of the cerebral cortex or scalp.
- EEG signals contain a large amount of physiological and disease information; different physiological states have different characteristics of EEG signals. Therefore, EEG signals provide a diagnostic basis for the detection of epilepsy status.
- human society has begun to enter a revolutionary era based on massive data and the use of information technology for knowledge creation and information mining; the entry of information technology into the medical industry is a need for the development of the times.
- the analysis methods of epilepsy brain waves include: time-domain analysis, frequency-domain analysis and time-frequency domain analysis.
- the time-domain analysis method directly extracts characteristic waves similar to EEG for observation; time-frequency analysis is a comprehensive analysis of EEG signals by combining time-domain signals and frequency-domain signals.
- power spectrum estimation is its main method, and its significance is related to transforming the brain wave with amplitude over time into the spectrum of brain power with frequency domain transformation, so that the distribution of EEG rhythm can be observed intuitively. And changes.
- the power spectrum is usually superimposed as a group every 10 consecutive frequencies; if there are 50 groups of frequency data, 1 to 10 groups of frequencies are superimposed as a group, so that the data is superimposed.
- the reduction of 50 columns to 5 columns achieves the purpose of dimensionality reduction, which is conducive to improving the efficiency of subsequent calculations.
- a rough dimensionality reduction method directly superimposes continuous power spectrum data, ignoring the correlation between frequency data, resulting in the loss of unique information between samples, and eventually leading to a decrease in the prediction performance of epilepsy state detection.
- the present invention provides a method for detecting brain wave state of epilepsy based on machine learning, which includes the following main steps:
- Step 1 Import data, import brain wave data from patients with epilepsy, and mark their status.
- Step 2 Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
- This step mainly includes:
- Step 2.1 Obtain the maximum value max and the minimum value min from the brain wave data set.
- Step 2.2 Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
- Step 3 time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
- Step 4 Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
- This step mainly includes:
- Step 4.2 find the minimum value P, so that for each sample X t satisfies the condition:
- R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
- step 4.3 the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
- Step 5 The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.
- This step mainly includes:
- Step 5.1.1 use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.
- K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity
- Step 5.1.2 select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.
- Step 5.1.3 For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
- D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
- Step 5.1.4 In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
- step 5.2 the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ⁇ (W).
- the calculation formula is:
- the embedding weight matrix for the sample X i is:
- w i ( ⁇ i1 , ⁇ i2 , ..., ⁇ in ) T
- W (w 1 , w 2 , ..., w n ).
- Step 5.3 the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ⁇ (Y) in the p-dimensional space is minimized, that is:
- Z i is the embedded representation of X i in a low-dimensional space
- Z ⁇ Z 1 , Z 2 , ..., Z n ⁇
- F (I n -W) T (I n -W)
- tr represents a matrix N
- the n number represents the total number of data sets
- I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0.
- Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F.
- the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low
- the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
- Step 6 Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
- Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
- the beneficial effects of the present invention are: providing an effective detection method based on machine learning for the state detection of epilepsy brain waves; using a linear adaptive dimensionality reduction algorithm to reduce the dimension of a data set, and introducing a polymorphism into the linear adaptive dimensionality reduction algorithm.
- the concept of class and mean allows the neighborhood points of the sample points to be adaptively selected, and enables the low-dimensional embedding of the data set to well maintain the topology and manifold structure of the original data set; the invention It has a good performance in terms of detection efficiency, computing overhead and accuracy.
- FIG. 1 shows a flowchart of a method for detecting an epilepsy brain wave state based on machine learning according to the present invention.
- Figure 2 shows a flowchart of a linear adaptive dimensionality reduction algorithm.
- Step 1 Import data, import brain wave data from patients with epilepsy, and mark their status.
- This step includes:
- Step 1.1 import brain wave data
- each row of data represents the patient's time-domain signal under a sampling period
- each column represents the time-domain signal obtained under a pulse
- r represents the number of data sets of samples
- m i (m i1, m i2, ..., m is)
- s represents a number of brain-wave data sample i.e. the number of columns.
- Step 2 Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
- This step includes:
- Step 2.1 Obtain the maximum value max and the minimum value min from the brain wave data set.
- Step 2.2 Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
- Step 3 time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
- This step includes:
- each time-domain sample m t (m t1 , m t2 , ..., m ts ) is converted to Dataset dimensions are transformed from s to
- Step 3.2 Perform power spectrum calculation according to the amplitude and phase values in the frequency spectrum. For each data such as a jk + ib jk , the power spectrum c jk is calculated as:
- the power spectrum c jk is used instead of its original data a jk + ib jk , and the data set is converted from M to C.
- Step 4 Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
- This step includes:
- Step 4.2 find the minimum value P, so that for each sample X t satisfies the condition:
- R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
- step 4.3 the data obtained by formula (10) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
- Step 5 The linear adaptive dimension reduction of the frequency-domain signal is performed, and the linear adaptive dimension reduction technique is adopted to reduce the dimension of the data in order to effectively perform classification processing, as shown in FIG. 2.
- This step includes:
- the adaptive final neighborhood points of the sample points X i mainly include:
- Step 5.1.1 Use K-Means algorithm based on Euclidean distance to perform cluster analysis on the data set for the selection and judgment of the final neighborhood points.
- the main steps are as follows: First, k cluster centers are randomly given initially, and the sample selects the nearest cluster center according to the Euclidean distance criterion and classifies it into the corresponding cluster to complete the first assignment. After that, it re-accords to the samples in the cluster. Calculate the centers of the clusters and redistribute the samples into the appropriate clusters according to the Euclidean distance criterion until the cluster centers no longer transform or the number of clusters reaches a set threshold.
- Step 5.1.2 select k initial neighborhood points of each sample point of the data set, and record the initial neighborhood points of the sample point X i as X i1 , X i2 , ..., X ik , where k represents the set Initial neighborhood points.
- Step 5.1.3 For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
- D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
- Step 5.1.4 In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
- step 5.2 the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ⁇ (W).
- the calculation formula is:
- w i ( ⁇ i1 , ⁇ i2 , ..., ⁇ in ) T
- W (w 1 , w 2 , ..., w n ).
- Step 5.3 the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ⁇ (Y) in the p-dimensional space is minimized, that is:
- Z i is the embedded representation of X i in a low-dimensional space
- Z ⁇ Z 1 , Z 2 , ..., Z n ⁇
- F (I n -W) T (I n -W)
- tr represents a matrix N
- the n number represents the total number of data sets
- I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0.
- Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F.
- the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large.
- the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
- Step 6 Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
- the classification interval can be calculated as 2 / ⁇ W ⁇ , so the problem of constructing the optimal hyperplane is converted to the minimum classification interval min ⁇ (W) under constraint:
- a (a 1 , a 2 , ..., a l ), and any a i > 0 is a Lagrangian multiplier; the optimal weight vector W and the optimal offset b obtained by the solution are: and Where j ⁇ ⁇ j
- Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
- This step includes:
- Step 7.1 Import the unlabeled epilepsy brain wave data set.
- Step 7.2 transform and process the data set according to steps 2, 3, 4, and 5 in sequence
- Step 7.3 Use the support vector machine classification prediction model established in step 6 to perform state classification prediction on the processed epilepsy brain wave data.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Psychology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Engineering & Computer Science (AREA)
- Psychiatry (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Complex Calculations (AREA)
Abstract
An epilepsy brain wave state detection method based on machine learning. The method comprises the following steps: input import: importing brain wave data of an epilepsy patient and marking the state thereof; normalized transformation processing: setting a suitable new maximum value and a suitable new minimum value, and mapping brain wave time-domain signal data to a smaller new value interval according to a normalized transformation technique; time domain to frequency domain conversion: carrying out fast Fourier transform on each piece of brain wave time-domain data, and carrying out comprehensive calculation on an amplitude frequency of each piece of data and taking same as a power spectrum thereof; frequency domain range selection: selecting a suitable low-frequency signal to replace an original frequency-domain signal, and removing high-frequency signal noise; linear adaptive dimension reduction of a frequency-domain signal: using a linear adaptive dimension reduction technique to carry out data dimension reduction, so as to effectively carry out classification processing; establishment of a support vector machine classification and prediction model: using a support vector machine classifier to establish a prediction model for a training data set; and epilepsy state classification and prediction: using the established prediction and classification model to carry out state classification and prediction on a brain wave in an unknown state.
Description
本发明属于机器学习与医疗诊断的技术领域,尤其涉及一种基于机器学习的癫痫病脑电波状态检测方法。The invention belongs to the technical field of machine learning and medical diagnosis, and particularly relates to a method for detecting brain wave state of epilepsy based on machine learning.
脑电活动是大脑皮层神经元产生的自发的、节律性的电位变化,是脑神经细胞电生理活动在大脑皮层或头皮表面的总体反映。脑电信号中包含大量的生理与疾病信息;不同的生理状态有不同的脑电信号特征,因此,脑电信号为癫痫病状态检测方面提供了诊断依据。随着科技的快速发展,人类社会开始迈入以海量数据作为基础,利用信息技术进行知识创造和信息挖掘的革命时代;信息技术进入到医疗行业是时代发展的需要。EEG activity is a spontaneous, rhythmic potential change generated by neurons in the cerebral cortex, and it is the overall reflection of the electrophysiological activity of cerebral nerve cells on the surface of the cerebral cortex or scalp. EEG signals contain a large amount of physiological and disease information; different physiological states have different characteristics of EEG signals. Therefore, EEG signals provide a diagnostic basis for the detection of epilepsy status. With the rapid development of science and technology, human society has begun to enter a revolutionary era based on massive data and the use of information technology for knowledge creation and information mining; the entry of information technology into the medical industry is a need for the development of the times.
在医疗诊断中,利用数据挖掘技术进行癫痫病脑电波的状态检测成为医疗行业的一大热点。其目的在于通过癫痫病患者发作期与间歇期的脑电波数据建立预测模型辅助医生进行状态诊断。癫痫病脑电波的分析方法主要包括:时域分析、频域分析和时频域分析。时域分析方法直接提取与脑电类似的特征波进行观察;时频分析则是结合时域信号和频域信号对脑电波信号进行综合分析。在频域分析方法中,功率谱估计是其主要手段,其意义在与把幅度随时间变换的脑电波变换为脑电功率随频域变换的谱图,从而可直观地观察到脑电节律的分布与变化情况。In medical diagnosis, the use of data mining technology to detect the state of epilepsy brain waves has become a hot spot in the medical industry. The purpose is to establish a predictive model based on the brain wave data of patients with epilepsy during the onset and intermittent periods to assist doctors in the diagnosis of the state. The analysis methods of epilepsy brain waves include: time-domain analysis, frequency-domain analysis and time-frequency domain analysis. The time-domain analysis method directly extracts characteristic waves similar to EEG for observation; time-frequency analysis is a comprehensive analysis of EEG signals by combining time-domain signals and frequency-domain signals. In the frequency domain analysis method, power spectrum estimation is its main method, and its significance is related to transforming the brain wave with amplitude over time into the spectrum of brain power with frequency domain transformation, so that the distribution of EEG rhythm can be observed intuitively. And changes.
在频域分析的频率降维过程中,通常将功率谱每10个连续频率进行叠加作为一组;如有50组频率数据,则将1~10组频率进行叠加作为一组,从而将数据从50列降低到5列,达到降维的目的,有利于提高之后计算的效率。但是这样粗糙的降维方法,直接将连续的功率谱数据进行叠加,忽略了频率数据间的相互关系,致使样本间的特有信息丢失,最终导致癫痫病状态检测的预测性能有所下降。In the frequency dimensionality reduction process of frequency domain analysis, the power spectrum is usually superimposed as a group every 10 consecutive frequencies; if there are 50 groups of frequency data, 1 to 10 groups of frequencies are superimposed as a group, so that the data is superimposed The reduction of 50 columns to 5 columns achieves the purpose of dimensionality reduction, which is conducive to improving the efficiency of subsequent calculations. However, such a rough dimensionality reduction method directly superimposes continuous power spectrum data, ignoring the correlation between frequency data, resulting in the loss of unique information between samples, and eventually leading to a decrease in the prediction performance of epilepsy state detection.
发明内容Summary of the Invention
为了克服现有技术的不足,本发明提供了一种基于机器学习的癫痫病脑电波状态检测方法,包括以下主要步骤:In order to overcome the shortcomings of the prior art, the present invention provides a method for detecting brain wave state of epilepsy based on machine learning, which includes the following main steps:
步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.
步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
本步骤主要包括:This step mainly includes:
步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.
步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
本步骤主要包括:This step mainly includes:
步骤4.1,随机选取数据集中的d个样本,记其为{X
1,X
2,...,X},X
i=(x
i1,x
i2,...,x
im),m表示数据集的维度。
Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X}, X i = (x i1 , x i2 , ..., x im ), m represents the data Set of dimensions.
步骤4.2,寻找最小值P,使得对于每一个样本X
t都满足条件:
Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:
式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
步骤4.3,通过从公式(2)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理。Step 5. The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.
本步骤主要包括:This step mainly includes:
步骤5.1,记数据集X={X
1,X
2,...,X
n},X
i=(x
i1,x
i1,...,x
iP),P表示数据集的维度,确定每个样本点X
i的自适应最终邻域点:
Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each Adaptive final neighborhood points of the sample points X i :
步骤5.1.1,使用基于欧氏距离的K-Means算法(K-means算法是一种基于距离的聚类算法,采用距离作为相似性的评价指标)对数据集进行聚类分析,用于最终邻域点的选择判断。Step 5.1.1, use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.
步骤5.1.2,选取样本点X
i的初始邻域点X
i1,X
i2,...,X
ik,其中k表示设定的初始邻域点数。
Step 5.1.2, select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.
步骤5.1.3,对于样本点X
i,计算其初始邻域点X
i1,X
i2,...,X
ik与样本X
i的距离均值,计算公式为:
Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
其中,D
ik表示样本点X
i与其邻域点X
ik的欧氏距离。
Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
步骤5.1.4,在初始邻域点中,为每个样本点X
i选取其最终邻域点,需满足条件:若初始邻域点X
ik与样本点X
i的距离D
ik小于距离均值DM
i,或者X
i与初始邻域点X
ik属于同一聚类簇,则X
ik为X
i的最终邻域点;否则不是。
Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X
i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为:
In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:
其中,系数ω
ij表示重构X
i时X
j所占的权重;当X
j不是X
i的邻域点时,ω
ij=0;并且∑
jω
ij=1;n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X
i其嵌入权值矩阵为:
Wherein, upon reconstitution indicates the coefficient ω ij X i X j share of the weight; when not the points in the neighborhood of X i X j, ω ij = 0; and Σ j ω ij = 1; n represents the total number of data sets . Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:
其中,G
i=(G
jk),且G
jk=(X
i-X
j)
T(X
i-X
k)(X
j,X
k是X
i的邻域点);1
n为维度为n的列向量,即1
n=(1,1,...,1)
T;1
n
T为1
n的转置;w
i=(ω
i1,ω
i2,...,ω
in)
T,W=(w
1,w
2,...,w
n)。
Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).
步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:
其中Z
i为X
i在低维空间的嵌入表示,Z={Z
1,Z
2,...,Z
n},F=(I
n-W)
T(I
n-W),tr表示矩阵的迹,n表示数据集总条数,I
n为主对角线上的元素都为1,其余元素全为0的n阶单位矩阵,Z需要满足
该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u
1,u
2,...,u
P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u
1,u
2,...,u
p)
T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。
Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
本发明的有益效果是:为癫痫病脑电波的状态检测提供一种基于机器学习的有效检测方法;使用线性自适应降维算法对数据集进行降维,在线性自适应降维算法中引入聚类和均值的概念,通过这两者的限制,使得样本点的邻域点可以自适应选择,并使数据集的低维嵌入良好地保持了原始数据集的拓扑结构和流形结构;本发明在检测效率,运算开销及准确率方面有良好表现。The beneficial effects of the present invention are: providing an effective detection method based on machine learning for the state detection of epilepsy brain waves; using a linear adaptive dimensionality reduction algorithm to reduce the dimension of a data set, and introducing a polymorphism into the linear adaptive dimensionality reduction algorithm. The concept of class and mean, through the limitation of these two, allows the neighborhood points of the sample points to be adaptively selected, and enables the low-dimensional embedding of the data set to well maintain the topology and manifold structure of the original data set; the invention It has a good performance in terms of detection efficiency, computing overhead and accuracy.
图1示出了本发明一种基于机器学习的癫痫病脑电波状态检测方法的流程图。FIG. 1 shows a flowchart of a method for detecting an epilepsy brain wave state based on machine learning according to the present invention.
图2示出了线性自适应降维算法的流程图。Figure 2 shows a flowchart of a linear adaptive dimensionality reduction algorithm.
下面结合附图和实施例对本发明优先实施方式进一步说明。The preferred embodiments of the present invention will be further described below with reference to the accompanying drawings and embodiments.
图1所示的流程图给出了本发明整个实施的具体过程:The flowchart shown in Figure 1 gives the specific process of the entire implementation of the present invention:
步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.
本步骤包括:This step includes:
步骤1.1,导入脑电波数据,每一行数据代表患者一个抽样周期下的时域信号,每一列代表一个脉冲下获取的时域信号,即数据集为M={m
1,m
2,...,m
r},r表示数据集样本个数,m
i=(m
i1,m
i2,...,m
is),s表示一条脑电波数据的抽样次数即列数。
Step 1.1, import brain wave data, each row of data represents the patient's time-domain signal under a sampling period, and each column represents the time-domain signal obtained under a pulse, that is, the data set is M = {m 1 , m 2 , ... , m r}, r represents the number of data sets of samples, m i = (m i1, m i2, ..., m is), s represents a number of brain-wave data sample i.e. the number of columns.
步骤1.2,通过数组y=(y
1,y
2,...,y
r)记录每个样本对应的状态,即间歇期状态记为-1,发作期状态为记1。
In step 1.2, the state corresponding to each sample is recorded through the array y = (y 1 , y 2 , ..., y r ), that is, the state of the intermittent period is recorded as -1 and the state of the onset period is recorded as 1.
步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
本步骤包括:This step includes:
步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.
步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
本步骤包括:This step includes:
步骤3.1,由于脑电波时域信号满足连续多变量普遍化B分布即Dirichlet分布,因此对于脑电波时域信号的函数x(t)在[0,T]区间上设Δt为很小的时间间隔,T=NΔt,则x(t)在t
n=nΔt时的采样值为x
n,用各个x
n求和代替x(t),则快速傅里叶变换的计算公式为:
Step 3.1, since the time-domain signal of the brain wave satisfies the continuous multivariate generalized B distribution, that is, the Dirichlet distribution, the function x (t) of the time-domain signal of the brain wave sets Δt as a small time interval in the [0, T] interval. , T = NΔt, then the sampling value of x (t) when t n = nΔt is x n , and the sum of each x n is used to replace x (t), then the formula for calculating the fast Fourier transform is:
,由于快速傅里叶变化的对称性,因此每个时域样本m
t=(m
t1,m
t2,...,m
ts)转换为
数据集维度从s转换成
Due to the symmetry of the fast Fourier transform, each time-domain sample m t = (m t1 , m t2 , ..., m ts ) is converted to Dataset dimensions are transformed from s to
步骤3.2,根据频谱中的幅度值和相位值进行功率谱计算。对于每一个数据如a
jk+ib
jk,其功率谱c
jk计算公式为:
Step 3.2: Perform power spectrum calculation according to the amplitude and phase values in the frequency spectrum. For each data such as a jk + ib jk , the power spectrum c jk is calculated as:
并使用功率谱c
jk代替其原始数据a
jk+ib
jk,数据集由M转换为C。
The power spectrum c jk is used instead of its original data a jk + ib jk , and the data set is converted from M to C.
步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
本步骤包括:This step includes:
步骤4.1,随机选取数据集中的d个样本,记其为{X
1,X
2,...,X
d},X
i=(x
i1,x
i2,...,x
im),m表示数据集的维度。
Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X d }, X i = (x i1 , x i2 , ..., x im ), where m represents The dimensions of the dataset.
步骤4.2,寻找最小值P,使得对于每一个样本X
t都满足条件:
Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:
式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
步骤4.3,通过从公式(10)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (10) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理,如图2所示。Step 5. The linear adaptive dimension reduction of the frequency-domain signal is performed, and the linear adaptive dimension reduction technique is adopted to reduce the dimension of the data in order to effectively perform classification processing, as shown in FIG. 2.
本步骤包括:This step includes:
步骤5.1,记数据集X={X
1,X
2,...,X
n},X
i=(x
i1,x
i1,...,x
iP),P表示数据集的维度,确定每个样本点X
i的自适应最终邻域点,主要包括:
Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each The adaptive final neighborhood points of the sample points X i mainly include:
步骤5.1.1,使用基于欧氏距离的K-Means算法对数据集进行聚类分析,用于最终邻域点的选择判断。其主要步骤为:首先随机初始给定k个簇中心,样本依据欧氏距离标准选择最近 的簇中心并将其归入到对应的簇中,完成第一次分配;之后,根据簇中样本重新计算各簇的中心并根据欧氏距离标准将样本重新分配到合适的簇中,直到各簇中心不再变换或聚类次数达到设定阈值为止。Step 5.1.1. Use K-Means algorithm based on Euclidean distance to perform cluster analysis on the data set for the selection and judgment of the final neighborhood points. The main steps are as follows: First, k cluster centers are randomly given initially, and the sample selects the nearest cluster center according to the Euclidean distance criterion and classifies it into the corresponding cluster to complete the first assignment. After that, it re-accords to the samples in the cluster. Calculate the centers of the clusters and redistribute the samples into the appropriate clusters according to the Euclidean distance criterion until the cluster centers no longer transform or the number of clusters reaches a set threshold.
步骤5.1.2,选取数据集每个样本点的k个初始邻域点,记样本点X
i的初始邻域点为X
i1,X
i2,...,X
ik,其中k表示设定的初始邻域点数。
Step 5.1.2, select k initial neighborhood points of each sample point of the data set, and record the initial neighborhood points of the sample point X i as X i1 , X i2 , ..., X ik , where k represents the set Initial neighborhood points.
步骤5.1.3,对于样本点X
i,计算其初始邻域点X
i1,X
i2,...,X
ik和样本X
i的距离均值,计算公式为:
Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
其中,D
ik表示样本点X
i与其邻域点X
ik的欧氏距离。
Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
步骤5.1.4,在初始邻域点中,为每个样本点X
i选取其最终邻域点,需满足条件:若初始邻域点X
ik与样本点X
i的距离D
ik小于距离均值DM
i,或者X
i与初始邻域点X
ik属于同一聚类簇,则X
ik为X
i的最终邻域点;否则不是。
Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X
i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为:
In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:
其中,系数ω
ij表示重构X
i时X
j所占的权重;当X
j不是X
i的邻域点时,ω
ij=0,并且
n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X
i其嵌入权值矩阵为:
Among them, the coefficient ω ij represents the weight occupied by X j when reconstructing X i ; when X j is not a neighborhood point of X i , ω ij = 0, and n represents the total number of data sets. Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:
其中,G
i=(G
jk),且G
jk=(X
i-X
j)
T(X
i-X
k)(X
j,X
k是X
i的邻域点);1
n为维度为n的列向量,即1
n=(1,1,...,1)
T;1
n
T为1
n的转置;w
i=(ω
i1,ω
i2,...,ω
in)
T,W=(w
1,w
2,...,w
n)。
Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).
步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:
其中Z
i为X
i在低维空间的嵌入表示,Z={Z
1,Z
2,...,Z
n},F=(I
n-W)
T(I
n-W),tr表示矩阵的迹,n表示数据集总条数,I
n为主对角线上的元素都为1,其余元素全为0的n阶单位矩 阵,Z需要满足
该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u
1,u
2,...,u
P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u
1,u
2,...,u
p)
T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。
Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large. The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
若给定训练样本集(X
i,y
i),i=1,2,...,l,X
i∈R
n,y
i∈{±1},超平面记作(W·X)+b=0,w,x∈R
n,为使分类面对所有样本正确分类并具备分类间隔,需满足约束:
If a given training set (X i, y i), i = 1,2, ..., l, X i ∈R n, y i ∈ {± 1}, referred to as a hyperplane (W · X) + b = 0, w, x ∈ R n . In order for the classification to correctly classify all samples and have a classification interval, the constraints need to be met:
y
i[(W·X
i)+b]≥1,i=1,2,...,l (15)
y i [(W · X i ) + b] ≥1, i = 1,2, ..., l (15)
可以计算出分类间隔为2/‖W‖,因此构造最优超平面的问题转换为在约束时下求最小分类间隔minφ(W):The classification interval can be calculated as 2 / ‖W‖, so the problem of constructing the optimal hyperplane is converted to the minimum classification interval minφ (W) under constraint:
通过引入拉格朗日乘子法解决该最优化问题:This optimization problem is solved by introducing the Lagrangian multiplier method:
式中,a=(a
1,a
2,…,a
l),任意a
i>0为拉格朗日乘数;求解计算得到最优权值向量W和最优偏置b分别为:
并且
其中,j∈{j|a
j>0},并且a满足
且
因此,最优分类函数为:
In the formula, a = (a 1 , a 2 , ..., a l ), and any a i > 0 is a Lagrangian multiplier; the optimal weight vector W and the optimal offset b obtained by the solution are: and Where j∈ {j | a j > 0}, and a satisfies And Therefore, the optimal classification function is:
f(X)=sgn{(W·φ(X))+b} (18)f (X) = sgn {(W · φ (X)) + b) (18)
sgn返回一个整型变量,满足sgn returns an integer variable that satisfies
使用该方法,将脑电波数据集和其对应的癫痫病状态建立支持向量机分类预测模型用于癫痫病的状态检测。Using this method, a support vector machine classification and prediction model is established for the EEG data set and its corresponding epilepsy state for epilepsy state detection.
步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
本步骤包括:This step includes:
步骤7.1,导入未标记状态的癫痫病脑电波数据集。Step 7.1: Import the unlabeled epilepsy brain wave data set.
步骤7.2,依次按照步骤2、3、4、5对数据集进行变换和处理;Step 7.2, transform and process the data set according to steps 2, 3, 4, and 5 in sequence;
步骤7.3,使用步骤6建立的支持向量机分类预测模型对处理后的癫痫病脑电波数据进行状态分类预测。Step 7.3: Use the support vector machine classification prediction model established in step 6 to perform state classification prediction on the processed epilepsy brain wave data.
Claims (4)
- 一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,包括以下步骤:A method for detecting brain wave state of epilepsy based on machine learning, which comprises the following steps:步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理。Step 5. The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
- 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤2中制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。所述的步骤2进一步包括:The method for detecting brain wave state of epilepsy based on machine learning according to claim 1, characterized in that, in step 2, an appropriate new maximum value and minimum value are formulated, and the time-domain signal data of the brain wave is calculated according to The normalized transformation technique maps to smaller new value intervals. The step 2 further includes:步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
- 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤4中进行频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。所述的步骤4进一步包括:The method for detecting epilepsy brain wave state based on machine learning according to claim 1, characterized in that, in step 4, frequency range selection is performed, and a suitable low-frequency signal is selected instead of the original frequency-domain signal to remove high-frequency signals. Frequency signal noise. The step 4 further includes:步骤4.1,随机选取数据集中的d个样本,记其为{X 1,X 2,...,X d},X i=(x i1,x i2,...,x im),m表示数据集的维度。 Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X d }, X i = (x i1 , x i2 , ..., x im ), where m represents The dimensions of the dataset.步骤4.2,寻找最小值P,使得对于每一个样本X t都满足条件: Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去 掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.步骤4.3,通过从公式(2)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
- 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤5中对频域信号的线性自适应降维,以便有效地进行分类处理。所述的步骤5进一步包括:The method for detecting an epilepsy brain wave state based on machine learning according to claim 1, characterized in that, in step 5, the linear adaptive dimension reduction of the frequency domain signal is performed in order to effectively perform classification processing. The step 5 further includes:步骤5.1,记数据集X={X 1,X 2,...,X n},X i=(x i1,x i1,...,x iP),P表示数据集的维度,确定每个样本点X i的自适应最终邻域点。所述的步骤5.1进一步包括: Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each Adaptive final neighborhood points of the sample points X i . The step 5.1 further includes:步骤5.1.1,使用基于欧氏距离的K-Means算法(K-means算法是一种基于距离的聚类算法,采用距离作为相似性的评价指标)对数据集进行聚类分析,用于最终邻域点的选择判断。Step 5.1.1, use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.步骤5.1.2,选取样本点X i的初始邻域点X i1,X i2,...,X ik,其中k表示设定的初始邻域点数。 Step 5.1.2, select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.步骤5.1.3,对于样本点X i,计算其初始邻域点X i1,X i2,...,X ik与样本X i的距离均值,计算公式为: Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:其中,D ik表示样本点X i与其邻域点X ik的欧氏距离。 Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .步骤5.1.4,在初始邻域点中,为每个样本点X i选取其最终邻域点,需满足条件:若初始邻域点X ik与样本点X i的距离D ik小于距离均值DM i,或者X i与初始邻域点X ik属于同一聚类簇,则X ik为X i的最终邻域点;否则不是。 Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为: In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:其中,系数ω ij表示重构X i时X j所占的权重;当X j不是X i的邻域点时,ω ij=0;并且∑ jω ij=1;n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X i其嵌入权值矩阵为: Wherein, upon reconstitution indicates the coefficient ω ij X i X j share of the weight; when not the points in the neighborhood of X i X j, ω ij = 0; and Σ j ω ij = 1; n represents the total number of data sets . Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:其中,G i=(G jk),且G jk=(X i-X j) T(X i-X k)(X j,X k是X i的邻域点);1 n为维度为n的列向量,即1 n=(1,1,...,1) T;1 n T为1 n的转置;w i=(ω i1,ω i2,...,ω in) T,W=(w 1,w 2,...,w n)。 Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:其中Z i为X i在低维空间的嵌入表示,Z={Z 1,Z 2,...,Z n},F=(I n-W) T(I n-W),tr表示矩阵的迹,n表示数据集总条数,I n为主对角线上的元素都为1,其余元素全为0的n阶单位矩阵,Z需要满足 该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u 1,u 2,...,u P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u 1,u 2,...,u p) T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。 Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/108154 WO2020061971A1 (en) | 2018-09-27 | 2018-09-27 | Epilepsy brain wave state detection method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/108154 WO2020061971A1 (en) | 2018-09-27 | 2018-09-27 | Epilepsy brain wave state detection method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020061971A1 true WO2020061971A1 (en) | 2020-04-02 |
Family
ID=69950941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/108154 WO2020061971A1 (en) | 2018-09-27 | 2018-09-27 | Epilepsy brain wave state detection method based on machine learning |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020061971A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014039790A1 (en) * | 2012-09-07 | 2014-03-13 | Children's Medical Center Corporation | Detection of epileptogenic brains with non-linear analysis of electromagnetic signals |
CN104899436A (en) * | 2015-05-29 | 2015-09-09 | 北京航空航天大学 | Electroencephalogram signal time-frequency analysis method based on multi-scale radial basis function and improved particle swarm optimization algorithm |
CN105956623A (en) * | 2016-05-04 | 2016-09-21 | 太原理工大学 | Epilepsy electroencephalogram signal classification method based on fuzzy entropy |
CN106821376A (en) * | 2017-03-28 | 2017-06-13 | 南京医科大学 | A kind of epileptic attack early warning system and method based on deep learning algorithm |
CN107049239A (en) * | 2016-12-28 | 2017-08-18 | 苏州国科康成医疗科技有限公司 | Epileptic electroencephalogram (eeg) feature extracting method based on wearable device |
CN107569228A (en) * | 2017-08-22 | 2018-01-12 | 北京航空航天大学 | Encephalic EEG signals characteristic wave identification device based on band information and SVMs |
-
2018
- 2018-09-27 WO PCT/CN2018/108154 patent/WO2020061971A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014039790A1 (en) * | 2012-09-07 | 2014-03-13 | Children's Medical Center Corporation | Detection of epileptogenic brains with non-linear analysis of electromagnetic signals |
CN104899436A (en) * | 2015-05-29 | 2015-09-09 | 北京航空航天大学 | Electroencephalogram signal time-frequency analysis method based on multi-scale radial basis function and improved particle swarm optimization algorithm |
CN105956623A (en) * | 2016-05-04 | 2016-09-21 | 太原理工大学 | Epilepsy electroencephalogram signal classification method based on fuzzy entropy |
CN107049239A (en) * | 2016-12-28 | 2017-08-18 | 苏州国科康成医疗科技有限公司 | Epileptic electroencephalogram (eeg) feature extracting method based on wearable device |
CN106821376A (en) * | 2017-03-28 | 2017-06-13 | 南京医科大学 | A kind of epileptic attack early warning system and method based on deep learning algorithm |
CN107569228A (en) * | 2017-08-22 | 2018-01-12 | 北京航空航天大学 | Encephalic EEG signals characteristic wave identification device based on band information and SVMs |
Non-Patent Citations (1)
Title |
---|
YOU, TINGLING: "Research and Application of Preprocess Technology in Health Data", CNKI MASTER’S THESES, 28 February 2018 (2018-02-28) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ahmad et al. | EEG-based epileptic seizure detection via machine/deep learning approaches: a systematic review | |
Liang et al. | An unsupervised EEG decoding system for human emotion recognition | |
Sharma et al. | Seizures classification based on higher order statistics and deep neural network | |
Xu et al. | Interpretation of electrocardiogram (ECG) rhythm by combined CNN and BiLSTM | |
Acharya et al. | Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework | |
Diykh et al. | Complex networks approach for EEG signal sleep stages classification | |
CN111444747B (en) | Epileptic state identification method based on migration learning and cavity convolution | |
Zhang et al. | Deep time–frequency representation and progressive decision fusion for ECG classification | |
Alvi et al. | Neurological abnormality detection from electroencephalography data: a review | |
Murugappan et al. | Sudden cardiac arrest (SCA) prediction using ECG morphological features | |
Chen et al. | Multiattention adaptation network for motor imagery recognition | |
Khalighi et al. | Adaptive automatic sleep stage classification under covariate shift | |
Zhu et al. | A novel automatic detection system for ECG arrhythmias using maximum margin clustering with immune evolutionary algorithm | |
Yang et al. | Mlp with riemannian covariance for motor imagery based eeg analysis | |
Slama et al. | Application of statistical features and multilayer neural network to automatic diagnosis of arrhythmia by ECG signals | |
Li et al. | GNMF-based quadratic feature extraction in SSTFT domain for epileptic EEG detection | |
Daydulo et al. | Cardiac arrhythmia detection using deep learning approach and time frequency representation of ECG signals | |
Nkengfack et al. | A comparison study of polynomial-based PCA, KPCA, LDA and GDA feature extraction methods for epileptic and eye states EEG signals detection using kernel machines | |
Chen et al. | Bafnet: bottleneck attention based fusion network for sleep apnea detection | |
Vylala et al. | Spectral feature and optimization-based actor-critic neural network for arrhythmia classification using ECG signal | |
WO2020061971A1 (en) | Epilepsy brain wave state detection method based on machine learning | |
CN112084935B (en) | Emotion recognition method based on expansion of high-quality electroencephalogram sample | |
Fatma et al. | Survey on Epileptic Seizure Detection on Varied Machine Learning Algorithms | |
Ma et al. | A feature extraction algorithm of brain network of motor imagination based on a directed transfer function | |
CN116821764A (en) | Knowledge distillation-based multi-source domain adaptive EEG emotion state classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18934564 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18934564 Country of ref document: EP Kind code of ref document: A1 |