WO2020061971A1 - Epilepsy brain wave state detection method based on machine learning - Google Patents

Epilepsy brain wave state detection method based on machine learning Download PDF

Info

Publication number
WO2020061971A1
WO2020061971A1 PCT/CN2018/108154 CN2018108154W WO2020061971A1 WO 2020061971 A1 WO2020061971 A1 WO 2020061971A1 CN 2018108154 W CN2018108154 W CN 2018108154W WO 2020061971 A1 WO2020061971 A1 WO 2020061971A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
frequency
brain wave
sample
data set
Prior art date
Application number
PCT/CN2018/108154
Other languages
French (fr)
Chinese (zh)
Inventor
蔡洪斌
卢光辉
尤婷婷
Original Assignee
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 电子科技大学 filed Critical 电子科技大学
Priority to PCT/CN2018/108154 priority Critical patent/WO2020061971A1/en
Publication of WO2020061971A1 publication Critical patent/WO2020061971A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/372Analysis of electroencephalograms

Definitions

  • the invention belongs to the technical field of machine learning and medical diagnosis, and particularly relates to a method for detecting brain wave state of epilepsy based on machine learning.
  • EEG activity is a spontaneous, rhythmic potential change generated by neurons in the cerebral cortex, and it is the overall reflection of the electrophysiological activity of cerebral nerve cells on the surface of the cerebral cortex or scalp.
  • EEG signals contain a large amount of physiological and disease information; different physiological states have different characteristics of EEG signals. Therefore, EEG signals provide a diagnostic basis for the detection of epilepsy status.
  • human society has begun to enter a revolutionary era based on massive data and the use of information technology for knowledge creation and information mining; the entry of information technology into the medical industry is a need for the development of the times.
  • the analysis methods of epilepsy brain waves include: time-domain analysis, frequency-domain analysis and time-frequency domain analysis.
  • the time-domain analysis method directly extracts characteristic waves similar to EEG for observation; time-frequency analysis is a comprehensive analysis of EEG signals by combining time-domain signals and frequency-domain signals.
  • power spectrum estimation is its main method, and its significance is related to transforming the brain wave with amplitude over time into the spectrum of brain power with frequency domain transformation, so that the distribution of EEG rhythm can be observed intuitively. And changes.
  • the power spectrum is usually superimposed as a group every 10 consecutive frequencies; if there are 50 groups of frequency data, 1 to 10 groups of frequencies are superimposed as a group, so that the data is superimposed.
  • the reduction of 50 columns to 5 columns achieves the purpose of dimensionality reduction, which is conducive to improving the efficiency of subsequent calculations.
  • a rough dimensionality reduction method directly superimposes continuous power spectrum data, ignoring the correlation between frequency data, resulting in the loss of unique information between samples, and eventually leading to a decrease in the prediction performance of epilepsy state detection.
  • the present invention provides a method for detecting brain wave state of epilepsy based on machine learning, which includes the following main steps:
  • Step 1 Import data, import brain wave data from patients with epilepsy, and mark their status.
  • Step 2 Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
  • This step mainly includes:
  • Step 2.1 Obtain the maximum value max and the minimum value min from the brain wave data set.
  • Step 2.2 Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
  • Step 3 time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
  • Step 4 Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
  • This step mainly includes:
  • Step 4.2 find the minimum value P, so that for each sample X t satisfies the condition:
  • R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
  • step 4.3 the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
  • Step 5 The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.
  • This step mainly includes:
  • Step 5.1.1 use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.
  • K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity
  • Step 5.1.2 select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.
  • Step 5.1.3 For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
  • D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
  • Step 5.1.4 In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
  • step 5.2 the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ⁇ (W).
  • the calculation formula is:
  • the embedding weight matrix for the sample X i is:
  • w i ( ⁇ i1 , ⁇ i2 , ..., ⁇ in ) T
  • W (w 1 , w 2 , ..., w n ).
  • Step 5.3 the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ⁇ (Y) in the p-dimensional space is minimized, that is:
  • Z i is the embedded representation of X i in a low-dimensional space
  • Z ⁇ Z 1 , Z 2 , ..., Z n ⁇
  • F (I n -W) T (I n -W)
  • tr represents a matrix N
  • the n number represents the total number of data sets
  • I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0.
  • Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F.
  • the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low
  • the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
  • Step 6 Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
  • Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
  • the beneficial effects of the present invention are: providing an effective detection method based on machine learning for the state detection of epilepsy brain waves; using a linear adaptive dimensionality reduction algorithm to reduce the dimension of a data set, and introducing a polymorphism into the linear adaptive dimensionality reduction algorithm.
  • the concept of class and mean allows the neighborhood points of the sample points to be adaptively selected, and enables the low-dimensional embedding of the data set to well maintain the topology and manifold structure of the original data set; the invention It has a good performance in terms of detection efficiency, computing overhead and accuracy.
  • FIG. 1 shows a flowchart of a method for detecting an epilepsy brain wave state based on machine learning according to the present invention.
  • Figure 2 shows a flowchart of a linear adaptive dimensionality reduction algorithm.
  • Step 1 Import data, import brain wave data from patients with epilepsy, and mark their status.
  • This step includes:
  • Step 1.1 import brain wave data
  • each row of data represents the patient's time-domain signal under a sampling period
  • each column represents the time-domain signal obtained under a pulse
  • r represents the number of data sets of samples
  • m i (m i1, m i2, ..., m is)
  • s represents a number of brain-wave data sample i.e. the number of columns.
  • Step 2 Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
  • This step includes:
  • Step 2.1 Obtain the maximum value max and the minimum value min from the brain wave data set.
  • Step 2.2 Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
  • Step 3 time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
  • This step includes:
  • each time-domain sample m t (m t1 , m t2 , ..., m ts ) is converted to Dataset dimensions are transformed from s to
  • Step 3.2 Perform power spectrum calculation according to the amplitude and phase values in the frequency spectrum. For each data such as a jk + ib jk , the power spectrum c jk is calculated as:
  • the power spectrum c jk is used instead of its original data a jk + ib jk , and the data set is converted from M to C.
  • Step 4 Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
  • This step includes:
  • Step 4.2 find the minimum value P, so that for each sample X t satisfies the condition:
  • R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
  • step 4.3 the data obtained by formula (10) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
  • Step 5 The linear adaptive dimension reduction of the frequency-domain signal is performed, and the linear adaptive dimension reduction technique is adopted to reduce the dimension of the data in order to effectively perform classification processing, as shown in FIG. 2.
  • This step includes:
  • the adaptive final neighborhood points of the sample points X i mainly include:
  • Step 5.1.1 Use K-Means algorithm based on Euclidean distance to perform cluster analysis on the data set for the selection and judgment of the final neighborhood points.
  • the main steps are as follows: First, k cluster centers are randomly given initially, and the sample selects the nearest cluster center according to the Euclidean distance criterion and classifies it into the corresponding cluster to complete the first assignment. After that, it re-accords to the samples in the cluster. Calculate the centers of the clusters and redistribute the samples into the appropriate clusters according to the Euclidean distance criterion until the cluster centers no longer transform or the number of clusters reaches a set threshold.
  • Step 5.1.2 select k initial neighborhood points of each sample point of the data set, and record the initial neighborhood points of the sample point X i as X i1 , X i2 , ..., X ik , where k represents the set Initial neighborhood points.
  • Step 5.1.3 For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
  • D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
  • Step 5.1.4 In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
  • step 5.2 the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ⁇ (W).
  • the calculation formula is:
  • w i ( ⁇ i1 , ⁇ i2 , ..., ⁇ in ) T
  • W (w 1 , w 2 , ..., w n ).
  • Step 5.3 the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ⁇ (Y) in the p-dimensional space is minimized, that is:
  • Z i is the embedded representation of X i in a low-dimensional space
  • Z ⁇ Z 1 , Z 2 , ..., Z n ⁇
  • F (I n -W) T (I n -W)
  • tr represents a matrix N
  • the n number represents the total number of data sets
  • I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0.
  • Z needs to satisfy This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F.
  • the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large.
  • the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
  • Step 6 Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
  • the classification interval can be calculated as 2 / ⁇ W ⁇ , so the problem of constructing the optimal hyperplane is converted to the minimum classification interval min ⁇ (W) under constraint:
  • a (a 1 , a 2 , ..., a l ), and any a i > 0 is a Lagrangian multiplier; the optimal weight vector W and the optimal offset b obtained by the solution are: and Where j ⁇ ⁇ j
  • Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
  • This step includes:
  • Step 7.1 Import the unlabeled epilepsy brain wave data set.
  • Step 7.2 transform and process the data set according to steps 2, 3, 4, and 5 in sequence
  • Step 7.3 Use the support vector machine classification prediction model established in step 6 to perform state classification prediction on the processed epilepsy brain wave data.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Psychology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Engineering & Computer Science (AREA)
  • Psychiatry (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Complex Calculations (AREA)

Abstract

An epilepsy brain wave state detection method based on machine learning. The method comprises the following steps: input import: importing brain wave data of an epilepsy patient and marking the state thereof; normalized transformation processing: setting a suitable new maximum value and a suitable new minimum value, and mapping brain wave time-domain signal data to a smaller new value interval according to a normalized transformation technique; time domain to frequency domain conversion: carrying out fast Fourier transform on each piece of brain wave time-domain data, and carrying out comprehensive calculation on an amplitude frequency of each piece of data and taking same as a power spectrum thereof; frequency domain range selection: selecting a suitable low-frequency signal to replace an original frequency-domain signal, and removing high-frequency signal noise; linear adaptive dimension reduction of a frequency-domain signal: using a linear adaptive dimension reduction technique to carry out data dimension reduction, so as to effectively carry out classification processing; establishment of a support vector machine classification and prediction model: using a support vector machine classifier to establish a prediction model for a training data set; and epilepsy state classification and prediction: using the established prediction and classification model to carry out state classification and prediction on a brain wave in an unknown state.

Description

一种基于机器学习的癫痫病脑电波状态检测方法Epileptic brain wave state detection method based on machine learning 技术领域Technical field
本发明属于机器学习与医疗诊断的技术领域,尤其涉及一种基于机器学习的癫痫病脑电波状态检测方法。The invention belongs to the technical field of machine learning and medical diagnosis, and particularly relates to a method for detecting brain wave state of epilepsy based on machine learning.
背景技术Background technique
脑电活动是大脑皮层神经元产生的自发的、节律性的电位变化,是脑神经细胞电生理活动在大脑皮层或头皮表面的总体反映。脑电信号中包含大量的生理与疾病信息;不同的生理状态有不同的脑电信号特征,因此,脑电信号为癫痫病状态检测方面提供了诊断依据。随着科技的快速发展,人类社会开始迈入以海量数据作为基础,利用信息技术进行知识创造和信息挖掘的革命时代;信息技术进入到医疗行业是时代发展的需要。EEG activity is a spontaneous, rhythmic potential change generated by neurons in the cerebral cortex, and it is the overall reflection of the electrophysiological activity of cerebral nerve cells on the surface of the cerebral cortex or scalp. EEG signals contain a large amount of physiological and disease information; different physiological states have different characteristics of EEG signals. Therefore, EEG signals provide a diagnostic basis for the detection of epilepsy status. With the rapid development of science and technology, human society has begun to enter a revolutionary era based on massive data and the use of information technology for knowledge creation and information mining; the entry of information technology into the medical industry is a need for the development of the times.
在医疗诊断中,利用数据挖掘技术进行癫痫病脑电波的状态检测成为医疗行业的一大热点。其目的在于通过癫痫病患者发作期与间歇期的脑电波数据建立预测模型辅助医生进行状态诊断。癫痫病脑电波的分析方法主要包括:时域分析、频域分析和时频域分析。时域分析方法直接提取与脑电类似的特征波进行观察;时频分析则是结合时域信号和频域信号对脑电波信号进行综合分析。在频域分析方法中,功率谱估计是其主要手段,其意义在与把幅度随时间变换的脑电波变换为脑电功率随频域变换的谱图,从而可直观地观察到脑电节律的分布与变化情况。In medical diagnosis, the use of data mining technology to detect the state of epilepsy brain waves has become a hot spot in the medical industry. The purpose is to establish a predictive model based on the brain wave data of patients with epilepsy during the onset and intermittent periods to assist doctors in the diagnosis of the state. The analysis methods of epilepsy brain waves include: time-domain analysis, frequency-domain analysis and time-frequency domain analysis. The time-domain analysis method directly extracts characteristic waves similar to EEG for observation; time-frequency analysis is a comprehensive analysis of EEG signals by combining time-domain signals and frequency-domain signals. In the frequency domain analysis method, power spectrum estimation is its main method, and its significance is related to transforming the brain wave with amplitude over time into the spectrum of brain power with frequency domain transformation, so that the distribution of EEG rhythm can be observed intuitively. And changes.
在频域分析的频率降维过程中,通常将功率谱每10个连续频率进行叠加作为一组;如有50组频率数据,则将1~10组频率进行叠加作为一组,从而将数据从50列降低到5列,达到降维的目的,有利于提高之后计算的效率。但是这样粗糙的降维方法,直接将连续的功率谱数据进行叠加,忽略了频率数据间的相互关系,致使样本间的特有信息丢失,最终导致癫痫病状态检测的预测性能有所下降。In the frequency dimensionality reduction process of frequency domain analysis, the power spectrum is usually superimposed as a group every 10 consecutive frequencies; if there are 50 groups of frequency data, 1 to 10 groups of frequencies are superimposed as a group, so that the data is superimposed The reduction of 50 columns to 5 columns achieves the purpose of dimensionality reduction, which is conducive to improving the efficiency of subsequent calculations. However, such a rough dimensionality reduction method directly superimposes continuous power spectrum data, ignoring the correlation between frequency data, resulting in the loss of unique information between samples, and eventually leading to a decrease in the prediction performance of epilepsy state detection.
发明内容Summary of the Invention
为了克服现有技术的不足,本发明提供了一种基于机器学习的癫痫病脑电波状态检测方法,包括以下主要步骤:In order to overcome the shortcomings of the prior art, the present invention provides a method for detecting brain wave state of epilepsy based on machine learning, which includes the following main steps:
步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.
步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
本步骤主要包括:This step mainly includes:
步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.
步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
Figure PCTCN2018108154-appb-000001
Figure PCTCN2018108154-appb-000001
对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
本步骤主要包括:This step mainly includes:
步骤4.1,随机选取数据集中的d个样本,记其为{X 1,X 2,...,X},X i=(x i1,x i2,...,x im),m表示数据集的维度。 Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X}, X i = (x i1 , x i2 , ..., x im ), m represents the data Set of dimensions.
步骤4.2,寻找最小值P,使得对于每一个样本X t都满足条件: Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:
Figure PCTCN2018108154-appb-000002
Figure PCTCN2018108154-appb-000002
式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
步骤4.3,通过从公式(2)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理。Step 5. The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.
本步骤主要包括:This step mainly includes:
步骤5.1,记数据集X={X 1,X 2,...,X n},X i=(x i1,x i1,...,x iP),P表示数据集的维度,确定每个样本点X i的自适应最终邻域点: Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each Adaptive final neighborhood points of the sample points X i :
步骤5.1.1,使用基于欧氏距离的K-Means算法(K-means算法是一种基于距离的聚类算法,采用距离作为相似性的评价指标)对数据集进行聚类分析,用于最终邻域点的选择判断。Step 5.1.1, use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.
步骤5.1.2,选取样本点X i的初始邻域点X i1,X i2,...,X ik,其中k表示设定的初始邻域点数。 Step 5.1.2, select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.
步骤5.1.3,对于样本点X i,计算其初始邻域点X i1,X i2,...,X ik与样本X i的距离均值,计算公式为: Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
Figure PCTCN2018108154-appb-000003
Figure PCTCN2018108154-appb-000003
其中,D ik表示样本点X i与其邻域点X ik的欧氏距离。 Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
步骤5.1.4,在初始邻域点中,为每个样本点X i选取其最终邻域点,需满足条件:若初始邻域点X ik与样本点X i的距离D ik小于距离均值DM i,或者X i与初始邻域点X ik属于同一聚类簇,则X ik为X i的最终邻域点;否则不是。 Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为: In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:
Figure PCTCN2018108154-appb-000004
Figure PCTCN2018108154-appb-000004
其中,系数ω ij表示重构X i时X j所占的权重;当X j不是X i的邻域点时,ω ij=0;并且∑ jω ij=1;n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X i其嵌入权值矩阵为: Wherein, upon reconstitution indicates the coefficient ω ij X i X j share of the weight; when not the points in the neighborhood of X i X j, ω ij = 0; and Σ j ω ij = 1; n represents the total number of data sets . Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:
Figure PCTCN2018108154-appb-000005
Figure PCTCN2018108154-appb-000005
其中,G i=(G jk),且G jk=(X i-X j) T(X i-X k)(X j,X k是X i的邻域点);1 n为维度为n的列向量,即1 n=(1,1,...,1) T;1 n T为1 n的转置;w i=(ω i1i2,...,ω in) T,W=(w 1,w 2,...,w n)。 Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).
步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:
Figure PCTCN2018108154-appb-000006
Figure PCTCN2018108154-appb-000006
其中Z i为X i在低维空间的嵌入表示,Z={Z 1,Z 2,...,Z n},F=(I n-W) T(I n-W),tr表示矩阵的迹,n表示数据集总条数,I n为主对角线上的元素都为1,其余元素全为0的n阶单位矩阵,Z需要满足
Figure PCTCN2018108154-appb-000007
该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u 1,u 2,...,u P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u 1,u 2,...,u p) T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。
Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy
Figure PCTCN2018108154-appb-000007
This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
本发明的有益效果是:为癫痫病脑电波的状态检测提供一种基于机器学习的有效检测方法;使用线性自适应降维算法对数据集进行降维,在线性自适应降维算法中引入聚类和均值的概念,通过这两者的限制,使得样本点的邻域点可以自适应选择,并使数据集的低维嵌入良好地保持了原始数据集的拓扑结构和流形结构;本发明在检测效率,运算开销及准确率方面有良好表现。The beneficial effects of the present invention are: providing an effective detection method based on machine learning for the state detection of epilepsy brain waves; using a linear adaptive dimensionality reduction algorithm to reduce the dimension of a data set, and introducing a polymorphism into the linear adaptive dimensionality reduction algorithm. The concept of class and mean, through the limitation of these two, allows the neighborhood points of the sample points to be adaptively selected, and enables the low-dimensional embedding of the data set to well maintain the topology and manifold structure of the original data set; the invention It has a good performance in terms of detection efficiency, computing overhead and accuracy.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1示出了本发明一种基于机器学习的癫痫病脑电波状态检测方法的流程图。FIG. 1 shows a flowchart of a method for detecting an epilepsy brain wave state based on machine learning according to the present invention.
图2示出了线性自适应降维算法的流程图。Figure 2 shows a flowchart of a linear adaptive dimensionality reduction algorithm.
具体实施方式detailed description
下面结合附图和实施例对本发明优先实施方式进一步说明。The preferred embodiments of the present invention will be further described below with reference to the accompanying drawings and embodiments.
图1所示的流程图给出了本发明整个实施的具体过程:The flowchart shown in Figure 1 gives the specific process of the entire implementation of the present invention:
步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.
本步骤包括:This step includes:
步骤1.1,导入脑电波数据,每一行数据代表患者一个抽样周期下的时域信号,每一列代表一个脉冲下获取的时域信号,即数据集为M={m 1,m 2,...,m r},r表示数据集样本个数,m i=(m i1,m i2,...,m is),s表示一条脑电波数据的抽样次数即列数。 Step 1.1, import brain wave data, each row of data represents the patient's time-domain signal under a sampling period, and each column represents the time-domain signal obtained under a pulse, that is, the data set is M = {m 1 , m 2 , ... , m r}, r represents the number of data sets of samples, m i = (m i1, m i2, ..., m is), s represents a number of brain-wave data sample i.e. the number of columns.
步骤1.2,通过数组y=(y 1,y 2,...,y r)记录每个样本对应的状态,即间歇期状态记为-1,发作期状态为记1。 In step 1.2, the state corresponding to each sample is recorded through the array y = (y 1 , y 2 , ..., y r ), that is, the state of the intermittent period is recorded as -1 and the state of the onset period is recorded as 1.
步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
本步骤包括:This step includes:
步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.
步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
Figure PCTCN2018108154-appb-000008
Figure PCTCN2018108154-appb-000008
对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
本步骤包括:This step includes:
步骤3.1,由于脑电波时域信号满足连续多变量普遍化B分布即Dirichlet分布,因此对于脑电波时域信号的函数x(t)在[0,T]区间上设Δt为很小的时间间隔,T=NΔt,则x(t)在t n=nΔt时的采样值为x n,用各个x n求和代替x(t),则快速傅里叶变换的计算公式为: Step 3.1, since the time-domain signal of the brain wave satisfies the continuous multivariate generalized B distribution, that is, the Dirichlet distribution, the function x (t) of the time-domain signal of the brain wave sets Δt as a small time interval in the [0, T] interval. , T = NΔt, then the sampling value of x (t) when t n = nΔt is x n , and the sum of each x n is used to replace x (t), then the formula for calculating the fast Fourier transform is:
Figure PCTCN2018108154-appb-000009
Figure PCTCN2018108154-appb-000009
,由于快速傅里叶变化的对称性,因此每个时域样本m t=(m t1,m t2,...,m ts)转换为
Figure PCTCN2018108154-appb-000010
Figure PCTCN2018108154-appb-000011
数据集维度从s转换成
Figure PCTCN2018108154-appb-000012
Due to the symmetry of the fast Fourier transform, each time-domain sample m t = (m t1 , m t2 , ..., m ts ) is converted to
Figure PCTCN2018108154-appb-000010
Figure PCTCN2018108154-appb-000011
Dataset dimensions are transformed from s to
Figure PCTCN2018108154-appb-000012
步骤3.2,根据频谱中的幅度值和相位值进行功率谱计算。对于每一个数据如a jk+ib jk,其功率谱c jk计算公式为: Step 3.2: Perform power spectrum calculation according to the amplitude and phase values in the frequency spectrum. For each data such as a jk + ib jk , the power spectrum c jk is calculated as:
Figure PCTCN2018108154-appb-000013
Figure PCTCN2018108154-appb-000013
并使用功率谱c jk代替其原始数据a jk+ib jk,数据集由M转换为C。 The power spectrum c jk is used instead of its original data a jk + ib jk , and the data set is converted from M to C.
步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
本步骤包括:This step includes:
步骤4.1,随机选取数据集中的d个样本,记其为{X 1,X 2,...,X d},X i=(x i1,x i2,...,x im),m表示数据集的维度。 Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X d }, X i = (x i1 , x i2 , ..., x im ), where m represents The dimensions of the dataset.
步骤4.2,寻找最小值P,使得对于每一个样本X t都满足条件: Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:
Figure PCTCN2018108154-appb-000014
Figure PCTCN2018108154-appb-000014
式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
步骤4.3,通过从公式(10)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (10) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理,如图2所示。Step 5. The linear adaptive dimension reduction of the frequency-domain signal is performed, and the linear adaptive dimension reduction technique is adopted to reduce the dimension of the data in order to effectively perform classification processing, as shown in FIG. 2.
本步骤包括:This step includes:
步骤5.1,记数据集X={X 1,X 2,...,X n},X i=(x i1,x i1,...,x iP),P表示数据集的维度,确定每个样本点X i的自适应最终邻域点,主要包括: Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each The adaptive final neighborhood points of the sample points X i mainly include:
步骤5.1.1,使用基于欧氏距离的K-Means算法对数据集进行聚类分析,用于最终邻域点的选择判断。其主要步骤为:首先随机初始给定k个簇中心,样本依据欧氏距离标准选择最近 的簇中心并将其归入到对应的簇中,完成第一次分配;之后,根据簇中样本重新计算各簇的中心并根据欧氏距离标准将样本重新分配到合适的簇中,直到各簇中心不再变换或聚类次数达到设定阈值为止。Step 5.1.1. Use K-Means algorithm based on Euclidean distance to perform cluster analysis on the data set for the selection and judgment of the final neighborhood points. The main steps are as follows: First, k cluster centers are randomly given initially, and the sample selects the nearest cluster center according to the Euclidean distance criterion and classifies it into the corresponding cluster to complete the first assignment. After that, it re-accords to the samples in the cluster. Calculate the centers of the clusters and redistribute the samples into the appropriate clusters according to the Euclidean distance criterion until the cluster centers no longer transform or the number of clusters reaches a set threshold.
步骤5.1.2,选取数据集每个样本点的k个初始邻域点,记样本点X i的初始邻域点为X i1,X i2,...,X ik,其中k表示设定的初始邻域点数。 Step 5.1.2, select k initial neighborhood points of each sample point of the data set, and record the initial neighborhood points of the sample point X i as X i1 , X i2 , ..., X ik , where k represents the set Initial neighborhood points.
步骤5.1.3,对于样本点X i,计算其初始邻域点X i1,X i2,...,X ik和样本X i的距离均值,计算公式为: Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
Figure PCTCN2018108154-appb-000015
Figure PCTCN2018108154-appb-000015
其中,D ik表示样本点X i与其邻域点X ik的欧氏距离。 Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
步骤5.1.4,在初始邻域点中,为每个样本点X i选取其最终邻域点,需满足条件:若初始邻域点X ik与样本点X i的距离D ik小于距离均值DM i,或者X i与初始邻域点X ik属于同一聚类簇,则X ik为X i的最终邻域点;否则不是。 Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为: In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:
Figure PCTCN2018108154-appb-000016
Figure PCTCN2018108154-appb-000016
其中,系数ω ij表示重构X i时X j所占的权重;当X j不是X i的邻域点时,ω ij=0,并且
Figure PCTCN2018108154-appb-000017
n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X i其嵌入权值矩阵为:
Among them, the coefficient ω ij represents the weight occupied by X j when reconstructing X i ; when X j is not a neighborhood point of X i , ω ij = 0, and
Figure PCTCN2018108154-appb-000017
n represents the total number of data sets. Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:
Figure PCTCN2018108154-appb-000018
Figure PCTCN2018108154-appb-000018
其中,G i=(G jk),且G jk=(X i-X j) T(X i-X k)(X j,X k是X i的邻域点);1 n为维度为n的列向量,即1 n=(1,1,...,1) T;1 n T为1 n的转置;w i=(ω i1i2,...,ω in) T,W=(w 1,w 2,...,w n)。 Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).
步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:
Figure PCTCN2018108154-appb-000019
Figure PCTCN2018108154-appb-000019
其中Z i为X i在低维空间的嵌入表示,Z={Z 1,Z 2,...,Z n},F=(I n-W) T(I n-W),tr表示矩阵的迹,n表示数据集总条数,I n为主对角线上的元素都为1,其余元素全为0的n阶单位矩 阵,Z需要满足
Figure PCTCN2018108154-appb-000020
该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u 1,u 2,...,u P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u 1,u 2,...,u p) T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。
Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy
Figure PCTCN2018108154-appb-000020
This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large. The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
若给定训练样本集(X i,y i),i=1,2,...,l,X i∈R n,y i∈{±1},超平面记作(W·X)+b=0,w,x∈R n,为使分类面对所有样本正确分类并具备分类间隔,需满足约束: If a given training set (X i, y i), i = 1,2, ..., l, X i ∈R n, y i ∈ {± 1}, referred to as a hyperplane (W · X) + b = 0, w, x ∈ R n . In order for the classification to correctly classify all samples and have a classification interval, the constraints need to be met:
y i[(W·X i)+b]≥1,i=1,2,...,l             (15) y i [(W · X i ) + b] ≥1, i = 1,2, ..., l (15)
可以计算出分类间隔为2/‖W‖,因此构造最优超平面的问题转换为在约束时下求最小分类间隔minφ(W):The classification interval can be calculated as 2 / ‖W‖, so the problem of constructing the optimal hyperplane is converted to the minimum classification interval minφ (W) under constraint:
Figure PCTCN2018108154-appb-000021
Figure PCTCN2018108154-appb-000021
通过引入拉格朗日乘子法解决该最优化问题:This optimization problem is solved by introducing the Lagrangian multiplier method:
Figure PCTCN2018108154-appb-000022
Figure PCTCN2018108154-appb-000022
式中,a=(a 1,a 2,…,a l),任意a i>0为拉格朗日乘数;求解计算得到最优权值向量W和最优偏置b分别为:
Figure PCTCN2018108154-appb-000023
并且
Figure PCTCN2018108154-appb-000024
其中,j∈{j|a j>0},并且a满足
Figure PCTCN2018108154-appb-000025
Figure PCTCN2018108154-appb-000026
Figure PCTCN2018108154-appb-000027
因此,最优分类函数为:
In the formula, a = (a 1 , a 2 , ..., a l ), and any a i > 0 is a Lagrangian multiplier; the optimal weight vector W and the optimal offset b obtained by the solution are:
Figure PCTCN2018108154-appb-000023
and
Figure PCTCN2018108154-appb-000024
Where j∈ {j | a j > 0}, and a satisfies
Figure PCTCN2018108154-appb-000025
And
Figure PCTCN2018108154-appb-000026
Figure PCTCN2018108154-appb-000027
Therefore, the optimal classification function is:
f(X)=sgn{(W·φ(X))+b}            (18)f (X) = sgn {(W · φ (X)) + b) (18)
sgn返回一个整型变量,满足sgn returns an integer variable that satisfies
Figure PCTCN2018108154-appb-000028
Figure PCTCN2018108154-appb-000028
使用该方法,将脑电波数据集和其对应的癫痫病状态建立支持向量机分类预测模型用于癫痫病的状态检测。Using this method, a support vector machine classification and prediction model is established for the EEG data set and its corresponding epilepsy state for epilepsy state detection.
步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
本步骤包括:This step includes:
步骤7.1,导入未标记状态的癫痫病脑电波数据集。Step 7.1: Import the unlabeled epilepsy brain wave data set.
步骤7.2,依次按照步骤2、3、4、5对数据集进行变换和处理;Step 7.2, transform and process the data set according to steps 2, 3, 4, and 5 in sequence;
步骤7.3,使用步骤6建立的支持向量机分类预测模型对处理后的癫痫病脑电波数据进行状态分类预测。Step 7.3: Use the support vector machine classification prediction model established in step 6 to perform state classification prediction on the processed epilepsy brain wave data.

Claims (4)

  1. 一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,包括以下步骤:A method for detecting brain wave state of epilepsy based on machine learning, which comprises the following steps:
    步骤1,数据导入,导入癫痫病患者的脑电波数据,并标记其状态。Step 1. Import data, import brain wave data from patients with epilepsy, and mark their status.
    步骤2,规范化变换处理,制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。Step 2: Normalize the transformation process, formulate appropriate new maximum and minimum values, and map the EEG time-domain signal data to a smaller new value interval according to the normalized transformation technique.
    步骤3,时频域转换,将每条脑电波时域数据进行快速傅里叶变换,并对每个数据的幅频进行综合计算作为其功率谱。Step 3, time-frequency domain conversion, performing fast Fourier transform on the time-domain data of each brainwave, and comprehensively calculating the amplitude frequency of each data as its power spectrum.
    步骤4,频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。Step 4. Select a frequency domain range, select a suitable low frequency signal to replace the original frequency domain signal, and remove high frequency signal noise.
    步骤5,频域信号的线性自适应降维,采取线性自适应降维技术进行数据降维,以便有效地进行分类处理。Step 5. The linear adaptive dimension reduction of the frequency domain signal is performed, and the linear adaptive dimension reduction technology is adopted to perform the data dimension reduction in order to effectively perform the classification processing.
    步骤6,支持向量机分类预测模型建立,使用支持向量机分类器对训练数据集建立预测模型。Step 6. Establish a support vector machine classification and prediction model, and use the support vector machine classifier to establish a prediction model on the training data set.
    步骤7,癫痫病状态分类预测,使用建立的预测分类模型对未知状态的脑电波进行状态分类预测。Step 7. Classification and prediction of the status of epilepsy. Use the established prediction classification model to perform state classification prediction on brain waves of unknown states.
  2. 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤2中制定合适的新的最大值和最小值,将脑电波时域信号数据按照规范化变换技术映射到较小的新取值区间。所述的步骤2进一步包括:The method for detecting brain wave state of epilepsy based on machine learning according to claim 1, characterized in that, in step 2, an appropriate new maximum value and minimum value are formulated, and the time-domain signal data of the brain wave is calculated according to The normalized transformation technique maps to smaller new value intervals. The step 2 further includes:
    步骤2.1,从脑电波数据集中获取最大值max和最小值min。Step 2.1: Obtain the maximum value max and the minimum value min from the brain wave data set.
    步骤2.2,根据需要设定新的最大值new_max和最小值new_min并使用规范化变换计算公式:Step 2.2. Set the new maximum value new_max and minimum value new_min as required and use the normalized transformation calculation formula:
    Figure PCTCN2018108154-appb-100001
    Figure PCTCN2018108154-appb-100001
    对数据集中的每个数据v进行规范化变换计算(转换为v'),将其值域从[min,max]转换到区间[new_min,new_max]。Normalize transformation calculation (convert to v ') for each data v in the data set, and transform its range from [min, max] to the interval [new_min, new_max].
  3. 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤4中进行频域范围选择,选取合适的低频信号替代原始频域信号,去除高频信号噪声。所述的步骤4进一步包括:The method for detecting epilepsy brain wave state based on machine learning according to claim 1, characterized in that, in step 4, frequency range selection is performed, and a suitable low-frequency signal is selected instead of the original frequency-domain signal to remove high-frequency signals. Frequency signal noise. The step 4 further includes:
    步骤4.1,随机选取数据集中的d个样本,记其为{X 1,X 2,...,X d},X i=(x i1,x i2,...,x im),m表示数据集的维度。 Step 4.1, randomly select d samples in the data set, and record them as {X 1 , X 2 , ..., X d }, X i = (x i1 , x i2 , ..., x im ), where m represents The dimensions of the dataset.
    步骤4.2,寻找最小值P,使得对于每一个样本X t都满足条件: Step 4.2, find the minimum value P, so that for each sample X t satisfies the condition:
    Figure PCTCN2018108154-appb-100002
    Figure PCTCN2018108154-appb-100002
    式中R为用户指定阈值,建议范围为[0.9,1),其意义为通过较小的低频信号表征原始样本,去 掉脑电信号中的高频噪音。Where R is a user-specified threshold and the recommended range is [0.9,1), which means that the original sample is characterized by a small low-frequency signal, and high-frequency noise in the EEG signal is removed.
    步骤4.3,通过从公式(2)中获取的P,截取原始样本中的前1~P列数据替代原始数据,达到去除噪声的目的。In step 4.3, the data obtained by formula (2) is used to intercept the first 1 to P columns of the original sample to replace the original data to achieve the purpose of removing noise.
  4. 根据权利要求1所述的一种基于机器学习的癫痫病脑电波状态检测方法,其特征在于,所述的步骤5中对频域信号的线性自适应降维,以便有效地进行分类处理。所述的步骤5进一步包括:The method for detecting an epilepsy brain wave state based on machine learning according to claim 1, characterized in that, in step 5, the linear adaptive dimension reduction of the frequency domain signal is performed in order to effectively perform classification processing. The step 5 further includes:
    步骤5.1,记数据集X={X 1,X 2,...,X n},X i=(x i1,x i1,...,x iP),P表示数据集的维度,确定每个样本点X i的自适应最终邻域点。所述的步骤5.1进一步包括: Step 5.1, record the data set X = {X 1 , X 2 , ..., X n }, X i = (x i1 , x i1 , ..., x iP ), P represents the dimension of the data set, determine each Adaptive final neighborhood points of the sample points X i . The step 5.1 further includes:
    步骤5.1.1,使用基于欧氏距离的K-Means算法(K-means算法是一种基于距离的聚类算法,采用距离作为相似性的评价指标)对数据集进行聚类分析,用于最终邻域点的选择判断。Step 5.1.1, use K-Means algorithm based on Euclidean distance (K-means algorithm is a distance-based clustering algorithm, using distance as the evaluation index of similarity) to perform cluster analysis on the data set for final Selection and judgment of neighborhood points.
    步骤5.1.2,选取样本点X i的初始邻域点X i1,X i2,...,X ik,其中k表示设定的初始邻域点数。 Step 5.1.2, select the initial neighborhood points X i1 , X i2 , ..., X ik of the sample points X i , where k represents the set number of initial neighborhood points.
    步骤5.1.3,对于样本点X i,计算其初始邻域点X i1,X i2,...,X ik与样本X i的距离均值,计算公式为: Step 5.1.3. For the sample point X i , calculate the average distance between the initial neighborhood points X i1 , X i2 , ..., X ik and the sample X i , and the calculation formula is:
    Figure PCTCN2018108154-appb-100003
    Figure PCTCN2018108154-appb-100003
    其中,D ik表示样本点X i与其邻域点X ik的欧氏距离。 Among them, D ik represents the Euclidean distance between the sample point X i and its neighborhood point X ik .
    步骤5.1.4,在初始邻域点中,为每个样本点X i选取其最终邻域点,需满足条件:若初始邻域点X ik与样本点X i的距离D ik小于距离均值DM i,或者X i与初始邻域点X ik属于同一聚类簇,则X ik为X i的最终邻域点;否则不是。 Step 5.1.4. In the initial neighborhood points, select the final neighborhood point for each sample point X i , which must meet the condition: if the distance D ik between the initial neighborhood point X ik and the sample point X i is less than the distance mean DM i , or if X i belongs to the same cluster as the initial neighborhood point X ik , then X ik is the final neighborhood point of X i ; otherwise it is not.
    步骤5.2,使用步骤5.1确定的最终邻域点的线性组合重构X i,并计算样本的局部嵌入权值矩阵W,使得重构后的代价误差ε(W)最小,计算公式为: In step 5.2, the linear combination of the final neighborhood points determined in step 5.1 is used to reconstruct X i , and the sample's local embedded weight matrix W is calculated to minimize the reconstructed cost error ε (W). The calculation formula is:
    Figure PCTCN2018108154-appb-100004
    Figure PCTCN2018108154-appb-100004
    其中,系数ω ij表示重构X i时X j所占的权重;当X j不是X i的邻域点时,ω ij=0;并且∑ jω ij=1;n表示数据集总条数。通过拉格朗日乘子法求得,对于样本X i其嵌入权值矩阵为: Wherein, upon reconstitution indicates the coefficient ω ij X i X j share of the weight; when not the points in the neighborhood of X i X j, ω ij = 0; and Σ j ω ij = 1; n represents the total number of data sets . Obtained by the Lagrangian multiplier method, the embedding weight matrix for the sample X i is:
    Figure PCTCN2018108154-appb-100005
    Figure PCTCN2018108154-appb-100005
    其中,G i=(G jk),且G jk=(X i-X j) T(X i-X k)(X j,X k是X i的邻域点);1 n为维度为n的列向量,即1 n=(1,1,...,1) T;1 n T为1 n的转置;w i=(ω i1i2,...,ω in) T,W=(w 1,w 2,...,w n)。 Where G i = (G jk ), and G jk = (X i -X j ) T (X i -X k ) (X j , X k is a neighborhood point of X i ); 1 n is dimension n Column vector of 1, that is, 1 n = (1,1, ..., 1) T ; 1 n T is a transpose of 1 n ; w i = (ω i1 , ω i2 , ..., ω in ) T , W = (w 1 , w 2 , ..., w n ).
    步骤5.3,通过局部嵌入权值W求解嵌入低维空间中的最佳映射,使得嵌入p维空间的嵌入代价误差ε(Y)最小,即:Step 5.3, the optimal mapping in the low-dimensional space is solved by the local embedding weight W, so that the embedding cost error ε (Y) in the p-dimensional space is minimized, that is:
    Figure PCTCN2018108154-appb-100006
    Figure PCTCN2018108154-appb-100006
    其中Z i为X i在低维空间的嵌入表示,Z={Z 1,Z 2,...,Z n},F=(I n-W) T(I n-W),tr表示矩阵的迹,n表示数据集总条数,I n为主对角线上的元素都为1,其余元素全为0的n阶单位矩阵,Z需要满足
    Figure PCTCN2018108154-appb-100007
    该问题实际上就是求解矩阵F的非零特征值中最小值。设矩阵F按升序排列的P个非零特征值所对应的特征向量为u 1,u 2,...,u P,从小到大选取前p个非零特征值,则最终求得嵌入低维空间的样本数据集为U=(u 1,u 2,...,u p) T。这样,步骤4中的P维数据集通过线性自适应降维变成了p维数据集。
    Where Z i is the embedded representation of X i in a low-dimensional space, Z = {Z 1 , Z 2 , ..., Z n }, F = (I n -W) T (I n -W), and tr represents a matrix N, the n number represents the total number of data sets, I n is an n-th order identity matrix where the elements on the main diagonal are 1 and the rest are all 0. Z needs to satisfy
    Figure PCTCN2018108154-appb-100007
    This problem is actually solving the minimum of the non-zero eigenvalues of the matrix F. Assume that the eigenvectors corresponding to P non-zero eigenvalues of matrix F arranged in ascending order are u 1 , u 2 , ..., u P , and select the first p non-zero eigenvalues from small to large, and finally find the embedding low The sample data set of the dimensional space is U = (u 1 , u 2 , ..., u p ) T. In this way, the P-dimensional data set in step 4 becomes a p-dimensional data set through linear adaptive dimensionality reduction.
PCT/CN2018/108154 2018-09-27 2018-09-27 Epilepsy brain wave state detection method based on machine learning WO2020061971A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108154 WO2020061971A1 (en) 2018-09-27 2018-09-27 Epilepsy brain wave state detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108154 WO2020061971A1 (en) 2018-09-27 2018-09-27 Epilepsy brain wave state detection method based on machine learning

Publications (1)

Publication Number Publication Date
WO2020061971A1 true WO2020061971A1 (en) 2020-04-02

Family

ID=69950941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108154 WO2020061971A1 (en) 2018-09-27 2018-09-27 Epilepsy brain wave state detection method based on machine learning

Country Status (1)

Country Link
WO (1) WO2020061971A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014039790A1 (en) * 2012-09-07 2014-03-13 Children's Medical Center Corporation Detection of epileptogenic brains with non-linear analysis of electromagnetic signals
CN104899436A (en) * 2015-05-29 2015-09-09 北京航空航天大学 Electroencephalogram signal time-frequency analysis method based on multi-scale radial basis function and improved particle swarm optimization algorithm
CN105956623A (en) * 2016-05-04 2016-09-21 太原理工大学 Epilepsy electroencephalogram signal classification method based on fuzzy entropy
CN106821376A (en) * 2017-03-28 2017-06-13 南京医科大学 A kind of epileptic attack early warning system and method based on deep learning algorithm
CN107049239A (en) * 2016-12-28 2017-08-18 苏州国科康成医疗科技有限公司 Epileptic electroencephalogram (eeg) feature extracting method based on wearable device
CN107569228A (en) * 2017-08-22 2018-01-12 北京航空航天大学 Encephalic EEG signals characteristic wave identification device based on band information and SVMs

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014039790A1 (en) * 2012-09-07 2014-03-13 Children's Medical Center Corporation Detection of epileptogenic brains with non-linear analysis of electromagnetic signals
CN104899436A (en) * 2015-05-29 2015-09-09 北京航空航天大学 Electroencephalogram signal time-frequency analysis method based on multi-scale radial basis function and improved particle swarm optimization algorithm
CN105956623A (en) * 2016-05-04 2016-09-21 太原理工大学 Epilepsy electroencephalogram signal classification method based on fuzzy entropy
CN107049239A (en) * 2016-12-28 2017-08-18 苏州国科康成医疗科技有限公司 Epileptic electroencephalogram (eeg) feature extracting method based on wearable device
CN106821376A (en) * 2017-03-28 2017-06-13 南京医科大学 A kind of epileptic attack early warning system and method based on deep learning algorithm
CN107569228A (en) * 2017-08-22 2018-01-12 北京航空航天大学 Encephalic EEG signals characteristic wave identification device based on band information and SVMs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YOU, TINGLING: "Research and Application of Preprocess Technology in Health Data", CNKI MASTER’S THESES, 28 February 2018 (2018-02-28) *

Similar Documents

Publication Publication Date Title
Ahmad et al. EEG-based epileptic seizure detection via machine/deep learning approaches: a systematic review
Liang et al. An unsupervised EEG decoding system for human emotion recognition
Sharma et al. Seizures classification based on higher order statistics and deep neural network
Xu et al. Interpretation of electrocardiogram (ECG) rhythm by combined CNN and BiLSTM
Acharya et al. Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework
Diykh et al. Complex networks approach for EEG signal sleep stages classification
CN111444747B (en) Epileptic state identification method based on migration learning and cavity convolution
Zhang et al. Deep time–frequency representation and progressive decision fusion for ECG classification
Alvi et al. Neurological abnormality detection from electroencephalography data: a review
Murugappan et al. Sudden cardiac arrest (SCA) prediction using ECG morphological features
Chen et al. Multiattention adaptation network for motor imagery recognition
Khalighi et al. Adaptive automatic sleep stage classification under covariate shift
Zhu et al. A novel automatic detection system for ECG arrhythmias using maximum margin clustering with immune evolutionary algorithm
Yang et al. Mlp with riemannian covariance for motor imagery based eeg analysis
Slama et al. Application of statistical features and multilayer neural network to automatic diagnosis of arrhythmia by ECG signals
Li et al. GNMF-based quadratic feature extraction in SSTFT domain for epileptic EEG detection
Daydulo et al. Cardiac arrhythmia detection using deep learning approach and time frequency representation of ECG signals
Nkengfack et al. A comparison study of polynomial-based PCA, KPCA, LDA and GDA feature extraction methods for epileptic and eye states EEG signals detection using kernel machines
Chen et al. Bafnet: bottleneck attention based fusion network for sleep apnea detection
Vylala et al. Spectral feature and optimization-based actor-critic neural network for arrhythmia classification using ECG signal
WO2020061971A1 (en) Epilepsy brain wave state detection method based on machine learning
CN112084935B (en) Emotion recognition method based on expansion of high-quality electroencephalogram sample
Fatma et al. Survey on Epileptic Seizure Detection on Varied Machine Learning Algorithms
Ma et al. A feature extraction algorithm of brain network of motor imagination based on a directed transfer function
CN116821764A (en) Knowledge distillation-based multi-source domain adaptive EEG emotion state classification method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934564

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934564

Country of ref document: EP

Kind code of ref document: A1