CN113869382A - Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability - Google Patents
Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability Download PDFInfo
- Publication number
- CN113869382A CN113869382A CN202111084540.9A CN202111084540A CN113869382A CN 113869382 A CN113869382 A CN 113869382A CN 202111084540 A CN202111084540 A CN 202111084540A CN 113869382 A CN113869382 A CN 113869382A
- Authority
- CN
- China
- Prior art keywords
- matrix
- sample
- category
- semi
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
- A61B5/4094—Diagnosing or monitoring seizure diseases, e.g. epilepsy
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Public Health (AREA)
- Neurology (AREA)
- Veterinary Medicine (AREA)
- Evolutionary Biology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Physiology (AREA)
- Psychiatry (AREA)
- Neurosurgery (AREA)
- Psychology (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Signal Processing (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
Abstract
The invention relates to the technical field of electroencephalogram signal identification, in particular to a semi-supervised learning epilepsy electroencephalogram signal identification method based on field embedding probability, which comprises the following steps: the method comprises the following steps: 1. collecting and preprocessing electroencephalogram signals; 2. constructing a marker set XlAnd unlabeled set Xu(ii) a 3. Forming a homogeneous sample pair set M and a heterogeneous sample pair set D; 4. constructing a semi-supervised incidence matrix on the data set X; 5. projecting each sample in X to a low-dimensional space; 6. constructing a domain embedding probability matrix; 7. updating the category labels corresponding to the k one-dimensional entropy feature vectors; 8. updating Xl、XuM and D; 9. judgment of XuWhether the set is empty; 10. for the tested brain electrical signalSample xtestAnd classifying to obtain an identification result. The invention utilizes the characteristic projection and the field embedding technology, maintains the local structure of the data, has higher distinguishability and discriminability of the electroencephalogram signal low-dimensional representation, and can accurately classify and identify the electroencephalogram signal of the epilepsy.
Description
Technical Field
The invention relates to the technical field of electroencephalogram signal identification, in particular to a method for identifying epilepsia electroencephalogram signals based on field embedding probability through semi-supervised learning.
Background
Epilepsy is a cerebral dysfunction disease, during which a patient may produce temporary vague consciousness or uncontrollable convulsion, causing great physical and mental harm to the patient and his family. Electroencephalography can accurately record various waveforms when epilepsy occurs, so that electroencephalogram analysis is an important basis for diagnosing epileptic seizures. The electroencephalogram signals are characterized by randomness and non-stationarity, and clinicians can make subjective judgment on electroencephalograms by combining priori knowledge, but the electroencephalograms are easy to make mistakes and low in efficiency. The automatic epilepsia electroencephalogram signal identification and monitoring technology is beneficial to improving the manual diagnosis accuracy and reducing the workload. In the big data era, machine learning technology is highly regarded as an important means in electroencephalogram analysis. The first step of epilepsia electroencephalogram signal identification based on machine learning is acquisition of electroencephalogram signals. The non-invasive electroencephalogram signal acquisition only needs to stick the electrodes on the corresponding scalp surface, and the acquisition mode is simple and convenient and is harmless to a tested object, so that the non-invasive electroencephalogram signal acquisition is widely applied. The second step is the preprocessing of the brain electrical signals. The electroencephalogram signals collected from the scalp electrodes are very weak and are often mixed with various artifacts and noises. Therefore, after the electroencephalogram signals are collected, an effective preprocessing method is needed to remove redundant information, reduce the dimension and extract useful electroencephalogram signals. Common preprocessing methods include electrode screening, deletion of artifacts such as electrooculogram and myoelectricity, and other time-domain filtering and spatial filtering methods. And thirdly, extracting the characteristics of the electroencephalogram signals. At present, the research on an electroencephalogram signal method mainly focuses on the aspects of time domain, frequency domain, time-frequency combination analysis, a spatial filtering method, nonlinear dynamics analysis and the like. After the effective electroencephalogram signal features are extracted, the features need to be classified to realize automatic epilepsy detection. Therefore, the classification algorithm is a key link for designing the epilepsy recognition task.
Researchers have used a variety of methods to address this problem. Zhouyou provides an electroencephalogram detection method and device by utilizing wavelet neural network, and the extracted feature vectors are sent into a classifier obtained by the wavelet neural network, so that the abnormal electroencephalogram signals are marked. Gong Guang hong et al proposed the method for automatically identifying multi-stage epilepsia electroencephalogram signals based on a supervised gradient raiser to examine epilepsia signals by a gradient raiser classifier. And gac bin et al propose epileptic seizure detection equipment and early warning system based on multi-data acquisition, which train a plurality of decision trees in a random forest classifier by using a plurality of extracted characteristic parameters as characteristic vectors to form a random forest model. Meizhen et al propose an EEG signal processing method and an epilepsy detection system, which have the effect of performing data preprocessing on EEG signals, eliminating frequency bands, extracting time domains and features based on entropy, and finally selecting an optimal feature subset by using an improved correlation-based feature selection method.
However, these methods belong to the traditional supervised classification method, and it is necessary to acquire a large number of labeled electroencephalogram signal samples to train to obtain a classifier with good performance, and acquiring a large number of labeled electroencephalogram signal samples is a time-consuming, labor-consuming and financial process. Therefore, the automatic detection of the epilepsy under the condition of only a small amount of labeled samples has great research significance and practical value. In addition, the features of general electroencephalogram signals have high dimensionality, and the analysis of high-dimensional data is more difficult than that of low-dimensional data, and may contain useless and redundant feature information. These characteristics present a huge challenge to the practical processing of epileptic brain electrical signals.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: on the basis of improving the effect of the existing image segmentation method, in view of the characteristic difference of electroencephalogram signals under different health states, under the scene of small samples and insufficient marked samples, the characteristic difference is amplified, expressed in a certain digital form and recognized by a classifier, and the automatic epilepsia electroencephalogram signal recognition method with less time consumption, strong applicability and high classification accuracy is formed. The method relates to a characteristic dimensionality reduction and classification method used in the automatic detection process of the epilepsia electroencephalogram signals, effectively ensures the integrity of local information of the electroencephalogram signals, enlarges the difference of the electroencephalogram information in different states, and improves the classification precision of the epilepsia electroencephalogram signals by using a proper semi-supervised learning model.
The technical scheme adopted by the invention is as follows: the method for recognizing the epilepsia electroencephalogram signals based on the semi-supervised learning of the domain embedding probability comprises the following steps:
step 1: collecting original electroencephalogram signals of different categories and preprocessing the signals;
step 2: performing feature extraction on the preprocessed electroencephalogram signals, and obtaining a feature data set X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectorsRepresents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns;
and step 3: tag matrix Y by classlAt XlMatching samples in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M { (x)i,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to n, a is a positive number more than 1, and b is a positive number more than 1 and less than a;
and 5: recording the projection matrix as A ∈ Rd×eE is more than 0 and less than or equal to d, and each sample X in the data set X is processed by AiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
wherein I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix,Tthe transpose operation of the representation matrix is set Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
to pairPerforming eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and updating category labels corresponding to k minimum one-dimensional entropy eigenvectors;
step 7.1: the objective function of the semi-supervised learning model of the domain embedding probability is expressed as:
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, orderWith QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrixEquation (6) can be expressed as:
pair type (7)With respect to YuIs equal to 0 to obtain YuThe update expression of (1):
step 7.2: calculating YuThe one-dimensional entropy of the middle label vector is calculated by the following formula:
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k component of (2), formula (9) represents the information content of each component in the tag vector and the uncertainty of the category to which the component belongs, the larger the entropy is, the larger the uncertainty of the category to which the component belongs is, and the smaller the entropy is, the smaller the certainty of the category to which the component belongs is;
step 7.3: taking out the eigenvectors corresponding to the k minimum one-dimensional entropies, and updating the class labels of the corresponding samples according to the maximum component values, namely if y isiThe jth component is maximum, then set
And 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlIs represented by a low-dimensional feature of each sample ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) Minimum r eigenvectors, according to voting methodThe category with the largest number of occurrences among the r feature vectors is taken as xtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testWhether it is epileptic brain electrical signals.
The invention has the following beneficial effects:
1. the method is based on a semi-supervised machine learning algorithm, only a small part of marked samples are needed for model training, and a large amount of manual marking work is reduced;
2. according to the method, the low-dimensional representation of the samples is learned by utilizing a feature projection technology and a field embedding technology, and the local structure of data is reserved, and meanwhile, the pairing relation among the samples is fused into a model, so that the low-dimensional representation can have the distinguishing and distinguishing performance of epileptic electroencephalogram signals;
3. the method is characterized in that the difference and common dimensionality reduction task and classification task are completed in stages, and the low-dimensional representation and sample mark identification of a sample are aggregated in a learning task;
4. the invention reduces the requirements of semi-supervised learning on the labeled samples step by step through an iterative optimization mode until the sample labeling information of all training sets is identified.
Drawings
FIG. 1 is a flow chart of a domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method of the present invention;
FIG. 2 is a diagram of brain electrical signals in accordance with one embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.
Fig. 1 is a flowchart of a domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method according to an embodiment of the invention, and the method comprises the following 10 steps.
Step 1: acquiring different types of original brain electrical signals, preprocessing the acquired original brain electrical signals, taking an epilepsia brain electrical data set of the university of Bayon Germany as an example of the original brain electrical signals, as shown in fig. 2: the data Set is divided into five subsets of Set A to Set E, each subset comprises 100 samples of the same type, each sample comprises 4097 electroencephalogram time sequences, the data sampling frequency is 173.61Hz, the duration is 23.6s, artifacts are removed by artificial filtering at 0.53-40 Hz, and the subsets of Set A and Set B are electroencephalogram signals collected by 5 healthy subjects under the eye opening and eye closing states respectively; the Set C and Set D subsets are respectively electroencephalogram signals collected by 5 epileptic patients in a focus contralateral area and a focus area in a seizure intermission period; the Set E subset is the electroencephalogram signals collected from the focal zone during the attack period. In this embodiment, Set a and Set B are classified as normal data, Set C and Set D are classified as episodic data, and Set E is considered as episodic data. 80% of the data set was used for training and the remaining 20% was used for testing. Preprocessing the acquired original electroencephalogram signals through an open source tool box EEGlab of MATLAB, including down-sampling, filtering, re-referencing electrodes, baseline correction, independent component analysis and the like, and finally obtaining pure noise-free electroencephalogram signals as far as possible;
step 2: the feature extraction is carried out on the preprocessed electroencephalogram signals, 4-layer discrete wavelet transform decomposition is carried out on the electroencephalogram data set by adopting a dmey wavelet basis in the embodiment to obtain wavelet packet nodes of 16 frequency band spaces, and the extraction of 3 types of features is carried out on original signals with the frequency below 27.13 HZ. The first type of extracted time domain features comprise descriptive statistical features, and the extracted features comprise a mean, a median, a minimum, a maximum, skewness, a standard deviation, a peak, a first quartile, a third quartile and a quartile interval; the second class of extracted entropy-based features: the physical meaning of the sample entropy is that the signal complexity is reflected by measuring the probability of generating a new mode in a signal, and the larger the value is, the more complicated the corresponding sample sequence is; the third kind of extracted time-frequency domain features: the physical significance of the frequency band energy characteristic is to reflect the energy of the electroencephalogram signal in a time-frequency localization space, each orthogonal wavelet packet space projection component of the original signal on each layer of decomposition level represents the time-frequency localization information of the source signal on a corresponding time-frequency domain resolution space, and the 3 types of characteristics are 58-dimensional in total. Obtaining a feature data set X ═ X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectorsRepresents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns, where n is 160 and l is 20 in this embodiment;
and step 3: tag matrix Y by classlAt XlMatching samples in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M { (x)i,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
wherein, 1 ≦ i ≦ n, 1 ≦ j ≦ n, a is a positive number greater than 1, b is a positive number greater than 1 and less than a, in this embodiment a ═ 3, b ≦ 2;
and 5: recording the projection matrix as A ∈ Rd×eIn this embodiment, e-20, each sample X in the data set X is represented by aiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
wherein I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix,Tthe transpose operation of the representation matrix is set Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
to pairPerforming eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and determining category labels corresponding to k one-dimensional entropy eigenvectors, wherein k is 7 in the embodiment;
step 7.1: the objective function of the semi-supervised learning model of the domain embedding probability is expressed as:
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, orderWith QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrixEquation (6) can be expressed as:
to formula (7) with respect to YuIs equal to 0 to obtain YuThe update expression of (1):
step 7.2: calculating YuThe one-dimensional entropy of the middle label vector is calculated by the following formula:
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k thThe component, formula (9) expresses the information content of each component in the label vector and the uncertainty of the belonged category, the larger the entropy is, the larger the uncertainty of the belonged category is, and the smaller the entropy is, the smaller the certainty of the belonged category is;
step 7.3: taking out the eigenvectors corresponding to the k minimum one-dimensional entropies, and updating the class labels of the corresponding samples according to the maximum component values, namely if y isiThe jth component is maximum, then set
And 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlIs represented by a low-dimensional feature of each sample ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) The smallest r feature vectors, and the category with the largest number of occurrences among the r feature vectors is defined as x by votingtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testWhether the signal is an epilepsia electroencephalogram signal or not, in this embodiment, r is 5, in the embodiment, the indexes for evaluating the classification performance include classification accuracy, recall rate and F-Score, and table 1 is the result of each statistical index of this embodiment. As can be seen from the results in the table 1, the classifier has higher precision on the classification result of the 3 types of electroencephalogram signals, and can efficiently realize automatic identification on the electroencephalogram signals of the epilepsy.
TABLE 1 statistical indexes of the present embodiment
Rate of accuracy | Recall rate | F-Score | |
Set A and Set B | 94.56% | 95.05% | 94.88% |
Set C and Set D | 95.24% | 95.75% | 95.60% |
Set E | 96.49% | 96.91% | 96.64% |
The method is based on a semi-supervised machine learning algorithm, only a small part of marked samples are needed for model training, and a large amount of manual marking work is reduced; the low-dimensional representation of the samples is learned by utilizing a feature projection technology and a field embedding technology, and the local structure of data is kept, and meanwhile, the pairing relation among the samples is fused into a model, so that the low-dimensional representation can have the distinguishability and the discriminability of the epileptic electroencephalogram signals; distinguishing common dimensionality reduction tasks and classification tasks, and completing the common dimensionality reduction tasks and the classification tasks in stages, and aggregating low-dimensional representation of samples and sample mark identification into a learning task; and through an iterative optimization mode, until the sample marking information of all the training sets is identified, the requirement of semi-supervised learning on the marked samples is gradually reduced.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (3)
1. The method for recognizing the epilepsia electroencephalogram signals based on the semi-supervised learning of the domain embedding probability is characterized by comprising the following steps:
step 1: collecting original electroencephalogram signals of different categories and preprocessing the signals;
step 2: performing feature extraction on the preprocessed electroencephalogram signals, and obtaining a feature data set X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectorsRepresents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns;
and step 3: tag matrix Y by classlAt XlCarrying out sample pairing in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M is a great face(xi,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to n, a is a positive number more than 1, and b is a positive number more than 1 and less than a;
and 5: recording the projection matrix as A ∈ Rd×eE is more than 0 and less than or equal to d, and each sample X in the data set X is processed by AiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
wherein, I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix, T represents the transposition operation of the matrix, and is set Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
to pairPerforming eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and updating category labels corresponding to k minimum one-dimensional entropy eigenvectors;
and 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlThe low-dimensional feature of each sample in ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) The smallest r feature vectors, and the category with the largest number of occurrences among the r feature vectors is defined as x by votingtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testElectroencephalogram for determining whether epilepsy is presentA signal.
2. The domain embedding probability-based semi-supervised learning epileptic brain electrical signal identification method according to claim 1, wherein the objective function of the semi-supervised learning model of the domain embedding probability of the step 7 is expressed as:
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, orderWith QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrixEquation (6) can be expressed as:
to formula (7) with respect to YuIs equal to 0 to obtain YuThe update expression of (1):
3. the domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method according to claim 1, wherein the step of updating the category labels corresponding to the k minimum one-dimensional entropy feature vectors in the step 7 comprises: calculating YuOne dimension of the medium label vectorThe calculation formula of the entropy and the one-dimensional entropy is as follows:
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k component of (2), formula (9) represents the information content of each component in the tag vector and the uncertainty of the category to which the component belongs, the larger the entropy is, the larger the uncertainty of the category to which the component belongs is, and the smaller the entropy is, the smaller the certainty of the category to which the component belongs is; extracting k eigenvectors corresponding to the minimum one-dimensional entropy;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111084540.9A CN113869382A (en) | 2021-09-16 | 2021-09-16 | Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111084540.9A CN113869382A (en) | 2021-09-16 | 2021-09-16 | Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113869382A true CN113869382A (en) | 2021-12-31 |
Family
ID=78996128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111084540.9A Withdrawn CN113869382A (en) | 2021-09-16 | 2021-09-16 | Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113869382A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115251909A (en) * | 2022-07-15 | 2022-11-01 | 山东大学 | Electroencephalogram signal hearing assessment method and device based on space-time convolutional neural network |
CN115905837A (en) * | 2022-11-17 | 2023-04-04 | 杭州电子科技大学 | Semi-supervised self-adaptive labeling regression electroencephalogram emotion recognition method for automatic abnormality detection |
-
2021
- 2021-09-16 CN CN202111084540.9A patent/CN113869382A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115251909A (en) * | 2022-07-15 | 2022-11-01 | 山东大学 | Electroencephalogram signal hearing assessment method and device based on space-time convolutional neural network |
CN115251909B (en) * | 2022-07-15 | 2024-04-30 | 山东大学 | Method and device for evaluating hearing by electroencephalogram signals based on space-time convolutional neural network |
CN115905837A (en) * | 2022-11-17 | 2023-04-04 | 杭州电子科技大学 | Semi-supervised self-adaptive labeling regression electroencephalogram emotion recognition method for automatic abnormality detection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shoeibi et al. | A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in EEG signals | |
Li et al. | Epileptic seizure detection in EEG signals using a unified temporal-spectral squeeze-and-excitation network | |
CN107924472B (en) | Image classification method and system based on brain computer interface | |
Li et al. | Automatic epileptic EEG detection using DT-CWT-based non-linear features | |
Ilakiyaselvan et al. | Deep learning approach to detect seizure using reconstructed phase space images | |
Zhang et al. | Bi-dimensional approach based on transfer learning for alcoholism pre-disposition classification via EEG signals | |
CN114176607B (en) | Electroencephalogram signal classification method based on vision transducer | |
Agarwal et al. | Classification of alcoholic and non-alcoholic EEG signals based on sliding-SSA and independent component analysis | |
CN113869382A (en) | Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability | |
CN114366124A (en) | Epilepsia electroencephalogram identification method based on semi-supervised deep convolution channel attention single classification network | |
CN112438741B (en) | Driving state detection method and system based on electroencephalogram feature transfer learning | |
Dissanayake et al. | Patient-independent epileptic seizure prediction using deep learning models | |
CN116864121A (en) | Health risk screening system | |
Zhou et al. | Phase space reconstruction, geometric filtering based Fisher discriminant analysis and minimum distance to the Riemannian means algorithm for epileptic seizure classification | |
Guan | Application of logistic regression algorithm in the diagnosis of expression disorder in Parkinson's disease | |
Ohannesian et al. | Epileptic seizures detection from EEG recordings based on a hybrid system of Gaussian mixture model and random forest classifier | |
CN107045624A (en) | Electroencephalogram signal preprocessing and classifying method based on maximum weighted cluster | |
Jasira et al. | DyslexiScan: a dyslexia detection method from handwriting using CNN LSTM model | |
Wu et al. | SCNet: A spatial feature fused convolutional network for multi-channel EEG pathology detection | |
Goshvarpour et al. | An innovative information-based strategy for epileptic EEG classification | |
CN116982993A (en) | Electroencephalogram signal classification method and system based on high-dimensional random matrix theory | |
CN116340825A (en) | Method for classifying cross-tested RSVP (respiratory tract protocol) electroencephalogram signals based on transfer learning | |
Ulhaq et al. | Epilepsy seizures classification with EEG signals: A machine learning approach | |
Satapathy et al. | A deep neural model CNN-LSTM network for automated sleep staging based on a single-channel EEG signal | |
Hamad et al. | A hybrid automated detection of epileptic seizures in EEG based on wavelet and machine learning techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20211231 |