CN113869382A - Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability - Google Patents

Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability Download PDF

Info

Publication number
CN113869382A
CN113869382A CN202111084540.9A CN202111084540A CN113869382A CN 113869382 A CN113869382 A CN 113869382A CN 202111084540 A CN202111084540 A CN 202111084540A CN 113869382 A CN113869382 A CN 113869382A
Authority
CN
China
Prior art keywords
matrix
sample
category
semi
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111084540.9A
Other languages
Chinese (zh)
Inventor
倪彤光
顾晓清
蒋亦樟
薛婧
钱鹏江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202111084540.9A priority Critical patent/CN113869382A/en
Publication of CN113869382A publication Critical patent/CN113869382A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/40Detecting, measuring or recording for evaluating the nervous system
    • A61B5/4076Diagnosing or monitoring particular conditions of the nervous system
    • A61B5/4094Diagnosing or monitoring seizure diseases, e.g. epilepsy
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Abstract

The invention relates to the technical field of electroencephalogram signal identification, in particular to a semi-supervised learning epilepsy electroencephalogram signal identification method based on field embedding probability, which comprises the following steps: the method comprises the following steps: 1. collecting and preprocessing electroencephalogram signals; 2. constructing a marker set XlAnd unlabeled set Xu(ii) a 3. Forming a homogeneous sample pair set M and a heterogeneous sample pair set D; 4. constructing a semi-supervised incidence matrix on the data set X; 5. projecting each sample in X to a low-dimensional space; 6. constructing a domain embedding probability matrix; 7. updating the category labels corresponding to the k one-dimensional entropy feature vectors; 8. updating Xl、XuM and D; 9. judgment of XuWhether the set is empty; 10. for the tested brain electrical signalSample xtestAnd classifying to obtain an identification result. The invention utilizes the characteristic projection and the field embedding technology, maintains the local structure of the data, has higher distinguishability and discriminability of the electroencephalogram signal low-dimensional representation, and can accurately classify and identify the electroencephalogram signal of the epilepsy.

Description

Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability
Technical Field
The invention relates to the technical field of electroencephalogram signal identification, in particular to a method for identifying epilepsia electroencephalogram signals based on field embedding probability through semi-supervised learning.
Background
Epilepsy is a cerebral dysfunction disease, during which a patient may produce temporary vague consciousness or uncontrollable convulsion, causing great physical and mental harm to the patient and his family. Electroencephalography can accurately record various waveforms when epilepsy occurs, so that electroencephalogram analysis is an important basis for diagnosing epileptic seizures. The electroencephalogram signals are characterized by randomness and non-stationarity, and clinicians can make subjective judgment on electroencephalograms by combining priori knowledge, but the electroencephalograms are easy to make mistakes and low in efficiency. The automatic epilepsia electroencephalogram signal identification and monitoring technology is beneficial to improving the manual diagnosis accuracy and reducing the workload. In the big data era, machine learning technology is highly regarded as an important means in electroencephalogram analysis. The first step of epilepsia electroencephalogram signal identification based on machine learning is acquisition of electroencephalogram signals. The non-invasive electroencephalogram signal acquisition only needs to stick the electrodes on the corresponding scalp surface, and the acquisition mode is simple and convenient and is harmless to a tested object, so that the non-invasive electroencephalogram signal acquisition is widely applied. The second step is the preprocessing of the brain electrical signals. The electroencephalogram signals collected from the scalp electrodes are very weak and are often mixed with various artifacts and noises. Therefore, after the electroencephalogram signals are collected, an effective preprocessing method is needed to remove redundant information, reduce the dimension and extract useful electroencephalogram signals. Common preprocessing methods include electrode screening, deletion of artifacts such as electrooculogram and myoelectricity, and other time-domain filtering and spatial filtering methods. And thirdly, extracting the characteristics of the electroencephalogram signals. At present, the research on an electroencephalogram signal method mainly focuses on the aspects of time domain, frequency domain, time-frequency combination analysis, a spatial filtering method, nonlinear dynamics analysis and the like. After the effective electroencephalogram signal features are extracted, the features need to be classified to realize automatic epilepsy detection. Therefore, the classification algorithm is a key link for designing the epilepsy recognition task.
Researchers have used a variety of methods to address this problem. Zhouyou provides an electroencephalogram detection method and device by utilizing wavelet neural network, and the extracted feature vectors are sent into a classifier obtained by the wavelet neural network, so that the abnormal electroencephalogram signals are marked. Gong Guang hong et al proposed the method for automatically identifying multi-stage epilepsia electroencephalogram signals based on a supervised gradient raiser to examine epilepsia signals by a gradient raiser classifier. And gac bin et al propose epileptic seizure detection equipment and early warning system based on multi-data acquisition, which train a plurality of decision trees in a random forest classifier by using a plurality of extracted characteristic parameters as characteristic vectors to form a random forest model. Meizhen et al propose an EEG signal processing method and an epilepsy detection system, which have the effect of performing data preprocessing on EEG signals, eliminating frequency bands, extracting time domains and features based on entropy, and finally selecting an optimal feature subset by using an improved correlation-based feature selection method.
However, these methods belong to the traditional supervised classification method, and it is necessary to acquire a large number of labeled electroencephalogram signal samples to train to obtain a classifier with good performance, and acquiring a large number of labeled electroencephalogram signal samples is a time-consuming, labor-consuming and financial process. Therefore, the automatic detection of the epilepsy under the condition of only a small amount of labeled samples has great research significance and practical value. In addition, the features of general electroencephalogram signals have high dimensionality, and the analysis of high-dimensional data is more difficult than that of low-dimensional data, and may contain useless and redundant feature information. These characteristics present a huge challenge to the practical processing of epileptic brain electrical signals.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: on the basis of improving the effect of the existing image segmentation method, in view of the characteristic difference of electroencephalogram signals under different health states, under the scene of small samples and insufficient marked samples, the characteristic difference is amplified, expressed in a certain digital form and recognized by a classifier, and the automatic epilepsia electroencephalogram signal recognition method with less time consumption, strong applicability and high classification accuracy is formed. The method relates to a characteristic dimensionality reduction and classification method used in the automatic detection process of the epilepsia electroencephalogram signals, effectively ensures the integrity of local information of the electroencephalogram signals, enlarges the difference of the electroencephalogram information in different states, and improves the classification precision of the epilepsia electroencephalogram signals by using a proper semi-supervised learning model.
The technical scheme adopted by the invention is as follows: the method for recognizing the epilepsia electroencephalogram signals based on the semi-supervised learning of the domain embedding probability comprises the following steps:
step 1: collecting original electroencephalogram signals of different categories and preprocessing the signals;
step 2: performing feature extraction on the preprocessed electroencephalogram signals, and obtaining a feature data set X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectors
Figure BDA0003265092840000031
Represents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns;
and step 3: tag matrix Y by classlAt XlMatching samples in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M { (x)i,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
Figure BDA0003265092840000032
wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to n, a is a positive number more than 1, and b is a positive number more than 1 and less than a;
and 5: recording the projection matrix as A ∈ Rd×eE is more than 0 and less than or equal to d, and each sample X in the data set X is processed by AiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
Figure BDA0003265092840000041
wherein I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix,Tthe transpose operation of the representation matrix is set
Figure BDA0003265092840000042
Figure BDA0003265092840000043
Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
Figure BDA0003265092840000044
to pair
Figure BDA0003265092840000045
Performing eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
Figure BDA0003265092840000046
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and updating category labels corresponding to k minimum one-dimensional entropy eigenvectors;
step 7.1: the objective function of the semi-supervised learning model of the domain embedding probability is expressed as:
Figure BDA0003265092840000051
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, order
Figure BDA0003265092840000052
With QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrix
Figure BDA0003265092840000053
Equation (6) can be expressed as:
Figure BDA0003265092840000054
pair type (7)With respect to YuIs equal to 0 to obtain YuThe update expression of (1):
Figure BDA0003265092840000055
step 7.2: calculating YuThe one-dimensional entropy of the middle label vector is calculated by the following formula:
Figure BDA0003265092840000056
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k component of (2), formula (9) represents the information content of each component in the tag vector and the uncertainty of the category to which the component belongs, the larger the entropy is, the larger the uncertainty of the category to which the component belongs is, and the smaller the entropy is, the smaller the certainty of the category to which the component belongs is;
step 7.3: taking out the eigenvectors corresponding to the k minimum one-dimensional entropies, and updating the class labels of the corresponding samples according to the maximum component values, namely if y isiThe jth component is maximum, then set
Figure BDA0003265092840000057
And 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlIs represented by a low-dimensional feature of each sample ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) Minimum r eigenvectors, according to voting methodThe category with the largest number of occurrences among the r feature vectors is taken as xtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testWhether it is epileptic brain electrical signals.
The invention has the following beneficial effects:
1. the method is based on a semi-supervised machine learning algorithm, only a small part of marked samples are needed for model training, and a large amount of manual marking work is reduced;
2. according to the method, the low-dimensional representation of the samples is learned by utilizing a feature projection technology and a field embedding technology, and the local structure of data is reserved, and meanwhile, the pairing relation among the samples is fused into a model, so that the low-dimensional representation can have the distinguishing and distinguishing performance of epileptic electroencephalogram signals;
3. the method is characterized in that the difference and common dimensionality reduction task and classification task are completed in stages, and the low-dimensional representation and sample mark identification of a sample are aggregated in a learning task;
4. the invention reduces the requirements of semi-supervised learning on the labeled samples step by step through an iterative optimization mode until the sample labeling information of all training sets is identified.
Drawings
FIG. 1 is a flow chart of a domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method of the present invention;
FIG. 2 is a diagram of brain electrical signals in accordance with one embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.
Fig. 1 is a flowchart of a domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method according to an embodiment of the invention, and the method comprises the following 10 steps.
Step 1: acquiring different types of original brain electrical signals, preprocessing the acquired original brain electrical signals, taking an epilepsia brain electrical data set of the university of Bayon Germany as an example of the original brain electrical signals, as shown in fig. 2: the data Set is divided into five subsets of Set A to Set E, each subset comprises 100 samples of the same type, each sample comprises 4097 electroencephalogram time sequences, the data sampling frequency is 173.61Hz, the duration is 23.6s, artifacts are removed by artificial filtering at 0.53-40 Hz, and the subsets of Set A and Set B are electroencephalogram signals collected by 5 healthy subjects under the eye opening and eye closing states respectively; the Set C and Set D subsets are respectively electroencephalogram signals collected by 5 epileptic patients in a focus contralateral area and a focus area in a seizure intermission period; the Set E subset is the electroencephalogram signals collected from the focal zone during the attack period. In this embodiment, Set a and Set B are classified as normal data, Set C and Set D are classified as episodic data, and Set E is considered as episodic data. 80% of the data set was used for training and the remaining 20% was used for testing. Preprocessing the acquired original electroencephalogram signals through an open source tool box EEGlab of MATLAB, including down-sampling, filtering, re-referencing electrodes, baseline correction, independent component analysis and the like, and finally obtaining pure noise-free electroencephalogram signals as far as possible;
step 2: the feature extraction is carried out on the preprocessed electroencephalogram signals, 4-layer discrete wavelet transform decomposition is carried out on the electroencephalogram data set by adopting a dmey wavelet basis in the embodiment to obtain wavelet packet nodes of 16 frequency band spaces, and the extraction of 3 types of features is carried out on original signals with the frequency below 27.13 HZ. The first type of extracted time domain features comprise descriptive statistical features, and the extracted features comprise a mean, a median, a minimum, a maximum, skewness, a standard deviation, a peak, a first quartile, a third quartile and a quartile interval; the second class of extracted entropy-based features: the physical meaning of the sample entropy is that the signal complexity is reflected by measuring the probability of generating a new mode in a signal, and the larger the value is, the more complicated the corresponding sample sequence is; the third kind of extracted time-frequency domain features: the physical significance of the frequency band energy characteristic is to reflect the energy of the electroencephalogram signal in a time-frequency localization space, each orthogonal wavelet packet space projection component of the original signal on each layer of decomposition level represents the time-frequency localization information of the source signal on a corresponding time-frequency domain resolution space, and the 3 types of characteristics are 58-dimensional in total. Obtaining a feature data set X ═ X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectors
Figure BDA0003265092840000081
Represents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns, where n is 160 and l is 20 in this embodiment;
and step 3: tag matrix Y by classlAt XlMatching samples in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M { (x)i,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
Figure BDA0003265092840000082
wherein, 1 ≦ i ≦ n, 1 ≦ j ≦ n, a is a positive number greater than 1, b is a positive number greater than 1 and less than a, in this embodiment a ═ 3, b ≦ 2;
and 5: recording the projection matrix as A ∈ Rd×eIn this embodiment, e-20, each sample X in the data set X is represented by aiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
Figure BDA0003265092840000091
wherein I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix,Tthe transpose operation of the representation matrix is set
Figure BDA0003265092840000092
Figure BDA0003265092840000093
Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
Figure BDA0003265092840000094
to pair
Figure BDA0003265092840000095
Performing eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
Figure BDA0003265092840000096
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and determining category labels corresponding to k one-dimensional entropy eigenvectors, wherein k is 7 in the embodiment;
step 7.1: the objective function of the semi-supervised learning model of the domain embedding probability is expressed as:
Figure BDA0003265092840000097
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, order
Figure BDA0003265092840000101
With QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrix
Figure BDA0003265092840000102
Equation (6) can be expressed as:
Figure BDA0003265092840000103
to formula (7) with respect to YuIs equal to 0 to obtain YuThe update expression of (1):
Figure BDA0003265092840000104
step 7.2: calculating YuThe one-dimensional entropy of the middle label vector is calculated by the following formula:
Figure BDA0003265092840000105
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k thThe component, formula (9) expresses the information content of each component in the label vector and the uncertainty of the belonged category, the larger the entropy is, the larger the uncertainty of the belonged category is, and the smaller the entropy is, the smaller the certainty of the belonged category is;
step 7.3: taking out the eigenvectors corresponding to the k minimum one-dimensional entropies, and updating the class labels of the corresponding samples according to the maximum component values, namely if y isiThe jth component is maximum, then set
Figure BDA0003265092840000106
And 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlIs represented by a low-dimensional feature of each sample ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) The smallest r feature vectors, and the category with the largest number of occurrences among the r feature vectors is defined as x by votingtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testWhether the signal is an epilepsia electroencephalogram signal or not, in this embodiment, r is 5, in the embodiment, the indexes for evaluating the classification performance include classification accuracy, recall rate and F-Score, and table 1 is the result of each statistical index of this embodiment. As can be seen from the results in the table 1, the classifier has higher precision on the classification result of the 3 types of electroencephalogram signals, and can efficiently realize automatic identification on the electroencephalogram signals of the epilepsy.
TABLE 1 statistical indexes of the present embodiment
Rate of accuracy Recall rate F-Score
Set A and Set B 94.56% 95.05% 94.88%
Set C and Set D 95.24% 95.75% 95.60%
Set E 96.49% 96.91% 96.64%
The method is based on a semi-supervised machine learning algorithm, only a small part of marked samples are needed for model training, and a large amount of manual marking work is reduced; the low-dimensional representation of the samples is learned by utilizing a feature projection technology and a field embedding technology, and the local structure of data is kept, and meanwhile, the pairing relation among the samples is fused into a model, so that the low-dimensional representation can have the distinguishability and the discriminability of the epileptic electroencephalogram signals; distinguishing common dimensionality reduction tasks and classification tasks, and completing the common dimensionality reduction tasks and the classification tasks in stages, and aggregating low-dimensional representation of samples and sample mark identification into a learning task; and through an iterative optimization mode, until the sample marking information of all the training sets is identified, the requirement of semi-supervised learning on the marked samples is gradually reduced.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (3)

1. The method for recognizing the epilepsia electroencephalogram signals based on the semi-supervised learning of the domain embedding probability is characterized by comprising the following steps:
step 1: collecting original electroencephalogram signals of different categories and preprocessing the signals;
step 2: performing feature extraction on the preprocessed electroencephalogram signals, and obtaining a feature data set X containing n training samples after feature extraction1,x2,...,xn},xiIs the ith feature vector in X, Xi∈RdD denotes the dimension of the sample, the first l samples of the data x1,x2,...,xlMarking category labels of the electroencephalogram signals, and marking the labels as Xl,XlThe corresponding category label matrix is marked as Yl={y1,y2,...,yl},YlIs a matrix of l rows and c columns, in which the label vectors
Figure FDA0003265092830000011
Represents a sample xiIs the j-th class, c is the number of classes of the electroencephalogram signal, and the last (n-l) samples { X) of the sample set Xl+1,xl+2,...,xnIs denoted by Xu,XuThe unlabeled category and the corresponding category label matrix is marked as Yu={yl+1,yl+2,...,yn},YuIs a 0 matrix of (n-l) rows and c columns;
and step 3: tag matrix Y by classlAt XlCarrying out sample pairing in the set to form a homogeneous sample pair set M and a heterogeneous sample pair set D, wherein M is a great face(xi,xj)|yi=yj},D={(xi,xj)|yi≠yj};
And 4, step 4: constructing a semi-supervised incidence matrix U on the data set X, wherein the ith row and the jth column of elements U in the UijIs defined as:
Figure FDA0003265092830000012
wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to n, a is a positive number more than 1, and b is a positive number more than 1 and less than a;
and 5: recording the projection matrix as A ∈ Rd×eE is more than 0 and less than or equal to d, and each sample X in the data set X is processed by AiProjection into a low-dimensional space ReThe low dimensional features are expressed as:
zi=ATxi, (2)
the projection matrix a is calculated as:
Figure FDA0003265092830000021
wherein, I represents a unit matrix, | M | and | D | represent the number of sample pairs in the sets M and D, respectively, Tr { } represents the trace operation of the matrix, T represents the transposition operation of the matrix, and is set
Figure FDA0003265092830000022
Figure FDA0003265092830000023
Introducing a Lagrange coefficient alpha, solving the formula (3) by using a Lagrange multiplier method, and obtaining:
Figure FDA0003265092830000024
to pair
Figure FDA0003265092830000025
Performing eigenvalue decomposition on the matrix, and updating the matrix A by taking eigenvectors corresponding to the largest e eigenvalues;
step 6: constructing a field embedding probability matrix S of a low-dimensional space, wherein the element S is positioned in the ith row and the jth column in the SijDenotes ziSelection of zjProbability as a neighbor, SijThe calculation formula of (A) is as follows:
Figure FDA0003265092830000026
wherein, dis (z)i,zj) Denotes ziTo zjEuclidean distance of (S)ijIs satisfied with ziAs a center, the identity matrix is the gaussian distribution of the covariance matrix;
and 7: inputting the semi-supervised incidence matrix U, the projection matrix A and the field embedding probability matrix S into a semi-supervised learning model of field embedding probability, and updating category labels corresponding to k minimum one-dimensional entropy eigenvectors;
and 8: adding the corresponding sample into X according to the class label corresponding to the k minimum one-dimensional entropy characteristic vectorslIs collected from XuDeleting in a centralized manner, and reconstructing a set M and a set D;
and step 9: judgment of XuWhether the set is empty or not, if not, turning to the step 4, and if so, turning to the step 10;
step 10: collecting electroencephalogram signal sample x to be testedtestAfter preprocessing, the low-dimensional feature representation z of the image is obtained by calculation according to the formula (2)testCalculating ztestAnd data set XlThe low-dimensional feature of each sample in ziDis (z) of the distance betweeni,ztest) Selecting dis (z)i,ztest) The smallest r feature vectors, and the category with the largest number of occurrences among the r feature vectors is defined as x by votingtestAccording to the category of (1), finally according to xtestIs determined by the category of (x)testElectroencephalogram for determining whether epilepsy is presentA signal.
2. The domain embedding probability-based semi-supervised learning epileptic brain electrical signal identification method according to claim 1, wherein the objective function of the semi-supervised learning model of the domain embedding probability of the step 7 is expressed as:
Figure FDA0003265092830000031
wherein | | | purple hair2Expressing the 2-norm, λ is a regularization parameter, which is a positive real number, let Gij=Sij+λUijIn the order of GijConstructing a matrix G for the matrix elements, order
Figure FDA0003265092830000032
With QiiConstructing a diagonal matrix Q for the matrix elements, splitting the Q matrix into 4 blocks after the l row and l column of the Q matrix
Figure FDA0003265092830000035
Equation (6) can be expressed as:
Figure FDA0003265092830000033
to formula (7) with respect to YuIs equal to 0 to obtain YuThe update expression of (1):
Figure FDA0003265092830000034
3. the domain embedding probability-based semi-supervised learning epileptic electroencephalogram signal identification method according to claim 1, wherein the step of updating the category labels corresponding to the k minimum one-dimensional entropy feature vectors in the step 7 comprises: calculating YuOne dimension of the medium label vectorThe calculation formula of the entropy and the one-dimensional entropy is as follows:
Figure FDA0003265092830000041
wherein, yi∈Yu,yi,kRepresents a label vector yiThe k component of (2), formula (9) represents the information content of each component in the tag vector and the uncertainty of the category to which the component belongs, the larger the entropy is, the larger the uncertainty of the category to which the component belongs is, and the smaller the entropy is, the smaller the certainty of the category to which the component belongs is; extracting k eigenvectors corresponding to the minimum one-dimensional entropy;
updating class labels of corresponding samples according to their maximum component values, i.e. if yiThe jth component is maximum, then set
Figure FDA0003265092830000042
CN202111084540.9A 2021-09-16 2021-09-16 Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability Withdrawn CN113869382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111084540.9A CN113869382A (en) 2021-09-16 2021-09-16 Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111084540.9A CN113869382A (en) 2021-09-16 2021-09-16 Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability

Publications (1)

Publication Number Publication Date
CN113869382A true CN113869382A (en) 2021-12-31

Family

ID=78996128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111084540.9A Withdrawn CN113869382A (en) 2021-09-16 2021-09-16 Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability

Country Status (1)

Country Link
CN (1) CN113869382A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115251909A (en) * 2022-07-15 2022-11-01 山东大学 Electroencephalogram signal hearing assessment method and device based on space-time convolutional neural network
CN115905837A (en) * 2022-11-17 2023-04-04 杭州电子科技大学 Semi-supervised self-adaptive labeling regression electroencephalogram emotion recognition method for automatic abnormality detection
CN115251909B (en) * 2022-07-15 2024-04-30 山东大学 Method and device for evaluating hearing by electroencephalogram signals based on space-time convolutional neural network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115251909A (en) * 2022-07-15 2022-11-01 山东大学 Electroencephalogram signal hearing assessment method and device based on space-time convolutional neural network
CN115251909B (en) * 2022-07-15 2024-04-30 山东大学 Method and device for evaluating hearing by electroencephalogram signals based on space-time convolutional neural network
CN115905837A (en) * 2022-11-17 2023-04-04 杭州电子科技大学 Semi-supervised self-adaptive labeling regression electroencephalogram emotion recognition method for automatic abnormality detection

Similar Documents

Publication Publication Date Title
Kumar et al. Classification of seizure and seizure-free EEG signals using local binary patterns
Shoeibi et al. A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in EEG signals
CN107924472B (en) Image classification method and system based on brain computer interface
Li et al. Automatic epileptic EEG detection using DT-CWT-based non-linear features
Ilakiyaselvan et al. Deep learning approach to detect seizure using reconstructed phase space images
Lewicki A review of methods for spike sorting: the detection and classification of neural action potentials
Zhang et al. Bi-dimensional approach based on transfer learning for alcoholism pre-disposition classification via EEG signals
Agarwal et al. Classification of alcoholic and non-alcoholic EEG signals based on sliding-SSA and independent component analysis
CN114366124A (en) Epilepsia electroencephalogram identification method based on semi-supervised deep convolution channel attention single classification network
CN114176607B (en) Electroencephalogram signal classification method based on vision transducer
CN107045624B (en) Electroencephalogram signal preprocessing and classifying method based on maximum weighted cluster
Dissanayake et al. Patient-independent epileptic seizure prediction using deep learning models
Guan Application of logistic regression algorithm in the diagnosis of expression disorder in Parkinson's disease
CN113869382A (en) Semi-supervised learning epilepsia electroencephalogram signal identification method based on domain embedding probability
Zhou et al. Phase space reconstruction, geometric filtering based Fisher discriminant analysis and minimum distance to the Riemannian means algorithm for epileptic seizure classification
Ohannesian et al. Epileptic seizures detection from EEG recordings based on a hybrid system of Gaussian mixture model and random forest classifier
CN116864121A (en) Health risk screening system
Chavan et al. A review on BCI emotions classification for EEG signals using deep learning
Goshvarpour et al. An Innovative Information-Based Strategy for Epileptic EEG Classification
Jasira et al. DyslexiScan: a dyslexia detection method from handwriting using CNN LSTM model
CN112438741B (en) Driving state detection method and system based on electroencephalogram feature transfer learning
Satapathy et al. A deep neural model CNN-LSTM network for automated sleep staging based on a single-channel EEG signal
Hamad et al. A hybrid automated detection of epileptic seizures in EEG based on wavelet and machine learning techniques
Rudas et al. On activity identification pipelines for a low-accuracy EEG device
Ulhaq et al. Epilepsy seizures classification with EEG signals: A machine learning approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211231