CN114521903A - Electroencephalogram attention recognition system and method based on feature selection - Google Patents

Electroencephalogram attention recognition system and method based on feature selection Download PDF

Info

Publication number
CN114521903A
CN114521903A CN202210138817.XA CN202210138817A CN114521903A CN 114521903 A CN114521903 A CN 114521903A CN 202210138817 A CN202210138817 A CN 202210138817A CN 114521903 A CN114521903 A CN 114521903A
Authority
CN
China
Prior art keywords
feature
features
electroencephalogram
attention
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210138817.XA
Other languages
Chinese (zh)
Inventor
徐欣
董志欢
项忠泽
聂旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210138817.XA priority Critical patent/CN114521903A/en
Publication of CN114521903A publication Critical patent/CN114521903A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/372Analysis of electroencephalograms
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/725Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Abstract

The invention discloses an electroencephalogram attention recognition system and method based on feature selection in the technical field of digital signal processing and electroencephalogram signals, and the system comprises the following steps: collecting electroencephalogram signals in different attention tasks; preprocessing the acquired electroencephalogram signals; segmenting the preprocessed electroencephalogram signal to obtain a plurality of samples, and then extracting the characteristics of the samples; calculating the weight of the feature according to the correlation between the extracted feature and the category label by using a Relieff algorithm, solving the average value of all the weights as a threshold, and taking the feature of which the weight is greater than the threshold as an initially selected feature; calculating mutual information quantity of the initially selected features, and checking the features through correlation; performing grading and sequencing on the expression basis of channel information in the features on a classifier by using RFECV, and selecting the best features as final features to obtain a feature set; and carrying out attention electroencephalogram identification on the obtained feature set by using a support vector machine. The invention reduces the redundancy of the characteristics and improves the classification accuracy.

Description

Electroencephalogram attention recognition system and method based on feature selection
Technical Field
The invention relates to an electroencephalogram attention recognition system and method based on feature selection, and belongs to the technical field of digital signal processing and electroencephalogram signals.
Background
Attention refers to the ability of an individual's mental activities to concentrate and direct to a particular thing, and the high or low level of attention affects our quality of life directly or indirectly from various aspects: in the field of medical health, for example, common attention-deficit hyperactivity disorder (ADHD) is also called attention-deficit hyperactivity disorder, and the detection, identification and research of attention are helpful for rehabilitation of the diseases; in the field of safe driving, the attention concentration degree of a driver is more directly related to personal safety of an individual; in professional training such as archery and shooting, the ideal attention state is helpful for athletes to obtain more outstanding results; in the aspect of educational learning, the study on attention is helpful for analyzing and adjusting the learning state of the learner, and analyzing and even interfering the learning process of the learner. Whether the intention is in advance, whether the intention is needed, and the attention can be divided into passive attention and active attention. In work and study, active attention is particularly important, so that the electroencephalogram signal under the active attention is mainly researched. Methods of drawing and maintaining active attention are: the purpose and task of the explicit activity, the utilization of indirect interests such as rewards, etc. In the data acquisition phase of the present invention, different levels of attention are induced by having the experimenter perform different tasks.
With the rapid development of computer technology and the progress of people in the aspect of biological basic cognition, researchers gradually realize that the electroencephalogram signal is a bioelectricity signal capable of reflecting individual physiological and psychological information, and the internal state of the brain and even the individual thinking state can be reflected through the research on the electroencephalogram signal. The electroencephalogram signals can change along with the change of various internal and external factors, such as the physiological condition, the mental state, the emotional type and the psychological activity of an individual, and the environment and the scene change of the individual, the electroencephalogram signals can have different characteristics due to the multiple factors, and the information carried by the electroencephalogram waves with different characteristics is researched, so that the electroencephalogram signals can be applied to different fields.
In past studies, a learner has judged a level of attention from an individual's eye state, facial expression, sitting posture, and the like. However, this method of judgment is subjective to some extent. With the development of cognitive psychology, researchers find that cerebral cortex is the highest part for paying attention, and have the effects of regulating and controlling subcutaneous tissues, so that attention level recognition based on electroencephalogram signals is gradually rising.
In the electroencephalogram signal-based attention level recognition, extracting effective characteristics capable of distinguishing different attention states is an important precondition for accurate classification. The feature extraction method can be divided into time domain, frequency domain and time-frequency domain analysis methods according to the types of feature parameters: in terms of time domain, the characteristics of the electroencephalogram signals are more specific and more intuitive. They tend to be some statistic in the time domain such as mean, variance, peak difference, fractal dimension, higher order zero crossing analysis, etc. The frequency domain parameters include energy and power spectrum. The time-frequency domain parameters mainly utilize wavelet transformation to separate signals of different rhythms from electroencephalogram signals, and utilize the root-mean-square and energy of wavelet coefficients as characteristic values to conduct classification research. Different types of characteristic parameters provide rich detail information, and meanwhile, redundancy phenomenon also exists, so that the calculation amount is increased, and even the identification accuracy is influenced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a system and a method for recognizing electroencephalogram attention based on feature selection.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the invention provides a feature selection-based electroencephalogram attention recognition method, which comprises the following steps:
acquiring electroencephalogram signals in different attention tasks;
preprocessing the acquired electroencephalogram signals;
segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
calculating the weight of the feature according to the correlation between the extracted feature and the category label by using a Relieff algorithm, solving the average value of all the weights as a threshold, and taking the feature of which the weight is greater than the threshold as an initial selection feature;
calculating mutual information quantity of the initially selected features, and performing check on the features through correlation;
performing grading and sequencing on the expression basis of channel information in the features on a classifier by using RFECV, and selecting the best features as final features to obtain a feature set;
and carrying out attention electroencephalogram identification on the obtained feature set by using a support vector machine.
Further, the different attention tasks are four, including:
task 1: high attention tasks: browsing and performing mental calculation, browsing a 10 x 10 number matrix (which is randomly distributed from 1 to 100 and is not repeated), and finding out prime numbers in the matrix;
task 2: the attention-focusing task is as follows: browsing, browsing the provided discussion text material;
task 3: low attention tasks: separating the spirit, staring at the text material in Task2, thinking about things unrelated to the Task in the brain;
task 4: non-attention tasks: resting;
the Task time of Task1 is from the beginning of browsing the first digit to the end of judging the last digit, and the Task time of Task2 to Task4 is consistent with the time of Task 1.
Furthermore, when acquiring electroencephalogram signals in different attention tasks, selecting leads in frontal lobe, central area (beta wave), occipital lobe, parietal lobe (alpha wave) and central area (theta wave), including: fp1, Fp2, F3, F4, C3, C4, P3, P4, O1 and O2.
Further, preprocessing the acquired electroencephalogram signals, comprising: the method comprises the steps of using an eeglab kit in matlab, setting the sampling rate to be 512Hz, firstly, performing band-pass filtering on an original signal by using an FIR filter, reserving signals within a frequency band range of 0.5-30 Hz, removing some signals within an unavailable frequency range, simultaneously removing an electromyographic signal with higher frequency and a electrocardio signal, a picosignal and a respiratory band signal with lower frequency, then, removing an electrooculogram signal in the original signal by using an independent component analysis method, analyzing all ICA components, deleting the electrooculogram components, and finally removing bad sections.
Further, after segmenting the preprocessed electroencephalogram signal to obtain a plurality of samples, performing feature extraction on the samples, including: the original data in each sample is composed of 10 channels of data, 10 characteristic parameters including a rectification average value, a maximum value, a peak difference, a root mean square, a standard deviation, a margin factor, sample entropy, theta wave, alpha wave and beta wave energy ratio are sequentially extracted from the electroencephalogram data of each channel, and each sample finally generates a 100-dimensional (10 channels x 10 parameters) characteristic vector.
Further, calculating a weight of the extracted feature according to the correlation between the feature and the category label, calculating an average value of all weights as a threshold, and using the feature with the weight greater than the threshold as a primary selection feature, including:
randomly taking a sample R from a training sample set through a Relieff algorithm, and then finding k nearest neighbor samples H of the R from a sample set which is similar to the RjRespectively finding out k nearest neighbor samples M (C) from different types of sample sets of each RjThen, the weight of each feature is updated, and the calculation method for calculating the weight of the feature F in the sample is as follows:
Figure BDA0003505619330000041
wherein W (F) represents the weight of the sample F, m is the number of samples, k is the number of nearest samples, class (R) is the class to which the randomly-taken sample R belongs, P (C) is the probability of the occurrence of class C, P (class (R)) is the probability of the occurrence of the class to which the randomly-selected sample R belongs, diff (F, R, H)j)、diff(F,R,M(C)j) Is two samples inThe distance under feature F is calculated as follows:
Figure BDA0003505619330000051
in the formula, R1[F]Is a sample R1Value of the F-th feature, R2[F]Is a sample R2The value of the F-th feature.
Further, calculating mutual information quantity of the initially selected features, and checking the features through correlation, wherein the mutual information quantity comprises the following steps:
the mutual information amount calculation method comprises the following steps:
I(X,Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)
in the formula: i (X, Y) is mutual information amount, H (X) represents information entropy of random variable X, H (X | Y) represents conditional entropy of random variable X under condition of random variable Y, H (Y) represents information entropy of random variable Y, and H (X, Y) represents joint entropy of (X, Y), and the calculation method is as follows:
Figure BDA0003505619330000052
in the formula: x, Y are specific values for X and Y, and P (X, Y) represents the joint probability of X, Y occurring together;
using RFECV to perform grading ranking on the expression basis of the channel information in the features on the classifier, selecting the best feature as a final feature, and obtaining a feature set, wherein the grading ranking comprises the following steps:
feature selection using the RFECV method, including recursive feature elimination and cross-validation, wherein:
recursive feature elimination includes:
step 1) setting an original feature set as all available features;
step 2) modeling is carried out by using the current feature set, and the importance of each feature is graded and sorted;
step 3), deleting one or more features with the lowest score, namely the least important features, and updating the existing feature set;
step 4) repeating the steps 2) and 3) until the importance rating of all the characteristics is completed;
the cross validation comprises the following steps:
obtaining the importance of each feature according to the result of recursive feature elimination, and sequentially selecting different numbers of features;
performing cross validation on the selected features;
and selecting the feature quantity with the highest average score to complete feature selection.
In a second aspect, the present invention provides a feature selection-based electroencephalogram attention recognition system, including:
an acquisition module: the device is used for collecting electroencephalogram signals in different attention tasks;
a preprocessing module: the system is used for preprocessing the acquired electroencephalogram signals;
a feature extraction module: the method is used for segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
a characteristic primary selection module: the weight value of the feature is calculated according to the correlation between the extracted feature and the category label, the average value of all the weight values is calculated to be used as a threshold value, and the feature with the weight value larger than the threshold value is used as a primary selection feature;
a check module: the mutual information quantity used for calculating the initially selected features is used for carrying out check on the features through correlation;
a characteristic final selection module: the system is used for carrying out grading and sequencing on the expression basis of the channel information in the features on the classifier, and selecting the best feature as a final feature to obtain a feature set;
an identification module: and the method is used for identifying the attention brain electricity by using the support vector machine.
In a third aspect, the invention provides a feature selection-based electroencephalogram attention recognition device, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses an electroencephalogram attention recognition research method based on Relieff-MI and RFECV. The brain electrical signals may reflect the individual's mental state and may therefore be used to identify different levels of attention. According to the method, four groups of cognitive tasks with different difficulties are set to induce attention electroencephalograms with different levels, total 100-dimensional features from 10 different electroencephalogram channels are extracted, in order to reduce feature redundancy and improve classification accuracy, a Relieff-MI combined RFECV feature selection algorithm is provided, filtering feature selection and wrapping feature selection are combined, and the problems of high dimensionality, feature information redundancy and huge calculation amount in previous attention level identification are well solved. In the aspect of recognition rate, compared with the traditional method for directly classifying after feature extraction, the method has the advantages that the feature dimension is reduced, the accuracy rate of attention recognition is effectively improved, the feature dimension is reduced from 100 dimensions to 36 dimensions, and the four-classification accuracy rate is improved from 85.7% to 91.4%. Experimental results show that the classification accuracy can be guaranteed while feature redundancy is reduced by combining the Relieff-MI and the RFECV.
Drawings
FIG. 1 is a flow chart of electroencephalogram attention recognition based on Relieff-MI and RFECV according to an embodiment of the present invention;
FIG. 2 is a flow chart of an attention electroencephalogram induction experiment provided in the first embodiment of the present invention;
FIG. 3 is a selected EEG signal channel distribution diagram provided by one embodiment of the present invention;
FIG. 4 is a waveform diagram of pre-processed and post-processed electroencephalogram signals according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a feature vector provided in an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating that a ReliefF algorithm calculates weight values of each feature parameter on different channels according to an embodiment of the present invention;
fig. 7 is a schematic diagram of mutual information amount of each feature parameter calculated by the MI algorithm in different channels according to the first embodiment of the present invention;
FIG. 8 is a diagram illustrating the scoring of each channel by the RFECV algorithm according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a classification confusion matrix after feature extraction according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
in the embodiment, aiming at the requirements of electroencephalogram detection with different degrees of attention, a tested subject is required to perform four experimental tasks with different difficulties to induce four types of electroencephalogram signals in high, medium, low attention and non-attention states, after multi-dimensional features are extracted, in order to reduce feature redundancy and ensure the accuracy of identification, feature selection is performed on the obtained features by using Relieff-MI in combination with RFECV, and the problems of overhigh dimension, redundant feature information and huge calculation amount in the conventional attention level identification are well solved. In the aspect of recognition rate, compared with the traditional method of directly classifying after feature extraction, the method has the advantages of effectively improving the accuracy rate of attention recognition.
FIG. 1 is a frame diagram of an electroencephalogram attention recognition flow based on Relieff-MI and RFECV, which mainly comprises the following steps:
step 1), collecting electroencephalogram signals, wherein electroencephalogram collecting equipment of Nanjing Weisi medical institution is adopted; brain area scalp electrode placement: and selecting a channel corresponding to the brain area.
Step 2) the testee completes four attention tasks with different cognitive difficulties under a quiet experimental condition, induces high, medium and low attention and non-attention states, and records electroencephalogram signals of the testee in different tasks;
and 3) preprocessing the electroencephalogram signals, namely preprocessing the electroencephalogram signals acquired in the step 2 to reduce the interference of myoelectric, electrooculogram, power frequency signals and the like, reduce the influence of artifacts in other functional areas of the brain and improve the quality of the signals. The pretreatment mainly comprises filtering, channel screening, power frequency interference removal, eye charge removal and re-reference;
and 4) segmenting the signal processed in the step 3 by adopting a sliding window with the time length of 4s and the overlapping of 50% to obtain 3403 samples. And then, extracting the characteristics, wherein the extracted main parameters comprise: rectifying the average value, the maximum value, the peak difference, the root mean square, the standard deviation, the margin factor, the sample entropy and the energy ratio of theta wave, alpha wave and beta wave. Resulting in 100-dimensional original features.
And 5) calculating the weight values of the original characteristic parameters in the step 4) by utilizing a Relieff algorithm according to the correlation between the characteristics and the class labels, solving the average value of all the weight values as a threshold value, and taking the characteristics of which the weight values are greater than the threshold value as initial selection characteristics.
And 6) calculating mutual information quantity of the initially selected features in the step 5), and checking the features through correlation.
And 7) grading and ranking the channel information in the features by an RFECV algorithm according to the expression on the classifier, and selecting the best features as final features.
And 8) using the feature set obtained after the step 7) to identify the attention electroencephalogram by using a support vector machine, and comparing the identification accuracy with the classification accuracy of the identification by using the original features.
Fig. 2 is an experimental flow for inducing attention electroencephalograms in step 2), in the experiment, a subject is required to complete four attention tasks with different cognitive difficulties under a quiet experimental condition, the four tasks of Task 1-Task 4 are sequentially performed after the experiment is started, and the Task time of Task1 is from the first digit browsed by the subject to the end of judging the last digit. The Task times of Task 2-Task 4 are consistent with the time of Task 1. After each task is finished, 30s of rest time is provided, and a corresponding questionnaire needs to be filled in when a test is carried out, so that the attention state in the experimental process is self-evaluated. So that the samples are screened by combining the subjective evaluation of the tested sample, and the original data meeting the experimental requirements are obtained. t1 is the time taken by the tested Task1, after each group of experiments is completed, the attention levels of the tested Task in the four Task processes are required to be sorted from high to low, and the data of the tested person which is in accordance with the experimental setting is reserved. Each test was repeated in two experiments, with a 30min interval between the two experiments. The four attention tasks are as follows:
task 1: high attention tasks: browse + mental arithmetic. And browsing a 10 x 10 number matrix (which is randomly distributed from 1 to 100 and is not repeated), and finding out prime numbers in the number matrix.
Task 2: the attention-focusing task is as follows: and (6) browsing. Browse the provided discussion-like textual material.
Task 3: low attention tasks: separating the spirit. Eyes stare at the textual material in Task2 and think mentally about things unrelated to the Task.
Task 4: non-attention tasks: and (5) taking a rest. Relaxed as much as possible and do not want anything
Fig. 3 is a schematic diagram of the distribution positions of the electroencephalogram signal channels selected in step 1) in the cerebral cortex, and since the electroencephalogram signals in the attention state of the human are more beta waves and less theta waves and alpha waves than in the non-attention state, when selecting leads, leads in the frontal lobe, central area (beta waves), occipital lobe, parietal lobe (alpha waves) and central area (theta waves) are selected. The finally selected leads are Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1 and O2;
the pretreatment of the acquired electroencephalogram signals in the step 3) is as follows: using the eeglab toolkit in matlab, the sampling rate was set to 512 Hz. Firstly, an FIR filter is adopted to carry out band-pass filtering on an original signal, signals within a frequency band range of 0.5-30 Hz are reserved, some signals within an unavailable frequency range are removed, and meanwhile myoelectric signals with high frequency and electrocardio signals, skin electric signals and respiratory band signals with low frequency are removed. And then, removing the electro-ocular signals in the original signals by adopting an independent component analysis method, analyzing all ICA components, and deleting the electro-ocular components in the ICA components. And finally, manually removing bad sections, wherein fig. 4 is a waveform diagram of the electroencephalogram signals before and after preprocessing, a dark color waveform is a signal before preprocessing, a light color waveform is a signal after preprocessing, and artifacts such as electro-oculogram and the like can be seen from a comparison graph.
The feature extraction in the step 4) is as follows: the feature extraction method can be divided into time domain, frequency domain and time-frequency domain analysis methods according to the types of feature parameters: in terms of time domain, the characteristics of the electroencephalogram signals are more specific and more intuitive. They tend to be some statistic in the time domain such as mean, variance, peak-to-peak difference, peak-to-valley distance, fractal dimension, higher order zero-crossing analysis, etc. The frequency domain parameters include energy and power spectrum. The time-frequency domain parameters mainly utilize wavelet transformation to separate signals of different rhythms from electroencephalogram signals, and utilize the root-mean-square and energy of wavelet coefficients as characteristic values to conduct classification research. The method comprises the steps of aiming at original data in each sample, wherein the original data are composed of 10 channels of data, and 10 characteristic parameters including a rectification average value, a maximum value, a peak difference, a root mean square, a standard deviation, a margin factor, a sample entropy, a theta wave, an alpha wave and a beta wave are sequentially extracted from electroencephalogram data of each channel, so that each sample finally generates a 100-dimensional (10 channels x 10 parameters) characteristic vector. Fig. 5 is a schematic diagram of the feature vector obtained in step 4), where the original feature vector is composed of 10 feature parameters, and each feature parameter is calculated from 10 channel data of different brain regions, so that the total dimension is 100 (10 parameters x 10 channels).
The weight of the original characteristic parameter calculated by the Relieff algorithm in the step 5) is specifically as follows:
feature selection algorithms generally fall into three categories: filtering, packaging, and embedding. The filtering type feature selection algorithm is independent of a subsequent classification algorithm, and all features are scored according to divergence or correlation, so that the algorithm is high in operating efficiency and good in universality and can adapt to different classification algorithms. The packaged feature selection algorithm directly evaluates the quality of the feature subset according to the performance of the classifier, so that the packaged feature selection algorithm is closely related to the subsequent classifier, and therefore the packaged feature selection algorithm has the advantage of higher classification accuracy. An embedded feature selection algorithm refers to feature selection being embedded into a machine learning algorithm as a component thereof, such as a typical decision tree algorithm.
The Relieff algorithm belongs to a filter type feature selection algorithm, and the core idea of the Relieff algorithm is thatThe weight of a feature is computed by using the correlation between the feature and the class label, where the correlation is measured by the ability of the feature to distinguish between close range samples. When the Relieff algorithm is used for processing various problems, one sample R is randomly taken out from a training sample set each time, and then k nearest neighbor samples H of the R are found from a sample set which is similar to the RjRespectively finding out k nearest neighbor samples M (C) from different types of sample sets of each RjThen, the weight of each feature is updated. If the weight of the feature F in the sample is calculated, the calculation method is as follows:
Figure BDA0003505619330000121
wherein W (F) represents the weight of the sample F, m is the number of samples, k is the number of nearest samples, class (R) is the class to which the randomly-taken sample R belongs, P (C) is the probability of the occurrence of class C, P (class (R)) is the probability of the occurrence of the class to which the randomly-selected sample R belongs, diff (F, R, H)j)、diff(F,R,M(C)j) The distance between two samples under the characteristic F is calculated as follows:
Figure BDA0003505619330000122
in the formula, R1[F]Is a sample R1Value of the F-th feature, R2[F]Is a sample R2The value of the F-th feature.
Fig. 6 is a line graph of normalized weight values of each feature parameter calculated using the ReliefF algorithm on different channels. Dividing the original features into 10 vectors of 10 dimensions according to channels, wherein each dimension represents a class of feature parameters, calculating the weight of each parameter according to the correlation between the feature parameters and class labels by using a Relieff algorithm, and normalizing the weight value to be between 0 and 1, wherein the result is shown in a broken line diagram 6. As can be seen from fig. 6, the weight of the two feature parameters, i.e., the maximum value and the peak difference, on each channel is low and is less than 0.2, so that it is considered that these two kinds of features are removed, and the other features are retained as the initial features.
In the step 6), mutual information quantity of the initially selected features in the step 5) is calculated, and the features are checked through correlation specifically as follows: the mutual information quantity I (X, Y) may be used to measure the correlation between the random variables X, Y, with a larger value indicating a more correlated two variables. The mutual information amount calculation method comprises the following steps:
I(X,Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)
in the formula: h (X) represents the information entropy of the random variable X, H (X | Y) represents the conditional entropy of the random variable X under the condition of the random variable Y, H (Y) represents the information entropy of the random variable Y, and H (X, Y) represents the joint entropy of (X, Y), and the calculation method is as follows:
Figure BDA0003505619330000131
in the formula: x, Y are specific values for X and Y, and P (X, Y) represents the joint probability of X, Y occurring together.
Fig. 7 is a mutual information amount line graph of the MI algorithm calculating each characteristic parameter on different channels. Dividing the initially selected features screened by the Relieff algorithm into 10 vectors with 8 dimensions according to channels, wherein each dimension represents a class of feature parameters, and calculating the mutual information quantity between the feature parameters and class labels by using an MI algorithm, wherein the result is shown in a line graph 7. As can be seen from fig. 7, the two feature parameters, E _ data/E _ all and E _ bata/E _ all, have low mutual information values on the respective channels, indicating that they have weak correlation with the categories, so that it can be considered to remove these two types of features and to retain the other features as the check features.
Screening out the final characteristics through an RFECV algorithm in the step 7), which is as follows: in order to make up for the defect that the classification accuracy is not ideal although the operation efficiency is high because the feature selection methods related to the steps 5) and 6) are filtering type feature selection algorithms, in the step, the RFECV method is used for grading and ranking the expression of the channel information in the features on the classifier, and the optimal features are selected as final features. The method comprises the following specific steps: feature selection using the RFECV method mainly comprises two steps: an RFE stage and a CV stage. RFE refers to Recursive feature elimination, and the importance of the features can be scored and rated; CV refers to Cross Validation, where the optimal number of features is selected by Cross Validation. Wherein:
the recursive feature elimination specifically comprises the following steps:
(1) setting an original feature set as all available features;
(2) modeling by using the current feature set, and scoring and sequencing the importance of each feature;
(3) deleting one or more features with the lowest score, namely the least important features, and updating the existing feature set;
(4) and (3) repeating the steps (2) and (3) until the importance rating of all the characteristics is completed.
Secondly, the specific process of cross validation is as follows:
obtaining the importance of each feature according to the result of recursive feature elimination, and sequentially selecting different numbers of features;
performing cross validation on the selected features;
and selecting the feature quantity with the highest average score to complete feature selection.
Fig. 8 shows the recognition rate of the feature subsets generated by the RFECV algorithm on different classifiers after ranking the scores of the channels. Scoring by RFECV algorithm, the ordering of each channel is as follows: fp2> O2> Fp1> P4> O1> F4> C4> F3> P3> C3, and the classification accuracy of the feature subsets under different channel numbers in the SVM and KNN classifier is shown in a line chart 8. As can be seen from the figure, after the number of channels is eliminated to 6, if the channels are further reduced, the classification rate starts to decrease rapidly. So the last 4 channel information can be considered to be removed to obtain the final selection feature.
Fig. 9 is a classification confusion matrix before and after feature extraction, where original features 100 dimensions (10 parameters × 10 channels) and final features 36 dimensions (6 parameters × 6 channels) enter the classifier for classification, and the recognition rates are 85.7% and 91.4% respectively. The confusion matrix for the final selected features is shown in fig. 9.
Example two:
the electroencephalogram attention recognition system based on feature selection can realize the electroencephalogram attention recognition method based on feature selection in the first embodiment, and comprises the following steps:
an acquisition module: the device is used for collecting electroencephalogram signals in different attention tasks;
a preprocessing module: the system is used for preprocessing the acquired electroencephalogram signals;
a feature extraction module: the method is used for segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
a characteristic primary selection module: the weight value of the feature is calculated according to the correlation between the extracted feature and the category label, the average value of all the weight values is calculated to be used as a threshold value, and the feature with the weight value larger than the threshold value is used as a primary selection feature;
a check module: the mutual information quantity used for calculating the initially selected features is used for carrying out check on the features through correlation;
a characteristic final selection module: the system is used for carrying out grading and sequencing on the expression basis of the channel information in the features on the classifier, and selecting the best feature as a final feature to obtain a feature set;
an identification module: and the method is used for identifying the attention brain electricity by using the support vector machine.
Example three:
the embodiment of the invention also provides a device for recognizing the electroencephalogram attention based on the feature selection, which can realize the method for recognizing the electroencephalogram attention based on the feature selection, and comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of:
acquiring electroencephalogram signals in different attention tasks;
preprocessing the acquired electroencephalogram signals;
segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
calculating the weight of the feature according to the correlation between the extracted feature and the category label, solving the average value of all the weights as a threshold, and taking the feature with the weight larger than the threshold as a primary selection feature;
calculating mutual information quantity of the initially selected features, and performing check on the features through correlation;
grading and sequencing the representation of the channel information in the features on the classifier, and selecting the best feature as a final feature to obtain a feature set;
and carrying out attention electroencephalogram identification on the obtained feature set by using a support vector machine.
Example four:
the embodiment of the present invention further provides a computer-readable storage medium, which can implement the electroencephalogram attention recognition method based on feature selection described in the first embodiment, and a computer program is stored thereon, and when being executed by a processor, the computer program implements the following steps of the method:
acquiring electroencephalogram signals in different attention tasks;
preprocessing the acquired electroencephalogram signals;
segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
calculating the weight of the feature according to the correlation between the extracted feature and the category label, solving the average value of all the weights as a threshold, and taking the feature with the weight larger than the threshold as a primary selection feature;
calculating mutual information quantity of the initially selected features, and performing check on the features through correlation;
grading and sequencing the representation of the channel information in the features on the classifier, and selecting the best feature as a final feature to obtain a feature set;
and identifying the attention electroencephalogram by using the support vector machine according to the obtained feature set.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A feature selection-based electroencephalogram attention recognition method is characterized by comprising the following steps:
acquiring electroencephalogram signals in different attention tasks;
preprocessing the acquired electroencephalogram signals;
segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
calculating the weight of the feature according to the correlation between the extracted feature and the category label by using a Relieff algorithm, solving the average value of all the weights as a threshold, and taking the feature of which the weight is greater than the threshold as an initial selection feature;
calculating mutual information quantity of the initially selected features, and performing check on the features through correlation;
performing grading and sequencing on the expression basis of channel information in the features on a classifier by using RFECV, and selecting the best features as final features to obtain a feature set;
and carrying out attention electroencephalogram identification on the obtained feature set by using a support vector machine.
2. The feature selection-based electroencephalogram attention recognition method of claim 1, wherein the different attention tasks are four, comprising:
task 1: high attention tasks: browsing and performing mental calculation, browsing a 10 x 10 number matrix (which is randomly distributed from 1 to 100 and is not repeated), and finding out prime numbers in the matrix;
task 2: the attention task comprises: browsing, browsing the provided discussion text material;
task 3: low attention tasks: separating the spirit, staring at the text material in Task2, thinking about things unrelated to the Task in the brain;
task 4: non-attention tasks: resting;
the Task time of Task1 is from the beginning of browsing the first digit to the end of judging the last digit, and the Task time of tasks 2 to tasks 4 is consistent with that of Task 1.
3. The feature selection-based electroencephalogram attention recognition method of claim 1, wherein when acquiring electroencephalogram signals in different attention tasks, selecting leads in frontal lobe, central zone (β -wave), occipital lobe, parietal lobe (α -wave), central zone (θ -wave), comprising: fp1, Fp2, F3, F4, C3, C4, P3, P4, O1 and O2.
4. The feature selection-based electroencephalogram attention recognition method of claim 1, wherein preprocessing is performed on the acquired electroencephalogram signals, and comprises the following steps: the method comprises the steps of using an eeglab kit in matlab, setting the sampling rate to be 512Hz, firstly, performing band-pass filtering on an original signal by using an FIR filter, reserving signals within a frequency band range of 0.5-30 Hz, removing some signals within an unavailable frequency range, simultaneously removing an electromyographic signal with higher frequency and a electrocardio signal, a picosignal and a respiratory band signal with lower frequency, then, removing an electrooculogram signal in the original signal by using an independent component analysis method, analyzing all ICA components, deleting the electrooculogram components, and finally removing bad sections.
5. The feature selection-based electroencephalogram attention recognition method of claim 1, wherein after segmenting the preprocessed electroencephalogram signal to obtain a plurality of samples, feature extraction is performed on the samples, and the feature extraction comprises: the original data in each sample is composed of 10 channels of data, 10 characteristic parameters including a rectification average value, a maximum value, a peak difference, a root mean square, a standard deviation, a margin factor, sample entropy, theta wave, alpha wave and beta wave energy ratio are sequentially extracted from the electroencephalogram data of each channel, and each sample finally generates a 100-dimensional (10 channels x 10 parameters) characteristic vector.
6. The electroencephalogram attention recognition method based on feature selection according to claim 1, wherein the method comprises the steps of calculating the weight of the feature according to the correlation between the extracted feature and the class label through a Relieff algorithm, calculating the average value of all the weights as a threshold, and taking the feature with the weight being larger than the threshold as an initially selected feature:
randomly taking a sample R from a training sample set through a Relieff algorithm, and then finding k nearest neighbor samples H of the R from a sample set which is similar to the RjRespectively finding out k nearest neighbor samples M (C) from different types of sample sets of each RjThen, the weight of each feature is updated, and the calculation method for calculating the weight of the feature F in the sample is as follows:
Figure FDA0003505619320000031
wherein W (F) represents the weight of the sample F, m is the number of samples, k is the number of nearest samples, class (R) is the class to which the randomly-taken sample R belongs, P (C) is the probability of the occurrence of class C, P (class (R)) is the probability of the occurrence of the class to which the randomly-selected sample R belongs, diff (F, R, H)j)、diff(F,R,M(C)j) The distance between the two samples under the characteristic F is calculated as follows:
Figure FDA0003505619320000032
in the formula, R1[F]Is a sample R1Value of the F-th feature, R2[F]Is a sample R2The value of the F-th feature.
7. The electroencephalogram attention recognition method based on feature selection according to claim 1, wherein mutual information quantity of initially selected features is calculated, and features are checked through correlation, and the method comprises the following steps:
the mutual information amount calculation method comprises the following steps:
I(X,Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)
in the formula: i (X, Y) is a mutual information quantity, H (X) represents an information entropy of a random variable X, H (X | Y) represents a conditional entropy of the random variable X under a condition of the random variable Y, H (Y) represents an information entropy of the random variable Y, and H (X, Y) represents a joint entropy of (X, Y), and the calculation method is as follows:
Figure FDA0003505619320000033
in the formula: x, Y are specific values for X and Y, and P (X, Y) represents the joint probability of X, Y occurring together;
using RFECV to perform grading ranking on the expression basis of the channel information in the features on the classifier, selecting the best feature as a final feature, and obtaining a feature set, wherein the grading ranking comprises the following steps:
feature selection using the RFECV method, including recursive feature elimination and cross-validation, wherein:
recursive feature elimination includes:
step 1) setting an original feature set as all available features;
step 2) modeling is carried out by using the current feature set, and the importance of each feature is graded and sorted;
step 3), deleting one or more features with the lowest score, namely the least important features, and updating the existing feature set;
step 4) repeating the steps 2) and 3) until the importance rating of all the features is completed;
the cross validation comprises the following steps:
obtaining the importance of each feature according to the result of recursive feature elimination, and sequentially selecting different numbers of features;
performing cross validation on the selected features;
and selecting the feature quantity with the highest average score to complete feature selection.
8. An electroencephalogram attention recognition system based on feature selection is characterized by comprising:
an acquisition module: the system is used for collecting electroencephalogram signals in different attention tasks;
a pretreatment module: the system is used for preprocessing the acquired electroencephalogram signals;
a feature extraction module: the method is used for segmenting the preprocessed electroencephalogram signals to obtain a plurality of samples, and then extracting the characteristics of the samples;
a characteristic primary selection module: the method is used for calculating the weight of the extracted feature through a Relieff algorithm according to the correlation between the feature and the category label, solving the average value of all the weights as a threshold, and taking the feature with the weight larger than the threshold as an initial selection feature;
a check module: the mutual information quantity used for calculating the initially selected features is used for carrying out check on the features through correlation;
a characteristic final selection module: the method is used for performing grading sorting on the channel information in the features by using RFECV according to the expression of the channel information in the features on a classifier, and selecting the best features as final features to obtain a feature set;
an identification module: and the method is used for identifying the attention brain electricity by using the support vector machine.
9. A electroencephalogram attention recognition device based on feature selection is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.
10. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210138817.XA 2022-02-15 2022-02-15 Electroencephalogram attention recognition system and method based on feature selection Pending CN114521903A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210138817.XA CN114521903A (en) 2022-02-15 2022-02-15 Electroencephalogram attention recognition system and method based on feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210138817.XA CN114521903A (en) 2022-02-15 2022-02-15 Electroencephalogram attention recognition system and method based on feature selection

Publications (1)

Publication Number Publication Date
CN114521903A true CN114521903A (en) 2022-05-24

Family

ID=81622897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210138817.XA Pending CN114521903A (en) 2022-02-15 2022-02-15 Electroencephalogram attention recognition system and method based on feature selection

Country Status (1)

Country Link
CN (1) CN114521903A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115349861A (en) * 2022-08-23 2022-11-18 山东大学 Mental stress detection system and method based on single-channel electroencephalogram signal
TWI810988B (en) * 2022-06-24 2023-08-01 國立中正大學 Method of enhancing classification of electroencephalography signals by time-frequency domain channel weighted technique and system thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115146A1 (en) * 2004-11-30 2006-06-01 Nec Corporation Pathological diagnosis support device, program, method, and system
CN106919948A (en) * 2015-12-28 2017-07-04 西南交通大学 A kind of recognition methods for driving Sustained attention level
CN106913333A (en) * 2015-12-28 2017-07-04 西南交通大学 A kind of choosing method of the sensivity feature index of Sustained attention level
CN111242225A (en) * 2020-01-16 2020-06-05 南京邮电大学 Fault detection and diagnosis method based on convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115146A1 (en) * 2004-11-30 2006-06-01 Nec Corporation Pathological diagnosis support device, program, method, and system
CN106919948A (en) * 2015-12-28 2017-07-04 西南交通大学 A kind of recognition methods for driving Sustained attention level
CN106913333A (en) * 2015-12-28 2017-07-04 西南交通大学 A kind of choosing method of the sensivity feature index of Sustained attention level
CN111242225A (en) * 2020-01-16 2020-06-05 南京邮电大学 Fault detection and diagnosis method based on convolutional neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI810988B (en) * 2022-06-24 2023-08-01 國立中正大學 Method of enhancing classification of electroencephalography signals by time-frequency domain channel weighted technique and system thereof
CN115349861A (en) * 2022-08-23 2022-11-18 山东大学 Mental stress detection system and method based on single-channel electroencephalogram signal

Similar Documents

Publication Publication Date Title
CN106886792B (en) Electroencephalogram emotion recognition method for constructing multi-classifier fusion model based on layering mechanism
CN110070105B (en) Electroencephalogram emotion recognition method and system based on meta-learning example rapid screening
Cai et al. A case-based reasoning model for depression based on three-electrode EEG data
Cai et al. Study on feature selection methods for depression detection using three-electrode EEG data
Zhang et al. Automatic artifact removal from electroencephalogram data based on a priori artifact information
CN114521903A (en) Electroencephalogram attention recognition system and method based on feature selection
CN105894039A (en) Emotion recognition modeling method, emotion recognition method and apparatus, and intelligent device
Nagar et al. Brain mapping based stress identification using portable eeg based device
Salvaris et al. Wavelets and ensemble of FLDs for P300 classification
Prasanth et al. Deep learning for interictal epileptiform spike detection from scalp EEG frequency sub bands
Hosseini et al. Emotional stress recognition using a new fusion link between electroencephalogram and peripheral signals
Chen et al. Scalp EEG-based pain detection using convolutional neural network
CN108175425A (en) A kind of analysis processing device and the cognition index analysis method of latent energy value test
CN113349780A (en) Method for evaluating influence of emotional design on online learning cognitive load
Pane et al. Identifying rules for electroencephalograph (EEG) emotion recognition and classification
CN113208593A (en) Multi-modal physiological signal emotion classification method based on correlation dynamic fusion
CN103077205A (en) Method for carrying out semantic voice search by sound stimulation induced ERP (event related potential)
Sun et al. Automatic epileptic seizure detection using PSO-based feature selection and multilevel spectral analysis for EEG signals
Velásquez-Martínez et al. Motor imagery classification for BCI using common spatial patterns and feature relevance analysis
Wei et al. Automatic Sleep Staging Based on Contextual Scalograms and Attention Convolution Neural Network Using Single-channel EEG
Placidi et al. Classification strategies for a single-trial binary Brain Computer Interface based on remembering unpleasant odors
CN115640827B (en) Intelligent closed-loop feedback network method and system for processing electrical stimulation data
Gao et al. An ICA/HHT hybrid approach for automatic ocular artifact correction
CN115686208A (en) Music induced emotion recognition method and system based on EEG
Hidalgo‐Muñoz et al. Affective valence detection from EEG signals using wrapper methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination