AU2019101151A4

AU2019101151A4 - Classify Mental States from EEG Signal Using Xgboost Algorithm

Info

Publication number: AU2019101151A4
Application number: AU2019101151A
Authority: AU
Inventors: Ke Chen; Jiachen Jiang; Yuan Ma; Yihao WANG; Huimin Zhang
Original assignee: Chen Ke Miss; Zhang Huimin Miss
Current assignee: Chen Ke Miss; Zhang Huimin Miss
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-23
Anticipated expiration: 2027-09-30

Abstract

Brain-computer interface (BCI) is a leading edge technique which allows the brain communicates with external devices. It has been applied in several fields, such as medical rehabilitation, virtual reality and so on. This invention introduces a technique that can be applied in education field to monitor and analyze users' electroencephalogram (EEG) so that the mental states could be identified. The algorithm of classifier used XGBoost which combined Bayes, KNN and SVM in it and its accuracy could reach to 80%. By using this technique, teacher could obtain the concentration status of students in real time and adjust his or her teaching method or remind the student who is wandering. Data collection Import data Feature Absolute Mean Variance coPeaatn PeaSkew extraction E Data pre-processing Visualization Classifier ayes ANN XGBoost Fine-tuning Result analysisI Figure 1

Description

FIELD

This invention is in the field of EEG signal data processing using XGBoost algorithm and serves as identify mental states.

BACKGROUND

Brain-computer interface (BCI), also called direct neural interface or brain-machine interface, is a technique providing a directly communication channel built between the brain and external devices independent of peripheral nervous system. A common function of BCI equipment is to monitor and analyze wearers’ brainwaves in order to get detailed information about the functional activity condition on different brain areas and then transform those data into specific form. Based on this function, the technique has been applied in several fields, including medical rehabilitation, video games etc. Furthermore, with the development of EEG technology and wireless transmission technology, BCI equipment has been designed to be wireless and became smaller and lighter, so it can be wearable and used in more fields. Thus, the public is attaching more expectations to the application of BCI in the education

2019101151 30 Sep 2019 field which is supposes to create a revolutionary change in education field.

As a matter of fact, a prevailing problem of most education methods is to get real-time learning states of each student, and obviously, attention monitoring based on BCI can probably play an important role. So creating an attention monitoring method becomes a prior choice.

An attention monitoring system requires a set of BCI equipment to detect the EEG signals of the wearer and then process these signals through an algorithm which can extract effective data and transform those data into specific statements of attention concentration. This invention used XGBoost algorithm to process data. XGBoost initially started as a research project by Tianqi Chen as part of the Distributed (Deep) Machine Learning Community (DMLC) group. It became well known in the ML competition circles after its use in the winning solution of the Higgs Machine Learning Challenge. A main advantage of this method is its conciseness and immediateness which allows real-time monitoring to be achieved. Eurthermore, by using this algorithm, the accuracy of identifying concentration state could reach up to 80%.

2019101151 30 Sep 2019

SUMMARY

To improve the immediateness and accuracy of the existing methods in classifying different states of attention, this invention proposes a real-time attention monitoring method that is based on the machine learning method of XGboost. Through combining machine learning methods with significant features extracted from the FFT power spectrum of 64 channels of EEG signal, the proposed method notably reduces the processing time and increases the accuracy up to 81.02% when using 5-fold cross-validation, which demonstrates the model’s advantage of good stability and powerful classification ability.

The technical solution of this invention is implemented as follows:

This attention monitoring method for EEG signal includes EEG signal database, feature extraction module, feature pre-processing module and different types of classifiers. The power spectrum of the EEG database is imported into the feature extraction module and extract features from the original data. Then, the representative characteristics will be standardized or normalized in the feature pre-processing module. At last, based on these features and 4 kinds of labels, the classifier can identify 4 states of attention, including concentration, meditation wandering, fatigue, and sleepiness.

The following steps are included:

Step (1), import the original data of 3 frequency bandwidths of FFT

2019101151 30 Sep 2019 power spectrum, including alpha, theta, and beta bandwidth, then combine the 7-time windows of 64 channels into a single window.

Step (2), extract 19 features from the original FFT power spectrum data, including 3 means, 3 variances, 3 autocorrelations, 3 pairwise correlations of 3 bandwidth, 3 skews, 3 peaks and the total sum of all data in each sample.

Step (3), to adapt the features to different classifications, choose a specific pre-processing method from 3 algorithms, including Standardization, Scaling features to a range and Normalization.

Step (4), use different classifiers, including Naive Bayes classifier, KNN classifier, SVM classifier, XGBoost classifier, and ANN to identify 4 states of attention.

Step (5), fme-tune all the models to meet their best performance and then compare the performance of different machine learning classifications.

On the whole, compared with other machine learning methods such as KNN or SVM, the XGBoost algorithm obtains the advantage of the integration of many methods, making its accuracy higher under the same condition. Compared with deep learning method such as ANN which requires plenty of training time, this method has a higher rate of convergence while maintaining acceptable accuracy

2019101151 30 Sep 2019

DESCRIPTION OF DRAWING

Figure 1 is the flow chart of the experiment.

Figure 2 is the Tree model of one algorithm in XGBoost

Figure 3 is a single neuron in ANN

Figure 4 is the architectural of three-layer perceptron

Figure 5 is the line chat of absolute sum of all set

Figure 6 is the process of finding parameter K in KNN which has maximum accuracy

Figure 7 is the process of finding parameter C in SVM which has maximum accuracy

2019101151 30 Sep 2019

DESCRIPTION OF PREFERRED EMBODIMENT

Method

Collection of data

The data source used in this experiment was the tested result of showing fifteen graduate students characters. The continuous EEG was recorded with 64 sintered Ag/AgCl electrodes placed around the scalp. The data was digitized with a sampling rate of 500 Hz using an online pass-band filter. For the analysis, EEG data were down-sampled to 250 Hz and baseline-corrected by removing the average of each channel. The whole file contains 2670 sets of experiment data. For each set, it was constitute of seven time windows. The collected data was the Fast Fourier Transformation in three frequency bands: theta Θ (4-7Hz), alpha a (8-13Hz) and beta β (14-30Hz). Since there were 64 electrodes workings simultaneously, 64 data could be recorded in each frequency band and there were 192 data in each time window. According to the status of the testers, the data could be divided into four groups. Each group was labeled by a specific number: 1 stand for concentration, 2 for wandering, 3 means fatigues and 4 was sleepy. The tag was list in the last column in the file.

Data Import

In order to distinguish the four statuses of the electroencephalogram, the group used Python to build a classifier. First, the data was imported into the program. Due to the flexibility of data processing, the form was converted into array. Since the file contains seven time windows, the mean value of each electrode in every window was calculated as formula

2019101151 30 Sep 2019 (1) and rearranged into a new array.

z=0

-*·η+ζ*192 (1)

The most features of the data were extracted based on this new array.

Feature extraction

There were 19 features extracted from each set of data which were six features from each frequency band and one feature from the whole set. The mean value and the variance were calculated as following formula (2) and (3).

^N ^^=μ=/νΣ^{Χί (2)} i

N s² μ)² (3) i

Kurtosis refers to the peak of the probability distribution of real random variables. The higher the kurtosis is the more sharp the peak and means that the increase in variance is caused by outliers which were extremely greater or less than the mean value. It could calculate by formula (4).

Another parameter that related to the probability distribution is skewness.

It is a measure of the skewness direction and degree of asymmetry of the probability distribution curve with respect to the mean value. Formula (5)

2019101151 30 Sep 2019 shows the calculation of skewness.

^ΣΓΟί - μ)⁴ kurtosis = —=--3 (4)

- μ)²)³ ^ΣΓϋη - μ)³skew = —-----------3 (5)

Qy ΣΓΟί - μ)²)2

Autocorrelation represents the correlation between two adjacent elements in one frequency band while cross-correlation reflects the relationship of two values between different frequency bands. As for cross-correlation, there were three combinations of three frequency bands. Formula (6) and (7) shows the calculation of autocorrelation and cross-correlation.

autocorrelation =

ΣΓ ¹(.Xi - μ) (Xi+i - μ)

ΣΓ(%ί - μ)² , . ΣΓΟί - μ%)(37 - μ?) crosscorrelation = ^ΣΓ(^ - μ%) ^ΣΓ(^ - μ_γ) (6) (7)

In order to extract the above features from each frequency band, the value of i was set to 0, 64 or 128 and the value of N was always 64 larger than

i.

The last feature was related to the whole set of data. The measured electric potential has positive or negative sign in front of the value, simply add them together could cancel the effect of some electrodes and lead to inaccurate features. Therefore, the group used the sum of absolute

2019101151 30 Sep 2019 value of all data which could calculate as formula (8).

abs_sum =

1343

z=0 (8)

Data pre-processing

There were three pre-processing algorithms: StandScaler, MinMaxScaler and Normalizer. They could rescale the obtained values in range 0-1 so that the differences between each value could be reduced. The following formulas show the methods of those three algorithms.

StandScale = ¹ ₃(9) s'

Xi X-min

MinMaxScaler = (10) ^Xmax ^xmin

Normalizer ||X||_P = (|%il^p + |%₂I^P + —Η \^χη\^ρΥ^/ρ(11)

Visualization

It was hard to tell the differences between the four statuses by reading the digital data of those features. Therefore, for each feature, its value of each set was consider as vertical axis while the number of set as horizontal axis and then drew polyline diagram. In this way, the feature could be visualized and become easier to compare between different statuses.

Classifier

Bayes

Naive Bayes is a one of the classifiers that make predictions by predicting probability It is based on Bayes' theorem which describes the probability of an event, based on prior knowledge of conditions that shown in the datasets. One of the advantages of naive Bayes classifier is that it doesn’t need a quite huge number of data for classification while it works well in

2019101151 30 Sep 2019 some complex situations.

The principle of Bayes' theorem and Naive Bayes classifier is as follows.

P(AB) = P(A|B)P(B) = P(B\A)P(A) (12)

As a result, given training data B,

P(B\A)P(A) (13)

For example, if there are three features in our data,

P(B|A) = PVWF) « (14) where X is one of the features, A is the combination of all the features and

B is one of the tag wanted.

One thing that should be paid enough attention is that each of features is assumed to be independent to contribute to the probability of each tag in Naive Bayes classifier.

KNN

K-nearest neighbor (KNN) is a basic classification and regression method.

Its basic method is: given a test case, based on some distance measure to io

2019101151 30 Sep 2019 find the nearest K instance points in the training set, and then based on the k nearest neighbor information to predict.

Distance measures, selection of K value and classification decision rule are three basic elements of k-nearest neighbor method. According to the selected distance measure such as Manhattan distance or Euclidean distance, the distance between the test case and each instance point in the training set can be calculated, and K nearest neighbors can be selected according to the K value. Finally, the test cases are classified according to the classification decision rules.

1) Distance: The distance between two instance points in feature space reflects the similarity of two instance points. The characteristic space of KNN is generally n-dimensional real vector space Rn. The distance used is Euclidean distance, but it can also be other distance, such as more general Lp distance or Minkowski distance. Let feature space X be n-dimensional real vector space Rⁿ, _Y(2) ... r - M¹) r® ...

_{f f} j ) A,j ) Aj ) ) J the L_p distance of Xi, Xj is defined as n

— %,· (15)

1=1

When p=oo, it is the maximum distance of each coordinate.

(16)

2) Selection of K Value: The choice of K value will have a significant impact on the results of KNN. In the application, the K value usually

2019101151 30 Sep 2019 takes a relatively small value, and the cross validation method is usually used to select the optimal K value.

3) Classification Decision Rules: The rule of classification decision in KNN is usually majority voting, that is, the class of input instance is determined by the majority of the k-nearest training instances of input instance.

SVM

As an algorithm which has been widely used in the biological field, support-vector machines (SVMs, also support-vector networks) prove to perform well for data classification. The principle of SVM is to separate categories by a clear gap and that means to find a hyperplane in an N-dimensional space so that the distance from it to the nearest data point on each side is maximized. For example, if N=2, then the hyperplane is just a line and if N=3, the hyperplane is a plane that separates the features, as is shown in the picture.

To identify the right hyperplane, there are some tuning parameters in SVM:

1) Kernel: It takes low dimensional input space and transforms it to a higher dimensional space if a non-linear separation is needed. There are some choices like “linear”, “rbf’, ’’poly” and others.

2) Regularization: Often known as C, it defines how much you want to avoid misclassifying each training example. The width of the margin shrinks as the value of C is increased and hyperplane tends to misclassify more points.

3) Gamma: It defines how far the influence of a single training example reaches.

In this experiment, support-vector machine is employed to classify the

2019101151 30 Sep 2019 four degrees of one’s attention. Given that there are several features in our research, different kernel functions are taken into account to realize the conversion from two-dimensional space to multi-dimensional space. They are:

1) Polynomial kernel: For degree-d polynomials, the polynomial kernel is defined as

K(x,y) = (x^T y + c)^d , c > 0 (17) where x and y are vectors in the input space.

2) Radial basis function kernel:

K(x,x') = exp

(18) where σ is a free parameter and ||x —x'|| means the squared

Euclidean distance between the two feature vectors.

XGBoost

Extreme gradient boosting (XGBoost) is one of the most powerful approaches for most regression and classification problems. In this

2019101151 30 Sep 2019 algorithm, many classifiers whose performance was poor in experiment were integrated together to form a strong classifier. The algorithms mentioned before were considered as tree models and the XGBoost is a combination of those tree models and form an ascending tree model which was based on CART regression tree model. The objective function of XGBoost is consisted of two parts: train loss and complexity of the tree.

ηK ^obi = ι&ί>?ι) + WO(19) i=lk=l

Where y_t is the corresponding tag of training set i and y_t is the predict value of . The existence of complexity is to prevent over-fitting and f_kis the algorithm of different model.

/2(w) = λ\|w| |²(20) w is the fraction of the leaf node and λ could control the number of leaf nodes in a small range so that over-fitting would not happened.

Figure 1 shows the structure of one algorithm. In XGBoost, there were lots of tree models and they were combined together to draw the final conclusion.

ANN

Artificial neural networks are computer systems coming from biological neural network and there are a collection of connected units called

2019101151 30 Sep 2019 artificial neurons that imitate human brains.

Figure 2 shows the structure of a single neuron, where X is input, Y is output and w is the weight of each input. What’s more, b means Bias which provides every node with a trainable constant value. The function of f is for nonlinear data while linear neural network was employed in our experiment.

There are three layers in a feedforward neural network:

1) Input layer: It provides information from the outside world and passes information to the hidden layer. A feedforward neural network only has one input layer.

2) Hidden layer: It hides between input layer and output layer, which plays a quite important role in the network. It transfers information from input layer to output layer and a feedforward neural network can have no or multiple hidden layers.

3) Output layer: It transfers information to the outside world. A feedforward neural network only has one output layer.

Figure 3 illustrates the architectural graph of three-layer perceptron.

Since neural network is the most widespread algorithm nowadays, different types of artificial neural network have emerged like convolutional neural networks. Inspired by the way biological neural networks work in the human brain, artificial neural network does make computers analyze data and make predictions just like humans.

2019101151 30 Sep 2019

However, due to time and other factors, linear neural network was the only neural network that employed in this experiment.

Fine-Turning

In order to get a better performance classifier, the parameters in all models such as parameter K in KNN, parameter C in SVM and series parameters in XGBoost need to be adjusted. The parameter that has the best performance was different from others in each pre-processing algorithm. To test the Bayes, KNN and SVM algorithm, the data was first divided into test set and train set with proportion 3/7 and then recorded the highest accuracy. Then, for further performance test, the group used 5-fold cross validation and recorded the average accuracy. As for the XGBoost algorithm, since there were lots of parameters, one parameter was changed at one time with the rest fixed to reduce the complexity. While the ANN algorithm needs to be trained for many times to improve its performance.

Results

The features of absolute sum of all set were visualized and show in Figure 4. The red lines segment each chart into four parts which correspond to four types of status. From the chart it can be see that because some extreme high values exist in the data file, the differences

2019101151 30 Sep 2019 between each sample in one type could be very large and increase the difficulty for computer to classify the sample.

Table 1 to Table 3 shows the tested accuracy results of three basic models. In comparison, the performance of Naive Bayes algorithm was the worst. The average accuracy of Bayes was about 40%. The Normalizer was the best pre-processing algorithm in Bayes, the average accuracy reached about 46%. In this case, the classifier could easily distinguish state 1 and state 2 while it was confused other two types.

Table 1: Accuracy of Naive Bayes

Accuracy of Naive Bayes
Pre-process ing Algorithm	Accuracy of 3/7 division	5-fold Cross Validation	Average Accuracy	Confusion Matrix
StandScaler	40.82%	[0.6336448 6 0.2953271 0.36891386 0.45778612 0.2945591 ]	41.00%	[[116 14 85 5] [28 36 160 6] [ 15 0 166 3] [ 25 4 129 9]]

2019101151 30 Sep 2019

MinMaxSc aler	45.69%	[0.6336448 6 0.2953271 0.36891386 0.45778612 0.2945591 ]	41.00%	[[106 16 106 7] [ 7 66 121 13] [ 12 2 182 1] [20 15 115 12]]
Normalizer	51.94%	[0.5289719 6 0.34953271 0.37640449 0.59662289 0.4727955 ]	46.49%	[[123 35 32 34] [ 6 148 9 46] [21 71 77 26] [15 71 19 68]]

To fine the highest accuracy, the program traversed the value of K in range 1 to 30 in KNN algorithm and drew the line chart. Figure 5 shows the accuracy varies with parameter K. There are three charts and from left to right is StandScaler, MinMaxScaler and Normalizer data pre-processing algorithm respectively.

According to Table 2, in KNN algorithm, the best accuracy could reach

70% when the quotient of test set and train set is 3/7. However, the best average accuracy is 57.58% when using StandScaler pre-processing algorithm and parameter K is 13.

2019101151 30 Sep 2019

Table 2: Accuracy of KNN

Accuracy of KNN
Pre-process ing Algorithm	Accuracy of 3/7 division	5-fold Cross Validation	Average Accuracy	Confusion Matrix
StandScaler K=13	70.29%	[0.3906542 1 0.74953271 0.75842697 0.65666041 0.33395872 ]	57.78 %	[[169 25 18 13] [20 154 23 15] [25 25 129 21] [13 19 21 Hl]]
MinMaxSc aler K=15	69.41%	[0.3962616 8 0.68598131 0.71348315	54.64 %	[[173 12 22 12] [ 15 150 32 22]

2019101151 30 Sep 2019

		0.61350844		[26 32 134
		0.32270169		22]
		]		[ 10 15 25
				99]]
Normalizer K=24	56.68%	[0.4504672 9 0.4317757 0.47191011 0.50093809 0.32645403 ]	43.63 %	[[151 50 25 16] [ 9 157 19 23] [28 60 79 30] [29 33 25 67]]

Figure 6 shows the variation of accuracy when parameter C in SVM algorithm increased. The two charts from left to right is StandScaler and Normalizer data pre-processing algorithm respectively

In the three basic models, the performance of SVM algorithm was the best. The accuracy was about 75% in 3/7 division and the average accuracy was 62.66% when using StandScaler.

2019101151 30 Sep 2019

Table 3: Accuracy of SVM

Accuracy of SVM
Pre-process ing Algorithm	Accuracy of 3/7 division	5-fold Cross Validation	Average Accuracy	Confusion Matrix
StandScaler C =115	75.41%	[0.4429906 5 0.77943925 0.79026217 0.73733583 0.38273921 ]	62.66 %	[[189 26 10 7] [ 10 184 4 7] [20 39 121 13] [ 12 27 22 110]]
Normalizer C = 65	53.56%	[0.4859813 1 0.4317757 0.38389513 0.53470919 0.43714822 ]	45.47 %	[[168 40 12 14] [ 9 177 8 20] [33 77 44 33] [27 68 31 40]]

The performances of all the basic models were not good as a classifier.

However, once combined them together using XGBoost algorithm, the accuracy will boost. Table 4 shows the test results of XGBoost algorithm.

The average accuracy was between 75% and 81% when using different data pre-processing algorithm. Same as KNN and SVM, StandScale also has the best performance in XGBoost.

2019101151 30 Sep 2019

Table 4: Accuracy of XGBoost

Accuracy of XGBoost
Pre-processi ng Algorithm	5-fold Cross Validation	Confusion Matrix
StandScale	81.02%	[[189 21 11 2] [ 10 166 17 12] [ 16 25 155 17] [ 8 2 11 139]]
MinMaxScal e	80.77%	[[189 19 11 4] [ 7 169 16 13]

2019101151 30 Sep 2019

		[21 23 153 16] [ 9 4 11 136]]
Normalizer	75.91%	[[182 26 8 7] [ 5 165 18 17] [ 17 30 139 27] [ 7 15 16 122]]

Table 5 shows the value of parameters in XGBoost when reach the best performance.

Table 5: Value of parameters in XGBoost

Parameter in XGBoost
booster	objective	max_depth/MinChild weight	gamma	lambd a
gbtree	multi: softmax	5///2	0.25	2
subsampl e	colsample_byt ree	n_estimators' :250	leaming_r ate'	seed
0.8	0.8	250	0.1	1000

2019101151 30 Sep 2019

Table 4 shows the accuracy of ANN algorithm. The accuracy is depending on the train time. As the train time increase, the accuracy of train set and test set will increase. The accuracy for the test set will increase to 74% when the train time is 1000. However, the group did not continue the experiment by increasing the train time because it will take a long time to process the training and there is the possibility of over-fitting after thousands time of training.

Table 6: Accuracy of ANN

Accuracy of ANN
Network Parameter	Trai n time	loss_ data	Tra in set	tes t set	Accurac y of train set	Accurac y of test set
net = SoftMax(n_feature= 19, n_hiddenl=30, n_out=4)	100	95.44 8397 13	217 0	50 0	71.20%	71.34%
net = SoftMax(n_feature= 19, n_hiddenl=30, n_out=4)	500	95.44 8397 13	217 0	50 0	76.00%	72.00%
net = SoftMax(n_feature= 19, n_hiddenl=30, n_out=4)	100 0	95.44 8397 13	217 0	50 0	88.85%	74.00%

2019101151 30 Sep 2019

In conclusion, XGBoost algorithm integrated the advantages of other machine learning methods and obtained the highest accuracy as a classifier. As for the deep learning method ANN, it has the possibility of the better accuracy than XGBoost but that will require plenty of training time. Therefore, XGBoost with StandSclae data pre-processing method is a better classify algorithm and is suitable for the attention monitoring to identify the mental state quickly and accurately.

Claims

1. A classify mental states from eeg signal using xgboost algorithm, wherein StandScale data pre-processing method and XGBoost algorithm are used, which combined Bayes, KNN and SVM to identify the mental status, it can be used in education to obtain the concentration status of students in real time.