CN112200016A

CN112200016A - Electroencephalogram signal emotion recognition based on ensemble learning method AdaBoost

Info

Publication number: CN112200016A
Application number: CN202010977310.4A
Authority: CN
Inventors: 陈宇; 常锐
Original assignee: Northeast Forestry University
Current assignee: Northeast Forestry University
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2021-01-08

Abstract

The invention relates to an electroencephalogram signal emotion recognition method based on an ensemble learning method AdaBoost, wherein the electroencephalogram signal emotion recognition method comprises the following steps: firstly, importing a DEAP data set (the DEAP data set is subjected to down-sampling to 128Hz and artifact removal), taking out the last 60s data of the first 32 channels of the 32 data files, and extracting data and 0/1 labels; then, feature extraction and feature selection are carried out, and time domain, frequency domain and nonlinear features related to emotion are extracted from the electroencephalogram signal data; next, emotion two-classification in 4 dimensions (Valence, aroma, Dominance, Liking) is performed, the extracted feature data is divided into a training set and a test set, the training set is input into a trained AdaBoost classifier, and the test set adopts a 5-fold cross validation method to verify the classification effect. In addition, a comparison experiment is carried out by using a Random Forest classifier and an XGboost classifier, and the method disclosed by the invention is found to have optimal performance, so that the emotion recognition accuracy is obviously improved.

Description

Electroencephalogram signal emotion recognition based on ensemble learning method AdaBoost

The technical field is as follows:

the invention relates to the field of machine learning and emotion recognition, in particular to electroencephalogram emotion recognition based on an integrated learning method AdaBoost.

Background art:

it is well known that emotions play an important role in activities such as daily learning and life of people, simplifying and making vivid the communication between people. Emotion can be expressed in a way that speech intonation and facial expression are easily perceived by others when speaking, and also in a way that physiological changes of the nervous system are not easily perceived by others. However, due to the "forgeability" of speech intonation and facial expression, the former is not a reliable index for judging emotion, and the physiological signal is more accurate. Therefore, along with the development of artificial intelligence technology, emotion recognition based on physiological signals, Especially Electroencephalogram (EEG) signals, is becoming a popular research topic and attracting much attention.

However, the electroencephalogram signal is taken as a chaotic time sequence, and the abundant emotional information stored in the chaotic time sequence is not obvious, but is expressed in a numerical form along with the time change. Naturally, these "values" also contain some information that is not relevant to emotion and noise that affects emotion recognition. If the electroencephalogram signal data are directly sent into the classifier, the classifier is difficult to identify, so that the emotion identification effect is seriously influenced, and the research has no practical significance.

Therefore, the information contained in the electroencephalogram signals needs to be deeply mined, and effective information is extracted and then emotion recognition is carried out on the extracted effective information on the basis of filtering out noise and irrelevant information. The scheme is mentioned in a large number of electroencephalogram signal related researches, namely: and (3) extracting the characteristics of the preprocessed electroencephalogram signals, and then sending the characteristic data related to emotion into a classifier for emotion recognition, so that the emotion recognition accuracy is improved to a certain extent. Although the attempted feature extraction and classification methods vary, most of these schemes suffer from a common problem: the extracted features are single in category, generally only time domain or frequency domain features are considered, emotional feature information is difficult to reflect comprehensively, and the used classification method is mostly concentrated in the field of traditional machine learning (such as KNN, SVM, ANN and the like), so that the classification performance of the classifier is limited.

The invention content is as follows:

the invention aims to overcome the defects of the existing method and provides an electroencephalogram signal emotion recognition method based on an integrated learning method AdaBoost. The method uses an emotion recognition standard data set, namely a DEAP data set (Python edition), and after feature extraction, emotion secondary classification is carried out on 4 emotion dimensions, Valence, Arousal, Dominance and Liking through an AdaBoost classifier. In particular to a feature extraction link, three types of features of time domain, frequency domain and nonlinearity are extracted in the link, and feature dimension is reduced through a supervised feature selection method, so that the problems that emotion related information reflected by the features extracted by the existing series electroencephalogram signal emotion recognition methods is not comprehensive enough, the classification performance of a classifier is limited and the like are solved. The method comprises the following steps:

step 1: reading in the preprocessed data set, and determining a data range to be used;

step 2: performing feature extraction and feature selection on the electroencephalogram signal data, and extracting features related to emotion;

and step 3: and (3) sending the characteristic data obtained in the step (2) into an AdaBoost classifier for training, and testing the performance of the AdaBoost classifier through the experiment and the comparative experiment.

The implementation of step 1 comprises:

step 1.1: reading DEAP data set source files downloaded from an official website (the source files are preprocessed to 128Hz), and importing the last 60s data of the first 32 channels of the 32 dat files (removing the baseline signal data of the first 3s which are irrelevant to emotion);

step 1.2: respectively storing the data of the data and the labels parts of 4 emotion dimensions in features _ raw.csv files and labels0-3.dat files, wherein the score value > in the label is 5.0 and is 1, otherwise, the score value > is 0, and preparing for subsequent feature extraction, feature selection and binary classification.

The implementation of step 2 comprises:

step 2.1: feature extraction: in order to capture the detail information in the electroencephalogram signal as much as possible, a method of simultaneously extracting a plurality of class features is adopted. And sequentially extracting time domain, time-frequency domain and nonlinear features from the last 60s data of all 32 channels. Step 2.1 again comprises the following steps:

step 2.1.1: extracting time domain features, wherein the time domain features comprise 8 statistical features of a Mean (Mean), a standard deviation (Std), a Range (Range), a Skewness (Skewness), a Kurtosis (Kurtosis), three Hjorth index activities, Mobility and Complexity;

step 2.1.2: extracting time-frequency domain features, namely extracting 4 different frequency bands (Theta, Alpha, Beta and Gamma) related to emotion based on a Wavelet packet decomposition method of db4 mother function, and sequentially extracting Wavelet Energy (Wavelet Energy) and Wavelet Entropy (Wavelet Entropy) on the frequency bands, wherein 2 x 4 is 8 features in total;

step 2.1.3: and nonlinear feature extraction, including Power Spectral Density (PSD) and Differential Entropy (DE), for 2 features.

Step 2.2: combining the characteristics: step 2.1, 18 feature values are extracted from each channel in each tested video data, so that each tested video file contains 32 × 18 — 576 feature values in total, and the feature values are spliced into a feature value matrix of (32 × 40) × 576 — 1280 × 576, and stored in a train.

Step 2.3: selecting characteristics: and a supervised feature selection method of Linear Discriminant Analysis (LDA) is adopted, so that the feature dimension extracted in the feature extraction link is reduced, the relevance of the screened features is further improved, and the robustness of the classifier is enhanced.

The implementation of step 3 comprises:

step 3.1: reading the processed characteristic data in a train.csv file, and randomly and independently dividing a training set and a test set according to a dividing ratio of 4: 1 (80% training set, 20% testing set);

step 3.2: training a classifier: sending the training set data into an AdaBoost integrated learning classifier for two-classification training, and continuously adjusting the parameters of the classifier to determine the optimal parameter combination and enhance the learning capability of the classifier;

step 3.3: and (3) testing the performance of the classifier: step 3.3 again comprises the following steps:

step 3.3.1: sending the data of the test set into the classifier trained in the step 2, and performing a two-classification test on 4 emotion dimensions (Valence, Arousal, Dominance, Liking) based on 5-Fold Cross Validation (5Fold Cross Validation);

step 3.3.2: performing comparison experiments by using other classifiers, wherein Random Forest and XGboost are adopted;

step 3.3.3: for the main experiment and the comparative experiment, the following 4 performance indexes are considered: accuracy, precision, recall, F1-Score, and plotting Confusion Matrix (fusion Matrix) and results to evaluate classifier performance in multiple angles.

The invention has the beneficial effects that: the invention starts from two major links of feature extraction and classification in consideration of the problems that the extracted features are single in type, the features are difficult to reflect emotional feature information comprehensively, the used classification method is mostly concentrated in the field of traditional machine learning and the like in the existing electroencephalogram signal emotion recognition method, so that the classification performance of a classifier is limited. Firstly, in the aspect of feature extraction, comprehensively considering various features of electroencephalogram signals, extracting three types of features of time domain, time frequency domain and nonlinearity, taking each channel data in each tested video file as a unit, extracting 5 types of time domain, 2 types of frequency domain and 5 types of nonlinearity, combining the time domain, the time frequency domain and the nonlinearity into feature vectors, further combining all the feature vectors into a feature matrix, then reducing feature dimension through a supervised feature selection method, and enhancing the correlation between the extracted features and emotion; and then, in classification, performing two-classification of the electroencephalogram signals in 4 emotional dimensions by using an ensemble learning method AdaBoost, performing a comparison experiment by using Random Forest and XGboost classifiers, and evaluating the performance of the classifiers in multiple angles by performing 5-fold cross validation and 4-fold individual performance indexes, drawing a confusion matrix and a result curve graph. The invention makes a certain breakthrough on the basis of the existing problems, and the performance of the classifier is effectively improved.

Description of the drawings:

FIG. 1 is a flowchart of an electroencephalogram signal emotion recognition method based on an ensemble learning method AdaBoost.

Fig. 2 is a diagram of electrode names and electrode locations corresponding to different brain regions.

FIG. 3 is a diagram showing the distribution of positive and negative examples in 4 emotion dimensions in the data Labels used in step 1.

Fig. 4 is a diagram showing the original signal and the decomposed signals a4, D4 to D1, taking the channel Fp1 of test 1 as an example.

Fig. 5 is a schematic diagram of the algorithm idea of the feature selection method LDA.

Fig. 6 is a working schematic diagram of the ensemble learning method AdaBoost.

Fig. 7 is a schematic diagram of a confusion matrix.

FIG. 8 is a classification result confusion matrix for the AdaBoost classifier in 4 emotion dimensions.

FIG. 9 is ACC values over 4 emotion dimensions for a 3-group classifier based on 5-fold cross validation.

FIG. 10 is the F1-Score values in 4 emotion dimensions for a 3-group classifier based on 5-fold cross validation.

The specific implementation mode is as follows:

the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a schematic diagram of a specific process for carrying out the present invention, which mainly comprises the following three steps:

1. pretreatment:

the invention uses an emotion recognition standard dataset, the DEAP dataset (Python edition), which collects physiological signals and corresponding emotion data of 32 volunteers, each volunteer watches 40 music videos containing different emotions, and records their physiological signals to data files s01-s32. dat. When the physiological signals are recorded, the total number of the signals is 40 (the front 32-lead electroencephalogram signal + the back 8-lead peripheral physiological signals comprise an electrooculogram signal, an electromyogram signal, a respiratory signal and the like), the sampling frequency is 512Hz, but the sampling frequency is reduced to 128Hz after a series of filtering operations. Each data file comprises the following two matrixes:

(1) data matrix: 40 x 8064, where the first 40 represents the total number of videos, the second 40 represents the total number of channels of signal acquisition, 8064 is 63 seconds of experimental data (63 x 128) for any 1 channel of video, the first 3 seconds of data being baseline data obtained before 1 experiment, and the last 60 seconds of data being data recorded during the experiment;

(2) labels matrix: 40 x 4, these 4 columns are 4 emotion dimensions respectively: the scores of the titer (Valence), the wakefulness (Arousal), the Dominance (Dominance) and the popularity (Liking) range from 1 to 9.

The object of study in the present invention is electroencephalogram, so this step selects 32. the last 60s data of the first 32 channels, the last 8 channels and the first 3s baseline signal data irrelevant to emotion from the dat file, and stores the data in these files and the labels parts of 4 emotion dimensions in the features _ raw.csv file and labels0-3.dat file, respectively, where the score value > in the label is 5.0 and is 1, otherwise, is 0 (the distribution of positive and negative examples in 4 emotion dimensions in labels is shown in fig. 2), so as to prepare for subsequent feature extraction, feature selection and binary classification.

2. Feature extraction and feature selection:

the electroencephalogram signal is taken as a chaotic time sequence, the interior of the chaotic time sequence contains rich emotional characteristic information, and the key point of the link is how to extract characteristic information related to emotion from the chaotic time sequence. Relevant studies have shown that: the electroencephalogram signal characteristics mainly comprise time domain, frequency domain, time-frequency domain and nonlinear dynamics characteristics, and particularly the latter two characteristics are more relevant to emotion. Therefore, in order to better capture the detail information in the electroencephalogram signal, 3 types of features of time domain, time-frequency domain and nonlinear dynamics are extracted.

(1) Time domain characteristics:

mean (Mean):

standard deviation (Std):

value Range (Range): range (x) max (x) -min (x)

Bias (Skewness): skewness (x) Mean ((x-Mean (x))³)

Kurtosis (Kurtosis):

sixthly, Hjorth parameter-Activity: activity (x) ═ Std (x)²

Seventhly, Hjorth parameter-Mobility:

-Hjorth parameter-Complexity:

wherein: skewness (Skewness) is the standard third-order central moment of a sample, and more emphasis is placed on describing the symmetry of overall value distribution; the Kurtosis (Kurtosis) is a standard fourth-order central moment of the sample, and the steepness of the overall all-value distribution form is described more emphatically, so that the data distribution form can be better described by combining the Kurtosis and the steepness; the Hjorth parameter provides a method that can quickly compute three important features of the signal in the time domain: mobility, and Complexity. The method is widely applied to the field of physiological signal processing.

(2) Time-frequency domain characteristics:

besides the time domain, the time-frequency domain features are also a class of important features in electroencephalogram signals. Researches show that the wavelet packet decomposition has effective multi-resolution capability when analyzing non-stationary signals, the problem that the wavelet transformation can only decompose low-frequency subbands and cannot extract information of high-frequency subbands at the same high resolution when decomposing signals at each level is solved, and db4 in the mother function has good smooth characteristic and can better detect the transformation condition of electroencephalogram signals, so the method adopts a 4-layer wavelet packet decomposition method based on db4 mother function. In the concrete implementation: the electroencephalogram signals are decomposed into 4-order detailed signals D4-D1 and a first-order approximation signal A4, the numerical values in the signals are wavelet coefficients of the signals of each order, and the signals respectively represent frequency bands Gamma (32-64Hz), Beta (16-32Hz), Alpha (8-16Hz) and Theta (4-8 Hz). Taking the channel Fp1 of the test sample 1 as an example, the original signal and the decomposed signals A4, D4-D1 are shown in FIG. 3. Then, two features of wavelet energy and wavelet entropy are extracted from the wavelet coefficients of each frequency band.

(ii) Wavelet Energy:

② Wavelet Entropy (Wavelet Entropy):

(3) nonlinear kinetic characteristics:

since the human brain is a typical nonlinear dynamical system, the emotion-related information reflected from the nonlinear dynamical features in the brain electrical signal is also very representative. In the invention, the traditional Power Spectral Density (PSD) and Differential Entropy (DE) characteristics are calculated on 60s data of each channel by adopting short-time Fourier transform of non-overlapping Hamming windows. The differential entropy is a characteristic relative to a continuous random variable, and the calculation formula can be expressed as:

wherein: x is the Gaussian distribution obeying N (mu, sigma)²) F (x) is the probability density function of x. The research shows that: for a fixed-length electroencephalogram signal sequence on a certain frequency band, the differential entropy is equal to the logarithm of the power spectral density.

In the above steps, 18 feature values are extracted for each channel in each video data to be tested, so that each video file to be tested contains 32 × 18 — 576 feature values in total, and these feature values are spliced into a feature value matrix of (32 × 40) × 576 — 1280 × 576, and stored in a train.

In order to reduce the feature dimension and further improve the relevance of the screened features and emotion, Linear Discriminant Analysis (LDA) is used for feature selection. LDA is a supervised learning-based dimension reduction technique, i.e. each sample of a data set is output in a category, and the idea is as follows: the data is projected on a low dimension, and after projection, the projection point of each category of data is desirably as small as possible, and the distance between the category centers of different categories of data is desirably as large as possible, as shown in fig. 5. Because the invention carries out the two classification tasks of the electroencephalogram signals, the algorithm is realized as follows:

let data set D { (x)₁,y₁),(x₂,y₂),...,(x_m,y_m) In which any sample x_iFor a vector of dimension N, yi ∈ {0,1}, define N_jIs the number of class j samples, X_jFor a set of class j samples, μ_jIs a mean vector of class j samples, Σ_jA covariance matrix for the jth sample (strictly speaking, a covariance matrix lacking a denominator part), where j is 0,1, then:

in the binary task, we only need to project data onto a straight line. If the projection line is vector w, then for any sample x_iIts projection on a straight line is w^Tx_iCenter point μ of two classes₀,μ₁The projection on w is w^Tμ₀And w^Tμ₁. Defining by combining with the idea of LDA 'data in class is as close as possible and data between classes is as far as possible':

within-class divergence matrix

Inter-class divergence matrix S_b＝(μ₀-μ₁)(μ₀-μ₁)^T

Thereby having an optimization goal

Note that at this time S_bw is constantly parallel to mu₀-μ₁Not to let S_bw＝λ_w(μ₀-μ₁)

Substituting into a formula of characteristic value to obtain S_bw＝(μ₀-μ₁)(μ₀-μ₁)^Tw＝(μ₀-μ₁)λ_wResolving w ═ S_w ^-1(μ₀-μ₁)

In summary, the original sample set is projected into a 1-dimensional low-dimensional space generated by taking w as a base vector, and the projected feature set is the solved feature set after dimension reduction.

3. And (4) classification:

on the basis of the two previous steps, the obtained feature set can be sent to a classifier for training and performance evaluation. Firstly, randomly and independently dividing read-in feature set data into 4 parts: 1 (80% training set, 20% testing set), ensuring no intersection between the testing set data and the training set data; the training set data is then fed into a classifier for two-class training, here using the ensemble learning method AdaBoost.

AdaBoost is a self-adaptive lifting algorithm in the field of integrated learning, and is effectively applied to the binary problem. The basic principle is iteration, a new weak classifier is added in each iteration, only the weak classifier is trained in each iteration until a predetermined small enough error rate is reached, and each training sample is given a weight indicating the probability that it is selected into the training set by a certain classifier. If a sample point has been accurately classified, then the probability that it is selected is reduced in constructing the next training set; conversely, if a sample point is not classified accurately, its weight is increased. By the mode, AdaBoost can better focus on the sample which is easy to be wrongly divided, the generalization capability of the classifier is improved, and overfitting is not easy to occur.

The working principle diagram of AdaBoost is shown in fig. 6, and the mathematical description is as follows:

(1) initializing training dataWeight distribution

Wherein: d₁Represents the weight, w, of each sample at the first iteration₁₁Representing the weight of a first sample in the first iteration, wherein N is the total number of samples;

(2) performing m iterations:

using weight distribution as D_mLearning training samples of (m ═ 1,2,. n) to obtain weak classifiers G_m(x):x->1, the performance index of the weak classifier is determined by the value ε of the following error function_mTo measure:

② calculating weak classifier G_mThe speaking right (also weight) alpha_mIt represents G_mThe degree of importance in the final classification:

from the above formula, with ε_mDecrease of alpha_mGradually increasing, namely: the classifier with small error has large importance degree in the final classifier;

and thirdly, updating the weight distribution of the training samples for the next iteration: the misclassified sample weight increases and the correctly classified sample weight decreases:

D_m+1＝(w_m+1,1,w_m+1,2,...,w_m+1,i,...,w_m+1,N)

wherein: d_m+1Is the weight, w, of the sample for the next iteration_m+1,1Is the weight, y, of the ith sample at the next iteration_iIs the category (1/-1), G, corresponding to the ith sample_m(x_i) Is the weak classifier to the sample x_iThe classification result (1/-1) of (2), if the classification is correct, y_iG_m(x_i) Is 1, otherwise is-1;

(3) combining the weak classifiers to obtain a strong classifier:

weighted summation of all iterated classifiers:

secondly, applying sign function (sign function) to the summation result to obtain a final strong classifier G (x):

in the experiment of the invention, through the realization and parameter adjustment in the training stage, the best performance can be obtained when the iteration number of the weak learner is set to be about 30.

After the training stage is finished, sending the test set data into the classifier trained in the step 2, performing a two-classification test on 4 emotion dimensions (Valence, Arousal, Dominance, and Liking) based on 5-Fold Cross Validation (5Fold Cross Validation), and performing a comparison experiment by using Random Forest and XGBoost classifiers. For the main experiment and the comparative experiment, the following 4 performance indexes are considered: accuracy, precision, recall, F1-Score, and plotting Confusion Matrix (fusion Matrix) and results to evaluate classifier performance in multiple angles.

Confusion Matrix (fusion Matrix): under the classification task, a plurality of different combinations exist between the predicted result (Predict) and the Real result (Real), and the matrix corresponding to the combined results is the confusion matrix. In the two classification tasks, there are 2 × 2 — 4 different combinations of the above results, which are respectively denoted as TP, FP, FN, TN, so as to obtain the confusion matrix shown in fig. 7.

True example (TP): the prediction is positive example, and the result is also positive example;

false Positive (FP): the prediction is positive case, but the result is negative case;

pseudo counter example (FN): the prediction is negative, but the result is positive;

true Negative (TN): the prediction is counterexample, and the result is counterexample.

Performance indexes are as follows:

(1) the accuracy is as follows:

(2) the precision ratio is as follows:

(3) the recall ratio is as follows:

(4)F1-Score：

the larger these performance index values, the better the classifier performance. The final results are shown in fig. 8 to 10. According to the analysis and result graph, the method has the advantages that the performance is optimal, and compared with the traditional feature extraction and classification method, the emotion recognition accuracy is obviously improved.

Finally, it should be understood that parts of the specification not set forth in detail are well within the prior art.

While the invention has been described with reference to specific embodiments and procedures, it will be understood by those skilled in the art that the invention is not limited thereto, and that various changes and substitutions may be made without departing from the spirit of the invention. The scope of the invention is only limited by the appended claims.

The embodiments of the invention described herein are exemplary only and should not be taken as limiting the invention, which is described by reference to the accompanying drawings.

Claims

1. The electroencephalogram signal preprocessing comprises the following steps:

step 1: reading DEAP data set source files downloaded from an official website (the source files are preprocessed to 128Hz), and importing the last 60s data of the first 32 channels of the 32 dat files (removing the baseline signal data of the first 3s which are irrelevant to emotion);

step 2: and respectively storing the data and the labels parts of the data in feature _ raw.csv files and labels0-3.dat files, wherein the score value > of the label is 5.0 and is stored as 1, and otherwise, the score value > is stored as 0, so that preparation is made for subsequent feature extraction, feature selection and classification work.

2. A method of feature extraction and feature selection based on claim 1, comprising the steps of:

step 1: feature extraction: in order to capture the detail information in the electroencephalogram signal as much as possible, a method of simultaneously extracting a plurality of class features is adopted. And sequentially extracting time domain, frequency domain and nonlinear characteristics from the last 60s data of all 32 channels. The step 1 comprises the following steps:

step 1.1: extracting time domain features, wherein the time domain features comprise 8 statistical features of a Mean (Mean), a standard deviation (Std), a Range (Range), a Skewness (Skewness), a Kurtosis (Kurtosis), three Hjorth index activities, Mobility and Complexity;

step 1.2: extracting time-frequency domain features, namely extracting 4 different frequency bands (Theta, Alpha, Beta and Gamma) related to emotion based on a Wavelet packet decomposition method of db4 mother function, and sequentially extracting Wavelet Energy (Wavelet Energy) and Wavelet Entropy (Wavelet Entropy) on the frequency bands, wherein 2 x 4 is 8 features in total;

step 1.3: and nonlinear feature extraction, including Power Spectral Density (PSD) and Differential Entropy (DE), for 2 features.

Step 2: combining the characteristics: step 1, extracting 18 characteristic values from each channel in each tested video data, so that each tested video file contains 32 × 18 — 576 characteristic values in total, splicing the characteristic values into a characteristic value matrix of (32 × 40) × 576 — 1280 ═ 576, and storing the characteristic value matrix in a train.

And step 3: selecting characteristics: and a supervised feature selection method of Linear Discriminant Analysis (LDA) is adopted, so that the feature dimension extracted in the feature extraction link is reduced, the relevance of the screened features is further improved, and the robustness of the classifier is enhanced.

3. The electroencephalogram signal classification method based on the ensemble learning method AdaBoost on the basis of the claim 2, comprising the following steps:

step 1: reading the processed characteristic data in a train.csv file, and randomly and independently dividing a training set and a test set according to a dividing ratio of 4: 1 (80% training set, 20% testing set);

step 2: training a classifier: sending the training set data into an AdaBoost integrated learning classifier for two-classification training, and continuously adjusting the parameters of the classifier to determine the optimal parameter combination and enhance the learning capability of the classifier;

and step 3: and (3) testing the performance of the classifier: the step 3 comprises the following steps:

step 3.1: sending the data of the test set into the classifier trained in the step 2, and performing a two-classification test on 4 emotion dimensions (Valence, Arousal, Dominance, Liking) based on 5-Fold Cross Validation (5Fold Cross Validation);

step 3.2: performing comparison experiments by using other classifiers, wherein Random Forest and XGboost are adopted;

step 3.3: for the main experiment and the comparative experiment, the following 4 performance indexes are considered: accuracy, precision, recall, F1-Score, and plotting Confusion Matrix (fusion Matrix) and results to evaluate classifier performance in multiple angles.