CN115659207A

CN115659207A - Electroencephalogram emotion recognition method and system

Info

Publication number: CN115659207A
Application number: CN202210581855.2A
Authority: CN
Inventors: 刘三女牙; 杨宗凯; 朱晓亮; 荣文婷; 戴志诚; 赵亮; 何自力; 杨巧来; 孙君懿; 刘艮东
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2023-01-31

Abstract

The invention provides an electroencephalogram emotion recognition method and system, which comprise the following steps: determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measurement position; inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model to recognize corresponding electroencephalogram emotion; the four-order Butterworth band-pass filter filters the electroencephalogram signals of the M channels into five sub-frequency bands, and selects the electroencephalogram signals of the C channels with high correlation degree with electroencephalogram emotion; extracting combined sequence features corresponding to channel features and time features of the preprocessed brain electric signals under different frequency bands by a multichannel parallel convolution neural network; the attention network fuses the combined sequence features of different frequency bands to obtain fused features; extracting depth features of the fused features by a feature extraction network; the classification network classifies the depth features to identify corresponding electroencephalogram emotions. The electroencephalogram emotion recognition method is high in accuracy rate of electroencephalogram emotion recognition.

Description

Electroencephalogram emotion recognition method and system

Technical Field

The invention belongs to the field of electroencephalogram emotion recognition, and particularly relates to an electroencephalogram emotion recognition method and system.

Background

The emotion can reflect the real psychological reaction of people to things, so the method has wide application in medical treatment, education and other related fields. Currently, emotion recognition related studies are largely divided into two categories: one is to acquire the emotional state of the subject in a non-contact manner based on various expressions (voice, text, image, etc.) of the external behavior. Another is a study based on neurophysiological states, i.e. acquisition of various physiological signals (electrocardiogram (ECG), electroencephalogram (EEG), heart rate (PPG)), etc. The emotion of external behavior research is limited, and the mode of paying attention to the representation of emotion of the neurophysiological state is more objective, so that the emotion recognition is further researched by using the EEG physiological signal.

At present, many researchers have self-built EEG signal data sets to research six basic emotions of human beings, such as a SEED data set and a DEAP data set, but few researchers have self-built EEG signals to pay attention to the emotion in the learning process. In recent years, study on learning emotion is generally based on multiple expressions of external behaviors, and sharap and the like study the input state of students by combining the movements of eyes, heads and facial muscles of the students in an online learning scene. However, in a real learning scene, normal students' emotions are more, facial muscles are small in amplitude and short in duration, so that expression features are difficult to capture, meanwhile, facial expressions have the defects of being capable of being disguised and the like, and the emotions of the students are difficult to reflect really through expression research.

In summary, although the existing EEG signal emotion classification techniques achieve better recognition results, the following problems still exist: (1) The sub-band discussion of the combination identification of various channels fails to combine the characteristics of five sub-bands well; (2) Exploring band association synthesis whole channel studies is an important research trend, however not all brain regions of EEG signals contain valid affective information, this approach fails to concentrate on capturing important mood channels.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an electroencephalogram emotion recognition method and system, and aims to solve the problem that the electroencephalogram emotion recognition accuracy is low due to the fact that an existing electroencephalogram emotion recognition method cannot concentrate on capturing important emotion channels and cannot combine different sub-band features.

In order to achieve the above object, in a first aspect, the present invention provides an electroencephalogram emotion recognition method, including the following steps:

determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measuring position;

inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model to recognize corresponding electroencephalogram emotions; the electroencephalogram emotion recognition network model comprises: filtering by a four-order Butterworth band-pass filter, a multi-channel parallel convolution neural network, an attention network, a feature extraction network and a classification network; the four-order Butterworth band-pass filter is used for filtering the electroencephalogram signals of the M channels into five sub-bands, selecting the electroencephalogram signals of the C channels with high electroencephalogram emotion correlation degree from the electroencephalogram signals of the M channels, and taking the five sub-bands corresponding to the electroencephalogram signals of the C channels as preprocessed electroencephalogram signals; the multichannel parallel convolution neural network is used for extracting combined sequence features corresponding to channel features and time features of the preprocessed electroencephalogram signals under different frequency bands; the attention network is used for fusing the combined sequence features of different frequency bands to obtain fused features; the feature extraction network is used for extracting the depth features of the fused features; the classification network is used for classifying the depth features so as to identify corresponding electroencephalogram emotion; the five sub-bands respectively correspond to delta waves, theta waves, alpha waves, beta waves and gamma waves; m and C are positive integers, and C is less than or equal to M.

In an alternative example, the C channels are located in the temporal lobe of the measurer.

In an optional example, the multichannel parallel convolutional neural network is configured to extract combined sequence features of the preprocessed brain electrical signals in different frequency bands, specifically:

electroencephalogram sequence X _f ^C Inputting the multichannel parallel convolution neural network to extract the combined sequence characteristics containing the channel characteristics and the time characteristics of different frequency bands to obtain the characteristics

Wherein the content of the first and second substances,

the combined sequence characteristics of the multi-channel parallel convolution neural network output in the f frequency band under the combination of the C channels; f ^C Representing 5 sub-band feature sets extracted by a multi-channel parallel convolutional neural network under the combination of C channels; reLU represents a nonlinear excitation function, and the nonlinear relation between network layers is increased; h g denotes the electroencephalogram sequence X for the input _f ^C Performing convolution;

representing the electroencephalogram signal at f-frequencies for the combination of C channels.

In an optional example, the attention network is configured to fuse the combined sequence features of different frequency bands to obtain a fused feature, specifically:

from F ^c Selecting features of n frequency bands to pay attention to the network, fusing the selected combination sequence features of the n frequency bands, and outputting a fusion channel, a time sequence and a feature F' after the frequency bands:

Weight _k ＝Sigmoid(q ^T Mult(Select(F ^C ) _×n ))

F’＝Mult(Select(F ^C ) _×n )*Weight _k

wherein Select represents from F ^C Selecting n frequency band combinations; mult is the multiplication of the frequency band combination; sigmoid is an activation function, and the output is mapped between 0 and 1 and used as a threshold number of output weight; q. q.s ^T Calculating the similarity; height _k The self-attention weight value of the selected n frequency bands is obtained; f' is the fused characteristic of attention network output; n is less than or equal to 5 and is a positive integer.

In an optional example, the method further comprises the steps of:

predetermining a video material library containing three types of triggered different learning emotions; the three learning emotions are respectively: investing learning emotion, neutral learning emotion or boring learning emotion;

determining an electroencephalogram signal set to be trained corresponding to the video material library; the electroencephalogram signal set to be trained comprises: inputting an electroencephalogram signal corresponding to learning emotion, an electroencephalogram signal corresponding to neutral learning emotion and an electroencephalogram signal corresponding to boring learning emotion; the learning-invested emotion electroencephalogram signal is obtained by a learner watching a video material triggering learning-invested emotion, the neutral learning emotion is obtained by the learner watching a video material triggering neutral learning emotion, and the boring learning emotion is obtained by the learner watching a video material triggering boring learning emotion;

and training the electroencephalogram emotion recognition network model by adopting the electroencephalogram signal set to be trained to obtain the trained electroencephalogram emotion recognition network model.

In a second aspect, the present invention provides a brain electric emotion recognition system, including:

the electroencephalogram signal determining unit is used for determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measurement position;

the electroencephalogram emotion recognition unit is used for inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model so as to recognize corresponding electroencephalogram emotion; the electroencephalogram emotion recognition network model comprises: filtering by a four-order Butterworth band-pass filter, a multi-channel parallel convolution neural network, an attention network, a feature extraction network and a classification network; the four-order Butterworth band-pass filter is used for filtering the electroencephalogram signals of the M channels into five sub-bands, selecting the electroencephalogram signals of the C channels with high electroencephalogram emotion correlation degree from the electroencephalogram signals of the M channels, and taking the five sub-bands corresponding to the electroencephalogram signals of the C channels as preprocessed electroencephalogram signals; the multichannel parallel convolution neural network is used for extracting combined sequence features corresponding to channel features and time features of the preprocessed electroencephalogram signals under different frequency bands; the attention network is used for fusing the combined sequence features of different frequency bands to obtain fused features; the feature extraction network is used for extracting the depth features of the fused features; the classification network is used for classifying the depth features so as to identify corresponding electroencephalogram emotion; the five sub-bands respectively correspond to delta waves, theta waves, alpha waves, beta waves and gamma waves; m and C are positive integers, and C is less than or equal to M.

electroencephalogram sequence X _f ^C Inputting the multichannel parallel convolution neural network to extract the combined sequence characteristics containing the channel characteristics and the time characteristics of different frequency bands to obtain characteristics

Wherein, the first and the second end of the pipe are connected with each other,

combining sequence characteristics output by the multi-channel parallel convolution neural network under the combination of the C channels in the f frequency band; f ^C Representing 5 sub-band feature sets extracted by a multi-channel parallel convolution neural network under the combination of C channels; reLU represents a nonlinear excitation function, and the nonlinear relation between network layers is increased; h g denotes the electroencephalogram sequence X for the input _f ^C Performing convolution;

Weight _k ＝Sigmoid(q ^T Mult(Select(F ^C ) _×n ))

F’＝Mult(Select(F ^C ) _×n )*Weight _k

wherein Select represents from F ^C Selecting n frequency band combinations; mult is the multiplication of the frequency band combination; sigmoid is an activation function, and the output is mapped between 0 and 1 and used as a threshold number of output weight; q. q.s ^T Calculating the similarity; height _k The self-attention weight value of the selected n frequency bands is obtained; f' is the fused characteristic of attention network output; n is not more than 5 and is a positive integer.

In one optional example, the system further comprises:

the model training unit is used for predetermining a video material library containing three types of triggered different learning emotions; the three learning emotions are respectively: investing learning emotion, neutral learning emotion or boring learning emotion; determining an electroencephalogram signal set to be trained corresponding to the video material library; the electroencephalogram signal set to be trained comprises: inputting an electroencephalogram signal corresponding to learning emotion, an electroencephalogram signal corresponding to neutral learning emotion and an electroencephalogram signal corresponding to boring learning emotion; the learning-invested emotion electroencephalogram signal is obtained by a learner watching a video material triggering learning-invested emotion, the neutral learning emotion is obtained by the learner watching a video material triggering neutral learning emotion, and the boring learning emotion is obtained by the learner watching a video material triggering boring learning emotion; and training the electroencephalogram emotion recognition network model by adopting the electroencephalogram signal set to be trained to obtain the trained electroencephalogram emotion recognition network model.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

the invention provides an electroencephalogram emotion recognition method and system, wherein learning video segments are used as emotion stimulating materials, a video material library containing 28 different learning emotions is established, a 32-channel EEG measuring device is used for collecting EEG signals of learners in the process of watching learning videos, noise and artifacts of the EEG signals are filtered, the EEG signals are input into a multi-channel parallel convolutional neural network to extract channel and time characteristics, an attention network is input to extract inter-band combination characteristics, and the EEG signals are input into a classification network to classify and recognize the three emotions of the learners. According to the invention, the attention module is embedded into the convolutional neural network, so that the emotion recognition accuracy of an EEG signal learner is effectively improved.

The invention provides an EEG emotion recognition method and system, and provides an attention fusion (ECN-AF) EEG emotion classification network based on multichannel frequency band features by paying attention to the relation among frequency bands, channels and time sequence features. Experiments on the SEED data set confirmed that ECN-AF performed with better classification accuracy than the baseline model. An induction experiment of an online learning scene is designed. A LE-EEG dataset is established comprising three learning emotions, boring, engagement and neutralization. EEG signals were collected for 45 subjects. Cross-dataset validation of ECN-AF demonstrates its better robustness, not only performing well on public data SEED, but also has significant advantages on self-created LE-EEG datasets.

Drawings

FIG. 1 is a flow chart of an electroencephalogram emotion recognition method provided by an embodiment of the invention;

FIG. 2 is a technical flowchart of an electroencephalogram emotion recognition method combining multi-channel frequency band feature attention fusion provided by an embodiment of the present invention;

fig. 3 is a subjective scoring statistical chart of learning video segments according to an embodiment of the present invention;

fig. 4 is a visualized presentation diagram of subjective score statistical analysis of a learning video clip according to an embodiment of the present invention;

FIG. 5 is a block diagram of an overall model provided in an embodiment of the present invention;

FIG. 6 is a graph of validation set accuracy during training of three models based on LE-EEG datasets according to an embodiment of the present invention.

Fig. 7 is an architecture diagram of an electroencephalogram emotion recognition system provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention develops research and establishes EEG learner emotion data sets for learning emotion classification. D' MeLEO states that six basic emotions, although common in life, are mostly absent in learning times as long as 30 minutes to 2 hours, thus defining six learning emotions including boring, investing, puzzling, frustrating, happy and surprised, and arranging them in ascending order according to persistence on a time scale: (happy = surprised) < (confusion = depression) < (bored = invested \ concentration). The invention considers time scale and emotion occurrence probability, researches two emotions with longer time scale, namely boring and investing, adds neutral emotion states, and totally three learning emotion states. When the teaching content is knowledge which is interesting, deeply understood and accepted by students, the learner will use smiling, nodding and the like to express the state of positive learning emotion, and the emotional state of the learner at the moment is described as input emotion; when the teaching content is knowledge which is not interesting and cannot be understood by students, the learner expresses the state of negative learning emotion by laevoking, sighing and the like, and the emotional state of the learner is described as boring emotion at the moment; most of the time, learners are in an understandable neutral state, which we define as a neutral mood.

At present, deep learning is utilized to extract EEG signal characteristics for emotion classification, and the emotion classification is mainly carried out from two aspects: the emotion recognition method comprises the steps of firstly discussing the emotion recognition effect of various channel combinations in a frequency band division mode, and secondly exploring the emotion recognition effect of frequency band relevance comprehensive full-channel characteristics. In the aspect of recognizing emotion by combining multiple channels under the sub-band discussion, research is usually performed by adopting SVM, KNN, CNN, graph neural network and the like. The graph neural network is used for modeling the characteristics of multi-channel electroencephalograms and learning the internal relation between different electroencephalogram (EEG) channels. In the aspect of exploring frequency band correlation and recognizing emotion, a four-dimensional convolution recurrent neural network, a transferable attention neural network and an attention three-dimensional dense network are often adopted. The four-dimensional convolution recurrent neural network converts the whole channel of the brain electricity into a two-dimensional picture, all sub-frequency bands are overlapped to convert the characteristics into three-dimensional, and the 2DCNN is used for extracting the channel and frequency band characteristics and the LSTM is used for extracting the time characteristics. The transferable attention neural network extracts the characteristics of the whole brain area by adopting two directional RNN modules, and highlights the key brain area by utilizing the fusion characteristics of the global attention layer to classify the emotion. Attention three-dimensional dense networks use attention modules to focus on combining the spatial and temporal importance features of five sub-bands for emotion classification.

The invention provides an electroencephalogram emotion recognition method and system combining multichannel frequency band characteristic attention fusion, wherein the method comprises the following steps: selecting a knowledge point video clip, and establishing a video material library for triggering learners to generate three different learning emotions; a 32-channel electroencephalogram cap collects an EEG signal in the process that a learner watches a learning video; EEG signals are preliminarily filtered, and noise and artifact interference are eliminated; dividing the continuously acquired EEG signal into a plurality of original segments by windows, filtering the segment of signal into five sub-bands by adopting a fourth-order Butterworth band-pass filter in all the original segments, normalizing the EEG signal of the five sub-bands and selecting a channel to obtain a preprocessed EEG signal; inputting the preprocessed EEG signal into a multi-channel parallel convolution neural network to extract channel and time characteristics to obtain five sub-band characteristics, and then inputting the five sub-band characteristics into an attention network to extract inter-band combination characteristics; and inputting the inter-band combination characteristics into a classification network to classify and identify three emotions of the learner. According to the invention, the attention module is embedded into the convolutional neural network, so that the accuracy of learning electroencephalogram emotion recognition is improved.

Aiming at the defects of the prior art, the invention aims to provide an electroencephalogram emotion recognition method and system combined with multi-channel frequency band feature attention fusion, which aim to pay attention to the emotion of a learner in the learning process, consider a specific channel and a part of frequency bands, and use the influence of the fusion of an attention network on emotion classification.

In order to realize the aim, the invention provides an electroencephalogram emotion recognition method combining multichannel frequency band characteristic attention fusion, which comprises the following steps of:

selecting a knowledge point video clip, and establishing a video material library for triggering learners to generate three different learning emotions;

a 32-channel electroencephalogram cap collects an EEG signal in the process that a learner watches a learning video;

EEG signals are preliminarily filtered, and noise and artifact interference are eliminated;

dividing the continuously collected EEG signals into a plurality of subsections by windows, filtering the signals into five sub-bands by adopting a fourth-order Butterworth band-pass filter, normalizing the EEG signals of the five sub-bands and selecting channels to obtain preprocessed EEG signals;

inputting the preprocessed EEG signal into a multi-channel parallel convolution neural network to extract channel and time characteristics to obtain five sub-band characteristics;

inputting the five sub-band characteristics into an attention network to extract inter-band combination characteristics;

and inputting the inter-band combined features into a classification network to classify and identify three emotions of the learner.

Selecting the knowledge point video clips as described above, and establishing a video material library for triggering the learner to generate three different learning emotions; the method specifically comprises the following steps:

selecting course segments from a MOOC network and a Pojie network in Chinese university according to active or passive vocabularies including input/concentration, boring and the like in course evaluation, and recruiting 49 computer professional researches for student participation and evaluation to obtain a video material library for triggering three different learning emotions of a learner.

The method for collecting the EEG signals in the process of watching and learning the videos of the learner through the 32-channel EEG cap specifically comprises the following steps:

the 32-channel EEG measuring equipment EPOC Flex salt Sensor Kit is used for EEG data acquisition in the learning process, the reference electrode is the tested left and right earlobe electrodes, and the sampling frequency is 128Hz.

The above-mentioned EEG signal preliminary filtering, noise and artifact interference rejection specifically includes:

preprocessing such as band-pass filtering and automatic artifact processing is carried out on the EEG signals by using MATLAB R2020b and eeglab toolboxes, ICLab and adjust plug-ins, after the artifact is processed by using an automatic toolkit, partial bad data are manually deleted by a visual method, and finally relatively clean EEG data are obtained.

Inputting the preprocessed EEG signal into a multi-channel parallel convolution neural network to extract channel and time characteristics to obtain five sub-band characteristics; the method specifically comprises the following steps:

the features extracted by the multichannel parallel convolution neural network are subband channel features extracted after the preprocessed EEG data passes through two layers of convolution neural networks, an average pooling layer and a normalization layer;

the extracting of the inter-band combination features by the sub-band feature input attention network specifically includes: the attention network calculates attention weight, assigns the weight to frequency band features, acquires frequency band attention feature vectors and outputs new attention features.

FIG. 1 is a flow chart of an electroencephalogram emotion recognition method provided by an embodiment of the invention; as shown in fig. 1, the method comprises the following steps:

s101, determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measurement position;

s102, inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model to recognize corresponding electroencephalogram emotion; the electroencephalogram emotion recognition network model comprises: filtering by a four-order Butterworth band-pass filter, a multi-channel parallel convolution neural network, an attention network, a feature extraction network and a classification network; the fourth-order Butterworth band-pass filter is used for filtering the electroencephalogram signals of the M channels into five sub-bands, selecting the electroencephalogram signals of the C channels with high correlation degree with electroencephalogram emotion from the electroencephalogram signals of the M channels, and taking the five sub-bands corresponding to the electroencephalogram signals of the C channels as preprocessed electroencephalogram signals; the multichannel parallel convolution neural network is used for extracting combined sequence features corresponding to channel features and time features of the preprocessed electroencephalogram signals under different frequency bands; the attention network is used for fusing the combined sequence features of different frequency bands to obtain fused features; the feature extraction network is used for extracting the depth features of the fused features; the classification network is used for classifying the depth features so as to identify corresponding electroencephalogram emotion; the five sub-bands respectively correspond to delta waves, theta waves, alpha waves, beta waves and gamma waves; m and C are both positive integers, and C is less than or equal to M.

In an optional example, the method further comprises the steps of:

collecting electroencephalogram signals when a learner watches and triggers different types of learning emotion video materials; the corresponding electroencephalogram emotion is the study-in emotion when the video material triggering the study-in emotion is watched, the corresponding electroencephalogram emotion is the neutral study emotion when the video material triggering the neutral study emotion is watched, and the corresponding electroencephalogram emotion is the boring study emotion when the video material triggering the boring study emotion is watched;

and training the electroencephalogram emotion recognition network model by adopting the acquired electroencephalogram signals to obtain the trained electroencephalogram emotion recognition network model.

Specifically, the method for recognizing electroencephalogram emotion through multi-channel frequency band feature attention fusion provided by the invention is a detailed technical scheme below, and fig. 2 is a technical flow chart of the method for recognizing electroencephalogram emotion through multi-channel frequency band feature attention fusion provided by the embodiment of the invention; as shown in fig. 2, the method comprises the following steps:

s1, selecting knowledge point video clips and establishing three video material libraries with different learning emotions. Selecting a knowledge point video clip, wherein the video clip is from an MOOC network and a beep-li network of China university, selecting a course clip according to active or passive vocabularies including input \ concentration, boring and the like in course evaluation, and carrying out video primary screening by taking a computer professional researcher as a academic background;

50 lesson snippets were collected based on the approach, with the preliminary decision containing three different libraries of learning emotions: 18 videos for triggering learners to engage in learning emotions, 17 videos for triggering learners to neutral learning emotions and 15 videos for triggering learners to boring learning emotions;

subjective evaluation is carried out on 50 collected curriculum segments, 49 study ginseng are recruited for evaluation, a questionnaire of learning state self-evaluation is taken, questions in the questionnaire adopt a 5-point scale, and 44 effective questionnaires are collected;

the 44 effective questionnaire data are imported into statistical software SPSS27.0, statistical analysis is carried out on the data by methods such as descriptive statistics, reliability analysis, variance analysis and the like, FIG. 3 is a subjective score statistical chart provided in the embodiment of the invention, and FIG. 3 shows 28 video segments with higher subjective score consistency of the testee. Fig. 4 is a visual presentation of subjective score statistical analysis provided in an example of the invention, showing the dispersion of five-point scores from 44 available questionnaires and the average score of emotional segments. Obtaining a video material library containing three different learning emotions: 14 videos for triggering learners to participate in learning emotions, 6 videos for triggering learners to learn neutral emotions and 8 videos for triggering learners to learn boring emotions.

And S2.32 channel electroencephalogram caps collect EEG signals in the process of watching learning videos by learners. And selecting 7 videos for triggering the learning emotion input of the learner, 6 videos for triggering the neutral learning emotion of the learner and 7 videos for triggering the boring learning emotion of the learner from the video material library obtained in the step S1 to perform an EEG acquisition experiment. The invention recruits 47 computer professional researchers to participate in electroencephalogram experiments, wherein data acquisition in the experiments uses electroencephalogram equipment produced by EMOTIV company, hardware equipment is EPOC Flex salt Sensor kit, software equipment is EmotivPRO v2.0, wireless transmission communication is adopted, and a testee is coated with conductive gel to realize the measurement of learning emotion original EEG signals for 1 hour and 30 minutes.

And S3, carrying out primary filtering on the EEG signal, eliminating noise and artifact interference, and generating an input data set. And (2) introducing the acquired original EEG signals in the S2 into MATLAB R2020b, processing the original EEG signals by using an eeglab toolbox, an ICLab and an adjust plug-in, firstly, positioning a channel position, removing power frequency interference by using 49-51HZ concave filtering, performing operations of 0.1HZ high-pass filtering and 40HZ low-pass filtering, performing independent principal component analysis (ICA) to remove blinking components and muscle tension components, removing bad sections by a visual method and the like, and finally obtaining relatively clean EEG data which are called learner emotion (LE-EEG) data sets and comprise EEG signals of three learning emotions of boredom, input and neutral.

S4, FIG. 5 is an integral model structure diagram of an electroencephalogram emotion recognition network combined with multi-channel frequency band feature attention fusion (ECN-AF) provided by the embodiment of the invention; as shown in fig. 5, the overall model comprises three main modules:

1) Module 1: a band division and channel selection module. In this module, the acquired EEG signal is first divided into original segments by sliding windows, the window size being 10 seconds, the step size being 2 seconds; secondly, extracting five different frequency bands from the original fragment through a fourth-order Butterworth band-pass filter, and respectively performing signal normalization on the five frequency bands to obtain a single-frequency band signal; and finally, referring to a channel combination mode, and finally generating an electroencephalogram sequence of the input neural network.

2) And (3) module 2: a band attention feature extraction module. The module consists of a multi-channel convolution parallel convolution neural network and an attention network. Firstly, the electroencephalogram sequence output by the module 1 is put into a multichannel convolution parallel convolution neural network, and the network extracts the characteristics of channels and time sequences of different frequency bands. Secondly, the features extracted from different frequency bands are further put into an attention network, the attention network fuses the channel and time series features of different frequency bands and outputs combined features containing the channel, the time series and the frequency bands.

3) And a module 3: and a feature depth fusion and classification module. In this module, first, the combined features output by module 2 are used as input to the module 3 feature depth fusion module, and the combined features output by module 2 are extracted by using the depth network to output depth features. And then, inputting the depth features into a classification module to give a final classification result.

The emotion classification accuracy of the public electroencephalogram data set SEED data set and the self-built LE-EEG data set in S3 was verified using the ECN-AF model. The common EEG data set SEED data set contains 62 channels of EEG signals from 15 subjects at a sampling rate of 200Hz. The self-created LE-EEG dataset in S3 contains 32 channels of EEG signals from 45 subjects with a sampling rate of 128Hz.

In block 1, first split the SEED dataset with all EEG data in the LE-EEG dataset in S3 into multiple original segments with W as the window size, as follows:

W＝T*C

wherein W is the window size; t is the window length; c is the number of channels.

First, the SEED dataset and all EEG data in the LE-EEG dataset in S3 are split into multiple original segments with W as window size, resulting in original segments W of 2000 x 62 and 1280 x 32 in the SEED and LE-EEG datasets, respectively, all data being split with sliding window, with window length T of 10 seconds and step size of 2 seconds.

S＝{W ₁ ,W ₂ ,W ₃ ,…W _i ,…W _n-1 ,W _n }

S represents a set of n original segments divided by a subject; w _i Representing the ith original fragment; n represents the total number of samples of one subject; thus, in the SEED dataset, the sample size S for each subject was 4896 and the total sample size for all 15 subjects was 73,440. In the LE-EEG dataset, each subject sample size S was between 1082 and 1650, with a total sample size of 60,376 for all 45 subjects.

Secondly, all the original segments in S are filtered into five sub-bands by a fourth-order butterworth band-pass filter: delta (1-4 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (31-50 Hz).

Wherein N is _f Of order of the filter, i.e. N _f =4; w is the frequency;

is normalized cut-to-frequency; and f is a ₁ ～f ₂ Is the pass band interval of the band pass filter; h (S) is a sub-band signal filtered by a fourth-order Butterworth band-pass filter; w is the frequency range of the sub-band.

And thirdly, inputting the EEG signal H (S) filtered by the band-pass filter as a normalization layer, and ensuring that EEG data have the same measurement scale after normalization to obtain a single-frequency-band signal.

Wherein S is _f Results of normalization for EEG signals; f is one of the five divided sub-frequency bands; STD is standard deviation; AVG is the average value.

Finally, the normalized EEG signal S is normalized _f And selecting a channel to finally obtain an electroencephalogram sequence of the input neural network. Previous studies have found that a combination of frequency channels can improve the identification performance.

The SEED dataset placed 62 channel electrode positions according to the international 10-20 system and the self-established LE-EEG dataset placed 32 channel electrode positions according to the international 10-20 system. The electrode position consists of letters and numbers, wherein the letters 'Fp' are prefrontal lobe electrodes, 'F' are frontal lobe electrodes, 'AF' are interplanar electrodes, 'T' are temporal lobe electrodes, 'O' are occipital lobe electrodes, 'P' are parietal lobe electrodes, 'C' are central electrodes, 'Z' are left and right brain central electrodes, 'FC' is an interplanar and central electrode, 'FT' is an interplanar and temporal electrode, 'CP' is an interplanar and apical electrode, 'PO' is interplanar and occipital electrode, and 'TP' is interplanar and occipital electrode; the numbers indicate the number of electrodes, with odd numbers on the left side of the brain and even numbers on the right side of the brain.

For example, zheng et al used six channel combinations of "FT7", "FT8", "T7", "T8", "TP7" and "TP8" for mood classification. Furthermore, zheng et al designed four different electrode placement patterns based on the peak characteristics of the weight distribution and the asymmetry of the mood treatment, and finally adopted "FT7", "T7", "TP7", "P7", "C5", "CP5", "FT8", "T8", "TP8", "P8", "C6", "CP6", and compared to the full-channel prediction, the 12-channel combination achieved the best results with 86.65% classification accuracy. This demonstrates that better experimental results than full channel identification can be achieved with fewer channel combinations. In conjunction with the above study, we obtained the following settings:

wherein, X _f ^C Combining EEG signals at f-frequencies for channel C; c is a channel combination mode; in the SEED data set, C1 and C2 are taken as C1= { "FT7", "FT8", "T7", "T8", "TP7", "TP8" } and C2= { "FT7", "T7", "TP7", "P7", "C5", "CP5", "FT8", "T8", "TP8", "P8", "C6", "CP6", respectively.

In the SEED dataset, C1 and C2 are taken as C1= { "FT7", "FT8", "T7", "T8", "TP7", "TP8" } and C2= { "FT7", "T7", "TP7", "P7", "C5", "CP5", "FT8", "T8", "TP8", "P8", "C6", "CP6", respectively. These channels are located in the temporal lobe, which is consistent with the results of the emotional brain area distribution study. In the SEED data set, the C2 channel combination is finally selected for basic emotion electroencephalogram emotion recognition. Considering the inconsistency of the number of available EEG channels for LE-EEG datasets and for see datasets, 64 channels and 32 channels respectively, and also for robustness of the verification algorithm, C3= { "T7", "P7", "CP5", "T8", "P8", "CP6" } is preset to be as consistent as possible with C2, thus performing cross-dataset verification. In an LE-EEG data set, a C3 channel combination is finally selected for learning emotion electroencephalogram emotion recognition.

In block 2, first, the electroencephalogram sequence X output from block 1 _f ^C Inputting a multi-channel parallel convolution neural network, extracting the characteristics of channels and time sequences of different frequency bands by the network to obtain the characteristics

Wherein the content of the first and second substances,

the characteristic of the convolution network output in the f frequency band under the C type channel combination; f ^C And representing the 5 sub-band feature sets extracted by the convolutional network under the C channel combination.

Second, from F ^C In which features of n frequency bands are selected

And putting the attention network into an attention network, fusing the channel and time sequence characteristics of the selected n frequency bands, and outputting a combined characteristic F' comprising the channel, the time sequence and the frequency band. FIG. 5 is a diagram of an internal structure of an attention module provided in an embodiment of the present invention; as shown in FIG. 5, the feature is entered into the attention module and the calculated feature association is derived from the attention Weight _k Weighting the input features by the obtained attention weight to output a new attention feature vector F', and outputting the feature F by the invention ^C As an attention network input.

Weight _k ＝Sigmoid(q ^T Mult(Select(F ^C ) _×n ))

F’＝Mult(Select(F ^C ) _×n )*Weight _k

Among them, weight _k Is the attention weight; select represents from F ^C Selecting n frequency band combinations, wherein the n frequency bands represent results selected from 5 sub-frequency bands, and finally, 3 sub-frequency bands are selected for two data sets respectively, namely delta, beta and gamma for electroencephalogram emotion recognition; mult is the multiplication of the frequency band combination; q. q of ^T Calculating the similarity; f' is the combined characteristics of the attention network outputs.

The attention network calculates the feature weight of emotion recognition through an attention mechanism, assigns the weight higher than a threshold value to an attention area of emotion recognition, assigns the weight lower than the threshold value to an irrelevant area, associates feature information among frequency bands, eliminates irrelevant interference features and obtains a combined feature F'.

In module 3, module 3 is composed of a feature depth fusion module and a classification module. Firstly, the combined feature F 'output by the module 2 is used as the input of a feature depth fusion module, the feature depth fusion module is formed by combining two layers of convolution networks, a pooling layer and a normalization layer, and the combined feature F' sequentially passes through the network layers and then outputs the depth feature. And then, inputting the depth features into a classification module, wherein the classification module is formed by combining two layers of convolution networks, a global pooling layer and a full connection layer, and the depth features sequentially pass through the network layers and then output a final classification result.

Through the steps, the emotion classification of the EEG signal is realized. In the training process, a cross entropy loss function is used, a loss function value is optimized through a random gradient descent algorithm, sigmoid is used as an activation function, the learning rate of the model is set to be 0.001, the weight attenuation is set to be 0.0001, the learning rate is dynamically adjusted in the process, the optimization function is set to be Adam optimization, and finally the optimal result is achieved. The experiment employed average Accuracy (ACC) and standard deviation (STD) as evaluation indices for emotion recognition. The larger the accuracy value is, the better the recognition effect is.

Specifically, to verify the effectiveness of the Attention network, we compared three fused band methods, namely feature addition fusion, feature multiplication fusion, and Attention weight fusion, which are denoted as Add, mult, and Attention in table 1, respectively. The comparison of the accuracy of emotion recognition on the SEED data set by the attention fusion method and other fusion methods is shown in Table 1:

TABLE 1

Experiments of the invention find that, firstly, the proposed attention network has better performance on the model on the whole on more frequency band combinations; however, more frequency band combinations do not guarantee a higher expression of the mood classification. For example, in the case of the sub-bands (δ, α, β, γ) shown in the last row of table 1, the model performance of the fusion mode using Add is degraded (see

columns

2 and 5 of the last row of table 1), but still relatively stable, compared to the sub-band combinations shown in the other rows of table 1; (ii) The model performance using fusion mode of Mult or Attention (see

columns

3 and 6 or columns 4 and 7 of the last row of table 1) is severely degraded. The reasons for this may include: during model training, the fusion mode of Mult and Attention leads model training parameters to grow exponentially, and severe overfitting caused by model overtraining is caused.

Secondly, we can see that the best performance obtained with C2 (see columns 5-7 of Table 1) is always higher than with C1 (see columns 2-4 of Table 2). To illustrate the problem, let us take the sub-bands (δ, γ) as an example. From line 4 of table 1 we can see that: (i) With respect to C1, the best performance of 95.63% was achieved using the fusion method of attention; (ii) With respect to C2, the best performance of 95.70% was again achieved using the fusion method of attention, that is, C2 achieved a 0.07% improvement in accuracy over C1.

Third, with respect to C2, the best two performances were achieved using the noted fusion method from subbands (α, β, γ) and (δ, β, γ), 96.02% and 96.45%, respectively (see second and third lines of the last column of table 1). Take the sub-bands (δ, β, γ) as an example. Compared with Add and Mult, the fusion method of Attention achieves 0.67% and 0.30% precision improvement. This indicates that the use of attention fusion can improve classification performance because more important features are assigned to the attention weights.

Specifically, the comparison of the accuracy of emotion recognition on the SEED data set by the method of the present invention and the methods proposed by other researchers is shown in table 2:

TABLE 2

Based on the above experiments, the invention adopts the delta, beta and gamma frequency bands and attention fusion mode to complete comparison. On the SEED data set, the model here was compared to the baseline model. The results are shown in Table 2. The performance of our model was improved by 2.21% compared to the best baseline model (see "RGNN" row in table 2).

Specifically, the comparison of the accuracy of emotion recognition performed on the self-created LE-EEG dataset by the method of the present invention with other methods is shown in table 3:

on the LE-EEG dataset, considering the inconsistency of the number of available EEG channels for the LE-EEG dataset and the SEED dataset, the SEED and LE-EEG datasets are 64 channels and 32 channels, respectively, and furthermore for robustness of the verification algorithm, C3= { "T7", "P7", "CP5", "T8", "P8", "CP6" } is preset to be as consistent as possible with C2, so as to perform cross-dataset verification. In the LE-EEG data set, the C3 channel combination is finally selected for learning emotion electroencephalogram emotion recognition.

Referring to the baseline model on the SEED dataset, upon validation on the LE-EEG dataset, two baseline models, 4d _crnnand SOGNN, which can be replicated with shared code, were selected for comparison. Table 3 lists comparisons with the baseline model. Our model performance was improved by 27.32% and 20.42% compared to the two baseline models (see column 3 of the rows "4d _crnn", "SOGNN" and "invention (C3)" in table 3), confirming that the network is robust across different data sets. FIG. 7 is a graph of validation set accuracy during training of three different models based on LE-EEG datasets according to an embodiment of the present invention. The present invention (ECN-AF model) yields better performance.

TABLE 3

As can be seen from tables 2 and 3, the electroencephalogram emotion recognition method of multi-channel frequency band feature attention fusion (ECN-AF) constructed by the invention has better accuracy in emotion recognition on the SEED data set and the self-built LE-EEG data set than the current mainstream method.

Fig. 7 is an architecture diagram of an electroencephalogram emotion recognition system provided in an embodiment of the present invention, as shown in fig. 7, including:

the electroencephalogram signal determining unit 710 is used for determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measurement position;

the electroencephalogram emotion recognition unit 720 is used for inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model so as to recognize corresponding electroencephalogram emotions; the electroencephalogram emotion recognition network model comprises: filtering by a four-order Butterworth band-pass filter, a multi-channel parallel convolution neural network, an attention network, a feature extraction network and a classification network; the four-order Butterworth band-pass filter is used for filtering the electroencephalogram signals of the M channels into five sub-bands, selecting the electroencephalogram signals of the C channels with high electroencephalogram emotion correlation degree from the electroencephalogram signals of the M channels, and taking the five sub-bands corresponding to the electroencephalogram signals of the C channels as preprocessed electroencephalogram signals; the multichannel parallel convolution neural network is used for extracting combined sequence features corresponding to channel features and time features of the preprocessed electroencephalogram signals under different frequency bands; the attention network is used for fusing the combined sequence features of different frequency bands to obtain fused features; the feature extraction network is used for extracting the depth features of the fused features; the classification network is used for classifying the depth features so as to identify corresponding electroencephalogram emotion; the five sub-frequency bands respectively correspond to delta waves, theta waves, alpha waves, beta waves and gamma waves; m and C are positive integers, and C is less than or equal to M.

The model training unit 730 is used for predetermining a video material library containing three types of triggered different learning emotions; the three learning emotions are respectively: investing learning emotion, neutral learning emotion or boring learning emotion; collecting electroencephalogram signals when a learner watches and triggers different types of learning emotion video materials; the electroencephalogram emotion corresponding to watching the video material triggering the input learning emotion is the input learning emotion, the electroencephalogram emotion corresponding to watching the video material triggering the neutral learning emotion is the neutral learning emotion, and the electroencephalogram emotion corresponding to watching the video material triggering the boring learning emotion is the boring learning emotion; and training the electroencephalogram emotion recognition network model by adopting the acquired electroencephalogram signals to obtain the trained electroencephalogram emotion recognition network model.

It should be noted that, for detailed function implementation of each unit in fig. 7, reference may be made to the description in the foregoing method embodiment, and details are not described herein.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The electroencephalogram emotion recognition method is characterized by comprising the following steps:

determining electroencephalogram signals of M channels of an electroencephalogram emotion entity to be identified; each channel corresponds to one electroencephalogram signal measurement position;

2. The electroencephalogram emotion recognition method of claim 1, wherein the C channels are located in temporal lobes of a measurer.

3. The electroencephalogram emotion recognition method according to claim 1, wherein the multichannel parallel convolutional neural network is used for extracting combined sequence features of the preprocessed electroencephalogram signal under different frequency bands, and specifically comprises the following steps:

Wherein the content of the first and second substances,

representing the electroencephalogram signal of f frequency under the combination of C channels.

4. The electroencephalogram emotion recognition method according to claim 3, wherein the attention network is configured to fuse the combined sequence features of different frequency bands to obtain a fused feature, and specifically includes:

from F ^C The feature attention-paying network of n frequency bands is selected to fuse the combination sequence features of the n selected frequency bands and output a feature F' after a fusion channel, a time sequence and the frequency bands:

Weight _k ＝Sigmoid(q ^T Mult(Select(F ^C ) _×n ))

F’＝Mult(Select(F ^C ) _×n )*Weight _k

wherein Select represents from F ^C Selecting n frequency band combinations; mult is the multiplication of the frequency band combination; sigmoid is an activation function, and the output is mapped between 0 and 1 and used as a threshold number of output weight; q. q.s ^T Calculating the similarity; weight _k The self-attention weight value of the selected n frequency bands is obtained; f' is the fused characteristic of attention network output; n is less than or equal to 5 and is a positive integer.

5. The electroencephalogram emotion recognition method according to any one of claims 1 to 4, further comprising the steps of:

6. An electroencephalogram emotion recognition system, comprising:

the electroencephalogram emotion recognition unit is used for inputting the electroencephalogram signals of the M channels into a pre-trained electroencephalogram emotion recognition network model so as to recognize corresponding electroencephalogram emotion; the electroencephalogram emotion recognition network model comprises: filtering by a four-order Butterworth band-pass filter, a multi-channel parallel convolution neural network, an attention network, a feature extraction network and a classification network; the fourth-order Butterworth band-pass filter is used for filtering the electroencephalogram signals of the M channels into five sub-bands, selecting the electroencephalogram signals of the C channels with high correlation degree with electroencephalogram emotion from the electroencephalogram signals of the M channels, and taking the five sub-bands corresponding to the electroencephalogram signals of the C channels as preprocessed electroencephalogram signals; the multichannel parallel convolution neural network is used for extracting combined sequence features corresponding to channel features and time features of the preprocessed electroencephalogram signals under different frequency bands; the attention network is used for fusing the combined sequence features of different frequency bands to obtain fused features; the feature extraction network is used for extracting the depth features of the fused features; the classification network is used for classifying the depth features so as to identify corresponding electroencephalogram emotion; the five sub-bands respectively correspond to delta waves, theta waves, alpha waves, beta waves and gamma waves; m and C are positive integers, and C is less than or equal to M.

7. The electroencephalogram emotion recognition system of claim 6, wherein the C channels are located in the temporal lobe of the measurer.

8. The electroencephalogram emotion recognition system of claim 6, wherein the multichannel parallel convolutional neural network is used for extracting combined sequence features of the preprocessed electroencephalogram signal under different frequency bands, and specifically comprises:

Wherein the content of the first and second substances,

the combined sequence characteristics of the multi-channel parallel convolution neural network output in the f frequency band under the combination of the C channels; f ^C Representing 5 sub-band feature sets extracted by a multi-channel parallel convolutional neural network under the combination of C channels; reLU tableShowing a nonlinear excitation function and increasing the nonlinear relation between network layers; h g denotes the electroencephalogram sequence X for the input _f ^C Performing convolution;

9. The electroencephalogram emotion recognition system of claim 8, wherein the attention network is configured to fuse the combined sequence features of different frequency bands to obtain fused features, and specifically:

Weight _k ＝Sigmoid(q ^T Mult(Select(F ^C ) _×n ))

F’＝Mult(Select(F ^C ) _×n )*Weight _k

wherein Select represents from F ^C Selecting n frequency band combinations; mult is the multiplication of the frequency band combination; sigmoid is an activation function, and the output is mapped between 0 and 1 and used as a threshold number of output weight; q. q of ^T Calculating the similarity; height _k The self-attention weight value of the selected n frequency bands is obtained; f' is the fused characteristic of attention network output; n is less than or equal to 5 and is a positive integer.

10. The electroencephalogram emotion recognition system according to any one of claims 6 to 9, further comprising:

the model training unit is used for predetermining a video material library containing three types of triggered different learning emotions; the three learning emotions are respectively: investing learning emotion, neutral learning emotion or boring learning emotion; determining an electroencephalogram signal set to be trained corresponding to the video material library; the electroencephalogram signal set to be trained comprises: inputting an electroencephalogram signal corresponding to learning emotion, an electroencephalogram signal corresponding to neutral learning emotion and an electroencephalogram signal corresponding to boring learning emotion; the brain electrical signal of the learning-invested emotion is obtained by a learner watching a video material triggering the learning-invested emotion, the neutral learning emotion is obtained by a learner watching a video material triggering the neutral learning emotion, and the boring learning emotion is obtained by a learner watching a video material triggering the boring learning emotion; and training the electroencephalogram emotion recognition network model by adopting the electroencephalogram signal set to be trained to obtain the trained electroencephalogram emotion recognition network model.