CN114767130A

CN114767130A - Multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging

Info

Publication number: CN114767130A
Application number: CN202210440906.XA
Authority: CN
Inventors: 徐华兴; 胡飞; 常加兴; 毛晓波; 李立国; 郑鹏远
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-07-22

Abstract

The invention discloses a multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging, which combines multi-scale and time sequence imaging algorithms, realizes emotion recognition by converting electroencephalogram signals into images, can save the spatial information of the electroencephalogram signals, can reduce the calculated amount by using the multi-scale algorithm, finds potential electroencephalogram signal modes, and codes high-dimensional information into the images at the same time, so that the images contain rich information, fully utilizes the advantages of machine vision, extracts the high-dimensional features of the images by using a 2DCNN model, and obtains better emotion classification results by different multi-modal feature fusion methods.

Description

Multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging

Technical Field

The invention relates to the field of physiological signal processing, in particular to a multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging.

Background

Emotion is a complex psychological and physiological state that affects people's cognition, behavior, and interpersonal interactions. According to cognitive and neurophysiological theories, emotions that play an important role in human brain activity can be detected in brain electroencephalogram (EEG) signals. Thus, effective emotion recognition can be performed using the EEG signal.

The traditional EEG signal-based emotion recognition method mainly uses a 1DCNN (Chinese definition: one-dimensional convolution) technology to extract signal features of an electroencephalogram and trains a classifier to realize emotion recognition. The traditional emotion recognition method only focuses on time domain or frequency domain information, so that electroencephalogram spatial information is seriously lost, the classification performance is limited, a great deal of effort is needed to search signal features related to emotion from an original EEG signal, corresponding correlation is constructed, the feature calculation is long in time consumption, and the generalization capability is very limited. In recent years, deep learning is vigorously developed in various fields, and more possibilities are provided for constructing emotion classification models.

Disclosure of Invention

The invention aims to provide a multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging, and aims to solve the problems that emotion related features need to be constructed, feature calculation is long in time consumption, model generalization capability is poor, electroencephalogram spatial information is seriously lost, and classification performance is limited in a traditional emotion recognition method.

In order to realize the purpose, the invention adopts the following technical scheme: time series classification using time series coded imaging also shows high performance due to the success of convolutional neural networks in image classification. The invention combines multi-scale and time sequence imaging algorithms, realizes emotion recognition by converting electroencephalogram signals into images, can save spatial information of the electroencephalogram signals, can reduce calculated amount by using the multi-scale algorithm, finds potential electroencephalogram signal modes, and codes high-dimensional information into the images simultaneously so that the images contain rich information, fully utilizes the advantages of machine vision, extracts high-dimensional characteristics of the images by using a 2DCNN (Chinese definition: two-dimensional convolution) model, and obtains better emotion classification results by different multi-modal characteristic fusion methods.

The invention discloses a multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging, which comprises the following steps:

s1, performing baseline removal on the original electroencephalogram signal by using a python code to obtain a first electroencephalogram signal;

s2, performing multi-scale processing on the first electroencephalogram signal to obtain a second electroencephalogram signal;

s3, converting the second brain electrical signal into an image by using a time series imaging algorithm to obtain N image data sets;

s4, performing data enhancement on the N image data sets to construct N samples and label sets;

s5, obtaining N first feature vectors by the N samples and the label sets through a ResNet model and a DNN-01 model respectively;

s6, forming 3 second eigenvectors by combining N first eigenvectors;

s7, forming a multi-modal feature fusion electroencephalogram emotion classification model by the 3 second feature vectors through a DNN-02 model respectively;

and S8, randomly dividing the N samples and the label sets in the step S4 into M parts by adopting a ten-fold cross validation method, taking M-1 part as training data and the others as test data, training the multi-mode feature fusion electroencephalogram emotion classification model, and obtaining an electroencephalogram emotion classification recognition model.

Further, the baseline removal includes the following: and dividing the baseline signal and the experimental signal in the original electroencephalogram signal into a K section and an I section with the length of L respectively, and subtracting the average value of all the baseline signal sections from each experimental signal section.

Further, the mathematical definition of the multi-scale process is:

，

wherein the content of the first and second substances,

for the set time scale, L is the length of the original brain electrical signal,

is the signal value of the original brain electrical signal at the time i,

is the second brain electrical signal, j is the index of the second brain electrical signal.

Further, the data is enhanced to Mixup.

Further, 3 of the second feature vectors are: a second eigenvector formed by adding N of the first eigenvectors; a second feature vector consisting of the maximum values of the same positions of the N first feature vectors; and a second feature vector formed by weighted combination of the N first feature vectors by using a full connection layer.

Further, the weight in the weighted combination is a parameter of the N first feature vectors after full-link layer training.

Further, the loss calculation formula of the multi-modal feature fusion electroencephalogram emotion classification model is as follows:

，

wherein the content of the first and second substances,

and

l i, being a preset parameter, is the loss of N of the samples and label sets through the ResNet model and the DNN-01 model,

an integer of (d); lcom is a loss of the N third feature vectors through the DNN-02 model, respectively.

The invention has the advantages that the multi-scale and time sequence imaging algorithms are combined, emotion recognition is realized by converting electroencephalogram signals into images, compared with the traditional emotion recognition method based on EEG signals, the method not only can store the spatial information of the electroencephalogram signals, but also can reduce the calculated amount by using the multi-scale algorithm, find potential electroencephalogram signal modes, and encode high-dimensional information into the images, so that the images contain rich information, the advantages of machine vision are fully utilized, the high-dimensional characteristics of the images are extracted by using the 2DCNN model, and better emotion classification results are obtained by different multi-modal characteristic fusion methods.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a graph of 32 electrode positions in an embodiment of the method of the invention.

FIG. 3 is a schematic diagram of spatial information obtained after electroencephalogram signals are converted into images in the embodiment of the method.

Fig. 4 is a schematic diagram of a ResNet network structure in the embodiment of the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in FIG. 1, the multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging comprises the following steps: the method comprises the following steps:

this embodiment downloads an electroencephalogram signal data set from the disclosed DEAP as raw data. In the DEAP database, 32 participants participated in the experiment. Each participant was asked to watch 40 one minute music videos and electroencephalographic signals were recorded from 32 electrodes according to the international 10-20 system, the electrode positions being shown in fig. 2. Participants scored the allotment, arousal, disposition and preference on a continuous scale between 1 and 9 after viewing each video. The data recorded by each participant included 40 pieces of electroencephalographic (abbreviated in english: EEG) data and corresponding labels. Each segment of brain electrical data contains 60 seconds of the experimental signal and a 3 second baseline signal in a relaxed state.

Because the human electroencephalogram signal is unstable, the human electroencephalogram signal is easily influenced by tiny changes of the surrounding environment; and the electroencephalogram signals generated by the emotional stimulation are also influenced by the emotional state before the stimulation is received to a certain extent. Thus, removing the baseline may achieve better classification results. In the invention, a baseline signal and an experimental signal are respectively divided into a K section and an I section with the length of L, and then a python code is used for removing the baseline signal (namely, the electroencephalogram signal in a relaxed state) from the electroencephalogram signal, the method is to subtract the average value of all baseline signal sections from each experimental signal section, and the mathematical expression is as follows:

，

，

wherein the content of the first and second substances,

the average value of all baseline signal segments;

is the ith section of baseline signal;

is an i-th experimental signal segment;

an i-th experimental signal section after the baseline is removed;

because the potential electroencephalogram mode of the electroencephalogram signal is unknown and the relevant time scale is also unknown, the electroencephalogram signal can be processed in a multi-scale mode, the data size can be reduced, different scale modes can be learned by a machine, and the classification performance is improved. The mathematical definition of the multiscale process is:

，

wherein the content of the first and second substances,

in order to set the time scale for the device,

is the length of the original brain electrical signal,

is the signal value of the original brain electrical signal at the ith moment,

When the temperature is higher than the set temperature

When the utility model is used, the water is discharged,

namely the signal value of the original brain electrical signal.

When the temperature is higher than the set temperature

When the utility model is used, the water is discharged,

the time sequence after coarse granulation formed by the average value of the original EEG signal values at every two continuous moments is obtained. I.e., when j =1, the system,

when the sum of j =2 is greater than the maximum value,

when the sum of j =3 is greater than or equal to,

when the temperature is higher than the set temperature

When the utility model is used, the water is discharged,

the time sequence after coarse graining is formed by the average value of the original EEG signal values at every three continuous moments. I.e., when j =1, the system,

and when j =2, the ratio of the total of the three components,

and when j =3, the number of the terminals,

the time sequence after coarse graining is the second brain electrical signal.

the step of converting the second brain electrical signal into the image can utilize information in the original brain electrical signal and encode high-dimensional information into the image, so that the image contains rich information, the advantages of the existing machine vision can be fully utilized, and a better emotion classification result can be obtained. The invention adopts a time series imaging algorithm (MDF for short) to convert the second brain electrical signal.

First, from a time series after a certain coarse graining

，

Take n values as a basic time series unit, and record as

An integer of (d); and setting the interval d of the values and the initial index s of the values.

When d =1, it indicates a time series from a certain coarse grain

Taking n values continuously as a basic time sequence unit, then

When the n is =2, the number of the n is more than 2,

；

when n =3, the number of the bits is increased,

；

when in use

Time represents a time series from a certain coarse grain

Taking n values as a basic time sequence unit, then

When n =2, the number of the bits is increased,

；

when the n is =3, the number of the n is more than 3,

；

namely, it is

；

Then, the difference between the basic units is calculated according to different intervals d to obtain a new time sequence which is recorded as

The concrete steps are shown as the following formula:

wherein the content of the first and second substances,

as a result of this, it is possible to,

the lengths are different, and a new sequence needs to be constructed

The formula is expressed as:

wherein the content of the first and second substances,

then, an MDF matrix is constructed, the formula being:

when n is determined, the corresponding

Namely have

Is composed of elements capable of generating

A channel data, wherein

The matrix of individual channels can be defined as:

wherein the content of the first and second substances,

。

to fill in

The element with a value of 0 in the matrix, each channel of the MDF image is defined as:

wherein

Representing a Hadamard product (Hadamard product),

is that

Matrix rotation

The matrix of the latter is then formed,

to prevent from

And

when the two are added, they are overlapped,

meanwhile, after each channel data of the second electroencephalogram signal is converted into an image by using the time series imaging algorithm, the images are spliced into a large image according to the physical positions of the corresponding channels, and the spatial information of the electroencephalogram signal is reserved as much as possible, as shown in fig. 3. The letters in the figure indicate 32 electrode positions for electrode placement according to the international 10-20 system.

S5, performing data enhancement on the N image data sets to construct N samples and label sets;

mixup is a simple and efficient data enhancement method that constructs new training samples and labels in a linear interpolation fashion. The application can remarkably enhance the generalization capability of the network architecture by using the Mixup and costs little calculation expense. The specific process of the step is expressed by a formula as follows:

wherein, the first and the second end of the pipe are connected with each other,

and

sample data in the image dataset;

and

is and is

And

a corresponding label.

Is a compliance

The parameters of the distribution are such that,

wherein

。

S6, obtaining N first feature vectors by the N samples and the label set through a ResNet model and a DNN-01 model respectively;

using the classical ResNet model as a feature extraction network, there are two differences compared to the traditional CNN (convolutional neural) network: (1) use residual structure, can build ultra-deep network structure, (2) use the Batch standardization layer (english is pieced together for Batch Normalization), solved two problems that traditional convolution network exists: (1) disappearance of the gradient or explosion of the gradient. (2) The degeneration problem (English spelling: degeneration program).

The ResNet18 network is mainly composed of an input layer, a convolutional layer, a Batch Normalization layer, an activation function, a pooling layer, a residual structure, a full link layer, and an exponential Normalization layer (Softmax), and the specific structure is shown in fig. 4.

In the embodiment of the application, when the MDF algorithm is used for converting the second electroencephalogram signal, n is 2, 3 and 4 respectively, and three different image data sets are obtained. Considering that the information contained in the images converted by different algorithms is different, in order to fully utilize all image data, the invention respectively sends three different image data sets to a ResNet18 model and a DNN-01 model, extracts the high-level features of different images and obtains 3 first feature vectors which are recorded as

，

，

。

The network structure of the DNN-01 model is shown in table 1 below, where FL represents a fully-connected layer, RELU represents a linear rectification function, Dr represents random deactivation, and the numbers in the Output column represent the dimension of the Output characteristic of the layer.

TABLE 1 DNN-01 network architecture

Wherein, the meaning of RELU in the supplementary table, the meaning of the number in the Outpu column

Although all three different image data sets need to be subjected to the same ResNet18 model and DNN-01 model, parameters after model training are different due to different image data.

S6, forming 3 second eigenvectors by combining the 3 first eigenvectors;

as shown in fig. 4, in the present application, a feature combiner (hereinafter, referred to as Comber) module combines 3 first feature vectors into 3 second feature vectors, and the specific combination method is as follows:

the first combination method is denoted as SUM and will

，

，

Adding the three first eigenvectors to obtain a new vector

The formula is expressed as:

the second combination method is marked as MAX, and is taken out

，

，

Three first feature vectors having maximum values at the same positions, and forming a new vector by using the maximum values

The formula is expressed as:

a third combination method is denoted FC, provided that

，

，

The three first feature vectors have a linear relation, and then the three first feature vectors are weighted and combined by using a full connection layer to obtain a new vector

The formula is expressed as:

wherein

Is that

，

，

And the three first feature vectors are parameters after full-connection layer training.

S7, sending the 3 second feature vectors formed after combination to a DNN-02 model to form a multi-mode feature fusion electroencephalogram emotion classification model; the DNN-02 network structure is shown in table 2.

TABLE 2 DNN-02 network architecture

And S8, randomly dividing the N samples and the label sets in the step S4 into 10 parts by adopting a ten-fold cross validation method, taking 9 parts as training data and taking the rest 1 part as test data, training the multi-modal feature fusion electroencephalogram emotion classification model, and obtaining an electroencephalogram emotion classification recognition model.

In the embodiment, the electroencephalogram signal of each experimenter is converted by using MDF, n is 2, 3 and 4, three different image data sets are obtained, 2400 pictures are obtained, the image data sets are randomly divided into 10 parts by using a ten-fold cross validation method, 9 parts of the image data sets are taken as training data in turn, and 1 part of the image data sets is taken as test data. And then, the pictures of the training set are sent to the multi-modal feature fusion electroencephalogram emotion classification model obtained in the step S7 for training, an Adam optimizer is selected, the learning rate is set to be 0.0001, the loss function is set to be a cross entropy loss function, and the batch size is set to be 32, so that the electroencephalogram emotion classification recognition model is obtained.

In this embodiment, a total of 4 Loss functions (Loss) are defined as

、

、

、

To train the classification network in such a way that,

for optimizing the penalty in training three different image data sets through a single ResNet18 and DNN-01 network architecture,

for optimizing the loss of the second feature vectors through the DNN-02 model, respectively. By passing

The loss of the entire network structure is optimized,

the formula is as follows:

and

for the preset parameters, focusing on learning specific features or combinations, the present embodiment is set to 1/3 and 1.0, making the contribution of each image data when training equal.

Claims

1. A multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging is characterized by comprising the following steps: the method comprises the following steps:

s2, carrying out multi-scale processing on the first electroencephalogram signal to obtain a second electroencephalogram signal;

s6, forming 3 second eigenvectors by combining N first eigenvectors;

and S8, randomly dividing the N samples and the label sets in the step S4 into ten parts by adopting a ten-fold cross validation method, taking nine parts as training data and taking the rest part as test data, training the multi-modal feature fusion electroencephalogram emotion classification model, and obtaining an electroencephalogram emotion classification recognition model.

2. The multi-modal feature fusion electroencephalogram emotion recognition method based on multi-scale imaging as recited in claim 1, wherein: the baseline removal includes the following: and respectively dividing the baseline signal and the experimental signal in the original electroencephalogram signal into a K section and an I section with the length of L, and subtracting the average value of all the baseline signal sections from each experimental signal section.

3. The multi-modality feature fusion electroencephalogram emotion recognition method based on multi-scale imaging as claimed in claim 1, wherein: the mathematical definition of the multiscale process is:

，

is the signal value of the original brain electrical signal at the time i,

4. The multi-modality feature fusion electroencephalogram emotion recognition method based on multi-scale imaging as claimed in claim 1, wherein: the data enhancement is Mixup.

5. The multi-modality feature fusion electroencephalogram emotion recognition method based on multi-scale imaging as claimed in claim 1, wherein: 3 of the second feature vectors are: a second feature vector composed of N additions of the first feature vectors; a second feature vector consisting of the maximum values of the same positions of the N first feature vectors; and a second feature vector formed by weighted combination of the N first feature vectors by using a full connection layer.

6. The multi-modality feature fusion electroencephalogram emotion recognition method based on multi-scale imaging, which is characterized in that: and the weight in the weighted combination is the parameter of the N first characteristic vectors after full-connected layer training.

7. The multi-modality feature fusion electroencephalogram emotion recognition method based on multi-scale imaging as claimed in claim 1, wherein: the loss calculation formula of the multi-modal feature fusion electroencephalogram emotion classification model is as follows:

，

and

l i is the loss of the N samples and the label set through the ResNet model and the DNN-01 model for the preset parameters,

an integer of (a); l is a radical of an alcohol_comAnd respectively passing the loss of the DNN-02 model for the N third feature vectors.