CN115422973A

CN115422973A - Electroencephalogram emotion recognition method of space-time network based on attention

Info

Publication number: CN115422973A
Application number: CN202211072485.6A
Authority: CN
Inventors: 高瞻; 朱琳; 邵叶秦; 王杰华
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-02

Abstract

The invention provides an electroencephalogram emotion recognition method based on attention space-time network, which comprises the following steps: s1, acquiring electroencephalogram data and tag data of a video clip watched by a subject; s2, preprocessing data, including: down-sampling, removing a baseline signal, removing an electro-oculogram signal and a blinking physiological artifact, and dividing data by a band-pass filtering and reserving 4.0-45.0 Hz of electroencephalogram signals; s3, constructing a space-time convolution neural network model based on nuclear attention; s4, training a neural network model, and adjusting parameters to obtain classification results of electroencephalogram emotion recognition; and S5, verifying the effectiveness of the space-time network model based on the nuclear attention by using the set evaluation indexes and comparing the SVM model as a base line with various deep neural network models. The invention has higher identification accuracy, can ensure that the machine can effectively identify and understand the emotion of human beings, ensures that the human-computer interaction is more friendly, and has stronger application value in the field of human-computer interaction.

Description

Electroencephalogram emotion recognition method of space-time network based on attention

Technical Field

The invention relates to the field of human-computer interaction and emotion calculation, in particular to an electroencephalogram emotion recognition method based on attention space-time network.

Background

Emotions are one of the most fundamental empirical constituents of humans. In the category of emotional psychology, emotion refers to a mental activity mediated by a subject, which is a picture between an individual and various environmental significances, is a multi-component composition, a multi-dimensional structure, multi-level integration, and is a mental activity process and mental motivation force interacting with cognition for organism survival adaptation and interpersonal interaction. The human beings strive to make the application of the technology more compatible and friendly, and the emotion calculation science appears, is a step of water flow, and can make the machine recognize and understand the human emotion, and have the emotional ability of expressing the caring. Although emotion calculation is a relatively novel branch of the field of artificial intelligence, it is one of important technologies for machines to possess emotion and finally realize real-meaning intelligence.

In addition, emotion recognition may analyze an individual's biological signals, including facial data, brain electrical data, and peripheral physiological data, through machine learning and deep learning models. Although facial expression data is less costly to obtain, research has found that some people can control their facial expressions to hide their emotions. Thus, many studies are directed to analysis using physiological signal data. Electroencephalogram data has been widely used in recent years as a typical physiological signal.

When dealing with the problem of emotion recognition based on electroencephalogram data, a problem often encountered is how to extract distinguishing emotional features from electroencephalogram signals. The traditional feature extraction method is to extract the electroencephalogram features from time domain, frequency domain and time-frequency domain, and depends on manual production.

Disclosure of Invention

The invention aims to provide an attention-based electroencephalogram emotion recognition method of a space-time network, which improves accuracy of electroencephalogram emotion recognition of a subject, so that a machine can obtain the ability of recognizing and understanding human emotion, the machine has a care, and man-machine friendly communication is promoted.

In order to solve the technical problem, the embodiment of the invention provides an electroencephalogram emotion recognition method of a space-time network based on attention, which comprises the following steps:

s1, acquiring electroencephalogram data and tag data of a video clip watched by a subject;

s2, preprocessing data, including: down-sampling, removing a baseline signal, removing an eye electrical signal and a blinking physiological artifact, and dividing data by a band-pass filtering and reserving 4.0-45.0 Hz electroencephalogram signal;

s3, constructing a space-time convolution neural network model based on nuclear attention;

s4, training the neural network model in the step S3, and adjusting parameters to obtain a classification result of electroencephalogram emotion recognition;

and S5, comparing the set evaluation indexes with the SVM model as a base line and various deep neural network models, and verifying the effectiveness of the space-time network model based on the nuclear attention.

In the step S1, the electroencephalogram data of a video at one end watched by each subject comprises a base line signal of 3S and an emotional stimulation signal of 60S, a 32-channel Biosemi ActiveTwo system is adopted, the sampling frequency is 512Hz, an international standard 10/20 system is adopted to obtain the electroencephalogram data of the user, and different emotional tags are given to the electroencephalogram data of the user under different emotional states; the tag data comprise arousal degree and valence degree, each dimensionality level of the tag data is measured by adopting a 9-point scale, and the larger the numerical value is, the higher the emotional activation degree is represented.

Wherein, step S2 includes the following steps:

s2.1, removing a base line signal of 3S in the electroencephalogram data;

s2.2, down-sampling the sampling frequency of 512Hz to 128Hz;

s2.3, removing an electro-oculogram signal and a blinking physiological artifact in the electroencephalogram data by adopting a blind source separation method;

s2.4, removing the interference of high-frequency noise, and performing 4.0-45.0 Hz band-pass filtering on the electroencephalogram data;

s2.5, selecting 5 as a threshold value for the class label of each dimension of the electroencephalogram data, and projecting 9 discrete values into the low class and the high class of each dimension;

s2.6: data expansion: 60s electroencephalogram data in the electroencephalogram data are divided into smaller non-overlapping 4s segments, and each electroencephalogram data is divided into 15 segments.

In step S3, the kernel attention-based spatio-temporal convolution neural network model includes a kernel attention module, a spatial dependency module and a temporal dependency module, and includes the following steps:

s3.1, acquiring node attributes of electroencephalogram by using a nuclear attention module, acquiring strong features by adopting a cascade convolution layer, and fusing and splicing electroencephalogram features learned by different nuclei to serve as the node attributes in electroencephalogram representation;

s3.2, extracting spatial characteristic information of the electroencephalogram data by a spatial dependence module;

and S3.3, extracting time sequence data of the electroencephalogram data by adopting a long-time memory network model through a time dependence module, and influencing the current state by utilizing historical information.

Wherein step S3.1 comprises the steps of:

s3.1.1, the length of the kernel is determined by the sampling rate f _s Is set in different proportions, the ratio coefficient is recorded as alpha ^k E R, where k is the number of convolutional layers, k will be from 1 to k, the size of the kth layer T core

Expressed, the formula is as follows:

s3.1.2, the electroencephalogram data expression formula after the step S2 is as follows:

X _i ∈R ^c×l ,i∈[1,...,n]；

wherein n represents the sample size of the electroencephalogram data, c represents the channel number of the electroencephalogram signal, and l represents the length of a single sample;

s3.1.3, fusion splicing is carried out on electroencephalogram characteristics learned by different kernels, and a final characteristic expression formula of an attention module is as follows:

wherein, F _bn Representing a batch normalization function, F _avgpool Represents the average pooling function, F _cat Representing a feature fusion function, F _conv Representing a convolution function.

Wherein step S3.2 comprises the steps of:

s3.2.1, the spatial dependency module comprises the basic units repeated four times, i.e. the 2D convolution layer, the Batch normal layer, the leak ReLU layer and the average pooling layer, taking the output characteristics obtained by the kernel attention module in step S3.1 as input characteristics, the final characteristics of the spatial dependency module are expressed by the following formula:

wherein, F _Leakyrelu Representing an activation function, F _{spatial_conv} Represents a spatial convolution function;

s3.2.2, the activating function of the convolutional neural network adopts a Leaky ReLU activating function, which can solve the problem that when the learning rate is too large, the neuron gradient in the network is always 0, and the formula is as follows:

f(x)＝max(0,x)+leak*min(0,x)；

wherein, the value of leak is 0.01.

In step S3.3, the long-time memory network model introduces a gate (gate) structure into each LSTM unit (cell state), that is, each LSTM unit is composed of a sigmoid function and a dot product operation, and an output value of the sigmoid function is in a [0,1] interval, and controls to discard or add information to determine forgetting or memorizing, where each LSTM unit includes a forgetting gate (forget gate), an input gate (input gate), and an output gate (output gate), and the contents are as follows:

s3.3.1, forgetting gate (forget gate): output h through the last cell _t-1 And input of the unit

Is the input sigmod function of C _t-1 Each term in (a) results in a value of [0,1]Inner value to control the degree to which the last cell state was forgotten, which is expressed as follows:

where σ denotes sigmoid function, h _t-1 The hidden node of the last cell is represented,

representing electroencephalogram data, W _f Represents a forgetting gate weight coefficient, b _f Indicating a forgotten door error;

s3.3.2, input gate: by controlling the addition of new information in conjunction with a tanh function that generates a new candidate entry gate i _t Is in a cell state C _t Each entry in (a) yields one [0,1 [ ]]The value of (d) is used to update the cell state of the memory cell, and the formula is as follows:

wherein σ represents sigmoid function, tanh represents tanh function, W _i Representing the input gate weight coefficient, b _i Indicates an input gate error, C _t Indicates a new cell state, W _C Representing the cell state weight coefficient, b _C Representing a cell state error;

s3.3.3, output gate (output gate): activation of cell status, output gate o _t Is in a cell state C _t Each term in (a) yields one [0,1]Internal value to control the filtering of the current cell state and output the hidden section of the next cellPoint h _t Values, which are formulated as follows:

h _t ＝o _t *tanh(C _t )；

wherein, W _o Representing the output gate weight coefficient, b _o Indicating an output gate error.

Wherein, step S4 includes the following steps:

s4.1, cross validation: grouping the electroencephalogram data processed in the step S2, respectively setting training data and verification data, firstly training the network model in the step S3 by using the training data, and then testing the model obtained by training by using the verification data;

and S4.2, improving the model effect through parameter adjustment, storing the network model, and outputting the classification result of emotion recognition.

Wherein, the step S4.1 comprises the following steps:

in each of 10 cross-validation steps, 1 time was selected as test data and the remaining 9 times were selected as training data, and in 9 training sessions, the data was randomly divided into 70% training data and 30% validation data.

Wherein, step S5 includes the following steps:

s5.1, performance evaluation: selecting a first metric as an evaluation index, which is one of the most common evaluation indexes in the classification problem, and can measure whether the prediction of the class data set is accurate, and the first metric is the ratio of the number of correctly predicted samples to the total number of samples to the classification problem, wherein the performance evaluation formula is as follows:

wherein TP is true positive, TN is true negative, FP is false positive, FN is false negative;

s5.2, because the data set has the problem of unbalanced category to a certain extent, selecting an F1 (F1 score) score evaluation index to quantify the electroencephalogram emotion recognition result, combining the recall ratio and precision ratio of the classifier, and defining the score evaluation index as a harmonic mean value of the recall ratio and precision ratio of the classifier, wherein the formula is as follows:

s5.3, in model comparison, selecting a SVM (support vector machine) in the traditional machine learning method as a base line network model, and comparing the model with deep learning networks such as DeepConvNet, EEGNet and TSconcept.

The technical scheme of the invention has the following beneficial effects:

the invention provides a space-time convolution neural network method based on kernel attention, which is different from a characteristic method of artificial extraction, wherein a multi-scale one-dimensional convolution kernel is designed by utilizing time and channel dimensions to form an attention module, and node attributes of electroencephalogram are obtained; in order to effectively learn spatial information of emotion recognition from the electroencephalogram, a convolutional neural network is adopted to process the spatial dependence of the electroencephalogram; electroencephalogram data is typical time-series data, and is processed by using a long-time memory network model. According to the method, the time dependency of the electroencephalogram signals is obtained by adopting the long-time memory network model, the accuracy of electroencephalogram emotion recognition can be effectively improved, and the method has a high application value in the field of human-computer interaction.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic view of the model structure of the present invention;

FIG. 3 is a schematic diagram of the time-dependent module model of the present invention.

Detailed Description

To make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, an embodiment of the present invention provides an electroencephalogram emotion recognition method based on an attention space-time network, including the following steps:

and S5, verifying the effectiveness of the space-time network model based on the nuclear attention by using the set evaluation indexes and comparing the SVM model as a base line with various deep neural network models.

In the step S1, the electroencephalogram data of a video at one end watched by each testee comprises a base line signal of 3S and an emotional stimulation signal of 60S, a 32-channel Biosemi ActiveTwo system is adopted, the sampling frequency is 512Hz, an international standard 10/20 system is adopted to obtain the electroencephalogram data of a user, and different emotional tags are given to the electroencephalogram data of the user under different emotional states; the tag data comprise arousal degree and valence degree, each dimensionality level of the tag data is measured by adopting a 9-point scale, and the larger the numerical value is, the higher the emotional activation degree is represented.

S2.2, down-sampling the sampling frequency of 512Hz to 128Hz;

s2.3, removing an electro-oculogram signal and blinking physiological artifacts in the electroencephalogram data by adopting a blind source separation method;

As shown in fig. 2, in step S3, the kernel attention-based spatio-temporal convolutional neural network model includes a kernel attention module, a spatial dependency module and a temporal dependency module, and includes the following steps:

Wherein step S3.1 comprises the steps of:

s3.1.1, a node representation enabling the network to learn dynamics, the length of the kernel is scaled by the sampling rate f _s Is set in different proportions, the ratio coefficient is recorded as alpha ^k E R, where k is the number of convolutional layers, k is from 1 to k, the size of the kth layer T core

Expressed, the formula is as follows:

X _i ∈R ^c×l ,i∈[1,...,n]；

wherein, F _bn RepresentsBatch normalization function, F _avgpool Represents the average pooling function, F _cat Representing a feature fusion function, F _conv Representing a convolution function.

Step S3.2 comprises the following steps:

s3.2.1, the spatial dependency module includes basic units repeated four times, i.e. a 2D convolution layer, a Batch normal layer, a leakage ReLU layer, and an average pooling layer, and the output characteristics obtained by the kernel attention module in step S3.1 are used as input characteristics, and the final characteristics of the spatial dependency module are expressed by the following formula:

s3.2.2, the activation function of the convolutional neural network adopts a Leaky ReLU activation function, compared with the traditional ReLU activation function, it can solve the problem that when the learning rate is too large, the gradient of the neurons in the network is always 0, and the formula is as follows:

f(x)＝max(0,x)+leak*min(0,x)；

wherein, the value of leak is 0.01.

In step S3.3, the electroencephalogram data is typical time series data, and the long-and-short memory network model (LSTM) can process the time series data well, and the current state is affected by using the historical information, so that the time dependency of the electroencephalogram is obtained by using the network in the present project, and the specific design is shown in fig. 3.

In this embodiment, the long-time memory network model introduces a gate (gate) structure into each LSTM unit (cell state), that is, each LSTM unit (cell state) is composed of a sigmoid function and a dot product operation, an output value of the sigmoid function is in a [0,1], 0 represents complete discarding, and 1 represents complete passing, so as to control discarding or adding information, so as to determine forgetting or memorizing, where each LSTM unit includes a forgetting gate (forget gate), an input gate (input gate), and an output gate (output gate), and the contents are as follows:

s3.3.2, input gate: by controlling the addition of new information in conjunction with a tanh function which generates a new candidate entry gate i _t Is a cell state C _t Each term in (a) yields one [0,1]The value of (d) is used to update the cell state of the memory cell, and the formula is as follows:

wherein σ represents sigmoid function, tanh represents tanh function, W _i Representing the input gate weight coefficient, b _i Indicates an input gate error, C _t Indicates a new cell state, W _C Represents the cell state weight coefficient, b _C Representing a cell state error;

s3.3.3, output gate (output gate): activation of cell status, output gate o _t Is in a cell state C _t Each term in (a) yields one [0,1]Internal value to control the filtering of the current cell state and output the hidden node h of the next cell _t Values, which are formulated as follows:

h _t ＝o _t *tanh(C _t )；

Step S4 includes the following steps:

s4.1, cross validation: using the neural network model in the S3, selecting 1 time as test data in each step of 10 times of cross validation, and taking the other 9 times as training data, wherein in 9 times of training, the data is randomly divided into 70% of training data and 30% of validation data;

Step S5 includes the steps of:

s5.1, performance evaluation: selecting a first metric as an evaluation index, which is one of the most common evaluation indexes in the classification problem, and measuring whether the prediction of the class data set is accurate, wherein the first metric is a ratio of the number of correctly predicted samples to the total number of samples to the classification problem, and the formula is as follows:

The experimental results show that the comparison method 1 adopts a traditional machine learning SVM method, the method 2 adopts an EEGNet method, the method 3 adopts a DeepConvNet method, the method 4 adopts a TSception method, and the method can effectively improve the emotion recognition accuracy by analyzing and evaluating indexes.

The time-space convolution neural network electroencephalogram emotion recognition method based on nuclear attention has high recognition accuracy, can enable a machine to effectively recognize and understand human emotion, enables human-computer interaction to be friendly, and has high application value in the field of human-computer interaction.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An attention-based electroencephalogram emotion recognition method of a space-time network is characterized by comprising the following steps:

2. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 1, characterized in that, in step S1, electroencephalogram data of a section of video watched by each subject includes a baseline signal of 3S and an emotion stimulation signal of 60S, a 32-channel Biosemi ActiveTwo system is adopted, a sampling frequency is 512Hz, an international standard 10/20 system is adopted to obtain electroencephalogram data of a user, and different emotion labels are given to the electroencephalogram data of the user under different emotion states;

the tag data comprise arousal degree and valence degree, each dimensionality level of the tag data is measured by adopting a 9-point scale, and the more the numerical value is, the higher the emotional activation degree is.

3. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 1, characterized in that step S2 comprises the steps of:

s2.1, removing a base line signal of 3S in the electroencephalogram data;

s2.2, down-sampling the sampling frequency of 512Hz to 128Hz;

4. The electroencephalogram emotion recognition method of an attention-based space-time network, as claimed in claim 1, wherein in step S3, the kernel attention-based space-time convolutional neural network model includes a kernel attention module, a spatial dependency module and a temporal dependency module, and includes the steps of:

s3.1, acquiring node attributes of the electroencephalogram by using a nuclear attention module;

5. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 4, characterized in that step S3.1 includes the steps of:

Expressed, the formula is as follows:

X _i ∈R ^c×l ,i∈[1,...,n]；

6. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 4, characterized in that step S3.2 comprises the steps of:

s3.2.1, the spatial dependency module includes basic units repeated four times, i.e. 2D convolution layer, batch normal layer, leakage ReLU layer and average pooling layer, taking the output obtained by the kernel attention module in step S3.1 as the input feature, and the final feature of the spatial dependency module is expressed by the following formula:

s3.2.2, the activation function of the convolutional neural network adopts a Leaky ReLU activation function, and the formula is as follows:

f(x)＝max(0,x)+leak*min(0,x)；

wherein, the value of leak is 0.01.

7. The electroencephalogram emotion recognition method for attention-based spatiotemporal networks, according to claim 4, characterized in that, in step S3.3, the long-and-short memory network model controls to discard or add information to determine forgetting or memorizing by introducing a gate structure, namely, composed of a sigmoid function and a dot product operation, into each LSTM unit, and the output value of the sigmoid function is in the interval of [0,1], where each LSTM unit comprises a forgetting gate, an input gate and an output gate, and the contents are as follows:

s3.3.1, forget gate: output h through the last cell _t-1 And the input of the unit

Sigmod function as input, C _t-1 Each term in (a) results in a value of [0,1]The value of (b) to control the degree to which the previous cell state was forgotten is expressed as follows:

s3.3.2, input gate: by controlling the addition of new information in conjunction with a tanh function that generates a new candidate entry gate i _t Is in a cell state C _t Each term in (a) yields one [0,1]The value of (d) is used to update the cell state of the memory cell, and the formula is as follows:

s3.3.3, output gate: activation of cell status, output gate o _t Is a cell state C _t Each entry in (a) yields one [0,1 [ ]]Internal value, to control the filtering of the current cell state,and outputs the hidden node h of the next unit _t Values, which are formulated as follows:

h _t ＝o _t *tanh(C _t )；

8. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 1, characterized in that step S4 includes the steps of:

9. The electroencephalogram emotion recognition method based on attention space-time network according to claim 8, characterized in that the step S4.1 specifically comprises the following steps:

10. The electroencephalogram emotion recognition method based on attention space-time network, according to claim 1, characterized in that step S5 comprises the steps of:

s5.1, performance evaluation: the first metric is accuracy as an evaluation index, and the performance evaluation formula is as follows:

s5.2, selecting an F1score evaluation index to quantify the electroencephalogram emotion recognition result, and defining the score evaluation index as a harmonic mean value of the recall ratio and the precision ratio of the classifier by combining the recall ratio and the precision ratio of the classifier, wherein the formula is as follows:

s5.3, in model comparison, selecting a SVM (support vector machine) in the traditional machine learning method as a base line network model, and comparing the model with a deep learning network comprising DeepConvNet, EEGNet and TSconcept.