CN115444431A

CN115444431A - Electroencephalogram emotion classification model generation method based on mutual information driving

Info

Publication number: CN115444431A
Application number: CN202211076101.8A
Authority: CN
Inventors: 吴清锋; 王颖东; 阮群生; 周昌乐
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-09

Abstract

The application relates to a method for generating an electroencephalogram emotion classification model based on mutual information driving, which divides the characteristics of entangled electroencephalograms into characteristics which are not related to individuation and classification and characteristics which are related to commonalities and classification as much as possible by estimating the upper bound and the lower bound of mutual information between two characteristics. Generalization is carried out through a common correlation feature extractor model, and therefore the accuracy of electroencephalogram emotion classification under unknown fields is improved.

Description

Electroencephalogram emotion classification model generation method based on mutual information driving

Technical Field

The application relates to the technical field of electroencephalogram emotion recognition, in particular to an electroencephalogram emotion classification model generation method and system based on mutual information driving and a storage medium.

Background

Emotion is the result of concerted activity in the cortical and subcortical neural processes and is highly transient. The electroencephalogram is spontaneous discharge activity which is not controlled by human, and has the advantages of high time resolution, simplicity, convenience and feasibility, so that the electroencephalogram is feasible to recognize emotion and reveal a complex neural mechanism. The emotion research based on the brain electricity is receiving more and more attention in recent years, and can be used for virtual reality technology and emotional communication between a companion robot and a service robot in the future. Virtual reality technology will be available for social interaction, online learning, large-scale technical displays, and treatment of psychological diseases in the future. When a person's on-device based mood can interact with other elements in real time, a quantitative result is provided to the patient, particularly in the treatment of affective disorders or depression quality.

At present, researchers at home and abroad recognize human emotions through expression, voice and posture signals, and a plurality of pattern recognition methods are applied to obtain certain effects, but due to the controllability and the disguise of the signals, the results cannot exclude the influence of subjective factors of a tested person, and sometimes cannot observe potential and real emotional states. At present, the accuracy of individual emotion classification in an unknown field is low due to individual differences of electroencephalogram signal emotion classification.

Disclosure of Invention

Aiming at the problem that the accuracy of individual emotion classification in the unknown field is low due to individual difference of electroencephalogram signal emotion classification at present, the application provides an electroencephalogram emotion classification model generation method and system based on mutual information driving and a storage medium.

In a first aspect, the application provides a method for generating an electroencephalogram emotion classification model based on mutual information driving, which includes the following steps:

s101: acquiring electroencephalogram emotion training data;

s102: performing data preprocessing on the electroencephalogram emotion training data, then performing alpha, beta, theta and gamma four-frequency-band filtering on the preprocessed electroencephalogram emotion training data, and combining the filtered electroencephalogram emotion training data with the original electroencephalogram emotion training data to form electroencephalogram input data;

s103: performing two-dimensional convolution on the electroencephalogram input data by using a two-dimensional convolution network to obtain initial characteristics of an electroencephalogram signal;

s104: inputting the initial features of the electroencephalogram signals into a feature extractor based on a self-attention mechanism, and outputting electroencephalogram features based on attention;

s105: inputting the attention-based electroencephalogram features into one-dimensional convolution and pooling for low-dimensional feature mapping, thereby obtaining key features based on time dimensions and channels;

s106: performing feature segmentation on the key features based on the time dimension and the channel, and then further reducing the dimension of the segmented features by using a global feature extractor so as to obtain global features;

s107: performing label prediction on the global features by using a fully-connected neural network to obtain a training initial model;

s108: and optimizing the training initial model by using a loss function to obtain an electroencephalogram emotion classification model.

By adopting the technical scheme, the method and the device have the advantages that the upper bound and the lower bound of mutual information between the two characteristics are estimated, and the characteristics of the entangled electroencephalogram are divided into characteristics which are independent of individuation and classification as well as characteristics which are related to commonality and classification first. Generalization is carried out through the feature extractor model with common correlation, and therefore the accuracy of electroencephalogram emotion classification under unknown fields is improved.

Preferably, in S103, the performing two-dimensional convolution on the electroencephalogram input data by using the two-dimensional convolution network specifically includes:

wherein Em _ or is the initial characteristic of the electroencephalogram signal,

inputting data for brain electricity, an

Preferably, in S104, the feature extractor based on the self-attention mechanism includes a plurality of attention blocks using a multi-attention mechanism, where the multi-attention mechanism specifically includes:

Em_a＝A*Em_or；

wherein M is the number of two-dimensional convolution kernels, K represents a key vector, V represents a value vector, Q is a query vector,

is a coefficient matrix containing M characteristics and having the same size as q, L _k For the length of the electroencephalogram data, qi represents the ith query vector, kj represents the jth key value,

representing the modulus of the vector qi and the vector kj.

Preferably, in the step 106, the performing the feature segmentation on the key features based on the time dimension and the channel specifically includes:

[F _re ，F _ir ]＝split[Cov1d(Layer)]；

wherein, F _re As class-related features, F _ir For class-independent features, split stands for a segmentation function, and Covld stands for one-dimensional convolution;

the further dimension reduction of the segmented features by utilizing the global feature extractor specifically comprises the following steps:

the method comprises the following specific steps:

wherein, l represents different neural network layers, w represents a deformation parameter, b represents a deviation parameter, and Fg represents a gobal feature extractor.

Preferably, in S107, the performing label prediction on the global feature by using the fully-connected neural network specifically includes:

Pre＝Softmax((F _g (f _re ))·W+b)；

wherein Softmax is a normalization function, fg is a gobal feature extractor, f _re Is a kind-related feature, W is a mapping parameter, and b is a bias parameter;

preferably, in S108, the loss function is specifically:

loss＝α·L1+β·L2+γ·L3；

where α, β, γ are weighting coefficients, xs and ys represent different domain data from different subjects and corresponding tags, respectively, K represents different kinds of tags, E represents expectation, pre _k Representing a predicted value;

L2＝min _{E1，E2，C，D1} max _M1 I _θ (f _ir ；f _re )；

I _θ (X；Y)＝E _p [-sp(T _θ (x，y))]-E _q [spT _θ (x，y′)]；

sp(z)＝log(1+e ^(z) )；

wherein, I _θ For mutual information, x and y are the corresponding variables for which mutual information estimation is required, T _θ Is a neural network composed of theta, and y' is a variable randomly shuffled along the batch data axis y, E _p Is a joint distribution of pq, and E _q Representing edge distribution, min representing minimized mutual information of optimized network parameters, and max representing maximized mutual information of optimized network parameters;

L3＝loss _local +loss _global ；

wherein, F ₂ For the output of the last layer of neural network before classification, F _re For classifying the relevant features, loss _global Representing that the two are changed into one-dimensional characteristics by introducing an M2 network and then mutual information calculation is carried out, loss _local Representing F by introduction into M3 network ₂ Copying and mapping the one-dimensional data to two-dimensional data and then comparing the two-dimensional data with the F _re And (5) performing fine granularity mutual information calculation, wherein N and h respectively represent the length and width of the feature.

In a second aspect, the present application further provides an electroencephalogram emotion classification method, including the following steps:

s201: acquiring electroencephalogram emotion target data to be classified;

s202: inputting the electroencephalogram emotion target data into an electroencephalogram emotion classification model, wherein the electroencephalogram emotion classification model is obtained by training in advance based on the method of the first aspect;

s203: and outputting the classification result of the electroencephalogram emotion classification model.

In a third aspect, the present application further provides a device for generating an electroencephalogram emotion classification model based on mutual information driving, where the device includes:

the acquisition module is used for acquiring electroencephalogram emotion training data;

the preprocessing module is used for preprocessing data of the electroencephalogram emotion training data;

and the training module is used for training the preprocessed electroencephalogram emotion training data into an electroencephalogram emotion classification model through the method of the first aspect.

In a fourth aspect, the present application further provides an electroencephalogram emotion classification apparatus, the apparatus including:

the acquisition module is used for acquiring electroencephalogram emotion target data acquired by electroencephalogram acquisition equipment;

the test module is used for inputting the electroencephalogram emotion target data into the electroencephalogram emotion classification model, the electroencephalogram emotion classification model is obtained by training in advance based on the method of the first aspect, and classification results of the electroencephalogram emotion classification model are output.

In a fifth aspect, the present application also proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect.

In summary, the present application at least includes the following beneficial technical effects:

1. the method and the device provide a new visual angle for extracting the domain-specific label related function and the shared domain-invariant label related function by deleting the domain-specific label unrelated function;

2. the sparse multi-head attention neural network is equipment for extracting basic features, and solves the problem that convolution is used as a filter, and the filter only focuses on adjacent channels and releases limited calculation memory;

3. the method provided by the application achieves the most advanced performance in extensive experiments on various benchmarks for unknown learning in the field.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the application. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.

Fig. 1 is a diagram of a electroencephalogram emotion classification model structure based on mutual information driving in an embodiment of the present application.

Fig. 2 is a schematic flowchart of a method for generating an electroencephalogram emotion classification model based on mutual information driving in an embodiment of the present application.

FIG. 3 is a data flow diagram illustrating feature segmentation based on time dimensions and key features of channels in an embodiment of the present application.

FIG. 4 is a schematic visualization of three features in an embodiment of the present application.

Fig. 5 is a schematic flow chart of an electroencephalogram emotion classification method in an embodiment of the present application.

Fig. 6 is a schematic block structure diagram of an electroencephalogram emotion classification model generation device based on mutual information driving in an embodiment of the present application.

Fig. 7 is a schematic block structure diagram of an electroencephalogram emotion classification apparatus in an embodiment of the present application.

FIG. 8 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

For clarity of the embodiments of the present application, mutual information and multi-source domain adaptive learning are described as follows:

mutual information is the degree of dependence of two estimated random variables X and Z, and measures the coexistence information of the two random variables, which can also be understood as the uncertainty of X when based on Z:

I(x；z)：＝H(x)-H(x|Z)；

where H (x) is the Shannon entropy, representing the magnitude of uncertainty in the occurrence of the x event. H (x | Z) represents the conditional entropy of Z at x. When x and z are identical, H (x | z) is 0, then the mutual information is maximal H (x). When x and z are completely independent, the value of H (x | z) increases and the mutual information decreases. Mutual information is non-negative, and mutual information can also be expressed as:

wherein P (xz) represents the combined distribution of x, z and P (x), P (z) represents the edge distribution of x, z, D _KL Representing (Kullback-Leiblerdcargence) divergence. In a self-coding neural network, mutual information between features and inputs is minimized, and mutual information between features and labels is maximized. The method can make I (x; z) as an encoder, and the encoder wants to compress the information source as much as possible to obtain a good representation; i (Y; z) is treated as decoder, which wants to ensure that the compressed representation and label remain consistent, i.e. the application cannot lose information of source X about Y. The mutual information can strengthen the characteristic relevance of different layers of the neural network or reduce the relevance between different layers.

The method is characterized in that multi-source domain generalized learning is adopted in most methods of multi-source domain adaptive learning, and generalized learning of multi-source domain data refers to relying on data sets distributed in different fields and aims to learn a model capable of being generalized. In the training process, target domain data does not need to be accessed, and the shared feature extractor is adopted to learn domain invariant features, however, domain invariance may damage semantic discrimination capability. Another approach, employing non-generic feature extractors and classifiers, while non-shared feature extractors may better align features of target and source domains in the underlying space, this greatly increases the number of parameters in the model. A very large number of feature extractors and classifiers are required. Domain adaptation aims at moving knowledge learning from the source domain to the target domain where the labeled data is sparse or non-existent. Domain adaptation typically involves domain-invariant representation. Maximum Mean Difference (MMD) is used in most methods to minimize feature distribution differences between different domains; many countermeasure strategies, such as gradient inversion layers, domain countermeasure classifiers have been successfully used for domain adaptation tasks. In the prior art, a one-one adaptation method is proposed, which embeds and entangles an image into three features through an automatic encoder network: the category independent features, the domain specific category dependent features and the domain invariant category dependent features are based on mutual information. Compared to DAL, mutualIS simplifies the embedding learning process by omitting the decoder process and obtains enhanced semantic features by maximizing mutual information between global and local features.

The application mainly provides a framework, and features of entangled electroencephalogram are divided into personalized and classification-independent features and common and classification-first-related features as far as possible by estimating the upper bound and the lower bound of mutual information between two features. Generalization is carried out through the feature extractor model with common correlation, and therefore the accuracy of electroencephalogram emotion classification under unknown fields is improved.

Data for each test { x } _i ，y _i Is treated as a domain data, where { x } _i Is a multi-channel time series of y _i Is corresponding to

And (4) a label.

Representing the kth source subject data. The objectives of the study were based on multi-source data

Training shared feature extraction model F _c And personalized feature extraction model F _s And a classifier C based on these two features _c And C _s . From these two feature extractors and classifiers, for { x ^t The label predicts, assuming { x } ^t Is a time-series based unlabeled data. To solve this problem, the method removes useless features from the extracted features by minimizing mutual information between class-related and class-unrelated related features.

FIG. 1 shows the structure of an electroencephalogram emotion classification model based on mutual information driving in the application: e1 is a basic feature extractor, F ₁ Is E ₁ Is a highly entangled feature. Disentanglement D ₁ Is responsible for resolving class-independent features F _ir And class-related features F _re . From the basic features F ₁ To class correlation F _re Is transported toInto the Global feature extractor E ₂ Global feature extractor E ₂ The goal is to convert the high-dimensional features into low-dimensional features, which are finally used in the emotion classifier C. The whole process consists of two data streams. The first data flow is from the original data to the class-related features and the second data flow is from the original data to the predictive tags. The innovation of the present application is to apply mutual information minimization to the entanglement of class-related features and class-independent features. The maximization of mutual information is applied to features related to classes and global features, and more details beneficial to classification are reserved as far as possible.

Referring to fig. 2, fig. 2 shows a flowchart of a method for generating an electroencephalogram emotion classification model based on mutual information driving according to the present application, and referring to fig. 2, the method specifically includes the following steps:

s101: acquiring electroencephalogram emotion training data;

in a specific embodiment, S101 specifically includes: 32 participants (half of males) were provided, all composers, with a mean age of 23.27 and a variance of 2.37. Each participant watched 40 one-minute music pieces and evaluated his emotional state after each watching for one minute. Assessment dimensions include potency, arousal, preference, dominance, and familiarity (potency, arousal, like, dominance, and familiarity). The evaluation value is between 1 and 9. The evaluation value is between 1 and 9, the value represents the attribute of the evaluation dimension from weak to strong, such as low price represents sadness, and high price represents happiness. During data acquisition, when the subject closes their eyes for three seconds before watching the video, a physiological signal called the baseline signal is also recorded. The electroencephalogram acquisition equipment is a BioSemi ActiveTwo system, electroencephalogram signals pass through a BioSemi acquisition 32 channel, and the sampling frequency is 512Hz. Other physiological signals include OMG, electromyography, EOG signals. The acquired electroencephalogram data was processed (with EEG and EOG removed, with 4.0-45.0Hz filtering) and sampled at 128 Hz. Of which 31 persons are the source data and the remaining one is the target data.

S102: carrying out data preprocessing on the electroencephalogram emotion training data, then carrying out alpha, beta, theta and gamma four-frequency-band filtering on the preprocessed electroencephalogram emotion training data, and combining the filtered electroencephalogram emotion training data with the original electroencephalogram emotion training data to form electroencephalogram input data;

in a specific embodiment, the data preprocessing of the electroencephalogram emotion training data is specifically to perform filtering of 4-54HZ on the data so as to eliminate irrelevant features.

Electroencephalogram signals are chaotic and chaotic electrical frequency signals, and it is difficult to obtain sufficient frequency domain information by a simple one-dimensional convolution method. The frequency domain information, the time domain information, the spatial information and the electroencephalogram signal need to be represented again to the maximum extent. The method is formed by filtering an input signal by the following four frequency bands of theta, alpha, beta and gamma and combining the filtered input signal with an original signal. The input data will be T × C × 5, where C represents the number of channels and T represents the length of the time series. The input data is specifically:

in a specific embodiment, in S103, the performing, by using a two-dimensional convolution network, two-dimensional convolution on electroencephalogram input data specifically includes:

wherein Em _ or is the initial characteristic of the brain electrical signal,

data is input for the brain electricity.

In order to better preserve the correlation between channels, the information of time series is preserved, unlike Natural Language Processing (NLP) embedding, electroencephalogram embedding employs two-dimensional convolution, in which the convolution kernel is (1, 5c). In order to input electroencephalogram representation into an attention model, the method ensures that the dimensionality of the characteristics is 2 and the dimensionality of the final electroencephalogram representation is (M, T) by designing a special convolution kernel and then reducing the number of channels to be 1.

S104: inputting the initial features of the electroencephalogram signals into a feature extractor based on a self-attention mechanism, and outputting the electroencephalogram features based on attention;

in S104, the feature extractor based on the self-attention mechanism includes a plurality of attention blocks using a multi-head attention mechanism, where the multi-head attention mechanism is specifically:

Em_a＝A*Em_or；

is a coefficient matrix containing M features and having the same size as q, L _k For the length of the electroencephalogram data, qi represents the ith query vector, kj represents the jth key value,

representing the modulus of the vector qi and the vector kj.

The feature extractor based on the self-attention mechanism is described in detail below:

after the initial features of the electroencephalogram signals are acquired, next, a basic feature extractor needs to be designed. The feature extractor hopefully concentrates the electroencephalogram length by enhancing the important time points of the event sequence, and reduces the length of the electroencephalogram signal as fast as possible. The basic feature extractor consists of five attention blocks. The attention block employs a multi-headed attention module and convolutional and pooling layers based on the time dimension. In order to reduce the memory pressure, a sparse attention method is adopted.

The sparse attention mechanism is an improvement based on a traditional attention mechanism, wherein the traditional method performs matrix multiplication on three parameters by inputting a characteristic X to respectively obtain a query vector Q, a key vector K and a value vector V. And multiplying the query vector and the key vector to obtain an attention value, and then carrying out weighted summation on the value vector according to the attention value.

Where X is the input feature, attention is the normalized value, and W is the parameter to be trained. Where the distribution of Attention is a coefficient, the sparsity needs to be calculated for each Q, where the distribution of Attention is used as a mean distribution

KL divergence in between. Wherein the sparsity of Q attention is:

wherein L is _k Is the length of the electroencephalogram data, wherein,

is q _i For all key-value inner products log-sum-exp, the second term is q _i The arithmetic mean of the inner products of all key values. However, the calculation amount is still relatively large, so a certain amount of random sampling

An approximate estimate is made, and since the dimension of the brain electrical data features is 64, the random number set here is 16.

A fixed number ofAccording to all Q sparsity estimations, top qury with low sparsity is selected for attention calculation, wherein

M characteristics are contained, and the characteristic is a coefficient with the same size as q, wherein M < d.

The remaining attention of qury is not calculated and the mean is used directly. The final attention is merged with the post-A and dot multiplied with the original feature.

Em_a＝A*Em_or；

in a specific embodiment, the low-dimensional feature mapping of S105 specifically includes:

atten _j ＝MaxPool(Relu(Conv1d[Em_a]))；

where j denotes the jth layer of the attention block. Em _ a is a channel matrix. Maxpool is the maximal pooling in one dimension with the goal of reducing the length of the sample, the characteristic length will be T/2 ⁵ Relu represents the activation function and Convld represents the one-dimensional convolution.

S106: performing feature segmentation on the key features based on the time dimension and the channels, and then further reducing the dimension of the segmented features by using a global feature extractor so as to obtain global features;

in the step 106, the performing of feature segmentation on the key features based on the time dimension and the channel specifically includes:

[F _re ，F _ir ]＝split[Cov1d(Layer)]；

wherein, F _re As class-related features, F _ir For class-independent features, split represents a segmentation function, and Covld represents a one-dimensional convolution;

the disentangler applies a one-dimensional convolution to the basisFeature, mapping feature size from DxT to 2 DxT, and then clipping high-dimensional features into two features of the same size (class-related feature F) _re And class independent feature F _ir ) The specific data flow therein can be seen in fig. 3.

wherein l represents different neural network layers, w represents a deformation parameter, b represents a paranoid parameter, and Fg represents a gobal feature extractor.

The global feature further reduces the dimension of the feature, the width and the length of the feature are reduced in two steps, and finally the two-dimensional feature is reshaped into one-dimensional data.

S107: performing label prediction on the global features by using a fully-connected neural network to obtain a training initial model; thereby normalizing all tags.

In S107, the performing label prediction on the global features by using the fully-connected neural network specifically includes:

Pre＝Softmax((F _g (f _re ))·W+b)；

wherein softmax is a normalization function,

fg is a gobal feature extractor, f _re For class-related features, W is the mapping parameter and b is the bias parameter.

In S108, the loss function is specifically:

loss＝α·L1+β·L2+γ·L3；

where α, β, γ are weighting coefficients, xs and ys represent different domain data from different topics and corresponding tags, respectively, K represents different kinds of tags, E represents expectation, pre _k Representing a predicted value;

L2＝min _{E1，E2，C，D1} max _M1 I _θ (f _ir ；f _re )；

I _θ (X；Y)＝E _p [-sp(T _θ (x，y))]-E _q [spT _θ (x，y′)]；

sp(z)＝log(1+e ^(z) )；

wherein, I _θ For mutual information, x and y are the corresponding variables for which mutual information estimation is required, T _θ Is a neural network composed of theta, y' is a variable randomly shuffled along the batch data axis y, ep is a joint distribution of pq, and E _q Representing edge distribution, min representing minimized mutual information of optimized network parameters, and max representing maximized mutual information of optimized network parameters.

L3＝loss _local +loss _global ；

Wherein, F ₂ For the output of the last layer of neural network before classification, F _re For classifying the relevant features, loss _global The mutual information calculation is carried out after the M2 network is introduced to change both the two into one-dimensional characteristics, loss _local Is to introduce F into M3 network ₂ Copying and mapping the one-dimensional data to two-dimensional data and then comparing the two-dimensional data with the F _re And (5) performing fine granularity mutual information calculation, wherein N and h respectively represent the length and width of the feature.

The loss function loss of the present application is described in detail below:

in order to solve the electroencephalogram emotion recognition problem, all in the application are classified into two categories, and the semantic information is reserved by using cross entropy:

wherein f is _re Is a class-related property, x _s ，y _s Representing different domain data from different topics and corresponding tags, K representing different categories of tags, E representing expectation, pre _k Representing the predicted value.

Mutual information due to the variables x, y can be expressed as Kullback-leibler (KL)) divergence of the joint distribution P and the edge distribution Q. Then the mutual information lower bound can be re-expressed as:

I(x，y)＝D _kl (p||Q)＝sup E _P [T _θ (x，y)]-log E _Q [e ^{Tθ(x，y′)} ]；

wherein P represents the joint distribution of two variables; q is represented as the edge distribution of two variables, T _θ Is a neural network composed of theta parameters. Here, the product of the marginal distribution Q is obtained by transforming the samples of the joint distribution along the batch axis. Since the application only focuses on the maximization of mutual information, the mutual information has a definite value in the aspect of training the neural network. The depth information model utilizes jensen-shannon (JS) divergence as an alternative to the above formula, and therefore the above formula is rewritten as follows:

I _θ (X；Y)＝E _p [-sp(T _θ (x，y))]-E _q [spT _θ (x，y′)]；

wherein sp (z) = log (1 + e) ^(z) ) Sp (z) is a characteristic angle of a softplus function, and in order to obtain joint distribution and edge distribution, a network M containing two variables is introduced to estimate f _ir And f _re Mutual information between them, and punishment of large mutual information value of M, so that two factors f _ir And f _re The decomposition is performed with minimal information overlap:

L2＝min _{E1，E2，C，D1} max _M1 I _θ (f _ir ；f _re )；

in the process, a two-layer fully-connected network M1 is introduced to calculate the time I of the mutual information of the related characteristic and the unrelated characteristic _θ (X; Y) the mutual information of the two variables can be maximized by maximizing the mutual information under the estimation, but the patent aims to minimize the mutual information between the two variables, so gradient inversion is used in M1, and the minimum mutual information of the two variables is obtained by optimizing the model feature extractor E1, the feature extractor E2 and the classifier C based on the maximization of M1.

In order to enrich feature representation learning, the proposed model introduces multi-aspect maximization mutual information, and reduces information loss between global features and basic features. In particular, based on Deep Infomax (DIM) algorithm, two sub-networks, denoted M, need to be designed _loc And M _glb In fig. 1, M2 and M3. These two sub-networks implement the class dependent feature f _re And final feature f ₂ Mutual information between the two features is maximized. M is a group of _loc The network estimates the characteristic f _re And F2 mutual information at the fine granularity level:

in the formula

A jth local feature block indicating class-related features; f ₂ Mapped to each feature block by replication. At the same time, the network M _glb Enhancing feature f on coarse grain _re And F ₂ Mutual information of features:

L3＝loss _local +loss _global ；

wherein, F ₂ For the output of the last layer of neural network before classification, F _re For classifying the relevant features, loss _global The two are changed into one-dimensional characteristics by introducing an M2 network, and then mutual information calculation is carried out, loss _local Is to introduce F into M3 network ₂ Copying and mapping the one-dimensional data to two-dimensional data and then comparing the two-dimensional data with the F _re And (5) performing fine granularity mutual information calculation, wherein N and h respectively represent the length and width of the feature. The main objective is to optimize E1, D1, E2, M3 to maximize the mutual information lower bound of F2 and Fre.

The final loss function can be expressed as:

loss＝α·L1+β·L2+γ·L3；

where α, β, γ are the three lost weight coefficients.

The embodiment of the application also performs a performance verification experiment on the electroencephalogram emotion classification model based on mutual information driving.

Two data sets are used in the performance verification experiment, wherein one data set adopts the electroencephalogram emotion training data set in the step S1 of the application, and is called a DEAP data set in the following performance verification experiment, the other data set is called an amigo data set in the following performance verification experiment, and the amigo data set is specifically as follows:

the amigo dataset consists of 40 participants, all of whom were tried to view 17 short and 3 long videos, with 37 participants watching all videos, 17 being tried to view alone, and the remaining 20 people, one group of 5 people each. The data collection paradigm is the same as the DEAP data set, and the video is first viewed and evoked, and after the video is viewed, the subject is scored according to his own experience. The collected equipment is Emotive Epoc, and the sampling frequency is 14 channels.

Meanwhile, after the experiment is finished, the subject cuts the corresponding watching video into 20-second samples, and asks another group of people for marking. The 20 second electroencephalogram tag is not a self-assessment, but rather an external tag, referred to as an external tag. The purpose of the external annotation is to verify whether the self-evaluation and the external evaluation agree. Research results show that people are more likely to express their emotional state in a social state than in an autistic state. Meanwhile, the results show that there is a high correlation between the external tag and the internal tag. The experiment used an internal label as the test label.

In the verification experiment, both data sets were sampled non-overlapping using a 1s sliding window, avoiding leakage of test information. Valency and arousal are the most common verification dimensions, the tag thresholds for the two datasets are set to 5, and emotional classification becomes a binary classification problem. In the preprocessing process, the average value of the baseline is subtracted from all data, wherein in the DEAP data set, the baseline data is the electroencephalogram signals of the first three seconds; in the amigo dataset, the baseline data is the first 5s of each electroencephalogram data segment. The two data sets are normalized according to the channel. There are three verification schemes:

scenario 1 (learning across subjects): in this paradigm, ten cross-validations are applied, taking into account all the data under test, including the source data and the target data. This paradigm is mainly to demonstrate the feature extraction capability of the design framework.

Scene 2 (ZERO-SHOT learning): zero-shot learning is also leave-one-out verification. In all the subjects, one of them was selected as target data, and the rest was used as test data. The paradigm is adapted to new users using wearable devices, and although continuous target data can be obtained, no tag is available.

Scenario 3 (dependent on the subject learning): this is designed for individual electroencephalogram emotion classification, with few labels used for model calibration. The present application will compare the training time for model-based transfer to the training time without the underlying model.

The experiment was run on a 1080Ti graphics card GPU under Linux. The superparameters are the same in all three cases, where α, β, γ are 0.3, 0.4 and 0.3, respectively. The entire model has three attention modules, and two-dimensional convolution layers. The training data for scenario 1 and scenario 3 both reached approximately 200 epochs.

Performance verification results of DEAP dataset:

in the performance validation experiments of the DEAP data set, a comparison will be made with the following baseline method: the MDFV starts from time domain, frequency and space characteristics, and selects a group of effective mixed characteristics through a well-designed experiment, so that the emotion classification accuracy is improved. DANN-EEG was inspired by DANN and its framework includes a feature extractor, a classifier, and a discriminator, and it is desirable to train the feature extractor through the discriminator so that the features extracted by the feature extractor in different domains can trick the discriminator into having no way to distinguish between a target domain and a source domain. Mutualis uses an entanglement frame, separates tag-related features and tag-unrelated features using minimized mutual information, and uses the EEGnet model as a basic frame.

In the verification scenario 1, the provided method is based on a cross-tested test certificate, has higher accuracy, is superior to other methods, and proves that the provided method is enough to extract common features; in the 2 nd verification scene, the accuracy of the proposed method is improved by about 2% in value and about 4% in arousal. The main reason is overfitting, parameters need to be set to inhibit overfitting, and the final result is superior to other methods.

For scenario 3, the application compares with the most advanced methods related to relying on the subject or few tags to learn electroencephalogram emotion classification. Samarth utilizes a deep and simple convolutional neural network to improve the accuracy of emotion classification in the DEAP dataset. TLDR improves and proposes a sinnet-based classifier sinnet-R consisting of three convolutional layers and three Deep Neural Network (DNN) layers for testing the classification accuracy and robustness of emotional electroencephalogram signals. SFA-FSL proposes a single-source domain adaptive shot-less learning network (SDA-FSL). A CBAM-based feature mapping module is designed to extract features common to both domains and use the domain adaptation module to align the data distribution of both domains. Wu et al improve the accuracy of electroencephalogram emotion recognition by exploring functional connections of brain regions and utilizing strength, clustering coefficients and feature vector centrality.

Table 6.1: scene 1-based cross-human brain electrical emotion classification

Table 6.2: individual-independent emotion electroencephalogram classification based on scene 2

Table 6.3: individual-dependent electroencephalogram emotion classification comparison result of test scene 3

Overall, the method proposed in the present application achieves the best results in all three cases. In particular, in the third case, the classification accuracy was very high (0.973% and 0.925%) even with only 2 minutes of data as training data.

Performance validation results for amigo dataset:

there were 40 subjects in the amigo dataset, only 37 subjects completed all the trials, 9 subjects had abnormal data during the filtration process, and the final test data was 28 subjects. In sample use, the present application uses only the last 60 seconds of data, where the sample's label uses an internal label in the present application. The results of the experiment on the Amigos dataset in scenario 1 are shown in table 6.4.DCNN used a 2d-cnn based approach performed fairly poorly across disciplines, lasting even 20 seconds with training and testing data, with a final maximum accuracy of 74.65%. All methods were trained 500 times. The results of scenario 2 are shown in table 4:

table 6.4: cross-trial amigo comparison results for scenario 1

Table 6.5: comparison results independent of amigo under test based on scenario 2

Table 6.6: comparison results of Individual-dependent Amigos sentiment Classification of test Scenario 3

Referring to fig. 4, domain invariance is obtained in order to demonstrate the de-entanglement model. The method and the device use t-SNE to visualize three characteristics, namely type-related characteristics, type-unrelated characteristics and global characteristics. Note that in the comparison of the class-related features and the global feature distribution, the global features are enriched by local and global mutual information maximization.

The electroencephalogram emotion classification model generation method based on mutual information driving has the following beneficial effects:

(1) By deleting domain-specific tag-independent functions, a new perspective is provided for extracting domain-specific tag-dependent functions and shared domain-invariant tag-dependent functions. Meanwhile, in the characteristic separation process, the counterstudy is replaced by mutual information.

(2) Sparse multi-headed attention neural networks are devices for fundamental feature extraction that address the limitation of convolution as a filter that focuses only on adjacent channels and frees up limited computational memory.

(3) The proposed method achieves the most advanced performance in extensive experiments on various benchmarks for field agnostic learning.

Referring to fig. 5, in a second aspect, the present application further provides an electroencephalogram emotion classification method, including the following steps:

s201: acquiring electroencephalogram emotion target data to be classified;

In a third aspect, further referring to fig. 6, as an implementation of the method described above, the present application provides an embodiment of an electroencephalogram emotion classification model generation apparatus based on mutual information driving, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the system may be specifically applied to various electronic devices.

The application provides an electroencephalogram emotion classification model generation device based on mutual information drive specifically includes:

an acquisition module 301, configured to acquire electroencephalogram emotion training data;

the preprocessing module 302 is used for preprocessing data of the electroencephalogram emotion training data;

a training module 303, configured to train the preprocessed electroencephalogram emotion training data into an electroencephalogram emotion classification model through the method of the first aspect.

In a fourth aspect, referring to fig. 7, the present application further provides an electroencephalogram emotion classification apparatus, including:

an acquisition module 401, configured to acquire electroencephalogram emotion target data acquired by an electroencephalogram acquisition device;

a testing module 402, configured to input the electroencephalogram emotion target data into an electroencephalogram emotion classification model, where the electroencephalogram emotion classification model is obtained by training in advance based on the method of the first aspect, and outputs a classification result of the electroencephalogram emotion classification model.

Referring now to FIG. 8, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 100 includes a Central Processing Unit (CPU) 101 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 102 or a program loaded from a storage section 108 into a Random Access Memory (RAM) 103. In the RAM 103, various programs and data necessary for the operation of the system 100 are also stored. The CPU 101, ROM 102, and RAM 103 are connected to each other via a bus 104. An input/output (I/O) interface 105 is also connected to bus 104.

The following components are connected to the I/O interface 105: an input portion 106 including a keyboard, a mouse, and the like; an output section 107 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 108 including a hard disk and the like; and a communication section 109 including a network interface card such as a LAN card, a modem, or the like. The communication section 109 performs communication processing via a network such as the internet. A drive 110 is also connected to the I/O interface 105 as needed. A removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 110 as necessary, so that a computer program read out therefrom is mounted into the storage section 108 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 109 and/or installed from the removable medium 111. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 101.

As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method shown in fig. 1.

It should be noted that the computer readable storage medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

In the description of the present application, it is to be understood that the terms "upper", "lower", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and operate, and thus, should not be construed as limiting the present application. The word 'comprising' does not exclude the presence of elements or steps not listed in a claim. The word 'a' or 'an' preceding an element does not exclude the presence of a plurality of such elements. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims shall not be construed as limiting the scope.

Claims

1. A method for generating an electroencephalogram emotion classification model based on mutual information driving is characterized by comprising the following steps: the method comprises the following steps:

s101: acquiring electroencephalogram emotion training data;

s103: performing two-dimensional convolution on the electroencephalogram input data by using a two-dimensional convolution network to obtain initial characteristics of the electroencephalogram signal;

s108: and optimizing the training initial model by using the loss function to obtain an electroencephalogram emotion classification model.

2. The electroencephalogram emotion classification model generation method based on mutual information driving as claimed in claim 1, characterized in that: in S103, the algorithm for performing two-dimensional convolution on the electroencephalogram input data by using the two-dimensional convolution network specifically includes:

input data for brain electricity, an

3. The electroencephalogram emotion classification model generation method based on mutual information driving as claimed in claim 1, characterized in that: in S104, the feature extractor based on the self-attention mechanism includes a plurality of attention blocks using a multi-attention mechanism, where an algorithm of the multi-attention mechanism specifically includes:

Em_a＝A*Em_or；

representing the modulus of the vector qi and the vector kj.

4. The electroencephalogram emotion classification model generation method based on mutual information driving as claimed in claim 1, characterized in that: in the step 106, the algorithm for performing feature segmentation on the key features based on the time dimension and the channel specifically includes:

[F _re ，F _ir ]＝split[Cov1d(Em_a)]；

the algorithm for further reducing the dimension of the segmented features by utilizing the global feature extractor specifically comprises the following steps:

the method specifically comprises the following steps:

5. The method for generating the electroencephalogram emotion classification model based on mutual information driving as claimed in claim 1, wherein the method comprises the following steps: in S107, the algorithm for performing label prediction on the global features by using the fully-connected neural network specifically includes:

Pre＝Softmax((F _g (f _re ))·W+b)；

wherein Softmax is a normalization function, fg is a gobal feature extractor, f _re For class-related features, W is the mapping parameter and b is the bias parameter.

6. The electroencephalogram emotion classification model generation method based on mutual information driving as claimed in claim 1, characterized in that: in S108, the loss function is specifically:

loss＝α·L1+β·L2+γ·L3；

L2＝min _{E1，E2，C，D1} max _M1 I _θ (f _ir ；f _re )；

I _θ (X；Y)＝E _p [-sp(T _θ (x，y))]-E _q [spT _θ (x，y′)]；

sp(z)＝log(1+e ^(z) )；

wherein, I _θ For mutual information, x and y are the corresponding variables for which mutual information estimation is required, T _θ Is a neural network composed of theta, and y' is a variable randomly shuffled along the batch data axis y, E _p Is a joint distribution of pq, and E _q Representing edge distribution, min representing minimized mutual information of the optimized network parameters, and max representing maximized mutual information of the optimized network parameters;

L3＝loss _local +loss _global ；

wherein, F ₂ Before being classifiedOutput of the last layer of neural network, F _re For classifying the relevant features, N and h represent the length and width of the feature, loss, respectively _global The two are changed into one-dimensional characteristics by introducing an M2 network, and then mutual information calculation is carried out, loss _local Introduction of F into M3 network ₂ Copying and mapping the one-dimensional data to two-dimensional data and then comparing the two-dimensional data with the F _re And (5) performing fine granularity mutual information calculation.

7. An electroencephalogram emotion classification method is characterized by comprising the following steps: the method comprises the following steps:

s201: acquiring electroencephalogram emotion target data to be classified;

s202: inputting the electroencephalogram emotion target data into an electroencephalogram emotion classification model, wherein the electroencephalogram emotion classification model is obtained by training in advance based on the method of any one of claims 1-6;

8. The utility model provides an electroencephalogram emotion classification model generation device based on mutual information drive which characterized in that: the device comprises:

a training module, wherein the training module is used for training the preprocessed electroencephalogram emotion training data into an electroencephalogram emotion classification model by the method of any one of claims 1 to 6.

9. An electroencephalogram emotion classification device, characterized in that the device comprises:

the test module is used for inputting the electroencephalogram emotion target data into the electroencephalogram emotion classification model, the electroencephalogram emotion classification model is obtained by training in advance based on the method of any one of claims 1 to 6, and classification results of the electroencephalogram emotion classification model are output.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.