CN114330436A - Emotion recognition method based on twin network architecture and graph convolution - Google Patents

Emotion recognition method based on twin network architecture and graph convolution Download PDF

Info

Publication number
CN114330436A
CN114330436A CN202111617915.3A CN202111617915A CN114330436A CN 114330436 A CN114330436 A CN 114330436A CN 202111617915 A CN202111617915 A CN 202111617915A CN 114330436 A CN114330436 A CN 114330436A
Authority
CN
China
Prior art keywords
input
sample
embedding
channel
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111617915.3A
Other languages
Chinese (zh)
Inventor
曾虹
吴琪
郑浩浩
金燕萍
潘登
徐非凡
李明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111617915.3A priority Critical patent/CN114330436A/en
Publication of CN114330436A publication Critical patent/CN114330436A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention relates to an emotion recognition method based on a twin network architecture and graph convolution, which belongs to the technical field of electroencephalogram emotion recognition.

Description

Emotion recognition method based on twin network architecture and graph convolution
Technical Field
The invention relates to an emotion recognition method based on a twin network architecture and graph convolution, and belongs to the technical field of electroencephalogram emotion recognition.
Background
The emotion has the functions of information transmission and behavior regulation in the daily communication, work, study and cognitive decision processes of people, correct emotion is recognized, and people can be helped to master correct information. Mood is a complex psychological and physiological state that results from the brain's response to these physiological changes and plays a crucial role in our lives. In recent years, more and more research has been focused on emotion recognition, not only to create an emotional interaction interface for a machine to perceive human emotions, but also to evaluate psychological diseases of patients with neurological disorders, such as parkinson's disease, autism spectrum disorder, schizophrenia, depression, and the like.
There are two main types of emotion recognition methods: non-physiological signals and physiological signals. Since a part of patients cannot express emotions through external physiological features such as facial expressions, body postures, and the like, and a part of people can deliberately disguise their emotions, measurement of physiological signals is often used as an analysis signal source for emotion classification. The electroencephalogram signal has the advantages of high time resolution, no wound, low acquisition cost and easy acquisition, is one of common physiological signals, and has been proved to reflect important information of human emotional states.
The electroencephalogram data is not regular Euclidean data, and for irregular brain network structures, complex connections among channels can be captured better by using image convolution. However, increasing too many map convolutional layers may result in over-smoothing, which affects accuracy, and how to extract more effective features and improve emotion classification accuracy is worth thinking. At present, methods such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a Support Vector Machine (SVM) and the like are used for recognizing electroencephalogram emotion, and considerable results are obtained, but the methods are partially insufficient in emotion recognition accuracy.
At present, there is a research method for emotion recognition classification based on feature extraction of difference entropy of electroencephalogram signals and combination of an LSTM neural network model (publication number CN110897648A), which comprises the following steps: (1) extracting 62-channel electroencephalogram signals of normal adults; (2) calculating a Differential Entropy (DE) of the time sequence to form a 62-dimensional time sequence characteristic; (3) the time sequence characteristics are used as the input of an LSTM neural network and are trained and learned; (4) the results of the network training were evaluated using the average classification accuracy, standard deviation, and F1 values. The method has good effect, can effectively identify and classify three emotions, finds out the heterogeneity of the electroencephalogram signals with three different emotions from the characteristics of the electroencephalogram signals such as non-stationarity, nonlinearity, time-frequency domain, complexity and the like, thereby distinguishing the three emotions and helping the adjuvant therapy recovery of various diseases, and has the difference from the method that: the invention uses a twin network framework to train auxiliary tasks, and utilizes the characteristics of the middle layer to perform comparative learning while performing emotion classification by using a model, thereby improving the performance and generalization capability of the model and having different emotion recognition accuracy rates.
Disclosure of Invention
Aiming at the defects, the invention provides an emotion recognition method based on a twin network architecture and graph convolution, which can self-adaptively endow different importance to data by utilizing a multi-head self-attention mechanism and extract more deep and effective information. In addition, in order to further improve the model precision, the method utilizes a twin network framework to train an auxiliary task, and utilizes the characteristics of the middle layer to perform comparative learning while performing emotion classification by using the model, so that the performance and generalization capability of the model are improved.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a twin network architecture and graph convolution-based emotion recognition method comprises the following steps:
and acquiring a data set. Acquiring electroencephalogram signals of 62 appointed electrode positions of a subject when watching the movie fragments, and immediately completing a questionnaire to report emotional reactions (neutral, negative and positive emotions) of the subject to each movie fragment after finishing watching each movie fragment;
data preprocessing and feature extraction. The original EEG signal is down sampled and artifact pre-processed. Filtering a time domain signal by using a Hamming window, performing Fast Fourier Transform (FFT), taking a signal every second as a sliding window, calculating the differential entropy characteristics of 62 channels of 5 frequency bands, and performing normalization processing;
and (4) generating a sample. For efficient use of time information, the DE feature of the 62 channels of 3s is used as a sample, and the dimension of one input sample is 3 × 62 × 5 (time × channel × frequency band).
And generating a sample pair. Traversing each sample generated in the step three, taking the sample as reference and being denoted as input _1, randomly selecting a sample (denoted as input _2) under the same emotion, forming a positive sample pair with the input _1, randomly selecting a sample (denoted as input _3) under different emotions, forming a negative sample pair with the input _1, namely, the input _2 and the input _1 in the positive sample pair are samples under the same emotion, and the input _3 and the input _1 in the negative sample pair are samples under different emotions.
A base model is defined. And taking the space-time graph convolutional neural network model as a basic model. The basic neural network consists of an adaptive graph learning layer and a space-time convolution module. The self-adaptive graph learning layer aims at learning the connection relation of the brain network; the space-time convolution module is composed of a space-time self-attention mechanism, a graph convolution and a common convolution, the space-time self-attention mechanism captures dynamics in space dimensions and time dimensions under different emotional states, the graph convolution achieves aggregation of adjacent nodes, and the common convolution is used for extracting features in the time dimensions.
A twin network architecture is defined. Inputting two samples in the sample pair, namely input _1 and input _2 or input3, into the same basic model in sequence, respectively generating two intermediate features embedding _1 and embedding _2, and then calculating the distance between embedding _1 and embedding _ 2. And then extracting deeper features of the embedding _1 by using the multi-head self-attention layer, and outputting the probability that the input _1 belongs to each category after passing through the full-connection layer and the softmax layer.
The inputs and outputs of the model are defined. The model inputs are either a positive sample pair (input _1 and input _2) or a negative sample pair (input _1 and input _ 3). The model has two outputs output _1 and output _2, where output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that sample input _1 belongs to each class.
An objective function is defined. The final objective function of the model consists of three loss functions. Firstly, in an adaptive graph learning layer, aiming at learning brain connection relation, a loss function is used for constraining the relation between the distance of characteristics between channels and connection strength, the farther the distance of the characteristics between the two channels is, the weaker the connection strength is, and the connection structure of the brain is not fully connected, so that the sparsity of the learned graph is controlled by adopting a regularization term of an L2 norm. The second is the contrast loss of the twin network, which aims to make the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther. And thirdly, cross entropy loss, which is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
And (5) training and testing. Inputting the sample pairs generated in the step six into the model for training. After the model is trained, samples in a test set are used as input _1, and input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that an input sample pair is formed. The output _2 is used to calculate the accuracy of the classification.
And evaluating the learning result by using the average classification accuracy.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides an emotion recognition method based on twin network architecture and graph convolution, which can self-adaptively endow different importance to data by utilizing a multi-head self-attention mechanism and extract deeper and more effective information; meanwhile, a twin network framework is borrowed, and the characteristics of the middle layer are utilized for comparative learning, so that the performance and the generalization capability of the model are improved. In addition, compared with other graph convolution methods, the emotion recognition accuracy of the model based on the twin network architecture is improved, and the accuracy reaches 94.78 +/-05.97%.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 2 is a spatiotemporal graph convolution neural network model of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 3 is an experimental mode of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 4 is a channel position of electroencephalogram type acquisition based on a twin network architecture and graph convolution emotion recognition method provided by the invention;
FIG. 5 is a comparison chart of a test of the emotion recognition method based on a twin network architecture and graph convolution provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, a method for emotion recognition based on twin network architecture and graph convolution specifically includes the following steps:
step (1) dataset acquisition
Step (1-1) 15 movie fragments (containing positive, neutral and negative emotions) were selected from the materials library as stimuli used in the experiment;
step (1-2) the experimenter watches 15 segments in each experiment, 5 seconds of prompt is provided before each segment, each segment presents 4 minutes, self-evaluation is carried out for 45 seconds after each segment is finished, and rest is carried out for 15 seconds after each segment; wherein the self-assessment phase feeds back the experimenter's emotional response to each movie clip by completing a questionnaire;
step (2) data preprocessing and feature extraction
Step (2-1) down-sampling the original signal to 200Hz, and removing the signal subjected to the interference of the electro-oculogram and the myoelectricity;
and (2-2) extracting DE characteristics for each channel from 5 frequency bands: delta band (1-3Hz), theta band (4-7), alpha band (8-13Hz), beta band (14-30Hz), gamma band (31-50 Hz):
A. filtering original data by adopting a Hamming window, performing fast Fourier transform on data per second, and calculating the differential entropy of the five frequency bands;
B. the definition method of the differential entropy is as follows:
let X be { X ═ X1,x2,...,xnN is equal to or greater than 1, corresponding to a probability of
Figure BDA0003425521270000051
According to the definition method of Shannon information entropy, the information quantity of the nondeterministic system is shown as the formula (1):
Figure BDA0003425521270000052
the state probability pi of the time domain in the above equation is replaced by the frequency domain power spectral density p ^ defined based on the fast Fourier transform, so that the definition of the induced differential entropy is shown as equation (2):
Figure BDA0003425521270000053
wherein
Figure BDA0003425521270000054
Representative power spectral density;
and (2-3) normalizing the electroencephalogram signal by adopting z-score, wherein the normalization formula is shown as the formula (3):
Figure BDA0003425521270000055
wherein X is the EEG signal on each channel,
Figure BDA0003425521270000056
the mean value of the brain electrical signals on each channel is S, and the standard deviation of the brain electrical signals on each channel is S.
Step (3) sample Generation
For efficient use of time information, the DE feature of the 62 channels of 3s is used as a sample, and the dimension of one input sample is 3 × 62 × 5 (time × channel × frequency band).
Step (4) sample pair generation
Traversing each sample generated in the step three, and taking the sample as a reference (denoted as input _1), randomly selecting a sample (denoted as input _2) under the same emotion to form a positive sample pair with input _1, and randomly selecting a sample (denoted as input _3) under different emotions to form a negative sample pair with input _1, that is, input _2 and input _1 in the positive sample pair are samples under the same emotion, and input _3 and input _1 in the negative sample pair are samples under different emotions.
Step (5) defining a base model
The method comprises the following steps that a spatio-temporal graph convolution neural network model is used as a basic model, the basic model is composed of a self-adaptive graph learning layer and a spatio-temporal convolution module, the self-adaptive graph learning layer aims at learning the connection relation of a brain network, the spatio-temporal convolution module is composed of a spatio-temporal self-attention mechanism, a graph convolution and a common convolution, the spatio-temporal self-attention mechanism captures space dimensionality and dynamic performance on time dimensionality under different emotion states, the graph convolution achieves aggregation of adjacent nodes, and the common convolution is used for extracting features on the time dimensionality;
step (5-1) adaptive image learning
Defining a non-negative adjacency matrix based on the channel characteristics, as shown in equation (4):
Apq=g(xp,xq)(p,q∈{1,2,...N}) (4)
wherein A ispqRepresents the connection relationship between the channel p and the channel q, i.e. the weight of the edge connecting the node p and the node q, g (x)p,xq) The adjacency matrix A is expressed by learning a weight vector w, and the definition of A is shown in formula (5)
Figure BDA0003425521270000061
Step (5-2) space-time self-attention mechanism
A. And calculating time self-attention, wherein correlation exists between states of different time slices in time, the correlation is different in different cases, and the dynamic correlation between nodes in the time dimension is acquired by using an attention mechanism in an adaptive mode. Transposing the input to obtain χhThe dimension is (62 × 5 × 3), and the temporal attention is defined as shown in formula (6) and formula (7):
T=VT·σ(((χh)TU1)U2(U3χh)+bT) (6)
Figure BDA0003425521270000062
wherein T'i,jRepresenting the similarity of time i and time j. VT、U1、U2、U3、bTTo learn the parameters, σ is the sigmoid activation function.
B. Spatial attention is calculated. Spatially, channels at different locations interact, the effect being highly dynamic, using an attention mechanism to adaptively capture the dynamic correlation between nodes in the spatial dimension. Transposing the input to obtain χhThe dimension is (62 × 5 × 3), and the spatial attention is defined as shown in formula (8) and formula (9):
S=VS·σ((χhW1)W2(W3χh)T+bs) (8)
Figure BDA0003425521270000071
wherein S'p,qRepresenting the similarity of channel p and channel q. VS、W1、W2、W3、bSTo learn the parameters, σ is the sigmoid activation function.
Step (5-3) spatial convolution
And (3) calculating a Laplace matrix L-D-A, wherein A is the adjacent matrix obtained by the step (4-2), D is a degree matrix obtained by calculation based on A, namely D is a diagonal matrix with the same dimension as A, and elements Dii on the diagonal of the D matrix are the added values of the ith row in A.
A. Calculating the maximum eigenvalue λ of the Laplace matrix LmaxCalculated by the formula (10)
Figure BDA0003425521270000072
Figure BDA0003425521270000073
Where I is the identity matrix.
B. Recursively computing chebyshev polynomials according to equation (11):
Figure BDA0003425521270000074
wherein
Figure BDA0003425521270000075
C. Performing graph convolution according to equation (12)
Figure BDA0003425521270000076
Wherein g isθRepresenting convolution kernel,. about.G represents graph convolution operation,. thetakExpressing snow in shearThe coefficient is obtained by learning, and x is input data.
Step (5-4) time convolution
At this level, 2D convolution is performed in the time dimension using a 3 x 1 convolution kernel with step size of 1 and Padding of 1 to preserve input height and width.
Step (5-5) residual join and layer normalization
In order to alleviate the gradient vanishing problem and help the network to train better, a layer of residual network is added and layer normalization is performed.
Step (6) defining twin network architecture
Step (6-1) of obtaining interlayer characteristics
And (4) sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, and respectively generating two intermediate features embedding _1 and embedding _ 2.
Step (6-2) of calculating the distance between the pair of samples
Calculating the distance between the two outputs embedding _1 and embedding _2 of the step (6-1) in a manner shown in formula (13):
Figure BDA0003425521270000081
wherein emb _1 represents embedding _1, emb _2 represents embedding _2, C represents the number of channels, T represents the time length, F represents the feature number, emb _1ctfThe f-th row element, emb _2, representing the characteristic of the c-channel t at embedding _1ctfAn f row element representing the characteristic of the c channel t at the embedding _2 moment;
step (6-3) Multi-headed self-attention layer
A. Randomly initializing a learnable position matrix P, and performing position coding on the embedding _1 according to the formula (14):
Xembedding=embedding_1+P (14)
B. random initialization of 8 sets (also called 8-head) of matrices
Figure BDA0003425521270000082
(i=0,12,3,4,5,6,7) are embedded _1 (here denoted as X), respectivelyembedding) Dot multiplication results in 8 sets of Q, K, V matrices, as shown in equations (15) - (17):
Figure BDA0003425521270000083
Figure BDA0003425521270000084
Figure BDA0003425521270000091
i=0,1,2,3,4,5,6,7
C. for each group, the magnitude of the attention weight is obtained by matrix calculation, and is divided by WkEvolution of the first dimension of the matrix, i.e.
Figure BDA0003425521270000092
Then multiplying by V to obtain the output of attention layer, finally obtaining 8 groups of matrixes (Z)0-Z7) As shown in formula (18):
Figure BDA0003425521270000093
i=0,1,2,3,4,5,6,7
D. the 8 groups of matrixes are spliced together horizontally (Z)0,Z1,…,Z7) And then randomly initializing a matrix WoMultiplying the two matrixes to obtain a matrix Z, wherein the matrix Z is shown as a formula (19):
Z=concatenate(Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7)·Wo (19)
step (6-4) fully connecting the layer with the softmax layer
A. Flattening the output Z of the step (6-3) into a one-dimensional vector;
B. obtaining a vector with dimension of 16 through transformation of a full connection layer;
C. and obtaining a vector with the dimension of 3 through a full connection layer change, and activating by using a softmax function to obtain the probability that the sample input _1 belongs to each category.
Step (7) defining the input and output of the model
The model inputs are either a positive sample pair (input _1 and input _2) or a negative sample pair (input _1 and input _ 3). The model is composed of two outputs output _1 and output _2, wherein output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that sample input _1 belongs to each category.
Step (8) defining an objective function
The final objective function of the model consists of three loss functions. Firstly, in an adaptive graph learning layer, aiming at learning a brain connection relation graph, a loss function is used for constraining the relation between the distance of characteristics between channels and connection strength, the farther the distance between the characteristics between the two channels is, the weaker the connection strength is, and as the connection structure of the brain is not fully connected, a regularization term of an L2 norm is adopted to control the sparsity of the learned graph. The second is the contrast loss of the twin network, which aims to make the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther. And thirdly, cross entropy loss, which is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
The final objective function form of the model is shown in equation (20):
L=Lgraph_learn+ηLcontrastive_loss+Lcross_entropy (20)
where η is the tuning parameter between the two loss functions, the greater η, the greater the proportion of the contrast loss, and vice versa. The three components that make up the objective function are shown in equations (21) - (23):
Figure BDA0003425521270000101
wherein x ispIs characteristic of the p channel, xqIs characteristic of the q channel, ApqAnd lambda is the regularization coefficient, and is the communication strength of the p channel and the q channel.
Figure BDA0003425521270000102
Where d is the euclidean distance between channels p and q, y is a dichotomy label, y 0 indicates that samples m and N are not from the same emotion, y 1 indicates that samples m and N are from the same emotion, N is the number of sample pairs within a batch, and margin is a hyperparameter indicating the distance separating different emotion samples.
Figure BDA0003425521270000103
Wherein, yi,rIs a flag indicating whether the sample i belongs to the category r, if so, it is 1, otherwise, it is 0,
Figure BDA0003425521270000104
is the probability that the predicted sample i belongs to the class r.
Step (9) training and testing
Inputting the sample pairs generated in the step (4) into a model for training. After the model is trained, samples in a test set are used as input _1, and input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that an input sample pair is formed. The output _2 is used to calculate the accuracy of the classification.
And (10) evaluating the learning result by using the average classification accuracy.
And (10-1) evaluating the model by adopting the accuracy. The accuracy is the proportion of correctly classified samples to the total number of samples. In the experiment, 15 experimenters participate, each experimenter performs three experiments, each experiment watches 15 segments, and then the experiment is performed for 45 times, so that the accuracy calculation formula of the ith experiment is shown as the formula (24):
Figure BDA0003425521270000111
i=1,2,3,...,45
wherein, TP is a positive sample predicted as a positive class by the model, TN is a negative sample predicted as a negative class by the model, FP is a negative sample predicted as a positive class by the model, and FN is a positive sample predicted as a negative class by the model.
The average accuracy of the 15 2 experiments tested is shown in the formula (25)
Figure BDA0003425521270000112
The standard deviation of this experiment is shown in equation (26)
Figure BDA0003425521270000113
The accuracy of the invention is tested by training test, and the comparison between the obtained result and the prior art (SVM, GCNN, DGCNN, BiDANN) is shown in the following table 1. In the comparison, the two experimental results of each tested sample are taken for testing; and the average value of all the obtained accuracy data is used for measuring the effect of the model.
From fig. 5, it can be found that the accuracy of the method of the present invention is higher than that of SVM, GCNN, DGCNN, and bidinn methods.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims (9)

1. A emotion recognition method based on twin network architecture and graph convolution is characterized in that: the method comprises the following steps:
step one, acquiring a data set: acquiring electroencephalogram data of a subject when watching movie fragments, and immediately completing a questionnaire by the subject after watching each movie fragment to report emotional response of the subject to each movie fragment, wherein the emotional response comprises positive, neutral and negative, and the electroencephalogram data are 62-channel electroencephalogram signals of a designated electrode position acquired through a 10-20 international standard lead system;
step two, data preprocessing and feature extraction: carrying out down-sampling and artifact-removing preprocessing on an original EEG signal, filtering a time domain signal by using a Hamming window and carrying out fast Fourier transform, taking a signal at each second as a sliding window, calculating the differential entropy characteristics of 62 channels of 5 frequency bands, and carrying out normalization processing;
step three, sample generation: taking the differential entropy characteristics of 62 channels of 3s as a sample, wherein the dimension of one input sample is a time channel frequency band;
step four, generating a sample pair: traversing each sample generated in the step three, taking the sample as reference, recording the sample as input _1, randomly selecting a sample under the same emotion, recording the sample as input _2, forming a positive sample pair with the input _1, randomly selecting a sample under different emotions, recording the sample as input _3, and forming a negative sample pair with the input _ 1;
step five, defining a basic model: taking a space-time graph convolution neural network model as a basic model, wherein the basic model comprises an adaptive graph learning layer and a space-time convolution module;
step six, defining a twin network architecture: sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, respectively generating two intermediate features embedding _1 and embedding _2, then calculating the distance between embedding _1 and embedding _2, then extracting deeper features of embedding _1 from a multi-head self-attention layer, and outputting the probability that the input _1 belongs to each category after passing through a full-connection layer and a softmax layer;
step seven, input and output of the model are defined: the model input is a positive sample pair or a negative sample pair; the model has two outputs, namely output _1 and output _2, wherein output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that the sample input _1 belongs to each category;
step eight, defining an objective function: the final objective function of the model consists of three loss functions;
step nine, training and testing: during training, inputting the sample pairs generated in the step four into a basic model for training; during testing, in order to keep the input scales consistent, samples in a testing set are used as input _1, input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that input sample pairs are formed, and the classification accuracy is calculated by using the output _ 2;
and step ten, evaluating the learning result by using the average classification accuracy.
2. The emotion recognition method based on twin network architecture and graph convolution of claim 1, wherein: the self-adaptive graph learning layer aims at learning the connectivity of the brain network; the space-time convolution module comprises a space-time self-attention mechanism, a graph volume and a common convolution, wherein the space-time self-attention mechanism captures space dimensionality and dynamics on time dimensionality under different emotional states, the graph volume realizes aggregation of adjacent nodes, and the common convolution is used for extracting features on the time dimensionality.
3. The emotion recognition method based on twin network architecture and graph convolution of claim 2, wherein: the three loss functions include:
in an adaptive graph learning layer, a loss function is used for constraining the relationship between the distance of the characteristics between the channels and the communication strength, the farther the characteristic distance between the two channels is, the weaker the communication strength is, and the regularization term of L2 norm is adopted to control the sparsity of a learned graph;
the contrast loss of the twin network makes the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther;
and cross entropy loss is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
4. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the first step comprises the following steps:
step (1-1): selecting 15 movie fragments from a material library as stimuli used in the experiment, the movie fragments respectively comprising movie fragments with positive, neutral and negative emotions;
step (1-2): the subject watched 15 movie clips per experiment, each clip was prompted 5 seconds before, each clip presented 4 minutes, each clip evaluated itself 45 seconds after completion, and each clip had a rest 15 seconds after; wherein the self-assessment phase feeds back the emotional response of the subject to each movie fragment by completing a questionnaire.
5. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the second step comprises the following steps:
step (2-1): the original signal is down-sampled to 200Hz, and the signal subjected to the interference of the electro-oculogram and the myoelectricity is removed;
step (2-2): DE features are extracted for each channel from 5 frequency bands: deltaband(1-3Hz)、θband(4-7)、αband(8-13Hz)、βband(14-30Hz)、γband(31-50Hz):
Filtering the original data by adopting a Hamming window, performing fast Fourier transform on the data per second, and calculating the differential entropy of the five frequency bands;
the definition method of the differential entropy is as follows:
let X be { X ═ X1,x2,...,xnN is equal to or greater than 1, corresponding to a probability of
Figure FDA0003425521260000031
According to the definition method of Shannon information entropy, the information quantity of the nondeterministic system is shown as the formula (1):
Figure FDA0003425521260000032
the state probability p of the time domain in the above equationiIs replaced by based onFrequency domain power spectral density defined by fast fourier transform
Figure FDA0003425521260000033
The definition of the differential entropy is thus derived as shown in equation (2):
Figure FDA0003425521260000034
wherein
Figure FDA0003425521260000035
Representative power spectral density;
step (2-3): and normalizing the electroencephalogram signal by adopting z-score, wherein the normalization formula is shown as a formula (3):
Figure FDA0003425521260000036
wherein X is the EEG signal on each channel,
Figure FDA0003425521260000037
the mean value of the brain electrical signals on each channel is S, and the standard deviation of the brain electrical signals on each channel is S.
6. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the fifth step comprises the following steps:
step (5-1) adaptive graph learning:
defining a non-negative adjacency matrix based on the channel characteristics, as shown in equation (4):
Apq=g(xp,xq)(p,q∈{1,2,...N}) (4)
wherein A ispqRepresents the connection relationship between the channel p and the channel q, i.e. the weight of the edge connecting the node p and the node q, g (x)p,xq) Intended to be learned by learning a weight vector wThe adjacency matrix A is defined as shown in formula (5)
Figure FDA0003425521260000041
Step (5-2) space-time self-attention mechanism:
calculating time self-attention, using an attention mechanism to adaptively capture dynamic correlation between nodes in a time dimension, and transposing input to obtain χhThe dimension is (62 × 5 × 3), and the temporal attention is defined as shown in formula (6) and formula (7):
T=VT·σ(((χh)TU1)U2(U3χh)+bT) (6)
Figure FDA0003425521260000042
wherein T'i,jRepresents the similarity, V, of time i and time jT、U1、U2、U3、bTSigma is a sigmoid activation function for the learned parameters;
then, the spatial attention is calculated, the attention mechanism is used for capturing the dynamic correlation among the nodes in the spatial dimension in a self-adaptive mode, and the input is transposed to obtain chihThe dimension is (62 × 5 × 3), and the spatial attention is defined as shown in formula (8) and formula (9):
S=VS·σ((χhW1)W2(W3χh)T+bs) (8)
Figure FDA0003425521260000043
wherein S'p,qDenotes the similarity of channel p and channel q, VS、W1、W2、W3、bSSigma is a sigmoid activation function for the learned parameters;
step (5-3) spatial convolution:
calculating a Laplace matrix L ═ D-A, wherein A is the adjacency matrix learned in step (4-2), D is the degree matrix calculated based on A, namely D is a diagonal matrix with the same dimension as A, and the elements D on the diagonal of the D matrixiiIs the value added in the ith row in a,
calculating the maximum eigenvalue λ of the Laplace matrix LmaxCalculated by the formula (10)
Figure FDA0003425521260000044
Figure FDA0003425521260000051
Wherein I I is an identity matrix;
recursively computing chebyshev polynomials according to equation (11):
Figure FDA0003425521260000052
wherein
Figure FDA0003425521260000053
Graph convolution is performed according to equation (12):
Figure FDA0003425521260000054
wherein g isθRepresenting convolution kernel,. about.G represents graph convolution operation,. thetakExpressing the Chebyshev coefficient, and obtaining the Chebyshev coefficient through learning, wherein x is input data;
step (5-4) time convolution:
performing 2D convolution in the time dimension using a 3 x 1 convolution kernel with step size of 1 and Padding of 1 to preserve input height and width;
step (5-5) residual join and layer normalization:
and adding a layer of residual error network and carrying out layer normalization.
7. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the sixth step comprises the following steps:
step (6-1) of obtaining characteristics of the intermediate layer:
sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, and respectively generating two intermediate features embedding _1 and embedding _ 2;
calculating the distance of the sample pair in the step (6-2):
and (3) calculating the distance between the two outputs embedding _1 and embedding _2 in the step five in a manner shown in a formula (13):
Figure FDA0003425521260000055
wherein emb _1 represents embedding _1, emb _2 represents embedding _2, C represents the number of channels, T represents the time length, F represents the feature number, emb _1ctfThe f-th row element, emb _2, representing the characteristic of the c-channel t at embedding _1ctfAn f row element representing the characteristic of the c channel t at the embedding _2 moment;
step (6-3) multi-head self-attention layer:
encoding the embedding _1 according to the formula (14):
Xembedding=embedding_1+P (14)
wherein P is a learnable matrix;
random initialization of 8 sets of matrices Wq,Wk,WvRespectively, with XembeddingDot multiplication results in 8 sets of Q, K, V matrices, as shown in equations (15) - (17):
Qi=XembeddingWq i (15)
Ki=XembeddingWk i (16)
Vi=XembeddingWv i (17)
i=0,1,2,3,4,5,6,7
wherein Wq i,Wk i,Wv iIs a learnable matrix;
calculating the attention weight value of each group through Q and K matrixes, and dividing the attention weight value by WkEvolution of the first dimension of the matrix, i.e.
Figure FDA0003425521260000061
And then multiplied by V to obtain the output of the attention layer, finally 8 groups of matrixes (Z) are obtained0-Z7) As shown in formula (18):
Figure FDA0003425521260000062
the 8 groups of matrixes are spliced together horizontally (Z)0,Z1,…,Z7) Then randomly initializing a learnable matrix WoMultiplying the two matrixes to obtain a matrix Z, wherein the matrix Z is shown as a formula (19):
Z=concatenate(Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7)·Wo (19)
and (6-4) connecting the full connection layer with the softmax layer:
flattening the output Z of the step (6-3) into a one-dimensional vector; obtaining a vector with dimension of 16 through transformation of a full connection layer; and obtaining a vector with the dimension of 3 through a full connection layer change, and activating by using a softmax function to obtain the probability that the sample input _1 belongs to each category.
8. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the eighth step comprises the following steps:
the final objective function form of the model is shown in equation (20):
L=Lgraph_learn+ηLcontrastive_loss+Lcross_entropy (20)
wherein η is a regulation parameter between two loss functions, the larger η is, the larger proportion of the contrast loss is, and vice versa, and three parts forming the target function are shown in formulas (21) to (23):
Figure FDA0003425521260000071
wherein x ispIs characteristic of the p channel, xqIs characteristic of the q channel, ApqThe communication strength of the p channel and the q channel is defined, and lambda is a regularization coefficient;
Figure FDA0003425521260000072
wherein d is the euclidean distance between channels p and q, y is a dichotomy label, y is 0 indicating that samples m and N are not from the same emotion, y is 1 indicating that samples m and N are from the same emotion, N is the number of sample pairs within a batch, and margin is a hyperparameter indicating the distance separating different emotion samples;
Figure FDA0003425521260000073
wherein, yi,rIs a flag indicating whether the sample i belongs to the category r, if so, it is 1, otherwise, it is 0,
Figure FDA0003425521260000074
is the probability that the predicted sample i belongs to the class r.
9. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the step ten comprises the following steps:
and (3) evaluating the model by adopting the accuracy rate, wherein the accuracy rate is the proportion of the correctly classified samples to the total number of the samples, 15 tested subjects participate in the experiment, each tested subject performs three experiments, 15 fragments are watched in each experiment, and then the experiments are performed for 45 times in total, so that the accuracy rate calculation formula of the ith experiment is shown as the formula (24):
Figure FDA0003425521260000081
the average accuracy of the 15 2 experiments tested is shown in equation (25):
Figure FDA0003425521260000082
the standard deviation of this experiment is shown in equation (26):
Figure FDA0003425521260000083
wherein, TP is a positive sample predicted as a positive class by the model, TN is a negative sample predicted as a negative class by the model, FP is a negative sample predicted as a positive class by the model, and FN is a positive sample predicted as a negative class by the model.
CN202111617915.3A 2021-12-22 2021-12-22 Emotion recognition method based on twin network architecture and graph convolution Pending CN114330436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111617915.3A CN114330436A (en) 2021-12-22 2021-12-22 Emotion recognition method based on twin network architecture and graph convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111617915.3A CN114330436A (en) 2021-12-22 2021-12-22 Emotion recognition method based on twin network architecture and graph convolution

Publications (1)

Publication Number Publication Date
CN114330436A true CN114330436A (en) 2022-04-12

Family

ID=81015432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111617915.3A Pending CN114330436A (en) 2021-12-22 2021-12-22 Emotion recognition method based on twin network architecture and graph convolution

Country Status (1)

Country Link
CN (1) CN114330436A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765873A (en) * 2019-09-19 2020-02-07 华中师范大学 Facial expression recognition method and device based on expression intensity label distribution
CN112686117A (en) * 2020-12-24 2021-04-20 华中师范大学 Face expression intensity recognition method and system based on hidden variable analysis
CN113017630A (en) * 2021-03-02 2021-06-25 贵阳像树岭科技有限公司 Visual perception emotion recognition method
KR20210099492A (en) * 2020-02-04 2021-08-12 한국과학기술원 Method and Apparatus for Speech Emotion Recognition Using a Top-Down Attention and Bottom-Up Attention Neural Network
KR20210139119A (en) * 2020-05-13 2021-11-22 (주)사맛디 System, method and program for recobnizing emotion of the object basen on deep-learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765873A (en) * 2019-09-19 2020-02-07 华中师范大学 Facial expression recognition method and device based on expression intensity label distribution
KR20210099492A (en) * 2020-02-04 2021-08-12 한국과학기술원 Method and Apparatus for Speech Emotion Recognition Using a Top-Down Attention and Bottom-Up Attention Neural Network
KR20210139119A (en) * 2020-05-13 2021-11-22 (주)사맛디 System, method and program for recobnizing emotion of the object basen on deep-learning
CN112686117A (en) * 2020-12-24 2021-04-20 华中师范大学 Face expression intensity recognition method and system based on hidden variable analysis
CN113017630A (en) * 2021-03-02 2021-06-25 贵阳像树岭科技有限公司 Visual perception emotion recognition method

Similar Documents

Publication Publication Date Title
CN110598793B (en) Brain function network feature classification method
CH716863A2 (en) Depression detection system based on channel selection of multi-channel electroencephalography made using training sets.
Han et al. A multimodal approach for identifying autism spectrum disorders in children
CN108959895B (en) Electroencephalogram EEG (electroencephalogram) identity recognition method based on convolutional neural network
CN112990008B (en) Emotion recognition method and system based on three-dimensional characteristic diagram and convolutional neural network
CN111714118A (en) Brain cognition model fusion method based on ensemble learning
CN115804602A (en) Electroencephalogram emotion signal detection method, equipment and medium based on attention mechanism and with multi-channel feature fusion
CN114947883A (en) Time-frequency domain information fusion deep learning electroencephalogram noise reduction method
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Niu et al. A brain network analysis-based double way deep neural network for emotion recognition
CN111772629A (en) Brain cognitive skill transplantation method
CN113974627B (en) Emotion recognition method based on brain-computer generated confrontation
Ji et al. Cross-task cognitive workload recognition using a dynamic residual network with attention mechanism based on neurophysiological signals
CN114504331A (en) Mood recognition and classification method fusing CNN and LSTM
Schwabedal et al. Automated classification of sleep stages and EEG artifacts in mice with deep learning
Mohi-ud-Din et al. Detection of Autism Spectrum Disorder from EEG signals using pre-trained deep convolution neural networks
CN116662782A (en) MSFF-SENET-based motor imagery electroencephalogram decoding method
CN116662736A (en) Human body state assessment method based on deep learning hybrid model
Vafaei et al. Extracting a novel emotional EEG topographic map based on a stacked autoencoder network
CN114330436A (en) Emotion recognition method based on twin network architecture and graph convolution
CN114081492A (en) Electroencephalogram emotion recognition system based on learnable adjacency matrix
Singh et al. Emotion recognition using deep convolutional neural network on temporal representations of physiological signals
Huang et al. An Online Teaching Video Evaluation Scheme Based on EEG Signals and Machine Learning
Divya et al. Identification of epileptic seizures using autoencoders and convolutional neural network
CN114997315B (en) Multi-channel electroencephalogram integration-based error related potential classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination