CN114330436A - Emotion recognition method based on twin network architecture and graph convolution - Google Patents
Emotion recognition method based on twin network architecture and graph convolution Download PDFInfo
- Publication number
- CN114330436A CN114330436A CN202111617915.3A CN202111617915A CN114330436A CN 114330436 A CN114330436 A CN 114330436A CN 202111617915 A CN202111617915 A CN 202111617915A CN 114330436 A CN114330436 A CN 114330436A
- Authority
- CN
- China
- Prior art keywords
- input
- sample
- embedding
- channel
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 30
- 230000008451 emotion Effects 0.000 claims description 33
- 239000011159 matrix material Substances 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 26
- 238000002474 experimental method Methods 0.000 claims description 18
- 230000007246 mechanism Effects 0.000 claims description 14
- 210000004556 brain Anatomy 0.000 claims description 13
- 239000012634 fragment Substances 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000002996 emotional effect Effects 0.000 claims description 4
- 230000006397 emotional response Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 230000007935 neutral effect Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 101100501281 Caenorhabditis elegans emb-1 gene Proteins 0.000 claims description 2
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 claims description 2
- 238000004070 electrodeposition Methods 0.000 claims description 2
- 239000000463 material Substances 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 230000002123 temporal effect Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 239000010410 layer Substances 0.000 description 22
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000027534 Emotional disease Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 238000009098 adjuvant therapy Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 208000029560 autism spectrum disease Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
Abstract
The invention relates to an emotion recognition method based on a twin network architecture and graph convolution, which belongs to the technical field of electroencephalogram emotion recognition.
Description
Technical Field
The invention relates to an emotion recognition method based on a twin network architecture and graph convolution, and belongs to the technical field of electroencephalogram emotion recognition.
Background
The emotion has the functions of information transmission and behavior regulation in the daily communication, work, study and cognitive decision processes of people, correct emotion is recognized, and people can be helped to master correct information. Mood is a complex psychological and physiological state that results from the brain's response to these physiological changes and plays a crucial role in our lives. In recent years, more and more research has been focused on emotion recognition, not only to create an emotional interaction interface for a machine to perceive human emotions, but also to evaluate psychological diseases of patients with neurological disorders, such as parkinson's disease, autism spectrum disorder, schizophrenia, depression, and the like.
There are two main types of emotion recognition methods: non-physiological signals and physiological signals. Since a part of patients cannot express emotions through external physiological features such as facial expressions, body postures, and the like, and a part of people can deliberately disguise their emotions, measurement of physiological signals is often used as an analysis signal source for emotion classification. The electroencephalogram signal has the advantages of high time resolution, no wound, low acquisition cost and easy acquisition, is one of common physiological signals, and has been proved to reflect important information of human emotional states.
The electroencephalogram data is not regular Euclidean data, and for irregular brain network structures, complex connections among channels can be captured better by using image convolution. However, increasing too many map convolutional layers may result in over-smoothing, which affects accuracy, and how to extract more effective features and improve emotion classification accuracy is worth thinking. At present, methods such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a Support Vector Machine (SVM) and the like are used for recognizing electroencephalogram emotion, and considerable results are obtained, but the methods are partially insufficient in emotion recognition accuracy.
At present, there is a research method for emotion recognition classification based on feature extraction of difference entropy of electroencephalogram signals and combination of an LSTM neural network model (publication number CN110897648A), which comprises the following steps: (1) extracting 62-channel electroencephalogram signals of normal adults; (2) calculating a Differential Entropy (DE) of the time sequence to form a 62-dimensional time sequence characteristic; (3) the time sequence characteristics are used as the input of an LSTM neural network and are trained and learned; (4) the results of the network training were evaluated using the average classification accuracy, standard deviation, and F1 values. The method has good effect, can effectively identify and classify three emotions, finds out the heterogeneity of the electroencephalogram signals with three different emotions from the characteristics of the electroencephalogram signals such as non-stationarity, nonlinearity, time-frequency domain, complexity and the like, thereby distinguishing the three emotions and helping the adjuvant therapy recovery of various diseases, and has the difference from the method that: the invention uses a twin network framework to train auxiliary tasks, and utilizes the characteristics of the middle layer to perform comparative learning while performing emotion classification by using a model, thereby improving the performance and generalization capability of the model and having different emotion recognition accuracy rates.
Disclosure of Invention
Aiming at the defects, the invention provides an emotion recognition method based on a twin network architecture and graph convolution, which can self-adaptively endow different importance to data by utilizing a multi-head self-attention mechanism and extract more deep and effective information. In addition, in order to further improve the model precision, the method utilizes a twin network framework to train an auxiliary task, and utilizes the characteristics of the middle layer to perform comparative learning while performing emotion classification by using the model, so that the performance and generalization capability of the model are improved.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a twin network architecture and graph convolution-based emotion recognition method comprises the following steps:
and acquiring a data set. Acquiring electroencephalogram signals of 62 appointed electrode positions of a subject when watching the movie fragments, and immediately completing a questionnaire to report emotional reactions (neutral, negative and positive emotions) of the subject to each movie fragment after finishing watching each movie fragment;
data preprocessing and feature extraction. The original EEG signal is down sampled and artifact pre-processed. Filtering a time domain signal by using a Hamming window, performing Fast Fourier Transform (FFT), taking a signal every second as a sliding window, calculating the differential entropy characteristics of 62 channels of 5 frequency bands, and performing normalization processing;
and (4) generating a sample. For efficient use of time information, the DE feature of the 62 channels of 3s is used as a sample, and the dimension of one input sample is 3 × 62 × 5 (time × channel × frequency band).
And generating a sample pair. Traversing each sample generated in the step three, taking the sample as reference and being denoted as input _1, randomly selecting a sample (denoted as input _2) under the same emotion, forming a positive sample pair with the input _1, randomly selecting a sample (denoted as input _3) under different emotions, forming a negative sample pair with the input _1, namely, the input _2 and the input _1 in the positive sample pair are samples under the same emotion, and the input _3 and the input _1 in the negative sample pair are samples under different emotions.
A base model is defined. And taking the space-time graph convolutional neural network model as a basic model. The basic neural network consists of an adaptive graph learning layer and a space-time convolution module. The self-adaptive graph learning layer aims at learning the connection relation of the brain network; the space-time convolution module is composed of a space-time self-attention mechanism, a graph convolution and a common convolution, the space-time self-attention mechanism captures dynamics in space dimensions and time dimensions under different emotional states, the graph convolution achieves aggregation of adjacent nodes, and the common convolution is used for extracting features in the time dimensions.
A twin network architecture is defined. Inputting two samples in the sample pair, namely input _1 and input _2 or input3, into the same basic model in sequence, respectively generating two intermediate features embedding _1 and embedding _2, and then calculating the distance between embedding _1 and embedding _ 2. And then extracting deeper features of the embedding _1 by using the multi-head self-attention layer, and outputting the probability that the input _1 belongs to each category after passing through the full-connection layer and the softmax layer.
The inputs and outputs of the model are defined. The model inputs are either a positive sample pair (input _1 and input _2) or a negative sample pair (input _1 and input _ 3). The model has two outputs output _1 and output _2, where output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that sample input _1 belongs to each class.
An objective function is defined. The final objective function of the model consists of three loss functions. Firstly, in an adaptive graph learning layer, aiming at learning brain connection relation, a loss function is used for constraining the relation between the distance of characteristics between channels and connection strength, the farther the distance of the characteristics between the two channels is, the weaker the connection strength is, and the connection structure of the brain is not fully connected, so that the sparsity of the learned graph is controlled by adopting a regularization term of an L2 norm. The second is the contrast loss of the twin network, which aims to make the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther. And thirdly, cross entropy loss, which is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
And (5) training and testing. Inputting the sample pairs generated in the step six into the model for training. After the model is trained, samples in a test set are used as input _1, and input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that an input sample pair is formed. The output _2 is used to calculate the accuracy of the classification.
And evaluating the learning result by using the average classification accuracy.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides an emotion recognition method based on twin network architecture and graph convolution, which can self-adaptively endow different importance to data by utilizing a multi-head self-attention mechanism and extract deeper and more effective information; meanwhile, a twin network framework is borrowed, and the characteristics of the middle layer are utilized for comparative learning, so that the performance and the generalization capability of the model are improved. In addition, compared with other graph convolution methods, the emotion recognition accuracy of the model based on the twin network architecture is improved, and the accuracy reaches 94.78 +/-05.97%.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 2 is a spatiotemporal graph convolution neural network model of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 3 is an experimental mode of a twin network architecture and graph convolution based emotion recognition method provided by the invention;
FIG. 4 is a channel position of electroencephalogram type acquisition based on a twin network architecture and graph convolution emotion recognition method provided by the invention;
FIG. 5 is a comparison chart of a test of the emotion recognition method based on a twin network architecture and graph convolution provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, a method for emotion recognition based on twin network architecture and graph convolution specifically includes the following steps:
step (1) dataset acquisition
Step (1-1) 15 movie fragments (containing positive, neutral and negative emotions) were selected from the materials library as stimuli used in the experiment;
step (1-2) the experimenter watches 15 segments in each experiment, 5 seconds of prompt is provided before each segment, each segment presents 4 minutes, self-evaluation is carried out for 45 seconds after each segment is finished, and rest is carried out for 15 seconds after each segment; wherein the self-assessment phase feeds back the experimenter's emotional response to each movie clip by completing a questionnaire;
step (2) data preprocessing and feature extraction
Step (2-1) down-sampling the original signal to 200Hz, and removing the signal subjected to the interference of the electro-oculogram and the myoelectricity;
and (2-2) extracting DE characteristics for each channel from 5 frequency bands: delta band (1-3Hz), theta band (4-7), alpha band (8-13Hz), beta band (14-30Hz), gamma band (31-50 Hz):
A. filtering original data by adopting a Hamming window, performing fast Fourier transform on data per second, and calculating the differential entropy of the five frequency bands;
B. the definition method of the differential entropy is as follows:
let X be { X ═ X1,x2,...,xnN is equal to or greater than 1, corresponding to a probability ofAccording to the definition method of Shannon information entropy, the information quantity of the nondeterministic system is shown as the formula (1):
the state probability pi of the time domain in the above equation is replaced by the frequency domain power spectral density p ^ defined based on the fast Fourier transform, so that the definition of the induced differential entropy is shown as equation (2):
and (2-3) normalizing the electroencephalogram signal by adopting z-score, wherein the normalization formula is shown as the formula (3):
wherein X is the EEG signal on each channel,the mean value of the brain electrical signals on each channel is S, and the standard deviation of the brain electrical signals on each channel is S.
Step (3) sample Generation
For efficient use of time information, the DE feature of the 62 channels of 3s is used as a sample, and the dimension of one input sample is 3 × 62 × 5 (time × channel × frequency band).
Step (4) sample pair generation
Traversing each sample generated in the step three, and taking the sample as a reference (denoted as input _1), randomly selecting a sample (denoted as input _2) under the same emotion to form a positive sample pair with input _1, and randomly selecting a sample (denoted as input _3) under different emotions to form a negative sample pair with input _1, that is, input _2 and input _1 in the positive sample pair are samples under the same emotion, and input _3 and input _1 in the negative sample pair are samples under different emotions.
Step (5) defining a base model
The method comprises the following steps that a spatio-temporal graph convolution neural network model is used as a basic model, the basic model is composed of a self-adaptive graph learning layer and a spatio-temporal convolution module, the self-adaptive graph learning layer aims at learning the connection relation of a brain network, the spatio-temporal convolution module is composed of a spatio-temporal self-attention mechanism, a graph convolution and a common convolution, the spatio-temporal self-attention mechanism captures space dimensionality and dynamic performance on time dimensionality under different emotion states, the graph convolution achieves aggregation of adjacent nodes, and the common convolution is used for extracting features on the time dimensionality;
step (5-1) adaptive image learning
Defining a non-negative adjacency matrix based on the channel characteristics, as shown in equation (4):
Apq=g(xp,xq)(p,q∈{1,2,...N}) (4)
wherein A ispqRepresents the connection relationship between the channel p and the channel q, i.e. the weight of the edge connecting the node p and the node q, g (x)p,xq) The adjacency matrix A is expressed by learning a weight vector w, and the definition of A is shown in formula (5)
Step (5-2) space-time self-attention mechanism
A. And calculating time self-attention, wherein correlation exists between states of different time slices in time, the correlation is different in different cases, and the dynamic correlation between nodes in the time dimension is acquired by using an attention mechanism in an adaptive mode. Transposing the input to obtain χhThe dimension is (62 × 5 × 3), and the temporal attention is defined as shown in formula (6) and formula (7):
T=VT·σ(((χh)TU1)U2(U3χh)+bT) (6)
wherein T'i,jRepresenting the similarity of time i and time j. VT、U1、U2、U3、bTTo learn the parameters, σ is the sigmoid activation function.
B. Spatial attention is calculated. Spatially, channels at different locations interact, the effect being highly dynamic, using an attention mechanism to adaptively capture the dynamic correlation between nodes in the spatial dimension. Transposing the input to obtain χhThe dimension is (62 × 5 × 3), and the spatial attention is defined as shown in formula (8) and formula (9):
S=VS·σ((χhW1)W2(W3χh)T+bs) (8)
wherein S'p,qRepresenting the similarity of channel p and channel q. VS、W1、W2、W3、bSTo learn the parameters, σ is the sigmoid activation function.
Step (5-3) spatial convolution
And (3) calculating a Laplace matrix L-D-A, wherein A is the adjacent matrix obtained by the step (4-2), D is a degree matrix obtained by calculation based on A, namely D is a diagonal matrix with the same dimension as A, and elements Dii on the diagonal of the D matrix are the added values of the ith row in A.
Where I is the identity matrix.
B. Recursively computing chebyshev polynomials according to equation (11):
C. Performing graph convolution according to equation (12)
Wherein g isθRepresenting convolution kernel,. about.G represents graph convolution operation,. thetakExpressing snow in shearThe coefficient is obtained by learning, and x is input data.
Step (5-4) time convolution
At this level, 2D convolution is performed in the time dimension using a 3 x 1 convolution kernel with step size of 1 and Padding of 1 to preserve input height and width.
Step (5-5) residual join and layer normalization
In order to alleviate the gradient vanishing problem and help the network to train better, a layer of residual network is added and layer normalization is performed.
Step (6) defining twin network architecture
Step (6-1) of obtaining interlayer characteristics
And (4) sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, and respectively generating two intermediate features embedding _1 and embedding _ 2.
Step (6-2) of calculating the distance between the pair of samples
Calculating the distance between the two outputs embedding _1 and embedding _2 of the step (6-1) in a manner shown in formula (13):
wherein emb _1 represents embedding _1, emb _2 represents embedding _2, C represents the number of channels, T represents the time length, F represents the feature number, emb _1ctfThe f-th row element, emb _2, representing the characteristic of the c-channel t at embedding _1ctfAn f row element representing the characteristic of the c channel t at the embedding _2 moment;
step (6-3) Multi-headed self-attention layer
A. Randomly initializing a learnable position matrix P, and performing position coding on the embedding _1 according to the formula (14):
Xembedding=embedding_1+P (14)
B. random initialization of 8 sets (also called 8-head) of matrices(i=0,12,3,4,5,6,7) are embedded _1 (here denoted as X), respectivelyembedding) Dot multiplication results in 8 sets of Q, K, V matrices, as shown in equations (15) - (17):
i=0,1,2,3,4,5,6,7
C. for each group, the magnitude of the attention weight is obtained by matrix calculation, and is divided by WkEvolution of the first dimension of the matrix, i.e.Then multiplying by V to obtain the output of attention layer, finally obtaining 8 groups of matrixes (Z)0-Z7) As shown in formula (18):
i=0,1,2,3,4,5,6,7
D. the 8 groups of matrixes are spliced together horizontally (Z)0,Z1,…,Z7) And then randomly initializing a matrix WoMultiplying the two matrixes to obtain a matrix Z, wherein the matrix Z is shown as a formula (19):
Z=concatenate(Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7)·Wo (19)
step (6-4) fully connecting the layer with the softmax layer
A. Flattening the output Z of the step (6-3) into a one-dimensional vector;
B. obtaining a vector with dimension of 16 through transformation of a full connection layer;
C. and obtaining a vector with the dimension of 3 through a full connection layer change, and activating by using a softmax function to obtain the probability that the sample input _1 belongs to each category.
Step (7) defining the input and output of the model
The model inputs are either a positive sample pair (input _1 and input _2) or a negative sample pair (input _1 and input _ 3). The model is composed of two outputs output _1 and output _2, wherein output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that sample input _1 belongs to each category.
Step (8) defining an objective function
The final objective function of the model consists of three loss functions. Firstly, in an adaptive graph learning layer, aiming at learning a brain connection relation graph, a loss function is used for constraining the relation between the distance of characteristics between channels and connection strength, the farther the distance between the characteristics between the two channels is, the weaker the connection strength is, and as the connection structure of the brain is not fully connected, a regularization term of an L2 norm is adopted to control the sparsity of the learned graph. The second is the contrast loss of the twin network, which aims to make the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther. And thirdly, cross entropy loss, which is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
The final objective function form of the model is shown in equation (20):
L=Lgraph_learn+ηLcontrastive_loss+Lcross_entropy (20)
where η is the tuning parameter between the two loss functions, the greater η, the greater the proportion of the contrast loss, and vice versa. The three components that make up the objective function are shown in equations (21) - (23):
wherein x ispIs characteristic of the p channel, xqIs characteristic of the q channel, ApqAnd lambda is the regularization coefficient, and is the communication strength of the p channel and the q channel.
Where d is the euclidean distance between channels p and q, y is a dichotomy label, y 0 indicates that samples m and N are not from the same emotion, y 1 indicates that samples m and N are from the same emotion, N is the number of sample pairs within a batch, and margin is a hyperparameter indicating the distance separating different emotion samples.
Wherein, yi,rIs a flag indicating whether the sample i belongs to the category r, if so, it is 1, otherwise, it is 0,is the probability that the predicted sample i belongs to the class r.
Step (9) training and testing
Inputting the sample pairs generated in the step (4) into a model for training. After the model is trained, samples in a test set are used as input _1, and input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that an input sample pair is formed. The output _2 is used to calculate the accuracy of the classification.
And (10) evaluating the learning result by using the average classification accuracy.
And (10-1) evaluating the model by adopting the accuracy. The accuracy is the proportion of correctly classified samples to the total number of samples. In the experiment, 15 experimenters participate, each experimenter performs three experiments, each experiment watches 15 segments, and then the experiment is performed for 45 times, so that the accuracy calculation formula of the ith experiment is shown as the formula (24):
i=1,2,3,...,45
wherein, TP is a positive sample predicted as a positive class by the model, TN is a negative sample predicted as a negative class by the model, FP is a negative sample predicted as a positive class by the model, and FN is a positive sample predicted as a negative class by the model.
The average accuracy of the 15 2 experiments tested is shown in the formula (25)
The standard deviation of this experiment is shown in equation (26)
The accuracy of the invention is tested by training test, and the comparison between the obtained result and the prior art (SVM, GCNN, DGCNN, BiDANN) is shown in the following table 1. In the comparison, the two experimental results of each tested sample are taken for testing; and the average value of all the obtained accuracy data is used for measuring the effect of the model.
From fig. 5, it can be found that the accuracy of the method of the present invention is higher than that of SVM, GCNN, DGCNN, and bidinn methods.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.
Claims (9)
1. A emotion recognition method based on twin network architecture and graph convolution is characterized in that: the method comprises the following steps:
step one, acquiring a data set: acquiring electroencephalogram data of a subject when watching movie fragments, and immediately completing a questionnaire by the subject after watching each movie fragment to report emotional response of the subject to each movie fragment, wherein the emotional response comprises positive, neutral and negative, and the electroencephalogram data are 62-channel electroencephalogram signals of a designated electrode position acquired through a 10-20 international standard lead system;
step two, data preprocessing and feature extraction: carrying out down-sampling and artifact-removing preprocessing on an original EEG signal, filtering a time domain signal by using a Hamming window and carrying out fast Fourier transform, taking a signal at each second as a sliding window, calculating the differential entropy characteristics of 62 channels of 5 frequency bands, and carrying out normalization processing;
step three, sample generation: taking the differential entropy characteristics of 62 channels of 3s as a sample, wherein the dimension of one input sample is a time channel frequency band;
step four, generating a sample pair: traversing each sample generated in the step three, taking the sample as reference, recording the sample as input _1, randomly selecting a sample under the same emotion, recording the sample as input _2, forming a positive sample pair with the input _1, randomly selecting a sample under different emotions, recording the sample as input _3, and forming a negative sample pair with the input _ 1;
step five, defining a basic model: taking a space-time graph convolution neural network model as a basic model, wherein the basic model comprises an adaptive graph learning layer and a space-time convolution module;
step six, defining a twin network architecture: sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, respectively generating two intermediate features embedding _1 and embedding _2, then calculating the distance between embedding _1 and embedding _2, then extracting deeper features of embedding _1 from a multi-head self-attention layer, and outputting the probability that the input _1 belongs to each category after passing through a full-connection layer and a softmax layer;
step seven, input and output of the model are defined: the model input is a positive sample pair or a negative sample pair; the model has two outputs, namely output _1 and output _2, wherein output _1 is the distance between the intermediate features embedding _1 and embedding _2, and output _2 is the probability that the sample input _1 belongs to each category;
step eight, defining an objective function: the final objective function of the model consists of three loss functions;
step nine, training and testing: during training, inputting the sample pairs generated in the step four into a basic model for training; during testing, in order to keep the input scales consistent, samples in a testing set are used as input _1, input _2 is obtained by randomly initializing a tensor with the same dimensionality as that of the input _1, so that input sample pairs are formed, and the classification accuracy is calculated by using the output _ 2;
and step ten, evaluating the learning result by using the average classification accuracy.
2. The emotion recognition method based on twin network architecture and graph convolution of claim 1, wherein: the self-adaptive graph learning layer aims at learning the connectivity of the brain network; the space-time convolution module comprises a space-time self-attention mechanism, a graph volume and a common convolution, wherein the space-time self-attention mechanism captures space dimensionality and dynamics on time dimensionality under different emotional states, the graph volume realizes aggregation of adjacent nodes, and the common convolution is used for extracting features on the time dimensionality.
3. The emotion recognition method based on twin network architecture and graph convolution of claim 2, wherein: the three loss functions include:
in an adaptive graph learning layer, a loss function is used for constraining the relationship between the distance of the characteristics between the channels and the communication strength, the farther the characteristic distance between the two channels is, the weaker the communication strength is, and the regularization term of L2 norm is adopted to control the sparsity of a learned graph;
the contrast loss of the twin network makes the distance between the positive sample pairs in the fourth step closer and the distance between the negative sample pairs farther;
and cross entropy loss is used for measuring the error between the input _1 predicted value and the real sample mark in the step six.
4. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the first step comprises the following steps:
step (1-1): selecting 15 movie fragments from a material library as stimuli used in the experiment, the movie fragments respectively comprising movie fragments with positive, neutral and negative emotions;
step (1-2): the subject watched 15 movie clips per experiment, each clip was prompted 5 seconds before, each clip presented 4 minutes, each clip evaluated itself 45 seconds after completion, and each clip had a rest 15 seconds after; wherein the self-assessment phase feeds back the emotional response of the subject to each movie fragment by completing a questionnaire.
5. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the second step comprises the following steps:
step (2-1): the original signal is down-sampled to 200Hz, and the signal subjected to the interference of the electro-oculogram and the myoelectricity is removed;
step (2-2): DE features are extracted for each channel from 5 frequency bands: deltaband(1-3Hz)、θband(4-7)、αband(8-13Hz)、βband(14-30Hz)、γband(31-50Hz):
Filtering the original data by adopting a Hamming window, performing fast Fourier transform on the data per second, and calculating the differential entropy of the five frequency bands;
the definition method of the differential entropy is as follows:
let X be { X ═ X1,x2,...,xnN is equal to or greater than 1, corresponding to a probability ofAccording to the definition method of Shannon information entropy, the information quantity of the nondeterministic system is shown as the formula (1):
the state probability p of the time domain in the above equationiIs replaced by based onFrequency domain power spectral density defined by fast fourier transformThe definition of the differential entropy is thus derived as shown in equation (2):
step (2-3): and normalizing the electroencephalogram signal by adopting z-score, wherein the normalization formula is shown as a formula (3):
6. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the fifth step comprises the following steps:
step (5-1) adaptive graph learning:
defining a non-negative adjacency matrix based on the channel characteristics, as shown in equation (4):
Apq=g(xp,xq)(p,q∈{1,2,...N}) (4)
wherein A ispqRepresents the connection relationship between the channel p and the channel q, i.e. the weight of the edge connecting the node p and the node q, g (x)p,xq) Intended to be learned by learning a weight vector wThe adjacency matrix A is defined as shown in formula (5)
Step (5-2) space-time self-attention mechanism:
calculating time self-attention, using an attention mechanism to adaptively capture dynamic correlation between nodes in a time dimension, and transposing input to obtain χhThe dimension is (62 × 5 × 3), and the temporal attention is defined as shown in formula (6) and formula (7):
T=VT·σ(((χh)TU1)U2(U3χh)+bT) (6)
wherein T'i,jRepresents the similarity, V, of time i and time jT、U1、U2、U3、bTSigma is a sigmoid activation function for the learned parameters;
then, the spatial attention is calculated, the attention mechanism is used for capturing the dynamic correlation among the nodes in the spatial dimension in a self-adaptive mode, and the input is transposed to obtain chihThe dimension is (62 × 5 × 3), and the spatial attention is defined as shown in formula (8) and formula (9):
S=VS·σ((χhW1)W2(W3χh)T+bs) (8)
wherein S'p,qDenotes the similarity of channel p and channel q, VS、W1、W2、W3、bSSigma is a sigmoid activation function for the learned parameters;
step (5-3) spatial convolution:
calculating a Laplace matrix L ═ D-A, wherein A is the adjacency matrix learned in step (4-2), D is the degree matrix calculated based on A, namely D is a diagonal matrix with the same dimension as A, and the elements D on the diagonal of the D matrixiiIs the value added in the ith row in a,
Wherein I I is an identity matrix;
recursively computing chebyshev polynomials according to equation (11):
Graph convolution is performed according to equation (12):
wherein g isθRepresenting convolution kernel,. about.G represents graph convolution operation,. thetakExpressing the Chebyshev coefficient, and obtaining the Chebyshev coefficient through learning, wherein x is input data;
step (5-4) time convolution:
performing 2D convolution in the time dimension using a 3 x 1 convolution kernel with step size of 1 and Padding of 1 to preserve input height and width;
step (5-5) residual join and layer normalization:
and adding a layer of residual error network and carrying out layer normalization.
7. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the sixth step comprises the following steps:
step (6-1) of obtaining characteristics of the intermediate layer:
sequentially inputting two samples in the sample pair generated in the step five, namely input _1 and input _2 or input3, into the same basic model, and respectively generating two intermediate features embedding _1 and embedding _ 2;
calculating the distance of the sample pair in the step (6-2):
and (3) calculating the distance between the two outputs embedding _1 and embedding _2 in the step five in a manner shown in a formula (13):
wherein emb _1 represents embedding _1, emb _2 represents embedding _2, C represents the number of channels, T represents the time length, F represents the feature number, emb _1ctfThe f-th row element, emb _2, representing the characteristic of the c-channel t at embedding _1ctfAn f row element representing the characteristic of the c channel t at the embedding _2 moment;
step (6-3) multi-head self-attention layer:
encoding the embedding _1 according to the formula (14):
Xembedding=embedding_1+P (14)
wherein P is a learnable matrix;
random initialization of 8 sets of matrices Wq,Wk,WvRespectively, with XembeddingDot multiplication results in 8 sets of Q, K, V matrices, as shown in equations (15) - (17):
Qi=XembeddingWq i (15)
Ki=XembeddingWk i (16)
Vi=XembeddingWv i (17)
i=0,1,2,3,4,5,6,7
wherein Wq i,Wk i,Wv iIs a learnable matrix;
calculating the attention weight value of each group through Q and K matrixes, and dividing the attention weight value by WkEvolution of the first dimension of the matrix, i.e.And then multiplied by V to obtain the output of the attention layer, finally 8 groups of matrixes (Z) are obtained0-Z7) As shown in formula (18):
the 8 groups of matrixes are spliced together horizontally (Z)0,Z1,…,Z7) Then randomly initializing a learnable matrix WoMultiplying the two matrixes to obtain a matrix Z, wherein the matrix Z is shown as a formula (19):
Z=concatenate(Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7)·Wo (19)
and (6-4) connecting the full connection layer with the softmax layer:
flattening the output Z of the step (6-3) into a one-dimensional vector; obtaining a vector with dimension of 16 through transformation of a full connection layer; and obtaining a vector with the dimension of 3 through a full connection layer change, and activating by using a softmax function to obtain the probability that the sample input _1 belongs to each category.
8. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the eighth step comprises the following steps:
the final objective function form of the model is shown in equation (20):
L=Lgraph_learn+ηLcontrastive_loss+Lcross_entropy (20)
wherein η is a regulation parameter between two loss functions, the larger η is, the larger proportion of the contrast loss is, and vice versa, and three parts forming the target function are shown in formulas (21) to (23):
wherein x ispIs characteristic of the p channel, xqIs characteristic of the q channel, ApqThe communication strength of the p channel and the q channel is defined, and lambda is a regularization coefficient;
wherein d is the euclidean distance between channels p and q, y is a dichotomy label, y is 0 indicating that samples m and N are not from the same emotion, y is 1 indicating that samples m and N are from the same emotion, N is the number of sample pairs within a batch, and margin is a hyperparameter indicating the distance separating different emotion samples;
9. A twin network architecture and graph convolution based emotion recognition method according to claim 1, 2 or 3, characterised in that: the step ten comprises the following steps:
and (3) evaluating the model by adopting the accuracy rate, wherein the accuracy rate is the proportion of the correctly classified samples to the total number of the samples, 15 tested subjects participate in the experiment, each tested subject performs three experiments, 15 fragments are watched in each experiment, and then the experiments are performed for 45 times in total, so that the accuracy rate calculation formula of the ith experiment is shown as the formula (24):
the average accuracy of the 15 2 experiments tested is shown in equation (25):
the standard deviation of this experiment is shown in equation (26):
wherein, TP is a positive sample predicted as a positive class by the model, TN is a negative sample predicted as a negative class by the model, FP is a negative sample predicted as a positive class by the model, and FN is a positive sample predicted as a negative class by the model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111617915.3A CN114330436A (en) | 2021-12-22 | 2021-12-22 | Emotion recognition method based on twin network architecture and graph convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111617915.3A CN114330436A (en) | 2021-12-22 | 2021-12-22 | Emotion recognition method based on twin network architecture and graph convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114330436A true CN114330436A (en) | 2022-04-12 |
Family
ID=81015432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111617915.3A Pending CN114330436A (en) | 2021-12-22 | 2021-12-22 | Emotion recognition method based on twin network architecture and graph convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114330436A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765873A (en) * | 2019-09-19 | 2020-02-07 | 华中师范大学 | Facial expression recognition method and device based on expression intensity label distribution |
CN112686117A (en) * | 2020-12-24 | 2021-04-20 | 华中师范大学 | Face expression intensity recognition method and system based on hidden variable analysis |
CN113017630A (en) * | 2021-03-02 | 2021-06-25 | 贵阳像树岭科技有限公司 | Visual perception emotion recognition method |
KR20210099492A (en) * | 2020-02-04 | 2021-08-12 | 한국과학기술원 | Method and Apparatus for Speech Emotion Recognition Using a Top-Down Attention and Bottom-Up Attention Neural Network |
KR20210139119A (en) * | 2020-05-13 | 2021-11-22 | (주)사맛디 | System, method and program for recobnizing emotion of the object basen on deep-learning |
-
2021
- 2021-12-22 CN CN202111617915.3A patent/CN114330436A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765873A (en) * | 2019-09-19 | 2020-02-07 | 华中师范大学 | Facial expression recognition method and device based on expression intensity label distribution |
KR20210099492A (en) * | 2020-02-04 | 2021-08-12 | 한국과학기술원 | Method and Apparatus for Speech Emotion Recognition Using a Top-Down Attention and Bottom-Up Attention Neural Network |
KR20210139119A (en) * | 2020-05-13 | 2021-11-22 | (주)사맛디 | System, method and program for recobnizing emotion of the object basen on deep-learning |
CN112686117A (en) * | 2020-12-24 | 2021-04-20 | 华中师范大学 | Face expression intensity recognition method and system based on hidden variable analysis |
CN113017630A (en) * | 2021-03-02 | 2021-06-25 | 贵阳像树岭科技有限公司 | Visual perception emotion recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110598793B (en) | Brain function network feature classification method | |
CH716863A2 (en) | Depression detection system based on channel selection of multi-channel electroencephalography made using training sets. | |
Han et al. | A multimodal approach for identifying autism spectrum disorders in children | |
CN108959895B (en) | Electroencephalogram EEG (electroencephalogram) identity recognition method based on convolutional neural network | |
CN112990008B (en) | Emotion recognition method and system based on three-dimensional characteristic diagram and convolutional neural network | |
CN111714118A (en) | Brain cognition model fusion method based on ensemble learning | |
CN115804602A (en) | Electroencephalogram emotion signal detection method, equipment and medium based on attention mechanism and with multi-channel feature fusion | |
CN114947883A (en) | Time-frequency domain information fusion deep learning electroencephalogram noise reduction method | |
Jinliang et al. | EEG emotion recognition based on granger causality and capsnet neural network | |
Niu et al. | A brain network analysis-based double way deep neural network for emotion recognition | |
CN111772629A (en) | Brain cognitive skill transplantation method | |
CN113974627B (en) | Emotion recognition method based on brain-computer generated confrontation | |
Ji et al. | Cross-task cognitive workload recognition using a dynamic residual network with attention mechanism based on neurophysiological signals | |
CN114504331A (en) | Mood recognition and classification method fusing CNN and LSTM | |
Schwabedal et al. | Automated classification of sleep stages and EEG artifacts in mice with deep learning | |
Mohi-ud-Din et al. | Detection of Autism Spectrum Disorder from EEG signals using pre-trained deep convolution neural networks | |
CN116662782A (en) | MSFF-SENET-based motor imagery electroencephalogram decoding method | |
CN116662736A (en) | Human body state assessment method based on deep learning hybrid model | |
Vafaei et al. | Extracting a novel emotional EEG topographic map based on a stacked autoencoder network | |
CN114330436A (en) | Emotion recognition method based on twin network architecture and graph convolution | |
CN114081492A (en) | Electroencephalogram emotion recognition system based on learnable adjacency matrix | |
Singh et al. | Emotion recognition using deep convolutional neural network on temporal representations of physiological signals | |
Huang et al. | An Online Teaching Video Evaluation Scheme Based on EEG Signals and Machine Learning | |
Divya et al. | Identification of epileptic seizures using autoencoders and convolutional neural network | |
CN114997315B (en) | Multi-channel electroencephalogram integration-based error related potential classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |