CN114209323B - Method for identifying emotion and emotion identification model based on electroencephalogram data - Google Patents

Method for identifying emotion and emotion identification model based on electroencephalogram data Download PDF

Info

Publication number
CN114209323B
CN114209323B CN202210069138.1A CN202210069138A CN114209323B CN 114209323 B CN114209323 B CN 114209323B CN 202210069138 A CN202210069138 A CN 202210069138A CN 114209323 B CN114209323 B CN 114209323B
Authority
CN
China
Prior art keywords
space
emotion
data
time
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210069138.1A
Other languages
Chinese (zh)
Other versions
CN114209323A (en
Inventor
陈益强
翁伟宁
�谷洋
王记伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202210069138.1A priority Critical patent/CN114209323B/en
Publication of CN114209323A publication Critical patent/CN114209323A/en
Application granted granted Critical
Publication of CN114209323B publication Critical patent/CN114209323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7203Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7225Details of analog processing, e.g. isolation amplifier, gain or sensitivity adjustment, filtering, baseline or drift compensation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Psychiatry (AREA)
  • Animal Behavior & Ethology (AREA)
  • Signal Processing (AREA)
  • Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Public Health (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Psychology (AREA)
  • Educational Technology (AREA)
  • Developmental Disabilities (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Power Engineering (AREA)
  • Social Psychology (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The embodiment of the invention provides a method for identifying emotion and an emotion identification model based on brain electrical data, wherein the emotion identification model comprises the following steps: the space matrix construction module is used for generating a first space matrix according to the electroencephalogram signals of the user obtained by each time slice in the plurality of time slices to obtain a plurality of first space matrices; the space feature extraction module is used for calculating the attention weight of each row and each column of each first space matrix by using an attention mechanism, and obtaining a plurality of second space matrices according to the attention weight of each row and each column of each first space matrix; the space-time feature fusion module is used for extracting time sequence association features among the plurality of second space matrixes and obtaining a plurality of space-time characterization vectors according to the plurality of second space matrixes and the corresponding time sequence association features; and the emotion recognition module is used for determining the emotion of the user according to the plurality of space-time characterization vectors.

Description

Method for identifying emotion and emotion identification model based on electroencephalogram data
Technical Field
The invention relates to the field of physiological data mining, in particular to the field of psychological state detection, and more particularly relates to a method for identifying emotion and an emotion identification model based on electroencephalogram data.
Background
Physiological health detection based on wearable devices is an important development point in the medical industry and health field today, and different wearable devices (e.g., related devices such as health bracelets, smartwatches, blood pressure and blood glucose detection devices, etc.) are widely used for health management. The detection and management field for mental health still remains blank, and mental health is important health content except physiological health, which directly affects the emotional state and psychological state of people. The mental health detection comprises medical science, psychology, data analysis and other multi-field contents, and can realize technical support for application such as psychological monitoring, bad psychological state early warning and the like of users through technologies such as medical definition, physiological state analysis, various behavior signals, physiological signal calculation, emotion state prediction, detection and the like, so that the mental health detection becomes an important path and method for mental health detection.
Emotion recognition is important content of mental state detection, and emotion is a mental state generated by interaction between an individual and the outside. The multimodal physiological data can be used to calculate emotional states including physiological signals such as electroencephalogram signals, muscle electrical signals, skin resistivity, and behavioral signals such as behavior, expression, gestures, and the like. In the multi-modal data, the electroencephalogram signal becomes a primary method for calculating emotion due to the characteristics of difficult camouflage, direct relevance of emotion, easy collection and the like. The prior art only focuses on the temporal features (for example, patent application document with publication number CN112364697 a) or the spatial features (for example, patent application document with publication number CN112990008 a) of the electroencephalogram signals, resulting in low accuracy of emotion recognition.
Disclosure of Invention
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a method for identifying emotion and an emotion identification model based on electroencephalogram data.
The invention aims at realizing the following technical scheme:
According to a first aspect of the present invention, there is provided an emotion recognition model based on electroencephalogram data, comprising: the space matrix construction module is used for generating a first space matrix according to the electroencephalogram signals of the user obtained by each time slice in the plurality of time slices to obtain a plurality of first space matrices; the space feature extraction module is used for calculating the attention weight of each row and each column of each first space matrix by using an attention mechanism, and obtaining a plurality of second space matrices according to the attention weight of each row and each column of each first space matrix; the space-time feature fusion module is used for extracting time sequence association features among the plurality of second space matrixes and obtaining a plurality of space-time characterization vectors according to the plurality of second space matrixes and the corresponding time sequence association features; and the emotion recognition module is used for determining the emotion of the user according to the plurality of space-time characterization vectors.
In some embodiments of the present invention, the first spatial matrix is generated according to spatial distribution of a plurality of electrodes for acquiring the electroencephalogram signals after the electroencephalogram signals of the user of the corresponding time slice are subjected to data preprocessing, where the data preprocessing includes data filtering processing and/or data artifact removal processing and/or data baseline removal processing, and a value in the first spatial matrix corresponding to the corresponding time slice is a channel variance of the electroencephalogram signals acquired by the corresponding channel in the time slice after the data preprocessing.
In some embodiments of the present invention, the second spatial matrix is obtained by multiplying each data in the corresponding first spatial matrix by the attention weight corresponding to the row where the data is located, and multiplying each data in the corresponding first spatial matrix by the attention weight corresponding to the column where the data is located.
In some embodiments of the invention, the spatial feature extraction module comprises: a first fully-connected network module comprising a first fully-connected network, the first fully-connected network module configured to: inputting a spliced vector of the mean value of each row of data of the first space matrix into a first fully-connected network for processing to obtain the output of the first fully-connected network, and carrying out Softmax calculation on the output of the first fully-connected network to obtain the attention weight of each row of the first space matrix; and a second fully-connected network module comprising a second fully-connected network, the second fully-connected network module configured to: and inputting the spliced vector of the mean value of each column of data of the first space matrix into a second full-connection network for processing to obtain the output of the second full-connection network, and carrying out Softmax calculation on the output of the second full-connection network to obtain the attention weight of each column of the first space matrix.
In some embodiments of the present invention, the spatio-temporal feature fusion module includes a plurality of stacked encoding networks, and an input of each encoding network sequentially passes through a self-attention mechanism layer, a feedforward layer and a residual layer of the encoding network; the input of the first coding network is a plurality of space characterization sequences, each space characterization sequence is obtained by sequentially splicing data corresponding to each row in the corresponding second space matrix, the input of the subsequent coding network is a space-time characterization vector in the middle of the output of the previous coding network, and the last coding network outputs a final space-time characterization vector.
In some embodiments of the invention, the self-attention mechanism layer is a unidirectional self-attention mechanism layer, wherein the unidirectional self-attention mechanism layer is configured to: when attention relationships between the corresponding time slice sequences are calculated, the attention relationship between the current corresponding time slice sequence and the corresponding time slice sequence in the front direction and the attention relationship between the current corresponding time slice sequence and the attention relationship of the current corresponding time slice sequence are calculated, and the attention relationship between the current corresponding time slice sequence and the corresponding time slice sequence in the back direction is not calculated.
In some embodiments of the invention, the emotion recognition model is trained by: acquiring a plurality of training samples, wherein each training sample comprises electroencephalogram signals acquired by a plurality of time slices for experimenters and emotion labels corresponding to each time slice; and outputting the emotion corresponding to the experimenter in the corresponding time slice by utilizing the plurality of training samples to train the emotion recognition model, calculating a loss value according to the plurality of emotions and the corresponding emotion labels output by the corresponding training samples, and updating parameters of the space feature extraction module, the space-time feature fusion module and the emotion recognition module by utilizing the loss value.
According to a second aspect of the present invention there is provided a method of identifying emotion, the method comprising: acquiring electroencephalogram signals of a user acquired by electroencephalogram acquisition equipment in a plurality of time slices; inputting electroencephalogram signals of the user in a plurality of time slices based on the emotion recognition model in the first aspect, and outputting the emotion of the user in each time slice.
In some embodiments of the invention, the emotion of each time slice is an instantaneous emotion of the user, the method further comprising: the long-term emotion of the user is determined in a soft voting manner based on the instant emotions of the user at a plurality of time slices and the probabilities corresponding to the instant emotions.
According to a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; and a memory, wherein the memory is for storing executable instructions; the one or more processors are configured to implement the steps of the method of the second aspect via execution of the executable instructions.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
fig. 1 is a schematic block diagram of an emotion recognition model based on electroencephalogram data according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of a process of emotion recognition based on an emotion recognition model of brain electrical data according to an embodiment of the present invention;
FIG. 3 shows the arrangement of the electroencephalogram electrodes and the corresponding space electrode matrix in the international standard 10-20 system;
FIG. 4 is a schematic diagram of data processing of an emotion recognition model based on electroencephalogram data according to an embodiment of the present invention;
Fig. 5 is a schematic diagram of unidirectional self-attention mechanism layers in an electroencephalogram data-based emotion recognition model according to an embodiment of the present invention.
Detailed Description
For the purpose of making the technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by way of specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As mentioned in the background section, the prior art only focuses on the temporal or spatial features of the electroencephalogram signal, resulting in a poor accuracy of emotion recognition. According to the invention, a first space matrix is constructed according to the electroencephalogram signals, the attention weight of each row and each column is calculated based on the data of the first space matrix, and the first space matrix is used for calculating a second space matrix, so that the space correlation degree of the electroencephalogram signals of the current individual can be reflected by the attention weight of each row and each column according to different individual differences, and a second space matrix capable of reflecting the emotion of the current individual can be obtained; and the invention also extracts time sequence association features according to the second space matrix, and obtains a plurality of time-space characterization vectors according to the second space matrices and the corresponding time sequence association features, so as to obtain the time-space characterization vectors under the condition of considering the space association features and the time association features, thereby more accurately identifying the emotion of the user.
According to an embodiment of the present invention, referring to fig. 1, the present invention provides an emotion recognition model based on brain electrical data, including: a space matrix construction module 10, a space feature extraction module 20, a space-time feature fusion module 30, and an emotion recognition module 40. In order to extract the time sequence correlation characteristics, input data is formed by electroencephalogram signals corresponding to a plurality of preset time slices, and the input data is sequentially processed by the space matrix construction module 10, the space characteristic extraction module 20, the space-time characteristic fusion module 30 and the emotion recognition module 40 to obtain the emotion corresponding to each time slice. An exemplary process for acquiring and processing an electroencephalogram signal (also referred to as electroencephalogram data) is shown in fig. 2, and includes: multidimensional data acquisition is carried out by utilizing a plurality of electrodes in the electroencephalogram acquisition equipment to obtain electroencephalogram signals; the first space matrix is constructed according to the space arrangement of the electrodes by the space matrix construction module 10, the second space matrix is calculated based on the cross-attention mechanism by the space feature extraction module 20 (corresponding to the second space matrix being obtained from the attention weight of each row and each column of each first space matrix), the time-series correlation features are extracted by the space-time fusion module 30 and the space-time characterization vector is calculated from the corresponding second space matrix, and the emotion of the user is recognized from the space-time characterization vector by the emotion recognition module 40. For example, -1, 0, 1 negative, neutral, positive respectively.
For easy understanding, the process of the electroencephalogram acquisition equipment for acquiring the electroencephalogram signals is introduced first. The electroencephalogram acquisition device has various types, and generally, the invention is applicable to processing as long as electroencephalogram signals acquired by the electroencephalogram acquisition device are provided with a plurality of signal acquisition points (electrodes or other types of electroencephalogram acquisition sensors) according to specific spatial distribution. As an example, an illustrative electroencephalogram electrode arrangement (corresponding to the spatial distribution of electrodes) of an electroencephalogram acquisition apparatus (according to the international standard 10-20 system) is given, the electrode for sampling an electroencephalogram signal is a silver chloride electrode, or the electrode includes a member made of silver chloride and felt. The electrodes are arranged in a spatial distribution mode specified by the specification of the international standard 10-20 system after being soaked by physiological saline and are contacted with the scalp of a user, so that all the electrodes are placed on a channel defined by the standard and real-time electroencephalogram signals are recorded. The letters and numbers within the small circles in fig. 3a indicate electrode names, where the letters represent the meaning: f: frontal lobe (Frontal lobe), fp: forehead lobe (Frontal poles), T: temporal lobe (Temporal lobes), O: occipital leaf (Occipital lobes), P: top leaf (Parietal lobes), C: center (Central) or sensorimotor cortex (Sensorimotor cortex), Z: zero (Zero) is the center of the left and right brains. The meaning of the numbers is: the corresponding electrodes are distinguished by different numbers, wherein the corresponding area of the left brain adopts singular numbers, and the corresponding area of the right brain adopts even numbers. For more details, the electrode names may be referred to the description of the published international standard 10-20 system.
According to one embodiment of the present invention, the spatial matrix construction module 10 is configured to generate a first spatial matrix according to the electroencephalogram signals of the user obtained in each of the plurality of time slices, so as to obtain a plurality of first spatial matrices.
In order to reduce the influence of some irrelevant or weakly relevant factors on emotion recognition, data preprocessing is needed to be performed on the electroencephalogram signals, and according to one embodiment of the invention, the first space matrix is generated according to the spatial distribution of a plurality of electrodes for acquiring the electroencephalogram signals after the electroencephalogram signals of the user of corresponding time slices are subjected to the data preprocessing, wherein the data preprocessing comprises data filtering processing, data artifact removal processing and data deglittering processing. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the irrelevant signals comprise noise data, emotion irrelevant data and an environmental baseline which causes serious interference to the electroencephalogram signals, and after the data filtering, data artifact removal and baseline removal processing are adopted, the influence of irrelevant or weakly relevant factors on emotion recognition can be better removed, and the accuracy of subsequent emotion recognition is improved.
According to one embodiment of the invention, the data filtering process filters the low frequency and high frequency data in the electroencephalogram signal with a band pass filter through a data filtering algorithm, and retains the data with frequencies between a first predetermined frequency and a second predetermined frequency. For example, the data filtering algorithm filters low-frequency and high-frequency data by a band-pass filter, takes a 0.5HZ-70HZ frequency band containing a large amount of emotion related information as a reserved characteristic interval, and the rest frequency bands contain a large amount of irrelevant low-pass and high-pass noise to remove the irrelevant low-pass and high-pass noise so as to reduce the influence of the irrelevant noise data on emotion recognition.
According to one embodiment of the invention, the data de-artifact processing includes channel normalization processing of the electroencephalogram signal in units of acquisition channels. The channel normalization processing can reduce the absolute value of the electroencephalogram signals in the channel and reduce the influence of high zigzag artifacts on the low waveform related electroencephalogram data, and the calculation mode of the schematic data artifact removal processing is as follows:
Wherein CR i represents the i-th sampling point of the corresponding channel, CR min represents the minimum value in the channel, CR max represents the maximum value in the channel, and C i represents the channel value of the i-th sampling point after normalization calculation. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the data de-artifact processing aims to reduce interference information in the electroencephalogram signal, for example, reduce the influence of the electromyogram signal and the electrooculogram signal on emotion recognition, thereby improving the accuracy of emotion recognition.
According to one embodiment of the invention, the data de-baselining process is to subtract the baseline signal mean value calculated by the baseline signals acquired by the channels corresponding to the plurality of the brain electrical signals from the brain electrical signals corresponding to the corresponding time slices. The method for removing the baselines for reducing the environmental influence is to divide continuous electroencephalogram signals into experimental signals (electroencephalogram signals corresponding to corresponding time slices) and baseline signals and make differences so as to subtract the influence of the environmental factors. Because of the particularity of the electroencephalogram signals and the particularity of the emotion states, different testing environment factors can cause the difference of the whole electroencephalogram signals, and different initial emotion states can influence the overall emotion potential change, so that the effective degree of the overall characteristic representation and the emotion recognition precision are reduced. Thus, with the degranulation process, the impact of environmental factors in emotion recognition can be reduced.
Because the conditions of the baseline signals which can be referred by the electroencephalogram signals of the user and the electroencephalogram signals in the training sample which are acquired in real time are different, corresponding de-baselined processing can be respectively set.
In some embodiments of the present invention, for the electroencephalogram signals of the user acquired in real time, the baseline signal average value of each channel corresponding to the pre-stored electroencephalogram acquisition device or the baseline signal average value calculated by the baseline signal of each channel acquired by the electroencephalogram acquisition device within a predetermined period before the electroencephalogram acquisition device contacts the scalp of the user may be used to perform data de-baselining processing. The electroencephalogram acquisition equipment is a Emotiv Epoc + based multichannel electroencephalogram acquisition sensor, for example.
According to one embodiment of the present invention, for a training sample, data in the training sample may be divided into baseline signals, electroencephalogram signals for predicting emotion, and time slices according to the following formula:
wherein XS is an electroencephalogram signal sample in a corresponding time period, XB is a baseline signal in the time period, and comprises XE is the electroencephalogram signal (belonging to the experimental signal required for identifying emotion) in the time period, k1 represents the number of time slices of the baseline signal, and k is the total number of time slices. The baseline data is consistent with the divided time slice lengths of the experimental data for de-baselined calculations. Preferably, the calculation formula of the data de-baselining process is as follows:
i∈(k1+1,k)。
Wherein XB j represents a baseline signal corresponding to the jth time slice, and X' i represents an electroencephalogram signal before the de-baselined processing of the ith time slice. The formula is used for weakening the influence of environmental factors in the electroencephalogram signals by calculating the baseline signal mean value of each channel, so that more accurate training data can be provided, and the prediction accuracy of the model is improved.
According to one embodiment of the invention, the first space matrix is generated according to the spatial distribution of a plurality of electrodes for acquiring the electroencephalogram signals after the electroencephalogram signals of the users of the corresponding time slices are subjected to data preprocessing, wherein the data preprocessing comprises data filtering processing, data artifact removing processing and data baseline removing processing, and the numerical value in the first space matrix corresponding to the corresponding time slices is the channel variance of the electroencephalogram signals acquired by the corresponding channels in the time slices after the data preprocessing. For example, assuming that the aforementioned international standard 10-20 system is adopted, the comparison relationship between the electroencephalogram electrode arrangement situation and the spatial electrode matrix (spatial matrix) can be referred to fig. 3b, the plurality of spatial matrices corresponding to the plurality of time slices can be referred to fig. 4, the international standard 10-20 system can generate a 9x9 spatial matrix according to the spatial arrangement of the electrodes, each position in the spatial matrix is filled based on electroencephalogram data collected by the corresponding spatial electrode, for example, the channel variance of the electroencephalogram signal of the corresponding channel of each preprocessed electrode is taken as a channel characteristic and is filled into the corresponding position of the spatial matrix according to the spatial distribution of the electrode, and the position without the corresponding electrode in the first spatial matrix is directly set to zero to sparse the matrix.
According to one embodiment of the present invention, the spatial feature extraction module 20 is configured to calculate, for each first spatial matrix of the plurality of first spatial matrices, an attention weight of each row and each column by using an attention mechanism, and obtain a plurality of second spatial matrices according to the attention weights of each row and each column of each first spatial matrix. Since a plurality of first space matrices are used as inputs, a second space matrix is generated for each first space matrix, and a plurality of second space matrices are obtained. Preferably, the second space matrix is obtained by multiplying each data in the corresponding first space matrix by the attention weight corresponding to the row where the data is located and multiplying each data in the corresponding first space matrix by the attention weight corresponding to the column where the data is located.
In order to analyze the spatial characteristics of each electrode in the overall spatial matrix, a channel attention analysis is required for all electrodes to obtain their relevancy weights, and according to one embodiment of the present invention, the spatial characteristics extraction module 20 includes: a first fully-connected network module comprising a first fully-connected network, the first fully-connected network module configured to: inputting a spliced vector of the mean value of each row of data of the first space matrix into a first fully-connected network for processing to obtain the output of the first fully-connected network, and carrying out Softmax calculation on the output of the first fully-connected network to obtain the attention weight of each row in the first space matrix; and a second fully-connected network module comprising a second fully-connected network, the second fully-connected network module configured to: and inputting the spliced vector of the mean value of each column of data of the first space matrix into a second full-connection network for processing to obtain the output of the second full-connection network, and carrying out Softmax calculation on the output of the second full-connection network to obtain the attention weight of each column in the first space matrix.
According to one embodiment of the invention, the attention weight of each row is calculated in a two-layer fully connected network, the calculation formula of which is as follows:
wl=softmax(w4×tanh(w3×l+b3)+b4);
Wherein w 3 is the weight parameter of the first layer in the fully connected network, w 4 is the weight parameter of the second layer in the fully connected network, b 3 is the bias of the first layer in the fully connected network, b 4 is the bias of the second layer in the fully connected network, tanh is the hyperbolic tangent activation function, softmax is the exponential probability activation function, and l is the row mean vector (the splice vector corresponding to the mean of each row of data) of the first spatial matrix.
Preferably, the attention weight of each column is calculated by a two-layer fully connected network, and the calculation formula is as follows:
wc=softmax(W4×tanh(W3×c+B3)+B4)
Wherein W 3 is a weight parameter of a first layer in the fully connected network, W 4 is a weight parameter of a second layer in the fully connected network, B 3 is a bias of the first layer in the fully connected network, B 4 is a bias of the second layer in the fully connected network, tanh is a hyperbolic tangent activation function, softmax is an exponential probability activation function, and c is a column mean vector (a splice vector corresponding to a mean of each column of data) of the first spatial matrix.
The calculation result w l includes the generated attention weight corresponding to each row, and w c includes the generated attention weight corresponding to each column. The filling data of the corresponding position in the first space matrix is multiplied by the attention weight of each row and column in which the filling data is positioned, namely the second space matrix for the electroencephalogram electrode is obtained, and the calculation formula is shown as follows
Wherein v i,j is the data of the ith row and the jth column of the second space matrix obtained after calculation,Attention weight representing the ith row where the data is located,/>The attention weight of the J-th column where the data is located is represented by S i,j, the data (originally filled data) of the I-th row and the J-th column of the first space matrix, I represents the row number of the space matrix, and J represents the column number of the space matrix. For example, in the case of the international standard 10-20 system, i= 9,J = 9,i e (0, 9), j e (0, 9) indicates that the size of the spatial matrix (the first spatial matrix and/or the second spatial matrix) is 10×10. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the invention separates the rows and columns, calculates the cross attention value (the attention weight corresponding to each row and column) according to the rows and columns as the minimum unit, calculates the average value of each row in the brain region of the rows, and uses the average value as the line characterization to calculate the attention weight; the column brain region calculates the mean value of each column, and the mean values are used as column characterization to calculate the attention weight. The second space matrix generated by each time slice contains channel similarity and associated features, and the obtained space features are more accurate, so that the accuracy of emotion recognition is improved.
According to one embodiment of the present invention, the space-time feature fusion module 30 is configured to extract time-sequence correlation features among the plurality of second space matrices, and obtain a plurality of space-time characterization vectors according to the plurality of second space matrices and the corresponding time-sequence correlation features. According to one embodiment of the present invention, the spatio-temporal feature fusion module 30 includes a plurality of stacked encoding networks, each encoding network having its input sequentially processed through a self-attention mechanism layer, a feed-forward layer, and a residual layer of the encoding network; the input of the first coding network is a plurality of space characterization sequences, each space characterization sequence is obtained by sequentially splicing data corresponding to each row in the corresponding second space matrix, the input of the subsequent coding network is a space-time characterization vector in the middle of the output of the previous coding network, and the last coding network outputs a final space-time characterization vector. The technical scheme of the embodiment at least can realize the following beneficial technical effects: according to the invention, through the processing of the self-attention mechanism layer, the feedforward layer and the residual layer, time sequence features can be better injected into the space features, so that the accuracy of the expression of the subsequent feature vectors is optimized, and the accuracy of emotion recognition is improved.
According to one embodiment of the invention, multiple coding networks may be adapted based on a transducer model. The transducer model (time sequence data mining network) comprises two large modules, namely an encoder and a decoder, wherein the encoder is used for calculating time correlation characteristics of each space characterization sequence and generating time characterization vectors, and the decoder is used for interpreting the generated intermediate time sequence vectors. The decoding part in the transducer network is not used in the invention, so that only the encoder of the transducer model can be modified, and the encoder is explained in detail below. According to one embodiment of the invention, the modified transducer model includes a plurality of encoding networks, without the decoder of the original transducer model. Wherein each coding network comprises a self-attention mechanism layer (or called self-attention calculation module), a feedforward layer (or called full connection module) and a residual layer (or called residual connection module) which are connected in sequence. The self-attention mechanism layer, the feedforward layer and the residual layer are sequentially connected to form a coding network. Referring to fig. 4, a plurality of coding networks (xN, representing a stack of N coding networks, e.g., N is 4, 6, 8, etc.) are cascaded to form a depth encoder, i.e., a modified transducer model. The input and output data formats of the coding network of the improved transducer model are consistent, the self-attention mechanism layer, the feedforward layer and the residual layer calculate the time dependence relationship of fine granularity and form corresponding characterization, gradient and distribution are maintained in the deep network, and time sequence feature mining and spatial feature fusion are efficiently realized with fewer parameters.
According to one embodiment of the present invention, in the self-attention mechanism layer, the plurality of second spatio-temporal matrices are converted into the spatial characterization sequences as inputs of the transducer network according to the sequence of the spatial characterization sequences in the time dimension, and the self-attention calculation module needs to calculate the relevance between the inputs, and for each input, three matrices are adopted to calculate Q, K, V vectors in the relevance calculation, wherein the calculation formula is as follows:
Qt=WQ×Xt
Kt=WK×Xt
Vt=WV×Xt
Where X t represents the t-th input sequence (it should be understood that in the case of multiple coding networks, the sequence here refers to the spatial token sequence corresponding to time slice t, in the case of the first coding network, and the sequence here refers to the space-time token vector output by its previous coding network for time slice t, in the case of the subsequent coding network), while W Q represents the parameter matrix used to generate the Q vector, W K represents the parameter matrix used to generate the K vector, and W V represents the parameter matrix used to generate the V vector. Q t、Kt、Vt is the query vector, key vector and value vector corresponding to the time slice t in the relevance calculation, and x is the standard multiplication operation between the matrixes. Wherein the query vector is used for similarity calculation of the current sequence and the rest sequence, and the key vector represents an index of a matrix and is used for point multiplication calculation of the query vector to form a similarity measure, and the value vector is used for generating a space characterization sequence characterization vector in the similarity score, and the calculation formula of the similarity score is as follows:
i,j∈(0,n);
Where Score ij represents the similarity Score of the sequence corresponding to time slice i to the sequence corresponding to time slice j, n represents the total number of sequences, dK j is the key vector dimension of the sequence corresponding to time slice j, Q i represents the query vector of the sequence corresponding to time slice i, and K j represents the key vector of the sequence corresponding to time slice j.
According to one embodiment of the invention, the intermediate timing vectors are calculated by the self-attention mechanism layer in the temporal order of the spatial characterization sequence. According to one embodiment of the invention, the self-attention mechanism layer calculates an intermediate timing vector from the similarity score and the value vector. Optionally, the self-attention mechanism layer generates the intermediate timing vector by the following calculation method:
Where XN t represents the intermediate timing vector (intermediate timing representation) of the sequence corresponding to time slice t, score t,t represents the similarity Score between the sequence corresponding to time slice t and itself, score t,j represents the similarity Score between the sequences corresponding to time slice t and time slice j, e represents the natural logarithm, and V t is the value vector of the sequence corresponding to time slice t. The formula calculates the similarity index duty ratio with different sequences by adopting a softmax activation function and generates weights, and takes the value vector of the weighted sum as an intermediate time sequence vector after similarity is calculated with all the other sequences.
According to one embodiment of the present invention, in order to reduce the amount of computation, the modified encoding network of the present invention may change the multi-headed attention mechanism in the encoder of the original transform model to a single-headed attention mechanism. Preferably, in the case of changing to a single-head attention mechanism, the specification of the weight parameters in the encoder of the transducer model can be adjusted to adapt to the spatial token sequences, so that the intermediate timing vectors obtained through the self-attention mechanism layer remain unchanged in data format relative to the spatial token sequences, i.e. each spatial token sequence generates an intermediate timing vector with identical format.
According to one embodiment of the invention, the feedforward layer is configured to perform a nonlinear transformation on the intermediate timing vector output from the attention mechanism layer to generate an intermediate timing vector. Preferably, the calculation formula for generating the intermediate timing vector by the feedforward layer is as follows:
gt=W2(Relu(W1·XNt+b1));
Wherein g t represents an intermediate timing vector generated after passing through the feedforward layer corresponding to the time slice t, W 1 represents a weight parameter of a first layer of the feedforward layer, W 2 represents a weight parameter of a second layer of the feedforward layer, b 1 represents a bias of the first layer of the feedforward layer, XN t represents an intermediate timing vector generated through the self-attention mechanism layer corresponding to the time slice t, and Relu is a nonlinear activation function. Relu are used to improve the nonlinear characterization capabilities of the algorithm.
According to one embodiment of the invention, a residual layer (residual connection layer) is used for carrying out residual connection and layer regularization on the input of the coding network to which it belongs and the output of the feedforward layer of the coding network to which it belongs, and outputting a space-time characterization vector. It should be appreciated that the input to the first encoding network is a plurality of spatial token sequences and the input to the subsequent encoding network is an intermediate spatial-temporal token vector of its previous encoding network output. According to the invention, the sequence operation gradient can be improved by performing layer jump connection (residual connection) and layer regularization on the data through the residual layer. Preferably, the calculation formula of the residual layer is as follows:
Rt=LayerNorm(gt+XNt);
Where R t is an output of the residual layer to the time slice t (also an output of the coding network to which the residual layer belongs to the time slice t), g t is an output of the feedforward layer of the coding network to which the residual layer belongs to the time slice t, XN t is an input of the coding network to which the residual layer belongs, and LayerNorm is layer regularization. Layer regularization is, for example, regularization of the mean of variance over all neurons of the layer. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the residual layer can prevent training difficulty caused by the disappearance of the gradient of the algorithm, and maintain the gradient and data distribution of the deep network
In the above modified transducer model, the self-attention mechanism layer is not modified, i.e., the self-attention mechanism layer still adopts a bidirectional self-attention mechanism layer. In the bi-directional self-attention mechanism layer, attention relationships between the current time slice t and time slices before and after the current time slice t are considered, but experiments and analysis by the inventor find that, as the current emotion of the user is usually only related to the previously experienced event or the previously generated emotion, the current emotion is not related to or is related to the subsequent emotion only weakly. According to one embodiment of the invention, the self-attention mechanism layer is preferably a unidirectional self-attention mechanism layer, wherein the unidirectional self-attention mechanism layer is configured to: when attention relations among the corresponding sequences of the time slices are obtained, attention relations among the corresponding sequences of the current time slices and the corresponding sequences of the time slices in the forward direction are calculated, and attention relations among the corresponding sequences of the current time slices and the attention relations of the current time slices are calculated; without calculating the attention relationship of the current time slice correspondence sequence to the backward time slice correspondence sequence. Thus, the dependency on the backward time slices when calculating features on the current time slice in time sequence is masked, i.e. all sequences only calculate the dependency with their forward sequences when calculating the attention relationship. Referring to fig. 5, Q t、Kt、Vt represents a query vector, a key vector, and a value vector corresponding to a time slice t, respectively, and Q t+1、Kt+1、Vt+1 represents a query vector, a key vector, and a value vector corresponding to a time slice t+1, respectively. The dashed line portion of fig. 5 represents the masked data flow. In the case of the bidirectional self-attention mechanism layer, the dotted line is a solid line, and the attention relationship between the sequence of the current time slice and the sequence of the time slices following the current time slice is considered. For further explanation, the following equations are used to explain that if a bi-directional self-attention mechanism layer is used, the self-attention mechanism layer generates the intermediate timing vector by the following calculation method:
Namely: the attention correlation of the time slice t to all vectors is calculated.
If a unidirectional self-attention mechanism layer is adopted, the calculation mode of generating an intermediate time sequence vector by the self-attention mechanism layer is as follows:
from the difference in the formulas, it can be seen that only the attention relationship between a time slice t and its preceding time slice (the forward time slice) is calculated, without taking into account the attention relationship between time slices following the time slice t (the backward time slice) (i.e. the subsequent time slices are masked). For example, assuming that 10 time slices are input at a time, assuming that the current time slice is time slice 4, the attention relationship between the time slice 4 corresponding sequence and the time slice 0-4 corresponding sequence is calculated, and the attention relationship between the time slice 4 corresponding sequence and the time slice 5-9 corresponding sequence is not calculated.
The technical scheme of the embodiment at least can realize the following beneficial technical effects: since emotion recognition is different from a sentence translation scene, in the emotion recognition scene, the influence of the previous emotion on the current emotion is stronger, and the influence of the subsequent emotion is relatively weaker, so that the embodiment is changed into a unidirectional self-attention mechanism layer, thereby improving the accuracy of the model on emotion recognition.
According to one embodiment of the present invention, it should be understood that, in addition to modification with the existing transducer model, a corresponding model structure may be constructed directly with reference to the description of the embodiment of the present invention to implement an emotion recognition model based on electroencephalogram data.
According to one embodiment of the invention, training data may employ SEED data sets and/or DEAP data sets. In the training data, emotion is induced by video stimulus, the video contains relevant emotion labels, and after the video is finished, a user is subjected to questionnaire feedback of real emotion, and a sample with consistent video labels and questionnaire labels is used as a feasible training sample. Training data in the form of data matrices and corresponding labels constitutes a dictionary store that is stored as: s i=(Xi,li),Xi∈Rc×((bt+dt)*r),li E { -1,0,1}, wherein S i is a storage data structure of an ith sample, X i is a data matrix with the format of [ c, ((bt+dt) r) ] and c is the total number of electrodes, r is the signal sampling rate, bt represents the recording time before emotion induction, dt represents the recording time in the emotion induction process, bt corresponds to the baseline time in the environment and is used for recording baseline signals, dt corresponds to the time for inducing emotion to record brain electrical signals, and l i is the label of the sample i. The labels are respectively represented by-1, 0 and 1, namely three types of negative emotion, neutral emotion and positive emotion. The data acquisition method as shown above stores all samples in a matrix, providing a data basis for model training.
According to one embodiment of the invention, emotion recognition module 40 is configured to determine the emotion of the user based on a plurality of space-time characterization vectors. According to one embodiment of the invention, emotion recognition module 40 includes a multi-layer fully connected network. For example, emotion recognition module 40 includes a two-layer fully connected network with the following calculation formula:
Et=softmax(w6×Relu(w5×Rt+b5)+b6);
Wherein w 5、w6 is the weight parameter of the first layer and the second layer of the fully connected network of the emotion recognition module, b 5、b6 is the bias of the first layer and the second layer of the fully connected network of the emotion recognition module, relu is the nonlinear activation function, and softmax is the probability activation function of the output layer of the fully connected network to generate each emotion recognition probability. E t denotes the probability that the identified user belongs to the corresponding emotion at time slice t.
According to one embodiment of the invention, the emotion recognition model is trained in the following way: acquiring a plurality of training samples, wherein each training sample comprises electroencephalogram signals acquired by a plurality of time slices for experimenters and emotion labels corresponding to each time slice; and outputting the emotion corresponding to the experimenter in the corresponding time slice by utilizing the plurality of training samples to train the emotion recognition model, calculating a loss value according to the plurality of emotions and the corresponding emotion labels output by the corresponding training samples, and updating the parameters of the space feature extraction module 20, the space-time feature fusion module 30 and the emotion recognition module 40 by utilizing the loss value. According to one embodiment of the invention, the loss value is calculated as a cross entropy loss function during model training. Namely: and taking the cross entropy loss function as an optimization target training model. The cross entropy loss function is calculated as follows:
Wherein E t represents the probability that the identified user belongs to the corresponding emotion in time slice T, Y t,j represents the emotion category of the user indicated in the tag in time slice T, T represents the number of time slices currently used for updating the model parameters, and K represents the category number of the emotion. For example, the number of categories of emotions is 3, negative, neutral and positive, respectively. The original labels represent negative, neutral and positive emotions with-1, 0,1, respectively. And calculating a loss value according to the emotion recognition result of the experimenter in the corresponding time slice and the true emotion label by using the cross entropy loss function. It should be noted that, due to the computation of the cross entropy loss, the original label needs to be converted into the multi-channel label denoted by 0,1, which is required for the cross entropy loss. Namely: y t,j is the One-hot vector (One-hot) corresponding to its tag. For example, when the multi-channel tags are arranged in sequence corresponding to the channels of negative emotion, neutral emotion and positive emotion, the-1 in the original tag needs to be converted into 1, 0 and 0;0 needs to be converted into 0,1 and 0;1 requires a transition of 0,1 to calculate the loss value for the multi-class cross entropy loss function.
According to an embodiment of the present invention, there is also provided a method of recognizing emotion, including: the electroencephalogram signals of the user in a plurality of time slices are acquired, the emotion recognition model based on the corresponding embodiment is input, and the emotion of the user in each time slice is output. According to one embodiment of the invention, the emotion of each time slice is the instantaneous emotion of the user, the method further comprising: the long-term emotion of the user is determined in a soft voting manner based on the instant emotions of the user at a plurality of time slices and the probabilities corresponding to the instant emotions. The instant emotion (or short-term emotion) recognition is the emotion recognition result corresponding to each time slice, and the long-term emotion prediction is generated through short-term emotion soft voting in a sample, namely, most of the short-term emotion in the sample is taken as the long-term emotion (or evoked emotion) of the sample, or the emotion with the highest average probability in the short-term emotion in the sample is taken as the long-term emotion of the sample.
According to one embodiment of the invention, the training of the emotion recognition model and the process for recognizing emotion comprises the steps of:
Step S1: the model training adopts a labeled multichannel electroencephalogram signal and a SEED and DEAP electroencephalogram reference data set which are sampled by an electroencephalogram acquisition device (based on Emotiv Epoc + multichannel electroencephalogram acquisition sensors) under the condition of emotion induction as training data. The original data needs to be subjected to noise reduction, artifact removal and environmental baseline influence removal. The data is noise-reduced, non-brain electricity and emotion irrelevant part signals in the signals are filtered, the interference caused by the acquisition of the eye electricity, the myoelectricity and the respiratory signals in the sampling process is reduced by an artifact removing algorithm, and the instability of environmental factors (such as air temperature, humidity, body temperature and the like) in the data in the signals is removed by baseline removal. The data preprocessing is completed before the model operation, and the data is divided into time slices according to the time sequence and waits for the model operation.
Step S2: and the emotion recognition model reads in the brain electrical data corresponding to each time slice in the training sample under the condition of maintaining the time sequence of the data. The training accuracy is ensured. Firstly, realizing spatial characterization of data, taking a specified number of time slices in a certain time to form samples according to a time dimension, wherein each time slice is a multi-channel short-time sampling matrix, filling a space matrix (corresponding to a first space matrix) according to the spatial distribution of electrode channels, calculating and generating an attention weight matrix in each dimension of the matrix by a cross attention mechanism, and taking a weighted space matrix (corresponding to a second space matrix) as signal spatial characterization. According to the method, a weighted space matrix is generated through space association degree calculation, wherein the weighted space matrix comprises activation states of electrodes and brain regions induced by different emotions, the difference and the similarity of individuals in emotion induction can be intuitively reflected, and analysis and recording of the difference of the emotion cognition ability of a user are realized.
Step S3: training emotion recognition models by using training samples, outputting emotions corresponding to the experimenters in the training samples in corresponding time slices, calculating loss values according to the emotions and the corresponding emotion labels output by the corresponding training samples, and updating parameters of a space feature extraction module, a space-time feature fusion module and the emotion recognition module by using the loss values, wherein the loss values are calculated by adopting a cross entropy loss function, and iterating the steps S2 and S3 until the models converge.
Step S4: the trained emotion recognition model recognizes transient emotion over a short period of time and persistent emotional states over a long period of time (corresponding to long-term emotion) from historical emotional conditions. Aiming at the real-time emotion monitoring of a user, individual brain electrical signals are monitored in real time, emotion data are generated in a time window, and the time window updates the data in a time dimension to realize real-time instantaneous emotion and/or continuous emotion state recognition.
To verify the effect of the present invention, the inventors conducted experiments based on SEED dataset (SJTU Emotion EEG DATASET).
Comparison of technical scheme 1: the existing CASCADE CNN model adopts a cascaded Convolutional Neural Network (CNN), uses about 12 convolutional layers, and only focuses on the spatial characteristics of data; the average recognition accuracy of the emotion is 69.2243%;
Comparison of technical scheme 2: in the existing CASCADE LSTM model, a cascaded long-short-term memory network (LSTM) is used, and only the time characteristics of data are concerned; the average recognition accuracy of the emotion is 75.4227%.
Embodiment 1 of the present invention: an emotion recognition model based on brain electrical data adopts 6 stacked coding networks, wherein a unidirectional self-attention mechanism layer is adopted;
embodiment 2 of the present invention: an emotion recognition model based on brain electrical data adopts 6 stacked coding networks, wherein a bidirectional self-attention mechanism layer is adopted.
The recognition accuracy and average recognition accuracy of the emotion of the different users according to the embodiments of the present invention are shown in the following table, and it can be seen that both embodiment 1 and embodiment 2 of the present invention are superior to those of the above comparative embodiments 1 and 2. And, in the case of adopting a unidirectional self-attention mechanism layer, the recognition effect of the model is relatively better. In general, the invention adopts the lightweight neural network model for training, improves the generalization capability and the universality effect of the model, prevents the overfitting in the training process, and ensures the high-precision emotion recognition under the condition of crossing users.
It should be noted that, although the steps are described above in a specific order, it is not meant to necessarily be performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order, as long as the required functions are achieved.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (9)

1. An emotion recognition model based on electroencephalogram data, comprising:
the space matrix construction module is used for generating a first space matrix according to the electroencephalogram signals of the user obtained by each time slice in the plurality of time slices to obtain a plurality of first space matrices;
The spatial feature extraction module is configured to calculate an attention weight of each row and each column for each first spatial matrix in the plurality of first spatial matrices by using an attention mechanism, and obtain a plurality of second spatial matrices according to the attention weight of each row and each column of each first spatial matrix, where the second spatial matrices are obtained by multiplying each data in the corresponding first spatial matrix by the attention weight corresponding to the row where the data is located and multiplying each data in the corresponding first spatial matrix by the attention weight corresponding to the column where the data is located, where the spatial feature extraction module includes:
a first fully-connected network module comprising a first fully-connected network, the first fully-connected network module configured to: inputting a spliced vector of the mean value of each row of data of the first space matrix into a first fully-connected network for processing to obtain the output of the first fully-connected network, and carrying out Softmax calculation on the output of the first fully-connected network to obtain the attention weight of each row in the first space matrix; and
A second fully-connected network module comprising a second fully-connected network, the second fully-connected network module configured to: inputting a spliced vector of the mean value of each column of data of the first space matrix into a second fully-connected network for processing to obtain the output of the second fully-connected network, and carrying out Softmax calculation on the output of the second fully-connected network to obtain the attention weight of each column in the first space matrix;
the space-time feature fusion module is used for extracting time sequence association features among the plurality of second space matrixes and obtaining a plurality of space-time characterization vectors according to the plurality of second space matrixes and the corresponding time sequence association features;
and the emotion recognition module is used for determining the emotion of the user according to the plurality of space-time characterization vectors.
2. The emotion recognition model of claim 1, wherein the first spatial matrix is generated according to spatial distribution of a plurality of electrodes for acquiring electroencephalograms after data preprocessing of the electroencephalograms of the user of the corresponding time slice, wherein the data preprocessing includes data filtering processing and/or data artifact removal processing and/or data base removal processing, and a numerical value in the first spatial matrix corresponding to the corresponding time slice is a channel variance of the electroencephalograms acquired by the corresponding channel in the time slice after the data preprocessing.
3. The emotion recognition model of claim 1, wherein the spatiotemporal feature fusion module comprises a plurality of stacked encoding networks, the input of each encoding network being sequentially processed by a self-attention mechanism layer, a feed-forward layer, and a residual layer of the encoding network;
The input of the first coding network is a plurality of space characterization sequences, each space characterization sequence is obtained by sequentially splicing data corresponding to each row in the corresponding second space matrix, the input of the subsequent coding network is a space-time characterization vector in the middle of the output of the previous coding network, and the last coding network outputs a final space-time characterization vector.
4. A model of emotion recognition as claimed in claim 3, wherein the self-attention mechanism layer is a unidirectional self-attention mechanism layer, wherein unidirectional self-attention mechanism layer is configured to: when attention relationships between the corresponding time slice sequences are calculated, the attention relationship between the current corresponding time slice sequence and the corresponding time slice sequence in the front direction and the attention relationship between the current corresponding time slice sequence and the attention relationship of the current corresponding time slice sequence are calculated, and the attention relationship between the current corresponding time slice sequence and the corresponding time slice sequence in the back direction is not calculated.
5. The emotion recognition model of any one of claims 1-4, wherein the emotion recognition model is trained by:
Acquiring a plurality of training samples, wherein each training sample comprises electroencephalogram signals acquired by a plurality of time slices for experimenters and emotion labels corresponding to each time slice;
And outputting the emotion corresponding to the experimenter in the corresponding time slice by utilizing the plurality of training samples to train the emotion recognition model, calculating a loss value according to the plurality of emotions and the corresponding emotion labels output by the corresponding training samples, and updating parameters of the space feature extraction module, the space-time feature fusion module and the emotion recognition module by utilizing the loss value.
6. A method of identifying emotion, the method comprising:
Acquiring electroencephalogram signals of a user acquired by electroencephalogram acquisition equipment in a plurality of time slices;
Inputting electroencephalogram signals of a user of a plurality of time slices into an emotion recognition model according to one of claims 1 to 5, and outputting emotion of the user at each time slice.
7. The method of claim 6, wherein the emotion of each time slice is an instantaneous emotion of the user, the method further comprising:
The long-term emotion of the user is determined in a soft voting manner based on the instant emotions of the user at a plurality of time slices and the probabilities corresponding to the instant emotions.
8. An electronic device, comprising:
one or more processors; and
A memory, wherein the memory is for storing executable instructions;
the one or more processors are configured to implement the steps of the method of claim 6 or 7 via execution of the executable instructions.
9. A computer readable storage medium, having stored thereon a computer program executable by a processor to implement the steps of the method of claim 6 or 7.
CN202210069138.1A 2022-01-21 2022-01-21 Method for identifying emotion and emotion identification model based on electroencephalogram data Active CN114209323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210069138.1A CN114209323B (en) 2022-01-21 2022-01-21 Method for identifying emotion and emotion identification model based on electroencephalogram data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210069138.1A CN114209323B (en) 2022-01-21 2022-01-21 Method for identifying emotion and emotion identification model based on electroencephalogram data

Publications (2)

Publication Number Publication Date
CN114209323A CN114209323A (en) 2022-03-22
CN114209323B true CN114209323B (en) 2024-05-10

Family

ID=80847097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210069138.1A Active CN114209323B (en) 2022-01-21 2022-01-21 Method for identifying emotion and emotion identification model based on electroencephalogram data

Country Status (1)

Country Link
CN (1) CN114209323B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309975B (en) * 2022-06-28 2024-06-07 中银金融科技有限公司 Product recommendation method and system based on interaction characteristics
CN115105079B (en) * 2022-07-26 2022-12-09 杭州罗莱迪思科技股份有限公司 Electroencephalogram emotion recognition method based on self-attention mechanism and application thereof
CN116965817B (en) * 2023-07-28 2024-03-15 长江大学 EEG emotion recognition method based on one-dimensional convolution network and transducer

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102573619A (en) * 2008-12-19 2012-07-11 新加坡科技研究局 Device and method for generating a representation of a subject's attention level
CN106923824A (en) * 2017-03-27 2017-07-07 广州视源电子科技股份有限公司 Electroencephalogram relaxation degree identification method and device based on multi-space signal characteristics
CN110162777A (en) * 2019-04-01 2019-08-23 广东外语外贸大学 One kind seeing figure writing type Automated Essay Scoring method and system
CN110610168A (en) * 2019-09-20 2019-12-24 合肥工业大学 Electroencephalogram emotion recognition method based on attention mechanism
CN111134666A (en) * 2020-01-09 2020-05-12 中国科学院软件研究所 Emotion recognition method of multi-channel electroencephalogram data and electronic device
CN111914486A (en) * 2020-08-07 2020-11-10 中国南方电网有限责任公司 Power system transient stability evaluation method based on graph attention network
CN112057089A (en) * 2020-08-31 2020-12-11 五邑大学 Emotion recognition method, emotion recognition device and storage medium
CN113598774A (en) * 2021-07-16 2021-11-05 中国科学院软件研究所 Active emotion multi-label classification method and device based on multi-channel electroencephalogram data
CN113947127A (en) * 2021-09-15 2022-01-18 复旦大学 Multi-mode emotion recognition method and system for accompanying robot

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102573619A (en) * 2008-12-19 2012-07-11 新加坡科技研究局 Device and method for generating a representation of a subject's attention level
CN106923824A (en) * 2017-03-27 2017-07-07 广州视源电子科技股份有限公司 Electroencephalogram relaxation degree identification method and device based on multi-space signal characteristics
CN110162777A (en) * 2019-04-01 2019-08-23 广东外语外贸大学 One kind seeing figure writing type Automated Essay Scoring method and system
CN110610168A (en) * 2019-09-20 2019-12-24 合肥工业大学 Electroencephalogram emotion recognition method based on attention mechanism
CN111134666A (en) * 2020-01-09 2020-05-12 中国科学院软件研究所 Emotion recognition method of multi-channel electroencephalogram data and electronic device
CN111914486A (en) * 2020-08-07 2020-11-10 中国南方电网有限责任公司 Power system transient stability evaluation method based on graph attention network
CN112057089A (en) * 2020-08-31 2020-12-11 五邑大学 Emotion recognition method, emotion recognition device and storage medium
CN113598774A (en) * 2021-07-16 2021-11-05 中国科学院软件研究所 Active emotion multi-label classification method and device based on multi-channel electroencephalogram data
CN113947127A (en) * 2021-09-15 2022-01-18 复旦大学 Multi-mode emotion recognition method and system for accompanying robot

Also Published As

Publication number Publication date
CN114209323A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
Tao et al. EEG-based emotion recognition via channel-wise attention and self attention
Wen et al. Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals
Altaheri et al. Physics-informed attention temporal convolutional network for EEG-based motor imagery classification
CN114209323B (en) Method for identifying emotion and emotion identification model based on electroencephalogram data
Asif et al. SeizureNet: Multi-spectral deep feature learning for seizure type classification
Feng et al. EEG-based emotion recognition using spatial-temporal graph convolutional LSTM with attention mechanism
Yuan et al. Wave2vec: Deep representation learning for clinical temporal data
CN114052735B (en) Deep field self-adaption-based electroencephalogram emotion recognition method and system
CN113627518A (en) Method for realizing multichannel convolution-recurrent neural network electroencephalogram emotion recognition model by utilizing transfer learning
CN110464314A (en) Method and system are estimated using mankind's emotion of deep physiological mood network
Vempati et al. A systematic review on automated human emotion recognition using electroencephalogram signals and artificial intelligence
CN108256629A (en) The unsupervised feature learning method of EEG signal based on convolutional network and own coding
Samavat et al. Deep learning model with adaptive regularization for EEG-based emotion recognition using temporal and frequency features
An et al. Electroencephalogram emotion recognition based on 3D feature fusion and convolutional autoencoder
CN115804602A (en) Electroencephalogram emotion signal detection method, equipment and medium based on attention mechanism and with multi-channel feature fusion
Agarwal et al. Classification of alcoholic and non-alcoholic EEG signals based on sliding-SSA and independent component analysis
Abdelhameed et al. An efficient deep learning system for epileptic seizure prediction
Jiang et al. Emotion recognition via multiscale feature fusion network and attention mechanism
Hasan et al. Fine-grained emotion recognition from eeg signal using fast fourier transformation and cnn
Paul et al. Deep learning and its importance for early signature of neuronal disorders
Pan et al. Recognition of human inner emotion based on two-stage FCA-ReliefF feature optimization
CN115422973A (en) Electroencephalogram emotion recognition method of space-time network based on attention
Immanuel et al. Recognition of emotion with deep learning using EEG signals-the next big wave for stress management in this covid-19 outbreak
Xu et al. AMDET: Attention based multiple dimensions EEG transformer for emotion recognition
Lian et al. An Ear Wearable Device System for Facial Emotion Recognition Disorders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant