CN113095428A - Video emotion classification method and system fusing electroencephalogram and stimulus information - Google Patents
Video emotion classification method and system fusing electroencephalogram and stimulus information Download PDFInfo
- Publication number
- CN113095428A CN113095428A CN202110442820.6A CN202110442820A CN113095428A CN 113095428 A CN113095428 A CN 113095428A CN 202110442820 A CN202110442820 A CN 202110442820A CN 113095428 A CN113095428 A CN 113095428A
- Authority
- CN
- China
- Prior art keywords
- video
- electroencephalogram
- fusion
- vector
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 103
- 230000004927 fusion Effects 0.000 claims abstract description 84
- 238000013528 artificial neural network Methods 0.000 claims abstract description 41
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 16
- 238000013145 classification model Methods 0.000 claims abstract description 14
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 7
- 230000007935 neutral effect Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/165—Evaluating the state of mind, e.g. depression, anxiety
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Psychiatry (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Surgery (AREA)
- Medical Informatics (AREA)
- Heart & Thoracic Surgery (AREA)
- Pathology (AREA)
- Social Psychology (AREA)
- Psychology (AREA)
- Hospice & Palliative Care (AREA)
- Educational Technology (AREA)
- Developmental Disabilities (AREA)
- Child & Adolescent Psychology (AREA)
- Fuzzy Systems (AREA)
- Physiology (AREA)
- Signal Processing (AREA)
Abstract
The invention discloses a video emotion classification method and system fusing electroencephalogram and stimulus information, which comprises the following steps of: and (3) enabling the subject to watch the video clip, and acquiring an electroencephalogram signal of the subject when watching the video by using an electroencephalogram scanner to construct a stimulus source-electroencephalogram signal data set. Constructing a multi-modal feature fusion model: and for the training data set, respectively extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism. Training a fusion vector classification model: predicting by taking the fusion vector as the input of a neural network full-connection layer; and updating the weight of the neural network according to the difference between the prediction result and the real label, and training the neural network. Classifying by using a model: collecting electroencephalogram signals when a subject watches videos to be classified; extracting video characteristics and electroencephalogram characteristics, and fusing; and inputting the fusion vector into the trained neural network to obtain a classification result.
Description
Technical Field
The invention relates to the field of multi-mode fused video emotion classification, in particular to a video emotion classification method and system fusing electroencephalogram and stimulus source information.
Background
The video emotion classification is a big hot spot in the research direction of computer vision and also has wide application value. In the video recommendation system, the emotion preference of a user is obtained by calculating the emotion of the user watching the video, and the video more conforming to the preference can be recommended. In the public opinion event, the video under the specific hot topic is obtained, the emotion of the video is calculated so as to be convenient for guiding, and correct public opinion guidance is established, so that a harmonious and stable network space environment is created. In addition, the video emotion classification also has important significance in aspects of video classification, advertisement implantation and the like.
Therefore, the invention provides a video emotion classification method fusing electroencephalogram and stimulus information. The method classifies the emotions of the videos by collecting electroencephalogram signals of videos watched by users and fusing video stimulus source information.
The video emotion classification method proposed in the prior art 1 includes: constructing a self-adaptive fusion network model; dividing an input video set into a training set and a testing set, and acquiring three modal characteristic vectors of each video in the video set, wherein the three modes are RGB, optical flow and audio; for the training set, respectively inputting the feature vectors of the three modes into the adaptive fusion network, and optimizing by adopting an optimization algorithm based on gradient to obtain a trained Model of the adaptive fusion network; and for the test set, inputting the feature vector of each video into a trained network Model, predicting the video emotion and classifying.
The emotion classification method proposed in the prior art 2 includes: determining the bullet screen emotion labels of all emotion bullet screens in a video to be analyzed; segmenting the video to be analyzed to obtain each video segment to be analyzed; calculating segment emotion vectors and emotion entropies of the video segments to be analyzed according to the bullet screen emotion labels in the video segments to be analyzed; and identifying the emotion fragments in the video fragments to be analyzed according to the fragment emotion vectors and the emotion entropy.
The emotion classification method proposed in prior art 3 includes: obtaining time series data of an electroencephalogram signal when a user watches a video; selecting a classification feature from the obtained time series data; and classifying the video emotion of the video according to the classification characteristic.
The three modes selected in the prior art 1 are all low-level characteristics of the video itself, and do not utilize electroencephalogram information. In the prior art 2, only the emotion of a video barrage is concerned, the information of the video is not concerned, and meanwhile, for video segments without the barrage, the method cannot be used for classification. In the prior art 3, only electroencephalogram signals generated when a user watches videos are concerned, but information of the videos is not concerned, so that the classification effect is not ideal.
Disclosure of Invention
The invention aims to provide a video emotion classification method and system fusing electroencephalogram and stimulus information, and aims to solve the problems.
In order to achieve the purpose, the invention adopts the following technical scheme:
a video emotion classification method fusing electroencephalogram and stimulus information comprises the following steps:
step 1, constructing a stimulus source-electroencephalogram signal data set: watching the video clips, and acquiring an electroencephalogram signal of a subject when watching the video by using an electroencephalogram scanner; constructing a stimulus source-electroencephalogram signal data set according to the video label, the video content and the electroencephalogram signal of the subject;
step 2, constructing a multi-modal feature fusion model: for a data set, extracting video characteristics and electroencephalogram characteristics, and respectively representing the characteristics of two modes as time sequence characteristic vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
step 3, training a fusion vector classification model: predicting the generated fusion vector as the input of a full connection layer of the neural network; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing the training of the model after the network is stable;
and 4, classifying by using the model: for videos to be classified, acquiring electroencephalogram signals when a subject watches the videos; extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into the trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the maximum probability as the result of video emotion classification.
Further, the construction of the stimulus source-electroencephalogram signal data set specifically comprises:
collecting videos, wearing a 62-channel electroencephalograph scanner for a subject, enabling the subject to watch stimulus source videos after signals are stable, and collecting electroencephalograms of the subject; and cleaning the acquired electroencephalogram data, storing the video tags, the contents and the electroencephalogram signals of the testee into a database, and constructing a stimulus source-electroencephalogram signal data set.
Further, video clips from the Internet are collected, wherein the video clips comprise videos with positive, negative and neutral emotions, the number of the videos is the same, and the duration of each video clip is 3-5 minutes.
Further, in step 2, the timing feature vector:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images of the training set according to a time interval of 1s for video data of the training set, respectively extracting the characteristics of the images by using a ResNet network, and splicing the characteristics according to a time step to obtain a time sequence characteristic vector; and for the electroencephalogram signals, performing feature extraction in a wavelet transform mode to obtain time sequence feature vectors.
Further, in step 2, the multi-modal feature vectors are organically combined through a multi-modal information fusion method based on an attention mechanism to generate a time sequence fusion vector; the fusion model adopts an RNN-based recurrent neural network encoder-decoder structure, for each time step, the encoder performs weighted fusion on multi-mode time sequence characteristic data to form an intermediate semantic representation, the semantic representation is input into a decoder network, the fusion characteristic of the current time step is obtained through decoding, and the characteristic fusion weight is updated by the hidden state of the decoder, so that the attention mechanism is realized; and operating all the time steps to finally obtain a time sequence fusion vector.
Further, in step 3, training the fusion vector classification model specifically includes:
predicting the time sequence fusion vector generated in the last step by taking the time sequence fusion vector as the input of the full connection layer of the neural network, and normalizing the result by using a Softmax function; and (3) using the cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the more the prediction result is close to the label, updating the weight of the neural network by using a random gradient descent method, training the neural network, and finishing the training of the model after the network is stable.
Further, in step 4, the classification using the model specifically includes:
for the video to be classified, wearing a 62-channel electroencephalogram scanner for the subject, enabling the subject to watch the video after the signals are stable, and simultaneously collecting electroencephalogram signals of the subject; after the acquisition is finished, performing data cleaning, and inputting the electroencephalogram signals and the video information into a multi-mode feature fusion model to generate a time sequence fusion vector; and inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion classes through a Softmax function on an output result, and selecting the emotion class with the maximum probability as a video emotion classification result.
Further, a video emotion classification system integrating electroencephalogram and stimulus information comprises
The stimulus source-electroencephalogram signal data set construction module is used for constructing a stimulus source-electroencephalogram signal data set according to the video label, the video content and the electroencephalogram signal of the subject;
the multi-modal feature fusion model construction module is used for extracting video features and electroencephalogram features from the data set and respectively representing the features of the two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
the fusion vector classification model training module is used for predicting the generated fusion vector as the input of a neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing the training of the model after the network is stable;
the model classification module is used for collecting electroencephalogram signals of a subject when the subject watches videos for the videos to be classified; extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into the trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the maximum probability as the result of video emotion classification.
Compared with the prior art, the invention 1 has the following technical effects:
the invention not only considers the stimulus source, but also combines the electroencephalogram signals when the user watches the video. The content of the stimulus source reflects the intrinsic information transmitted by the stimulus source, which is irrelevant to an individual watching a video, the electroencephalogram signal of the user reflects the physiological change generated after the user receives the information transmitted by the stimulus source, the information of the two modes is organically fused, and the generalization capability of the model and the accuracy of emotion classification are improved.
The invention adopts a multi-mode information fusion method based on an attention mechanism to organically combine video information with an electroencephalogram signal. The fusion model adopts an encoder-decoder structure based on RNN (recurrent neural network), an attention mechanism is introduced at the encoder stage, and the human cognition process is simulated, so that the neural network has the capability of being concentrated on the input subset, useless information is inhibited, and the efficiency of the model is improved.
Drawings
FIG. 1 is a flow chart of a video emotion classification method fusing electroencephalogram and stimulus information.
FIG. 2 is a diagram of a video emotion classification model according to the present invention.
FIG. 3 is a diagram of a multi-modal feature fusion model of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples. It should be understood that the specific embodiments are provided for better illustration of the invention and that all techniques implemented based on the teachings of the invention are within the scope of the invention.
Referring to fig. 1 to 2, a video emotion classification method fusing electroencephalogram and stimulus information includes the following steps:
(1) constructing a stimulus source-electroencephalogram signal data set:
collecting video clips with positive, negative and neutral emotions from the Internet;
enabling the subject to watch the video clip, and acquiring an electroencephalogram signal when the subject watches the video by using an electroencephalogram scanner;
and constructing a stimulus source-electroencephalogram signal data set according to the video tags, the content and the electroencephalogram signals of the testee.
(2) Constructing a multi-modal feature fusion model:
extracting video characteristics and electroencephalogram characteristics for a training data set, and respectively representing the characteristics of two modes as time sequence characteristic vectors;
and for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism.
(3) Training a fusion vector classification model:
predicting the fusion vector generated in the last step by taking the fusion vector as the input of the full connection layer of the neural network;
and updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing the training of the model after the network is stable.
(4) Classification Using models
For the video to be classified, wearing an electroencephalogram scanner for the subject, and collecting electroencephalogram signals when the subject watches the video;
extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
inputting the fusion vector into the trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the maximum probability as the result of video emotion classification.
The specific steps for constructing the stimulus source-electroencephalogram signal data set are as follows:
video clips from the Internet are collected, wherein the video clips comprise videos with positive, negative and neutral emotions, the number of the videos is the same, and the duration of each video clip is 3-5 minutes. The 62-channel electroencephalograph scanner is worn on the subject, the subject is allowed to watch the stimulus source video after the signals are stable, and the electroencephalograph signals of the subject are collected. And cleaning the acquired electroencephalogram data, storing the video tags, the contents and the electroencephalogram signals of the testee into a database, and constructing a stimulus source-electroencephalogram signal data set.
The method for constructing the multi-modal feature fusion model specifically comprises the following steps:
the method comprises the steps of dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images of the training set at certain time intervals according to video data of the training set, respectively extracting features of the images by using a ResNet network, and splicing the features according to time steps to obtain time sequence feature vectors. And for the electroencephalogram signals, performing feature extraction in a wavelet transform mode to obtain time sequence feature vectors.
And organically combining the multi-mode feature vectors by a multi-mode information fusion method based on an attention mechanism to generate a time sequence fusion vector. Because the emotion of the video is not uniformly distributed in the whole video, the emotion generated when people watch the video often fluctuates, and therefore when the time sequence features are fused, the final classification effect is not good due to the fact that the same weight is given to all the moments. The attention mechanism can simulate the process of human cognition, so that the neural network focuses more on local contents, and the weight which is originally distributed evenly is redistributed according to the importance degree of the contents.
The fusion model adopts an RNN (recurrent neural network) -based encoder-decoder structure, for each time step, the encoder performs weighted fusion on multi-mode time sequence feature data to form an intermediate semantic representation, the semantic representation is input into a decoder network, the fusion feature of the current time step is obtained through decoding, the feature fusion weight is updated through the hidden state of the decoder, and the attention mechanism is realized. And operating all the time steps to finally obtain a time sequence fusion vector.
The training fusion vector classification model specifically comprises the following steps:
and predicting the time sequence fusion vector generated in the last step by taking the time sequence fusion vector as the input of the full connection layer of the neural network, and normalizing the result by using a Softmax function.
And (3) using the cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the more the prediction result is close to the label, updating the weight of the neural network by using a random gradient descent method, training the neural network, and finishing the training of the model after the network is stable.
The classification by using the model specifically comprises the following steps:
for the video to be classified, a 62-channel electroencephalograph scanner is worn by the subject, the subject is allowed to watch the video after the signals are stable, and meanwhile, electroencephalograms of the subject are collected.
And after the acquisition is finished, performing data cleaning, and inputting the electroencephalogram signals and the video information into a multi-mode feature fusion model to generate a time sequence fusion vector.
And inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion classes through a Softmax function on an output result, and selecting the emotion class with the maximum probability as a video emotion classification result.
Example (b):
the video of the global epidemic situation event on the tweet is taken as an embodiment to explain the video emotion classification process fusing electroencephalogram and stimulus source information.
(1) Construction of stimulus-electroencephalogram data set
Video clips from the internet are collected, containing videos with positive, negative and neutral emotions, and the number of the videos is the same. The positive video selects segments in comedy movies, the negative video selects segments in tragedy movies, and the neutral video selects documentary segments. The time length of each video is 3-5 minutes. The 62-channel electroencephalograph scanner is worn by a subject, the subject watches stimulus source videos after signals are stable, nearby workers are responsible for playing the videos and recording the electroencephalograph signals, the videos are played in a random sequence and are continuously watched by the same subject, the video playing interval is 15s, and the interval time is used for the subject to rest and calm down the emotion. And cleaning the acquired electroencephalogram data, storing the video tags, the contents and the electroencephalogram signals of the testee into a database, and constructing a stimulus source-electroencephalogram signal data set.
(2) Constructing a multimodal feature fusion model
For a training data set, video features and electroencephalogram features are extracted, and the features of two modes are respectively expressed as time sequence feature vectors:
for the electroencephalogram signal x, performing feature extraction by using a wavelet transform mode to obtain a time sequence feature vector x*To x*Equally dividing the time interval into L sections, each section is a time step, the corresponding time length is T, and the characteristic vector section of the ith time step is recorded as ux,iThen x*={ux,1,ux,2,…,x,L};
For a video stimulus source, extracting L images of the video according to a time interval T, extracting the characteristics of the images as the characteristics of the stimulus source content at the time step by utilizing a ResNet network pre-trained on an ImageNet data set, and recording the characteristic vector of the ith picture as us,iFinally, the time sequence characteristic vector of the stimulus s is obtained and recorded as s*={us,1,us,2,…,us,L}。
For input modality k, the feature vector corresponding to time step i is uk,i. Feature vector u of multi-mode information fusion model pair based on attention mechanismk,iPerforming weighted fusion to generate a new fusion vector giThe method comprises the following steps:
wherein, WhAnd bhRespectively, the weight matrix and the offset vector of the decoder network. With current input uk,iAnd decoder hidden state hi-1To determine the importance of each element in the input vector. dk,iIs uk,iThe calculation method is as follows:
dk,i=Wukuk,i+buk (2)
multimodal attention weight αk,iCalculated by the following formula:
here, ,
vk,iis calculated at time i and the current state h of the decoderi-1Associated modality k ith feature vector uk,iFraction of (d) to measure the vector uk,iImportance to the prediction result. WB、VBkIs a weight matrix, wBIs a weight vector, bBkIs a bias vector.
(3) Training fusion vector classification model
The fusion result G ═ G1g2…gL]TPredicting as input of a neural network full-connection layer, normalizing the result by using a Softmax function, and outputting a probability vector o
o=softmax(z)=softmax(WTG) (5)
Wherein,
we use the cross entropy based on softmax as a loss function, defined as follows:
wherein N is the number of samples, ojAnd updating the network weight training model by a minimum loss function for the jth value of the output vector o and yi for the position of the ith sample real label in the output vector o.
(4) Classification Using models
And collecting videos related to global epidemic situation on the twitter as videos to be classified, and downloading the videos to the local. For the videos to be classified, the electroencephalogram scanner is worn by the subject, the channel position is adjusted to keep the signals stable, the subject can watch the videos, and meanwhile, electroencephalogram signals of the subject are collected.
And (3) respectively extracting video features and electroencephalogram features according to the method in the step (2), and generating a fusion vector by adopting a multi-mode feature fusion method based on an attention mechanism. And inputting the vector into the trained neural network, wherein the output vector of the network is the probability of each emotion category, and selecting the emotion category with the maximum probability as the result of video emotion classification.
Claims (8)
1. A video emotion classification method fusing electroencephalogram and stimulus source information is characterized by comprising the following steps:
step 1, constructing a stimulus source-electroencephalogram signal data set: watching the video clips, and acquiring an electroencephalogram signal of a subject when watching the video by using an electroencephalogram scanner; constructing a stimulus source-electroencephalogram signal data set according to the video label, the video content and the electroencephalogram signal of the subject;
step 2, constructing a multi-modal feature fusion model: for a data set, extracting video characteristics and electroencephalogram characteristics, and respectively representing the characteristics of two modes as time sequence characteristic vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
step 3, training a fusion vector classification model: predicting the generated fusion vector as the input of a full connection layer of the neural network; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing the training of the model after the network is stable;
and 4, classifying by using the model: for videos to be classified, acquiring electroencephalogram signals when a subject watches the videos; extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into the trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the maximum probability as the result of video emotion classification.
2. The video emotion classification method fusing electroencephalogram and stimulus source information, as claimed in claim 1, wherein the construction of the stimulus source-electroencephalogram signal data set specifically comprises:
collecting videos, wearing a 62-channel electroencephalograph scanner for a subject, enabling the subject to watch stimulus source videos after signals are stable, and collecting electroencephalograms of the subject; and cleaning the acquired electroencephalogram data, storing the video tags, the contents and the electroencephalogram signals of the testee into a database, and constructing a stimulus source-electroencephalogram signal data set.
3. The video emotion classification method integrating electroencephalogram and stimulus source information as claimed in claim 2, wherein video segments from the internet are collected, the video segments include videos with positive, negative and neutral emotions, the number of the videos is the same, and the duration of each video segment is 3-5 minutes.
4. The method for classifying the emotion of the video by fusing the electroencephalogram information and the stimulus information as claimed in claim 1, wherein in the step 2, the time sequence feature vector:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images of the training set according to a time interval of 1s for video data of the training set, respectively extracting the characteristics of the images by using a ResNet network, and splicing the characteristics according to a time step to obtain a time sequence characteristic vector; and for the electroencephalogram signals, performing feature extraction in a wavelet transform mode to obtain time sequence feature vectors.
5. The video emotion classification method fusing electroencephalogram and stimulus source information, as recited in claim 1, wherein in step 2, the multi-modal feature vectors are organically combined by a multi-modal information fusion method based on an attention mechanism to generate a time-series fusion vector; the fusion model adopts an RNN-based recurrent neural network encoder-decoder structure, for each time step, the encoder performs weighted fusion on multi-mode time sequence characteristic data to form an intermediate semantic representation, the semantic representation is input into a decoder network, the fusion characteristic of the current time step is obtained through decoding, and the characteristic fusion weight is updated by the hidden state of the decoder, so that the attention mechanism is realized; and operating all the time steps to finally obtain a time sequence fusion vector.
6. The method for classifying video emotion fused with electroencephalogram and stimulus information as claimed in claim 1, wherein in step 3, training of the fused vector classification model specifically comprises:
predicting the time sequence fusion vector generated in the last step by taking the time sequence fusion vector as the input of the full connection layer of the neural network, and normalizing the result by using a Softmax function; and (3) using the cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the more the prediction result is close to the label, updating the weight of the neural network by using a random gradient descent method, training the neural network, and finishing the training of the model after the network is stable.
7. The method for classifying video emotion fused with electroencephalogram and stimulus information as claimed in claim 1, wherein in the step 4, the classification by using the model specifically comprises:
for the video to be classified, wearing a 62-channel electroencephalogram scanner for the subject, enabling the subject to watch the video after the signals are stable, and simultaneously collecting electroencephalogram signals of the subject; after the acquisition is finished, performing data cleaning, and inputting the electroencephalogram signals and the video information into a multi-mode feature fusion model to generate a time sequence fusion vector; and inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion classes through a Softmax function on an output result, and selecting the emotion class with the maximum probability as a video emotion classification result.
8. A video emotion classification system fusing electroencephalogram and stimulus source information, which is characterized in that the video emotion classification method fusing electroencephalogram and stimulus source information based on any one of claims 1 to 7 comprises
The stimulus source-electroencephalogram signal data set construction module is used for constructing a stimulus source-electroencephalogram signal data set according to the video label, the video content and the electroencephalogram signal of the subject;
the multi-modal feature fusion model construction module is used for extracting video features and electroencephalogram features from the data set and respectively representing the features of the two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
the fusion vector classification model training module is used for predicting the generated fusion vector as the input of a neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing the training of the model after the network is stable;
the model classification module is used for collecting electroencephalogram signals of a subject when the subject watches videos for the videos to be classified; extracting video characteristics and electroencephalogram characteristics, and generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into the trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the maximum probability as the result of video emotion classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110442820.6A CN113095428B (en) | 2021-04-23 | 2021-04-23 | Video emotion classification method and system integrating electroencephalogram and stimulus information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110442820.6A CN113095428B (en) | 2021-04-23 | 2021-04-23 | Video emotion classification method and system integrating electroencephalogram and stimulus information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095428A true CN113095428A (en) | 2021-07-09 |
CN113095428B CN113095428B (en) | 2023-09-19 |
Family
ID=76679737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110442820.6A Active CN113095428B (en) | 2021-04-23 | 2021-04-23 | Video emotion classification method and system integrating electroencephalogram and stimulus information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095428B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627447A (en) * | 2021-10-13 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Label identification method, label identification device, computer equipment, storage medium and program product |
CN113779934A (en) * | 2021-08-13 | 2021-12-10 | 远光软件股份有限公司 | Multi-modal information extraction method, device, equipment and computer-readable storage medium |
CN113974658A (en) * | 2021-10-28 | 2022-01-28 | 天津大学 | Semantic visual image classification method and device based on EEG time-sharing spectrum Riemann |
CN113988201A (en) * | 2021-11-03 | 2022-01-28 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN114366107A (en) * | 2022-02-23 | 2022-04-19 | 天津理工大学 | Cross-media data emotion recognition method based on facial expressions and electroencephalogram signals |
CN114638253A (en) * | 2022-02-16 | 2022-06-17 | 南京邮电大学 | Identity recognition system and method based on emotion electroencephalogram feature fusion optimization mechanism |
CN114722950A (en) * | 2022-04-14 | 2022-07-08 | 武汉大学 | Multi-modal multivariate time sequence automatic classification method and device |
CN114821401A (en) * | 2022-04-07 | 2022-07-29 | 腾讯科技(深圳)有限公司 | Video auditing method, device, equipment, storage medium and program product |
CN115381467A (en) * | 2022-10-31 | 2022-11-25 | 浙江浙大西投脑机智能科技有限公司 | Attention mechanism-based time-frequency information dynamic fusion decoding method and device |
CN115778330A (en) * | 2023-02-07 | 2023-03-14 | 之江实验室 | Automatic epileptic seizure detection system and device based on video electroencephalogram |
CN115985505A (en) * | 2023-01-19 | 2023-04-18 | 北京未磁科技有限公司 | Multidimensional fusion myocardial ischemia auxiliary diagnosis model and construction method thereof |
CN116049743A (en) * | 2022-12-14 | 2023-05-02 | 深圳市仰和技术有限公司 | Cognitive recognition method based on multi-modal data, computer equipment and storage medium |
CN116304643A (en) * | 2023-05-18 | 2023-06-23 | 中国第一汽车股份有限公司 | Mental load detection and model training method, device, equipment and storage medium |
CN117426774A (en) * | 2023-12-21 | 2024-01-23 | 深圳腾信百纳科技有限公司 | User emotion assessment method and system based on intelligent bracelet |
JP7428337B1 (en) | 2023-06-30 | 2024-02-06 | Vie株式会社 | Information processing method, information processing device, and program |
CN117615210A (en) * | 2023-11-21 | 2024-02-27 | 南湖脑机交叉研究院 | User experience quality determining method and device |
CN118152819A (en) * | 2024-04-25 | 2024-06-07 | 之江实验室 | Brain coding method, device and medium under audiovisual stimulation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108805087A (en) * | 2018-06-14 | 2018-11-13 | 南京云思创智信息科技有限公司 | Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem |
CN109815903A (en) * | 2019-01-24 | 2019-05-28 | 同济大学 | A kind of video feeling classification method based on adaptive converged network |
CN110515456A (en) * | 2019-08-14 | 2019-11-29 | 东南大学 | EEG signals emotion method of discrimination and device based on attention mechanism |
US20210019531A1 (en) * | 2019-07-16 | 2021-01-21 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for classifying video |
-
2021
- 2021-04-23 CN CN202110442820.6A patent/CN113095428B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108805087A (en) * | 2018-06-14 | 2018-11-13 | 南京云思创智信息科技有限公司 | Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem |
CN109815903A (en) * | 2019-01-24 | 2019-05-28 | 同济大学 | A kind of video feeling classification method based on adaptive converged network |
US20210019531A1 (en) * | 2019-07-16 | 2021-01-21 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for classifying video |
CN110515456A (en) * | 2019-08-14 | 2019-11-29 | 东南大学 | EEG signals emotion method of discrimination and device based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
李幼军;黄佳进;王海渊;钟宁;: "基于SAE和LSTM RNN的多模态生理信号融合和情感识别研究", 通信学报, no. 12 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779934A (en) * | 2021-08-13 | 2021-12-10 | 远光软件股份有限公司 | Multi-modal information extraction method, device, equipment and computer-readable storage medium |
CN113779934B (en) * | 2021-08-13 | 2024-04-26 | 远光软件股份有限公司 | Multi-mode information extraction method, device, equipment and computer readable storage medium |
CN113627447A (en) * | 2021-10-13 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Label identification method, label identification device, computer equipment, storage medium and program product |
CN113974658A (en) * | 2021-10-28 | 2022-01-28 | 天津大学 | Semantic visual image classification method and device based on EEG time-sharing spectrum Riemann |
CN113974658B (en) * | 2021-10-28 | 2024-01-26 | 天津大学 | Semantic visual image classification method and device based on EEG time-sharing frequency spectrum Riemann |
CN113988201B (en) * | 2021-11-03 | 2024-04-26 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN113988201A (en) * | 2021-11-03 | 2022-01-28 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN114638253A (en) * | 2022-02-16 | 2022-06-17 | 南京邮电大学 | Identity recognition system and method based on emotion electroencephalogram feature fusion optimization mechanism |
CN114366107A (en) * | 2022-02-23 | 2022-04-19 | 天津理工大学 | Cross-media data emotion recognition method based on facial expressions and electroencephalogram signals |
CN114821401A (en) * | 2022-04-07 | 2022-07-29 | 腾讯科技(深圳)有限公司 | Video auditing method, device, equipment, storage medium and program product |
CN114722950A (en) * | 2022-04-14 | 2022-07-08 | 武汉大学 | Multi-modal multivariate time sequence automatic classification method and device |
CN114722950B (en) * | 2022-04-14 | 2023-11-07 | 武汉大学 | Multi-mode multi-variable time sequence automatic classification method and device |
CN115381467B (en) * | 2022-10-31 | 2023-03-10 | 浙江浙大西投脑机智能科技有限公司 | Attention mechanism-based time-frequency information dynamic fusion decoding method and device |
CN115381467A (en) * | 2022-10-31 | 2022-11-25 | 浙江浙大西投脑机智能科技有限公司 | Attention mechanism-based time-frequency information dynamic fusion decoding method and device |
CN116049743A (en) * | 2022-12-14 | 2023-05-02 | 深圳市仰和技术有限公司 | Cognitive recognition method based on multi-modal data, computer equipment and storage medium |
CN116049743B (en) * | 2022-12-14 | 2023-10-31 | 深圳市仰和技术有限公司 | Cognitive recognition method based on multi-modal data, computer equipment and storage medium |
CN115985505B (en) * | 2023-01-19 | 2023-12-12 | 北京未磁科技有限公司 | Multidimensional fusion myocardial ischemia auxiliary diagnosis model and construction method thereof |
CN115985505A (en) * | 2023-01-19 | 2023-04-18 | 北京未磁科技有限公司 | Multidimensional fusion myocardial ischemia auxiliary diagnosis model and construction method thereof |
CN115778330A (en) * | 2023-02-07 | 2023-03-14 | 之江实验室 | Automatic epileptic seizure detection system and device based on video electroencephalogram |
CN116304643B (en) * | 2023-05-18 | 2023-08-11 | 中国第一汽车股份有限公司 | Mental load detection and model training method, device, equipment and storage medium |
CN116304643A (en) * | 2023-05-18 | 2023-06-23 | 中国第一汽车股份有限公司 | Mental load detection and model training method, device, equipment and storage medium |
JP7428337B1 (en) | 2023-06-30 | 2024-02-06 | Vie株式会社 | Information processing method, information processing device, and program |
CN117615210A (en) * | 2023-11-21 | 2024-02-27 | 南湖脑机交叉研究院 | User experience quality determining method and device |
CN117426774A (en) * | 2023-12-21 | 2024-01-23 | 深圳腾信百纳科技有限公司 | User emotion assessment method and system based on intelligent bracelet |
CN117426774B (en) * | 2023-12-21 | 2024-04-09 | 深圳腾信百纳科技有限公司 | User emotion assessment method and system based on intelligent bracelet |
CN118152819A (en) * | 2024-04-25 | 2024-06-07 | 之江实验室 | Brain coding method, device and medium under audiovisual stimulation |
Also Published As
Publication number | Publication date |
---|---|
CN113095428B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113095428B (en) | Video emotion classification method and system integrating electroencephalogram and stimulus information | |
Kossaifi et al. | Sewa db: A rich database for audio-visual emotion and sentiment research in the wild | |
US11887352B2 (en) | Live streaming analytics within a shared digital environment | |
US20190172458A1 (en) | Speech analysis for cross-language mental state identification | |
CN111209440B (en) | Video playing method, device and storage medium | |
US11206450B2 (en) | System, apparatus and method for providing services based on preferences | |
US20170238859A1 (en) | Mental state data tagging and mood analysis for data collected from multiple sources | |
US20170289619A1 (en) | Method for positioning video, terminal apparatus and cloud server | |
JP2018206085A (en) | Event evaluation support system, event evaluation support device, and event evaluation support program | |
US10942563B2 (en) | Prediction of the attention of an audience during a presentation | |
US20180302686A1 (en) | Personalizing closed captions for video content | |
CN111310019A (en) | Information recommendation method, information processing method, system and equipment | |
US10834453B2 (en) | Dynamic live feed recommendation on the basis of user real time reaction to a live feed | |
CN102244788A (en) | Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs | |
CN110929158A (en) | Content recommendation method, system, storage medium and terminal equipment | |
CN116484318A (en) | Lecture training feedback method, lecture training feedback device and storage medium | |
CN113496156B (en) | Emotion prediction method and equipment thereof | |
CN115050077A (en) | Emotion recognition method, device, equipment and storage medium | |
Aydin et al. | Automatic personality prediction from audiovisual data using random forest regression | |
CN111931073B (en) | Content pushing method and device, electronic equipment and computer readable medium | |
Liu et al. | Learning to predict salient faces: A novel visual-audio saliency model | |
Desai et al. | ASL citizen: a community-sourced dataset for advancing isolated sign language recognition | |
CN112073757B (en) | Emotion fluctuation index acquisition method, emotion fluctuation index display method and multimedia content production method | |
CN113764099A (en) | Psychological state analysis method, device, equipment and medium based on artificial intelligence | |
CN113407778A (en) | Label identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |