CN113095428B - Video emotion classification method and system integrating electroencephalogram and stimulus information - Google Patents

Video emotion classification method and system integrating electroencephalogram and stimulus information Download PDF

Info

Publication number
CN113095428B
CN113095428B CN202110442820.6A CN202110442820A CN113095428B CN 113095428 B CN113095428 B CN 113095428B CN 202110442820 A CN202110442820 A CN 202110442820A CN 113095428 B CN113095428 B CN 113095428B
Authority
CN
China
Prior art keywords
video
electroencephalogram
fusion
vector
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110442820.6A
Other languages
Chinese (zh)
Other versions
CN113095428A (en
Inventor
刘欢
李珂
秦涛
郑庆华
张玉哲
陈栩栩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110442820.6A priority Critical patent/CN113095428B/en
Publication of CN113095428A publication Critical patent/CN113095428A/en
Application granted granted Critical
Publication of CN113095428B publication Critical patent/CN113095428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Surgery (AREA)
  • Pathology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Social Psychology (AREA)
  • Developmental Disabilities (AREA)
  • Child & Adolescent Psychology (AREA)
  • Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Fuzzy Systems (AREA)
  • Physiology (AREA)
  • Signal Processing (AREA)
  • Educational Technology (AREA)

Abstract

The invention discloses a video emotion classification method and a video emotion classification system for fusing electroencephalogram and stimulus information, which are used for constructing a stimulus-electroencephalogram data set: and allowing the subject to watch the video fragment, collecting brain electrical signals of the subject when watching the video by using an electroencephalogram scanner, and constructing a stimulus source-brain electrical signal data set. Constructing a multi-mode feature fusion model: and respectively extracting video features and electroencephalogram features for the training data set, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism. Training a fusion vector classification model: taking the fusion vector as input of a neural network full-connection layer to predict; and updating the weight of the neural network according to the difference between the prediction result and the real label, and training the neural network. Classification is performed by using a model: collecting brain electrical signals of a subject when watching videos to be classified; extracting video features and electroencephalogram features, and fusing; and inputting the fusion vector into a trained neural network to obtain a classification result.

Description

Video emotion classification method and system integrating electroencephalogram and stimulus information
Technical Field
The invention relates to the field of multi-mode fusion video emotion classification, in particular to a video emotion classification method and system for fusing electroencephalogram and stimulus information.
Background
Video emotion classification is a great hotspot in the direction of computer vision research, and has wide application value. In the video recommendation system, emotion of a user watching a video is calculated, emotion preference of the user is obtained, and the user can be recommended with the video which accords with the preference. In the public opinion event, the video under the specific hot topic is acquired, the emotion of the video is calculated so as to guide, and the correct public opinion guide is established, so that the construction of a harmonious and stable network space environment is facilitated. In addition, video emotion classification is also of great significance in aspects of video classification, advertisement implantation and the like.
Therefore, the invention provides a video emotion classification method integrating electroencephalogram and stimulus information. The method classifies the emotion of the video by collecting the electroencephalogram signals of the video watched by the user and fusing the video stimulus source information.
The video emotion classification method proposed in the prior art 1 includes: constructing a self-adaptive fusion network model; dividing an input video set into a training set and a testing set, and acquiring three modal feature vectors of each video in the video set, wherein the three modalities are RGB, optical flow and audio; for the training set, respectively inputting the feature vectors of the three modes into the self-adaptive fusion network, and optimizing by adopting a gradient-based optimization algorithm to obtain a trained self-adaptive fusion network Model; and for the test set, inputting the feature vector of each video into a trained network Model, and predicting the emotion of the video for classification.
The emotion classification method proposed in the prior art 2 includes: determining bullet screen emotion labels of all emotion bullet screens in a video to be analyzed; dividing the video to be analyzed to obtain each video segment to be analyzed; according to each bullet screen emotion label in each video segment to be analyzed, calculating segment emotion vectors and emotion entropy of each video segment to be analyzed; and identifying emotion fragments in each video fragment to be analyzed according to the fragment emotion vector and the emotion entropy.
The emotion classification method proposed in the prior art 3 includes: obtaining time series data of an electroencephalogram signal when a user watches a video; selecting classification features from the obtained time series data; and classifying the video emotion of the video according to the classification characteristics.
The three modes selected in the prior art 1 are all low-level features of the video itself, and do not utilize brain electrical information. In the prior art 2, only the emotion of the video barrage is focused, the information of the video is not focused, and meanwhile, the method cannot be used for classifying video clips without barrages. In the prior art 3, only the electroencephalogram signals generated by watching the video by the user are focused, but the information of the video is not focused, so that the classification effect is not ideal.
Disclosure of Invention
The invention aims to provide a video emotion classification method and a video emotion classification system for fusing electroencephalogram and stimulus information so as to solve the problems.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a video emotion classification method integrating electroencephalogram and stimulus information comprises the following steps:
step 1, constructing a stimulus-electroencephalogram signal data set: watching the video clips, and collecting brain electrical signals of a subject when watching the video by using an brain electrical scanner; constructing a stimulus source-electroencephalogram signal data set according to the video tag, the video content and the electroencephalogram signal of the subject;
step 2, constructing a multi-mode feature fusion model: for a data set, extracting video features and electroencephalogram signal features, and respectively representing the features of two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
step 3, training a fusion vector classification model: predicting the generated fusion vector as the input of the neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing training of the model after the network is stable;
step 4, classifying by using the model: for videos to be classified, acquiring brain electrical signals of a subject when watching the videos; extracting video features and electroencephalogram features, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism; and inputting the fusion vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as a video emotion classification result.
Further, the construction of the stimulus-electroencephalogram data set is specifically as follows:
collecting video, wearing a 62-channel electroencephalogram scanner for a subject, enabling the subject to watch a stimulus video after the signal is stable, and collecting electroencephalogram signals of the subject; and cleaning the acquired electroencephalogram data, storing the video tag, the content and the electroencephalogram signals of the subject into a database, and constructing a stimulus source-electroencephalogram signal data set.
Further, video clips from the Internet are collected, wherein the video clips comprise videos with positive emotion, negative emotion and neutral emotion, the number of the video clips is the same, and the duration of each video clip is 3-5 minutes.
Further, in step 2, the timing feature vector:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images according to a time interval of 1s for video data of the training set, respectively extracting features of the images by using a ResNet network, and splicing the features according to time steps to obtain a time sequence feature vector; and extracting characteristics of the electroencephalogram signals by adopting a wavelet transformation mode to obtain time sequence characteristic vectors.
In step 2, the multi-mode feature vectors are organically combined through a multi-mode information fusion method based on an attention mechanism to generate a time sequence fusion vector; the fusion model adopts an RNN-based cyclic neural network encoder-decoder structure, for each time step, the encoder carries out weighted fusion on multi-mode time sequence feature data to form an intermediate semantic representation, the semantic representation is input into a decoder network, fusion features of the current time step are obtained through decoding, and the fusion weights of the features are updated by the hidden state of the decoder, so that an attention mechanism is realized; and performing step operation on all time steps to finally obtain a time sequence fusion vector.
Further, in step 3, training the fusion vector classification model specifically includes:
predicting the time sequence fusion vector generated in the last step as the input of the neural network full-connection layer, and normalizing the result by using a Softmax function; and using cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the closer the prediction result is to the label, using a random gradient descent method to update the weight of the neural network, training the neural network, and completing training of the model after the network is stable.
Further, in step 4, classification by using the model is specifically:
for videos to be classified, a 62-channel electroencephalogram scanner is worn on a subject, the subject watches the videos after signals are stable, and meanwhile, electroencephalogram signals of the subject are collected; after the acquisition is finished, data cleaning is carried out, and the electroencephalogram signals and the video information are input into a multi-mode feature fusion model to generate a time sequence fusion vector; and inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion categories through a Softmax function according to the output result, and selecting the emotion category with the highest probability as the video emotion classification result.
Further, a video emotion classification system integrating electroencephalogram and stimulus information comprises
The stimulus source-electroencephalogram data set construction module is used for constructing a stimulus source-electroencephalogram data set according to the video tag, the video content and the electroencephalogram of the subject;
the multi-mode feature fusion model construction module is used for extracting video features and electroencephalogram signal features for a data set, and respectively representing the features of the two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
the fusion vector classification model training module is used for predicting the generated fusion vector as the input of the neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing training of the model after the network is stable;
the model classification module is used for collecting brain electrical signals of the video to be classified when the subject watches the video; extracting video features and electroencephalogram features, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism; and inputting the fusion vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as a video emotion classification result.
Compared with the prior art, the invention has the technical effects that:
the invention considers the stimulus source and combines the brain electrical signal when the user watches the video. The content of the stimulus source reflects the intrinsic information transmitted by the stimulus source, which is irrelevant to the individual watching the video, the electroencephalogram signal of the user reflects the physiological change generated after the user receives the information transmitted by the stimulus source, and the information of the two modes is organically fused, so that the generalization capability of the model and the accuracy of emotion classification are improved.
The invention adopts a multi-mode information fusion method based on an attention mechanism to organically combine video information with brain electrical signals. The fusion model adopts an encoder-decoder structure based on an RNN (cyclic neural network), and a attention mechanism is introduced in the encoder stage to simulate a human cognition process, so that the neural network has the capability of concentrating on an input subset, useless information is restrained, and the efficiency of the model is improved.
Drawings
Fig. 1 is a flowchart of a video emotion classification method integrating electroencephalogram and stimulus information.
FIG. 2 is a diagram of a video emotion classification model of the present invention.
FIG. 3 is a diagram of a multimodal feature fusion model of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments are provided to better explain the present invention, and all techniques implemented based on the present disclosure fall within the scope of the present invention.
Referring to fig. 1 to 2, a video emotion classification method integrating electroencephalogram and stimulus information includes the following steps:
(1) Constructing a stimulus-electroencephalogram data set:
collecting video clips with emotions positive, negative and neutral from the Internet;
allowing a subject to watch the video clip, and acquiring an electroencephalogram signal of the subject when watching the video by using an electroencephalogram scanner;
and constructing a stimulus-electroencephalogram data set according to the video tag, the content and the electroencephalogram of the subject.
(2) Constructing a multi-mode feature fusion model:
for a training data set, extracting video features and electroencephalogram features, and respectively representing the features of two modes as time sequence feature vectors;
for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism.
(3) Training a fusion vector classification model:
the fusion vector generated in the last step is used as the input of the neural network full-connection layer to predict;
and updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing training of the model after the network is stabilized.
(4) Classification using models
For videos to be classified, wearing an electroencephalogram scanner for a subject, and collecting electroencephalogram signals of the subject when watching the videos;
extracting video features and electroencephalogram features, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism;
and inputting the fusion vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as a video emotion classification result.
The construction of the stimulus-electroencephalogram data set is specifically as follows:
video clips from the Internet are collected, wherein the video clips comprise videos with positive emotion, negative emotion and neutral emotion, the number of the video clips is the same, and the duration of each video clip is 3-5 minutes. And (3) wearing a 62-channel electroencephalogram scanner for the subject, and after the signals are stable, enabling the subject to watch the stimulus source video and collecting the electroencephalogram signals of the subject. And cleaning the acquired electroencephalogram data, storing the video tag, the content and the electroencephalogram signals of the subject into a database, and constructing a stimulus source-electroencephalogram signal data set.
The construction of the multi-mode feature fusion model is specifically as follows:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images according to a certain time interval for video data of the training set, respectively extracting features of the images by using a ResNet network, and splicing the features according to time steps to obtain a time sequence feature vector. And extracting characteristics of the electroencephalogram signals by adopting a wavelet transformation mode to obtain time sequence characteristic vectors.
And organically combining the multi-mode feature vectors through a multi-mode information fusion method based on an attention mechanism to generate a time sequence fusion vector. Since emotions of the video are not uniformly distributed in the whole video, the emotion generated when people watch the video often has fluctuation, and therefore, when time sequence characteristics are fused, the final classification effect is poor due to the fact that all moments are given the same weight. The attention mechanism can simulate the human cognition process, so that the neural network is focused on local content, and the weight which is originally distributed evenly is redistributed according to the importance degree of the content.
The fusion model adopts an RNN (cyclic neural network) -based encoder-decoder structure, for each time step, the encoder carries out weighted fusion on multi-mode time sequence feature data into an intermediate semantic representation, the semantic representation is input into a decoder network, fusion features of the current time step are obtained through decoding, and the fusion weights of the features are updated by the hidden state of the decoder, so that a attention mechanism is realized. And performing step operation on all time steps to finally obtain a time sequence fusion vector.
The training fusion vector classification model specifically comprises the following steps:
and predicting the time sequence fusion vector generated in the last step as the input of the neural network full-connection layer, and normalizing the result by using a Softmax function.
And using cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the closer the prediction result is to the label, using a random gradient descent method to update the weight of the neural network, training the neural network, and completing training of the model after the network is stable.
The classification by using the model is specifically as follows:
and for the video to be classified, wearing a 62-channel electroencephalogram scanner for the subject, and after the signals are stable, enabling the subject to watch the video and simultaneously collecting the electroencephalogram signals of the subject.
And after the acquisition is finished, data cleaning is carried out, and the electroencephalogram signals and the video information are input into a multi-mode feature fusion model to generate a time sequence fusion vector.
And inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion categories through a Softmax function according to the output result, and selecting the emotion category with the highest probability as the video emotion classification result.
Examples:
the video of global epidemic events on a tweeter is taken as an embodiment to describe a video emotion classification process of fusing electroencephalogram and stimulus information.
(1) Construction of stimulus-brain electrical signal data set
Video clips from the internet are collected, including videos whose emotions are positive, negative and neutral, and the number of the three is the same. The active video selects a episode in a comedy movie, the passive video selects a episode in a tragedy movie, and the neutral video selects a documentary episode. Each video has a duration of 3-5 minutes. The method is characterized in that a 62-channel electroencephalogram scanner is worn for a subject, the subject watches a stimulus source video after a signal is stabilized, and a nearby worker is responsible for playing the video and recording electroencephalogram signals, wherein the video is played in random sequence, the same subject watches continuously, the video is played at intervals of 15s, the subject is at intervals for rest, and emotion is recumbent. And cleaning the acquired electroencephalogram data, storing the video tag, the content and the electroencephalogram signals of the subject into a database, and constructing a stimulus source-electroencephalogram signal data set.
(2) Constructing a multi-modal feature fusion model
For a training data set, extracting video features and electroencephalogram features, and respectively representing the features of two modes as time sequence feature vectors:
for the electroencephalogram signal x, performing feature extraction by utilizing a wavelet transformation mode to obtain a time sequence feature vector x * For x * Averagely dividing the vector into L sections, wherein each section is a time step, the corresponding time length is T, and the feature vector section of the ith time step is recorded as u x,i X is then * ={u x,1 ,u x,2 ,…, x,L };
For a video stimulus, extracting L images of the video according to a time interval T, extracting features of the images by using a ResNet network pre-trained on an ImageNet data set as features of the stimulus content at the time step, and recording feature vectors of an ith picture as u s,i Finally, the time sequence characteristic vector of the stimulus s is recorded as s * ={u s,1 ,u s,2 ,…,u s,L }。
For the input modality k, the feature vector corresponding to the time step i is u k,i . Attention mechanism-based multi-mode information fusion model pair feature vector u k,i Weighted fusion is carried out to generate a new fusion vector g i The method of (2) is as follows:
wherein W is h And b h Respectively a weight matrix and an offset vector of the decoder network. With current input u k,i And decoder hidden state h i-1 To determine the importance of each element in the input vector. d, d k,i Is u k,i The calculation method is as follows:
d k,i =W uk u k,i +b uk (2)
multi-modal attention weight alpha k,i Calculated by the following formula:
here the number of the elements is the number,
v k,i is calculated at the moment i and the current state h of the decoder i-1 Related modality kth eigenvector u k,i For measuring the vector u k,i Importance to the prediction results. W (W) B 、V Bk Is a weight matrix, w B As weight vector, b Bk Is a bias vector.
(3) Training fusion vector classification model
Fusion result g= [ G ] 1 g 2 …g L ] T Predicting as input of the neural network full-connection layer, normalizing the result by using Softmax function, and outputting a probability vector o
o=softmax(z)=softmax(W T G) (5)
Wherein, the liquid crystal display device comprises a liquid crystal display device,
we use the cross entropy based on softmax as a loss function, defined as follows:
wherein N is the number of samples, o j And for the j-th value of the output vector o, yi is the position of the i-th sample real label in the output vector o, and the network weight training model is updated by minimizing the loss function.
(4) Classification using models
And collecting video related to global epidemic situation on the twitter as video to be classified, and downloading the video to the local. And for videos to be classified, wearing an electroencephalogram scanner for the subject, adjusting the channel position to keep signals stable, enabling the subject to watch the videos, and simultaneously collecting electroencephalogram signals of the subject.
And (3) respectively extracting video features and electroencephalogram features according to the method in the step (2), and generating fusion vectors by adopting a multi-mode feature fusion method based on an attention mechanism. And inputting the vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as the video emotion classification result.

Claims (6)

1. The video emotion classification method integrating the electroencephalogram and stimulus information is characterized by comprising the following steps of:
step 1, constructing a stimulus-electroencephalogram signal data set: watching the video clips, and collecting brain electrical signals of a subject when watching the video by using an brain electrical scanner; constructing a stimulus source-electroencephalogram signal data set according to the video tag, the video content and the electroencephalogram signal of the subject;
step 2, constructing a multi-mode feature fusion model: for a data set, extracting video features and electroencephalogram signal features, and respectively representing the features of two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
step 3, training a fusion vector classification model: predicting the generated fusion vector as the input of the neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing training of the model after the network is stable;
step 4, classifying by using the model: for videos to be classified, acquiring brain electrical signals of a subject when watching the videos; extracting video features and electroencephalogram features, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as a video emotion classification result;
in step 2, the timing feature vector:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images according to a time interval of 1s for video data of the training set, respectively extracting features of the images by using a ResNet network, and splicing the features according to time steps to obtain a time sequence feature vector; for the electroencephalogram signals, performing feature extraction in a wavelet transformation mode to obtain time sequence feature vectors;
in the step 2, organically combining the multi-mode feature vectors through a multi-mode information fusion method based on an attention mechanism to generate a time sequence fusion vector; the fusion model adopts an RNN-based cyclic neural network encoder-decoder structure, for each time step, the encoder carries out weighted fusion on multi-mode time sequence feature data to form an intermediate semantic representation, the semantic representation is input into a decoder network, fusion features of the current time step are obtained through decoding, and the fusion weights of the features are updated by the hidden state of the decoder, so that an attention mechanism is realized; and performing step operation on all time steps to finally obtain a time sequence fusion vector.
2. The method for classifying the video emotion fusing electroencephalogram and stimulus information according to claim 1, wherein the construction of the stimulus-electroencephalogram data set is specifically as follows:
collecting video, wearing a 62-channel electroencephalogram scanner for a subject, enabling the subject to watch a stimulus video after the signal is stable, and collecting electroencephalogram signals of the subject; and cleaning the acquired electroencephalogram data, storing the video tag, the content and the electroencephalogram signals of the subject into a database, and constructing a stimulus source-electroencephalogram signal data set.
3. The method for classifying the emotion of the video fused with the electroencephalogram and stimulus information according to claim 2, wherein video clips from the internet are collected, the video clips comprise videos with positive emotion, negative emotion and neutral emotion, the number of the videos is the same, and the duration of each video clip is 3-5 minutes.
4. The method for classifying video emotion fusing electroencephalogram and stimulus information according to claim 1, wherein in step 3, training the fused vector classification model is specifically:
predicting the time sequence fusion vector generated in the last step as the input of the neural network full-connection layer, and normalizing the result by using a Softmax function; and using cross entropy based on Softmax as a loss function, wherein the lower the cross entropy is, the closer the prediction result is to the label, using a random gradient descent method to update the weight of the neural network, training the neural network, and completing training of the model after the network is stable.
5. The method for classifying video emotion integrating electroencephalogram and stimulus information according to claim 1, wherein in step 4, classification by using a model is specifically:
for videos to be classified, a 62-channel electroencephalogram scanner is worn on a subject, the subject watches the videos after signals are stable, and meanwhile, electroencephalogram signals of the subject are collected; after the acquisition is finished, data cleaning is carried out, and the electroencephalogram signals and the video information are input into a multi-mode feature fusion model to generate a time sequence fusion vector; and inputting the fusion vector into a classification model for classification, obtaining the probability of various emotion categories through a Softmax function according to the output result, and selecting the emotion category with the highest probability as the video emotion classification result.
6. A video emotion classification system integrating electroencephalogram and stimulus information, characterized in that the video emotion classification method integrating electroencephalogram and stimulus information based on any one of claims 1 to 5 comprises
The stimulus source-electroencephalogram data set construction module is used for constructing a stimulus source-electroencephalogram data set according to the video tag, the video content and the electroencephalogram of the subject;
the multi-mode feature fusion model construction module is used for extracting video features and electroencephalogram signal features for a data set, and respectively representing the features of the two modes as time sequence feature vectors; for the multi-mode time sequence feature vector, generating a fusion vector by adopting a multi-mode information fusion method based on an attention mechanism;
the fusion vector classification model training module is used for predicting the generated fusion vector as the input of the neural network full-connection layer; updating the weight of the neural network according to the difference between the prediction result and the real label, training the neural network, and finishing training of the model after the network is stable;
the model classification module is used for collecting brain electrical signals of the video to be classified when the subject watches the video; extracting video features and electroencephalogram features, and generating fusion vectors by adopting a multi-mode information fusion method based on an attention mechanism; inputting the fusion vector into a trained neural network, wherein the output vector of the network is the probability of various emotion categories, and selecting the emotion category with the highest probability as a video emotion classification result;
timing feature vector:
dividing a stimulus source-electroencephalogram signal data set into a training set and a testing set, extracting video images according to a time interval of 1s for video data of the training set, respectively extracting features of the images by using a ResNet network, and splicing the features according to time steps to obtain a time sequence feature vector; for the electroencephalogram signals, performing feature extraction in a wavelet transformation mode to obtain time sequence feature vectors;
organically combining the multi-mode feature vectors through a multi-mode information fusion method based on an attention mechanism to generate a time sequence fusion vector; the fusion model adopts an RNN-based cyclic neural network encoder-decoder structure, for each time step, the encoder carries out weighted fusion on multi-mode time sequence feature data to form an intermediate semantic representation, the semantic representation is input into a decoder network, fusion features of the current time step are obtained through decoding, and the fusion weights of the features are updated by the hidden state of the decoder, so that an attention mechanism is realized; and performing step operation on all time steps to finally obtain a time sequence fusion vector.
CN202110442820.6A 2021-04-23 2021-04-23 Video emotion classification method and system integrating electroencephalogram and stimulus information Active CN113095428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110442820.6A CN113095428B (en) 2021-04-23 2021-04-23 Video emotion classification method and system integrating electroencephalogram and stimulus information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110442820.6A CN113095428B (en) 2021-04-23 2021-04-23 Video emotion classification method and system integrating electroencephalogram and stimulus information

Publications (2)

Publication Number Publication Date
CN113095428A CN113095428A (en) 2021-07-09
CN113095428B true CN113095428B (en) 2023-09-19

Family

ID=76679737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110442820.6A Active CN113095428B (en) 2021-04-23 2021-04-23 Video emotion classification method and system integrating electroencephalogram and stimulus information

Country Status (1)

Country Link
CN (1) CN113095428B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779934B (en) * 2021-08-13 2024-04-26 远光软件股份有限公司 Multi-mode information extraction method, device, equipment and computer readable storage medium
CN113627447B (en) * 2021-10-13 2022-02-08 腾讯科技(深圳)有限公司 Label identification method, label identification device, computer equipment, storage medium and program product
CN113974658B (en) * 2021-10-28 2024-01-26 天津大学 Semantic visual image classification method and device based on EEG time-sharing frequency spectrum Riemann
CN113988201B (en) * 2021-11-03 2024-04-26 哈尔滨工程大学 Multi-mode emotion classification method based on neural network
CN114366107A (en) * 2022-02-23 2022-04-19 天津理工大学 Cross-media data emotion recognition method based on facial expressions and electroencephalogram signals
CN114722950B (en) * 2022-04-14 2023-11-07 武汉大学 Multi-mode multi-variable time sequence automatic classification method and device
CN115381467B (en) * 2022-10-31 2023-03-10 浙江浙大西投脑机智能科技有限公司 Attention mechanism-based time-frequency information dynamic fusion decoding method and device
CN116049743B (en) * 2022-12-14 2023-10-31 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium
CN115985505B (en) * 2023-01-19 2023-12-12 北京未磁科技有限公司 Multidimensional fusion myocardial ischemia auxiliary diagnosis model and construction method thereof
CN115778330A (en) * 2023-02-07 2023-03-14 之江实验室 Automatic epileptic seizure detection system and device based on video electroencephalogram
CN116304643B (en) * 2023-05-18 2023-08-11 中国第一汽车股份有限公司 Mental load detection and model training method, device, equipment and storage medium
JP7428337B1 (en) 2023-06-30 2024-02-06 Vie株式会社 Information processing method, information processing device, and program
CN117615210A (en) * 2023-11-21 2024-02-27 南湖脑机交叉研究院 User experience quality determining method and device
CN117426774B (en) * 2023-12-21 2024-04-09 深圳腾信百纳科技有限公司 User emotion assessment method and system based on intelligent bracelet

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN109815903A (en) * 2019-01-24 2019-05-28 同济大学 A kind of video feeling classification method based on adaptive converged network
CN110515456A (en) * 2019-08-14 2019-11-29 东南大学 EEG signals emotion method of discrimination and device based on attention mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334689B (en) * 2019-07-16 2022-02-15 北京百度网讯科技有限公司 Video classification method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805087A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Semantic temporal fusion association based on multi-modal Emotion identification system judges subsystem
CN109815903A (en) * 2019-01-24 2019-05-28 同济大学 A kind of video feeling classification method based on adaptive converged network
CN110515456A (en) * 2019-08-14 2019-11-29 东南大学 EEG signals emotion method of discrimination and device based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李幼军 ; 黄佳进 ; 王海渊 ; 钟宁 ; .基于SAE和LSTM RNN的多模态生理信号融合和情感识别研究.通信学报.2017,(第12期),全文. *

Also Published As

Publication number Publication date
CN113095428A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN113095428B (en) Video emotion classification method and system integrating electroencephalogram and stimulus information
Kossaifi et al. Sewa db: A rich database for audio-visual emotion and sentiment research in the wild
CN111209440B (en) Video playing method, device and storage medium
US20170289619A1 (en) Method for positioning video, terminal apparatus and cloud server
CN106255866B (en) Communication system, control method and storage medium
CN112954312B (en) Non-reference video quality assessment method integrating space-time characteristics
CN111274440B (en) Video recommendation method based on visual and audio content relevancy mining
US20180302686A1 (en) Personalizing closed captions for video content
CN102244788B (en) Information processing method, information processor and loss recovery information generation device
CN109815903A (en) A kind of video feeling classification method based on adaptive converged network
CN110019961A (en) Method for processing video frequency and device, for the device of video processing
JP2011215964A (en) Server apparatus, client apparatus, content recommendation method and program
CN113642604B (en) Audio-video auxiliary touch signal reconstruction method based on cloud edge cooperation
CN116484318B (en) Lecture training feedback method, lecture training feedback device and storage medium
CN110929158A (en) Content recommendation method, system, storage medium and terminal equipment
CN111432282B (en) Video recommendation method and device
CN113395578A (en) Method, device and equipment for extracting video theme text and storage medium
WO2021066530A1 (en) Co-informatic generative adversarial networks for efficient data co-clustering
CN113239159A (en) Cross-modal retrieval method of videos and texts based on relational inference network
CN104866490B (en) A kind of video intelligent recommended method and its system
CN110309753A (en) A kind of race process method of discrimination, device and computer equipment
CN112073757B (en) Emotion fluctuation index acquisition method, emotion fluctuation index display method and multimedia content production method
CN113407778A (en) Label identification method and device
Wu et al. Cold start problem for automated live video comments
CN111160124A (en) Depth model customization method based on knowledge reorganization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant