CN113705703A - Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism - Google Patents

Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism Download PDF

Info

Publication number
CN113705703A
CN113705703A CN202111021378.6A CN202111021378A CN113705703A CN 113705703 A CN113705703 A CN 113705703A CN 202111021378 A CN202111021378 A CN 202111021378A CN 113705703 A CN113705703 A CN 113705703A
Authority
CN
China
Prior art keywords
text
vector
picture
data
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111021378.6A
Other languages
Chinese (zh)
Inventor
金勇�
胡林利
陈宏明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN YANGTZE COMMUNICATIONS INDUSTRY GROUP CO LTD
Original Assignee
WUHAN YANGTZE COMMUNICATIONS INDUSTRY GROUP CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN YANGTZE COMMUNICATIONS INDUSTRY GROUP CO LTD filed Critical WUHAN YANGTZE COMMUNICATIONS INDUSTRY GROUP CO LTD
Priority to CN202111021378.6A priority Critical patent/CN113705703A/en
Publication of CN113705703A publication Critical patent/CN113705703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a graph-text multi-modal emotion recognition method based on BilSTM and an attention mechanism, which comprises the following steps of: collecting text data and picture data; vector preprocessing, wherein a text and a picture are independently subjected to vector expression; the text vector and the picture vector are respectively subjected to combined training of an attention mechanism attention and a GRU model; and combining the vectors of the text and the picture to identify a final comprehensive result through a softmax function. The method comprises the steps of preprocessing texts and pictures respectively by adopting WORD2VEC and CNN technologies to obtain preliminary vector expressions, performing cross training by adopting a BilSTM, GRU and attention mechanism, and fusing results to a softmax layer to perform final supervised label recognition. The experiment carries out model training analysis on more than 19000 pieces of data (each piece of data comprises texts and pictures), and the result proves that the machine learning effect of fusing the pictures and the characters is better.

Description

Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism
Technical Field
The invention relates to the technical field of machine learning and multi-modal emotion analysis, in particular to a graph-text multi-modal emotion recognition method based on BilSTM and attention mechanism.
Background
Nowadays, social network technology platforms are rapidly developed, people express more and more abundantly on the network platforms, including videos, characters, pictures, audios and other forms, and particularly, many people express their own opinions and moods through the pictures and the characters. How to analyze emotion in multi-modal data is an important topic in the field of current machine learning.
Compared with single-mode data, multiple modes contain more effective information, and the information can be supplemented with each other, for example, a piece of friend circle information of 'weather today' is identified, and a picture with a laughter is matched below, so that the emotional viewpoint cannot be directly identified from the text point of view, but the picture with the laughter can be basically determined to be a negative emotion. Therefore, different modalities can complement each other in the aspect of emotional expression, and the analysis and recognition effects combined with the combination are better. In addition, current emotion analysis methods mainly focus on single modality, for example, text emotion analysis mainly performs recognition analysis on emotion in characters, and involves less picture or audio-video analysis. The present invention is therefore directed to teletext emotion recognition analysis for multimodal data comprising pictures and text.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a graphics and text multi-mode emotion recognition method based on a BilSTM and an attention mechanism, which aims to perform emotion recognition analysis by adopting a depth network model combining the BilSTM, GRU and the attention mechanism, and solves the problems that the existing single-mode data is little in acquired effective information and affects emotion recognition accuracy.
The invention provides the following technical scheme: the image-text multi-modal emotion recognition method based on the BilSTM and the attention mechanism comprises the following steps:
s1, collecting data, namely collecting text data and picture data in a period of time;
s2, vector preprocessing, wherein the text and the picture are independently subjected to vector expression, and the text vector and the picture vector are subjected to feature tuning training through a bidirectional LSTM model;
s3, the text vector and the picture vector are respectively trained by combining an attention model and a GRU model through an attention mechanism, and the result of the text training and the result of the picture training are subjected to cross influence to respectively obtain implicit expression vectors of the text and the picture;
and S4, combining the vectors of the text and the picture, and identifying a final comprehensive result through a softmax function.
Preferably, the data obtained by crawling in step S1 needs to undergo data cleaning, data integration, and manual labeling to form a training set, a verification set, and a test set, and the result label of the labeling is: positive, negative, neutral.
Preferably, the data set ratio in step S1 is: the proportion of the training set, the verification set and the test set is 7:2:1, and different models are obtained by combining different parameter configuration training.
Preferably, the vector expression of the text in step S2 is represented by W1,W2,…,WkForming an embedding vector; the vector expression of the picture is a vector which is preprocessed by a rolling machine neural network after the pixel expression of the picture.
The invention provides a graph-text multi-modal emotion recognition method based on a BilSTM and an attention mechanism, which comprises the steps of preprocessing a text and a picture respectively by adopting WORD2VEC and CNN technologies to obtain a primary vector expression, then performing cross training by adopting the BilSTM, GRU and attention mechanism, and fusing the result to a softmax layer to perform final supervised label recognition. The experiment carries out model training analysis on more than 19000 pieces of data (each piece of data comprises texts and pictures), and the result proves that the machine learning effect of fusing the pictures and the characters is better.
Drawings
FIG. 1 is a schematic diagram of a teletext multi-modal processing scheme according to the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, in order to analyze data of a plurality of modalities, information complementation between the plurality of modalities is realized. In the project, a lower graph interactive deep learning framework is adopted for image-text multi-mode emotion recognition, a model setting picture and characters can independently express emotion (independently used as vector expression) or be mutually used as an auxiliary part to supplement and express emotion (vector interaction), then vector combination is used as comprehensive characteristics of images and texts, and finally, emotion comprehensive results are recognized through a softmax function.
The invention provides a technical scheme that: the image-text multi-modal emotion recognition method based on the BilSTM and the attention mechanism comprises the following steps:
s1, collecting data, namely collecting text data and picture data in a period of time; the data after crawling need form training set, verification set and test set through data cleaning, data integration and manual labeling, and the result label of labeling is: positive, negative, neutral.
The proportion of the data set for model training is: the proportion of the training set, the verification set and the test set is 7:2:1, and different models are obtained by combining different parameter configuration training.
S2, vector preprocessing, wherein the text and the picture are independently subjected to vector expression, and the text vector and the picture vector are subjected to feature tuning training through a bidirectional LSTM model;
the vector representation of the text is represented as W1,W2,…,WkForming an embedding vector; the vector expression of the picture is a vector which is preprocessed by a rolling machine neural network after the pixel expression of the picture.
And S3, the text vector and the picture vector are respectively subjected to combined training of attentions and GRU models, the result of the text training and the result of the picture training are subjected to cross influence, and implicit expression vectors of the text and the picture, namely a hidden text vector (hidden text vector) and a hidden picture vector (hidden image vector), are respectively obtained.
As shown in fig. 1, that is, for the right picture, the result obtained by the left text passing through ATT + GRU is blended into the result of ATT + GRU of the right picture with a lower weight, and then input into the ATT + GRU layer of the next layer; similar processing is performed on the left text, and the result obtained by the right picture is merged into the next ATT + GRU layer of text analysis with lower weight.
And S4, combining the vectors of the text and the picture, and identifying a final comprehensive result through a softmax function.
Example (b):
first, data crawlers. With the monitoring of network emotion of new coronavirus as background, using a Python script data acquisition tool to crawl different network sites such as news, microblogs, forums, WeChat public numbers and the like, wherein the main keywords of the crawler are as follows: new corona, pneumonia, epidemic, coronavirus, etc. The crawled data needs to be subjected to data cleaning, data integration and manual marking (the marked result labels are positive, negative and neutral), and a training set, a verification set and a test set are formed.
And secondly, parameter configuration. The deep learning frame comprises parameters such as the dimensionality of embeddings, the CNN convolution number, the unit number of LSTM, the layer number of ATT and GRU, softmax regularization, network learning rate and the like for configuration.
And thirdly, vector preprocessing. WORD vector training is carried out on the text by using WORD2VEC, and the picture is processed by using CNN to obtain a preliminary characteristic vector.
And fourthly, training a model. The proportion of the data set during the experiment was: the proportion of the training set, the verification set and the test set is 7:2: 1. And different models are obtained by combining different parameter configuration training.
And fifthly, evaluating the result. And comparing training results of different models, and comprehensively analyzing the accuracy and performance indexes of the models to obtain a model design scheme for industrial application.
The invention carries out emotion recognition analysis aiming at image-text multi-mode data, and integrates complementary information of images and characters in network public sentiment to develop an image-text multi-mode emotion recognition model framework in order to avoid limitation caused by single mode.
The method comprises the steps of preprocessing texts and pictures respectively by adopting WORD2VEC and CNN technologies to obtain preliminary vector expressions, performing cross training by adopting a BilSTM, GRU and attention mechanism, and fusing results to a softmax layer to perform final supervised label recognition. The experiment carries out model training analysis on more than 19000 pieces of data (each piece of data comprises texts and pictures), and the result proves that the machine learning effect of fusing the pictures and the characters is good.
By researching the image-text multi-mode emotion recognition method, the invention realizes the integrated emotion recognition of the network public opinion images and characters, and provides a certain technical basis for the automatic cognitive recognition of social emotion.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (4)

1. The image-text multi-modal emotion recognition method based on the BilSTM and the attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
s1, collecting data, namely collecting text data and picture data in a period of time;
s2, vector preprocessing, wherein the text and the picture are independently subjected to vector expression, and the text vector and the picture vector are subjected to feature tuning training through a bidirectional LSTM model;
s3, the text vector and the picture vector are respectively trained by combining an attention model and a GRU model through an attention mechanism, and the result of the text training and the result of the picture training are subjected to cross influence to respectively obtain implicit expression vectors of the text and the picture;
and S4, combining the vectors of the text and the picture, and identifying a final comprehensive result through a softmax function.
2. The teletext multi-modal emotion recognition method based on the BilSTM and attention mechanism of claim 1, wherein: the data crawled in the step S1 need to undergo data cleaning, data integration, and manual labeling to form a training set, a verification set, and a test set, and the result labels of the labeling are: positive, negative, neutral.
3. The teletext multi-modal emotion recognition method based on the BilSTM and attention mechanism of claim 2, wherein: the data set proportion in step S1 is: the proportion of the training set, the verification set and the test set is 7:2:1, and different models are obtained by combining different parameter configuration training.
4. The teletext multi-modal emotion recognition method based on the BilSTM and attention mechanism of claim 1, wherein: the vector expression of the text in the step S2 is represented as W1,W2,…,WkForming an embedding vector; the vector expression of the picture is a vector which is preprocessed by a rolling machine neural network after the pixel expression of the picture.
CN202111021378.6A 2021-09-01 2021-09-01 Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism Pending CN113705703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111021378.6A CN113705703A (en) 2021-09-01 2021-09-01 Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111021378.6A CN113705703A (en) 2021-09-01 2021-09-01 Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism

Publications (1)

Publication Number Publication Date
CN113705703A true CN113705703A (en) 2021-11-26

Family

ID=78658820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111021378.6A Pending CN113705703A (en) 2021-09-01 2021-09-01 Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism

Country Status (1)

Country Link
CN (1) CN113705703A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049743A (en) * 2022-12-14 2023-05-02 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049743A (en) * 2022-12-14 2023-05-02 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium
CN116049743B (en) * 2022-12-14 2023-10-31 深圳市仰和技术有限公司 Cognitive recognition method based on multi-modal data, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107463609B (en) Method for solving video question-answering by using layered space-time attention codec network mechanism
CN111488931B (en) Article quality evaluation method, article recommendation method and corresponding devices
CN101634996A (en) Individualized video sequencing method based on comprehensive consideration
CN111311364B (en) Commodity recommendation method and system based on multi-mode commodity comment analysis
CN110705490B (en) Visual emotion recognition method
CN113297370A (en) End-to-end multi-modal question-answering method and system based on multi-interaction attention
CN111325571A (en) Method, device and system for automatically generating commodity comment labels for multitask learning
Zhao et al. Flexible presentation of videos based on affective content analysis
CN117891940B (en) Multi-modal irony detection method, apparatus, computer device, and storage medium
CN111563373A (en) Attribute-level emotion classification method for focused attribute-related text
CN114170411A (en) Picture emotion recognition method integrating multi-scale information
CN113987167A (en) Dependency perception graph convolutional network-based aspect-level emotion classification method and system
Lan et al. Image aesthetics assessment based on hypernetwork of emotion fusion
CN113705703A (en) Image-text multi-modal emotion recognition method based on BilSTM and attention mechanism
CN118468882A (en) Deep multi-mode emotion analysis method based on image-text interaction information and multi-mode emotion influence factors
CN117150320B (en) Dialog digital human emotion style similarity evaluation method and system
CN111859925B (en) Emotion analysis system and method based on probability emotion dictionary
CN114219514A (en) Illegal advertisement identification method and device and electronic equipment
CN118115781A (en) Label identification method, system, equipment and storage medium based on multi-mode model
CN112132075A (en) Method and medium for processing image-text content
CN117114745A (en) Method and device for predicting intent vehicle model
CN114297390B (en) Aspect category identification method and system in long tail distribution scene
CN111340329A (en) Actor assessment method and device and electronic equipment
Deng et al. Multimodal Sentiment Analysis Based on a Cross-ModalMultihead Attention Mechanism.
Zhao et al. Supplementing Missing Visions via Dialog for Scene Graph Generations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination