CN115631504B - Emotion identification method based on bimodal graph network information bottleneck - Google Patents

Emotion identification method based on bimodal graph network information bottleneck Download PDF

Info

Publication number
CN115631504B
CN115631504B CN202211645853.1A CN202211645853A CN115631504B CN 115631504 B CN115631504 B CN 115631504B CN 202211645853 A CN202211645853 A CN 202211645853A CN 115631504 B CN115631504 B CN 115631504B
Authority
CN
China
Prior art keywords
graph
bimodal
text
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211645853.1A
Other languages
Chinese (zh)
Other versions
CN115631504A (en
Inventor
李丽
李平
苟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202211645853.1A priority Critical patent/CN115631504B/en
Publication of CN115631504A publication Critical patent/CN115631504A/en
Application granted granted Critical
Publication of CN115631504B publication Critical patent/CN115631504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the steps of preprocessing data, and respectively coding pictures and texts through corresponding pre-training models; respectively extracting the characteristics of the text and the image by using a long-short term memory network and a feedforward neural network; constructing a topological graph in the modes based on the grammar dependency relationship and the adjacent position relationship of the visual blocks, and constructing a bimodal topological graph based on a complete bipartite graph; designing a modal interaction module based on a bimodal graph network, and realizing information interaction in and among the modalities by utilizing a graph convolution network; converting node representation of the bimodal topological graph into graph representation through a graph pooling technology; and (4) performing bimodal emotion recognition by adopting a multilayer perceptron. In addition, an information bottleneck module is established, and the generalization capability of the method is improved. The emotion recognition method based on the bimodal graph network information bottleneck can effectively fuse modal information and is used for guiding emotion recognition.

Description

Emotion identification method based on bimodal graph network information bottleneck
Technical Field
The invention belongs to the field of bimodal emotion recognition in the fields of natural language processing and vision intersection, and particularly relates to an emotion recognition method based on bimodal graph network information bottleneck.
Background
The emotion recognition aims at mining subjective information in data by using a natural language processing technology, and is widely applied to various fields, such as: financial market forecasting, business review analysis, and the like. With the rapid development of internet technology, information in the internet gradually changes from plain text to bimodal, so that the existing emotion analysis method faces new challenges and opportunities. How to effectively extract and fuse features from bimodal data is key to bimodal emotion characterization.
General bimodal emotion recognition can be realized by splicing, adding and calculating Hadamard products of all monomodal features, but correlation among the modals cannot be obtained in the mode. Recently, a cross attention mechanism method is introduced to enhance the feature fusion of bimodal data; however, cross-attention merely establishes the association of global semantics of one modality with local features on another modality, and is not sufficient to reflect the alignment relationship of the modalities on the local features, and using a global feature representation of a modality for semantic alignment may generate a large noise. Furthermore, attention-based methods have another drawback, and such methods typically require careful attention patterns, such as: multi-layer/multi-pass attention, multi-layer attention will introduce more parameters, increasing the likelihood of overfitting.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an emotion recognition method based on bimodal graph network information bottleneck, and decomposes data of each modality into semantic units with fine granularity, such as: the text word and image visual block establishes the relation between the bimodal fine-grained semantic units by utilizing the relevance in each modality and among the modalities, so that bimodal feature fusion is directly performed among the fine-grained semantic units, namely, a mapping relation is established for the representation information of each modality by adopting a local alignment local mode, and the semantic information of a text and the local information of an image can be fully fused. In addition, an information bottleneck mechanism is added, so that the generalization capability of the method can be effectively improved.
In order to realize the purpose, the invention adopts the following technical scheme:
s1: preprocessing data, processing the text by adopting a word embedding technology Glove to obtain a text embedding matrix
Figure SMS_1
(ii) a The image is processed using an image processing technique ResNet152, whereby the image is cut to ≦ before processing>
Figure SMS_2
A vision block resulting in an image representation matrix->
Figure SMS_3
(ii) a Wherein it is present>
Figure SMS_4
Indicating the number of visual blocks.
S2: extracting the features of the preprocessed embedded expression, and extracting the text features by using a bidirectional long-short term memory network
Figure SMS_5
Extracting image features using a feed-forward neural network>
Figure SMS_6
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image. The specific operation is as follows:
s31, constructing a topological graph in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges
Figure SMS_7
。/>
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges
Figure SMS_8
S33: taking words in a text and a visual block in an image as two groups of nodes, forming a non-directional edge by any node in the words and each node in the visual block, and constructing a complete bipartite graph as a dual-mode topological graph
Figure SMS_9
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transfer mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes. The specific operation is as follows:
s41: topological graph in text mode
Figure SMS_10
For an adjacency matrix, S2The obtained text features are word node feature vectors, expression learning of word nodes is carried out through a graph convolution network, information interaction in a text mode is achieved, and the calculation formula is as follows:
Figure SMS_11
in the above-mentioned formula, the compound has the following structure,
Figure SMS_12
is trainable and takes place for a parameter>
Figure SMS_13
The function is activated for sigmoid.
S42: in topological graph in image mode
Figure SMS_14
The image features extracted in the step 2 are visual block node feature vectors, the visual block nodes are represented and learned through a graph convolution network, information interaction in an image mode is achieved, and the calculation formula is as follows:
Figure SMS_15
in the above-mentioned formula, the compound has the following structure,
Figure SMS_16
is trainable and takes place for a parameter>
Figure SMS_17
The function is activated for sigmoid.
S43: in a bimodal topology
Figure SMS_18
As an adjacency matrix, the text and image features extracted by splicing S2 are node feature vectors ^ and ^>
Figure SMS_19
Information aggregation is carried out through a graph convolution network, information fusion between modes is achieved, and a calculation formula is as follows:
Figure SMS_20
in the above formula, the first and second carbon atoms are,
Figure SMS_21
is trainable and takes place for a parameter>
Figure SMS_22
The function is activated for sigmoid.
S44: loops S41-S43 are set according to the specific parameters of the model.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved. The specific operation is as follows:
s51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module
Figure SMS_23
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module
Figure SMS_24
S53: splicing the text representation and the image representation after the modal interaction based on the bimodal graph network S4 to serve as the output characteristic of the information bottleneck module
Figure SMS_25
。/>
S54: the goal of information bottlenecks is to reduce
Figure SMS_26
And/or>
Figure SMS_27
In between, increase->
Figure SMS_28
And/or>
Figure SMS_29
The calculation formula is as follows:
Figure SMS_30
in the above-mentioned formula, the compound has the following structure,
Figure SMS_33
target for which optimization is required for the information bottleneck module, <' >>
Figure SMS_36
For the parameters of the emotion recognition method based on the bimodal graph network information bottleneck, be->
Figure SMS_38
Is->
Figure SMS_32
And &>
Figure SMS_35
The mutual information between the two groups is obtained,
Figure SMS_37
is->
Figure SMS_39
And/or>
Figure SMS_31
In between, based on mutual information->
Figure SMS_34
Is an adjustable factor.
S6: obtaining a graph representation vector by adopting a graph pooling technology represented by all nodes in the spliced bimodal topological graph, wherein a calculation formula is as follows:
Figure SMS_40
in the above formula, the first and second carbon atoms are,
Figure SMS_41
representation of a graph representation resulting from the representation of all nodes of a stitched text and visual block, and->
Figure SMS_42
For all nodes in the bimodal topology map, ->
Figure SMS_43
Is node after S4->
Figure SMS_44
Is shown.
S7: and identifying bimodal emotional tendency by using a multi-layer perceptron as a classifier.
S8: the model is trained through bimodal data, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. The training goals for the model are as follows:
Figure SMS_45
in the above formula, the first and second carbon atoms are,
Figure SMS_46
for a sample in the training set, ->
Figure SMS_47
For the set of all training samples, <' >>
Figure SMS_48
Is adjustable factor>
Figure SMS_49
For the parameters of the emotion recognition method based on the bimodal graph network information bottleneck, be->
Figure SMS_50
Is the true value of the sample or samples,
Figure SMS_51
is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
Compared with the existing bimodal emotion recognition method, the emotion recognition method based on the bimodal graph network information bottleneck has the following beneficial effects:
1. forming a bimodal topological graph by the text words and the visual blocks, and utilizing grammatical information of the text and spatial position information of the image;
2. the bi-modal topological graph establishes the relation between the bi-modal fine-grained semantic units, so that the multi-modal feature fusion is directly carried out between the fine-grained semantic units, the semantic information of texts and the local information of images can be fully fused, and the defects of the existing method are greatly supplemented;
3. by utilizing an information bottleneck mechanism, the generalization capability of the method is effectively improved.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a diagram of a system model of the present invention;
FIG. 3 is a building block of a bimodal topology of the present invention.
Detailed Description
In order that the public may better understand the present invention, specific embodiments thereof will be described below with reference to the accompanying drawings. Wherein the drawings are for illustrative purposes only and are not to be construed as limiting the invention; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the following steps:
s1: and (3) data preprocessing, namely preprocessing the text and the image respectively through corresponding pre-training models.
As shown in FIG. 1, the text and image in the bimodal data are first separated and then separatedText and images are preprocessed. For texts, the representation of words is searched in pre-trained Glove, each word is mapped to a 300-dimensional vector, and a text embedding matrix is obtained
Figure SMS_52
(ii) a For images, cut first into->
Figure SMS_53
Processing each visual block by adopting an image processing technology ResNet152, processing each visual block into a representation vector with 1024 dimensions, and finally obtaining an image embedding matrix ^ and ^ according to the image>
Figure SMS_54
(ii) a Wherein +>
Figure SMS_55
Indicating the number of visual blocks.
S2: and performing feature extraction on the preprocessed embedded representation.
As shown in fig. 1, feature extraction is performed on the text embedding and the image embedding obtained in S1, respectively.
Because the text has a front-back order relation, in order to integrate more context information into word embedding, a bidirectional long-short early-stage memory network is adopted to carry out context semantic dependency learning, and text characteristics are extracted
Figure SMS_56
. The specific calculation formula is as follows:
Figure SMS_57
Figure SMS_58
Figure SMS_59
Figure SMS_60
Figure SMS_61
Figure SMS_62
in the above-mentioned formula, the compound has the following structure,
Figure SMS_68
for forgetting to close the door>
Figure SMS_67
Is a input door, is>
Figure SMS_72
Is an output gate, which is arranged in the interior of the housing>
Figure SMS_65
Is a candidate value vector, is->
Figure SMS_73
Is a memory cell at the previous moment>
Figure SMS_64
For memory cells at the present moment>
Figure SMS_76
Is a representation of the hidden state at the last moment,
Figure SMS_75
for a hidden status representation at the present time>
Figure SMS_79
、/>
Figure SMS_63
、/>
Figure SMS_71
、/>
Figure SMS_69
And &>
Figure SMS_78
、/>
Figure SMS_70
、/>
Figure SMS_74
、/>
Figure SMS_66
Indicating a trainable parameter, subscript @, of a long and short term memory network>
Figure SMS_77
Representing the index of the position of the current word in the text.
Because no sequence features exist among visual blocks of the image, the feedforward neural network is adopted to extract the image features
Figure SMS_80
. The specific calculation formula is as follows:
Figure SMS_81
in the above formula, the first and second carbon atoms are,
Figure SMS_82
trainable parameters representing a feed forward neural network.
To facilitate the implementation of subsequent feature fusion, text features
Figure SMS_83
And an image feature>
Figure SMS_84
Is set to 128.
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image.
In order to solve the defects of the prior art, the alignment relation of each modality on local features is reflected. As shown in fig. 3, this step will construct three topologies, namely: two intra-modal topographies and one bi-modal topographies, the operation is as follows.
S31: for the text modality, complex grammatical dependencies exist between words, and modeling grammatical dependencies facilitate learning of text information. Therefore, a topological graph in a text mode is constructed by taking words in the text as nodes and grammar dependence relations in the dependence tree as undirected edges
Figure SMS_85
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges
Figure SMS_86
S33: establishing the relation between bimodal fine-grained semantic units, so that bimodal feature fusion can be directly carried out between the fine-grained semantic units, namely: and establishing a mapping relation for the representation information of each mode by adopting a local alignment local mode, so that the semantic information of the text and the local information of the image are fully fused. Therefore, the words in the text and the visual blocks in the images are used as two groups of nodes, any node in the words and each node in the visual blocks form a non-directional edge, and a complete bipartite graph is constructed to be used as a dual-mode topological graph
Figure SMS_87
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transmission mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes.
As shown in fig. 2, the text features extracted in S2
Figure SMS_88
And an image feature>
Figure SMS_89
And sending the data to a bimodal graph network, and carrying out information interaction and feature fusion through a graph volume network on the basis of a topological graph constructed in the S3.
S41: topological graph in text mode
Figure SMS_90
Is adjacent to the matrix, is>
Figure SMS_91
The expression learning of the word nodes is carried out for the word node feature vectors through a graph convolution network, each word node transmits information to a neighbor word node with a grammar dependency relationship, and the information interaction in a text mode is realized, wherein the calculation formula is as follows: />
Figure SMS_92
In the above formula, the first and second carbon atoms are,
Figure SMS_93
for trainable parameters>
Figure SMS_94
The function is activated for sigmoid.
S42: topology map in image modality
Figure SMS_95
In the vicinity of a matrix>
Figure SMS_96
For the feature vectors of the visual block nodes, the representation learning of the visual block nodes is carried out through a graph convolution network, and the information transmission is carried out between the adjacent visual blocks, so as to realize the information interaction in the image modality, and the calculation formula is as follows:
Figure SMS_97
in the above formula, the first and second carbon atoms are,
Figure SMS_98
for trainable parameters>
Figure SMS_99
The function is activated for sigmoid.
S43: in a bimodal topology
Figure SMS_100
As an adjacency matrix, the text and image features extracted by splicing S2 are node feature vectors ^ and ^>
Figure SMS_101
Information aggregation is carried out through a graph convolution network, all neighbor nodes of each node belong to another mode node, so that information fusion between modes is realized, and a calculation formula is as follows:
Figure SMS_102
in the above formula, the first and second carbon atoms are,
Figure SMS_103
for trainable parameters>
Figure SMS_104
The function is activated for sigmoid.
S44: as shown in fig. 2, S41 to S43 form a convolutional network block, and after the parameter adjustment is performed on the model, a better parameter value of the layer number of the convolutional network block is obtained, and S41 to S43 are cycled according to the specific parameter value.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved.
The information bottleneck module runs through the whole process of the method, and the specific operation is as follows.
S51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module
Figure SMS_105
S52: extracting the text feature and the image feature from S2Performing splicing to obtain intermediate characteristics of the information bottleneck module
Figure SMS_106
S53: s4, splicing the text representation and the image representation after modal interaction based on the bimodal graph network, wherein the text representation and the image representation are used as the output characteristics of the information bottleneck module
Figure SMS_107
S54: the goal of information bottlenecks is to reduce
Figure SMS_108
And/or>
Figure SMS_109
In between, increase->
Figure SMS_110
And/or>
Figure SMS_111
The calculation formula is as follows:
Figure SMS_112
in the above formula, the first and second carbon atoms are,
Figure SMS_115
target for which optimization is required for the information bottleneck module, <' >>
Figure SMS_118
For the parameters of the emotion recognition method based on the bimodal graph network information bottleneck, be->
Figure SMS_120
Is->
Figure SMS_114
And/or>
Figure SMS_117
In betweenMutual information->
Figure SMS_119
Is->
Figure SMS_121
And/or>
Figure SMS_113
In between, based on the mutual information->
Figure SMS_116
Is an adjustable coefficient.
S6: a graph pooling technique is employed to convert the node representation of the bimodal topology graph into a graph representation.
The bimodal emotion recognition is to classify the overall emotional tendency of the data, and needs to combine the feature information of all nodes in the bimodal topological graph. Therefore, a graph pooling technology represented by all nodes in the spliced bimodal topological graph is adopted to obtain a graph representation vector, and a calculation formula is as follows:
Figure SMS_122
in the above formula, the first and second carbon atoms are,
Figure SMS_123
representing a graph representation vector represented by all nodes of the stitched text and the visual block, and->
Figure SMS_124
For all nodes in the bimodal topology map, ->
Figure SMS_125
Is node after S4->
Figure SMS_126
Is shown.
S7: and expressing the vector through the graph obtained in the S6, and identifying the bimodal emotional tendency by using a multilayer perceptron as a classifier, wherein a calculation formula is as follows:
Figure SMS_127
Figure SMS_128
in the above formula, the first and second carbon atoms are,
Figure SMS_129
for a final learned bimodal characterization, the>
Figure SMS_130
For the emotional tendency predicted by the model, < >>
Figure SMS_131
And
Figure SMS_132
represents a trainable weight, <' > asserted>
Figure SMS_133
And &>
Figure SMS_134
Is a trainable bias.
S8: the model is trained through the bimodal data.
In the training process, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. Wherein the training targets of the model are as follows:
Figure SMS_135
in the above formula, the first and second carbon atoms are,
Figure SMS_136
for a sample in the training set, ->
Figure SMS_137
For the set of all training samples, <' >>
Figure SMS_138
Is adjustable factor>
Figure SMS_139
For the parameters of the emotion recognition method based on the bimodal graph network information bottleneck, be->
Figure SMS_140
Is the true value of the sample or samples,
Figure SMS_141
is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the limitation of the concept and scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the design concept of the present invention shall fall within the protection scope of the present invention, and the technical contents of the present invention which are claimed are all described in the claims.

Claims (4)

1. An emotion recognition method based on bimodal graph network information bottleneck is characterized by comprising the following steps:
s1: data preprocessing, namely preprocessing a text and an image respectively through corresponding pre-training models;
s2: extracting the character of the embedded expression after the pretreatment, and extracting the text character H by using a bidirectional long-short term memory network t Extracting image features H using a feed-forward neural network v
S3: constructing a topological graph by using a syntax dependency relationship in a text and a spatial position relationship in an image;
s31, constructing a topological graph G in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges t
S32: by visual blocks in the imageAs nodes, the spatial position relation between the visual blocks is used as a non-directional edge to construct a topological graph G in an image modality v
S33: taking words in a text and a visual block in an image as two groups of nodes, forming a non-directional edge by any node in the words and each node in the visual block, and constructing a complete bipartite graph as a bimodal topological graph G m
S4: designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transfer mechanism of a graph convolution network to realize information interaction and feature fusion in and among the modes;
s41: topology G in text modality t The extracted text features are word node feature vectors, and the expression learning of word nodes is carried out through a graph convolution network, so that information interaction in a text mode is realized;
s42: in a topological graph G within an image modality v The image features extracted in S2 are visual block node feature vectors, and the representation learning of the visual block nodes is carried out through a graph convolution network, so that information interaction in an image mode is realized;
s43: in a bimodal topology G m As an adjacency matrix, splicing the text and image features extracted by S2 into a node feature vector H m =[H t ,H v ]Information aggregation is carried out through a graph convolution network, and information fusion between modes is realized;
s44: setting loops S41-S43 according to the specific parameters of the model;
s5: an information bottleneck module is established, and the generalization capability of the method is improved;
s51: splicing the text embedding and the image embedding after the data preprocessing of the S1 to obtain an input characteristic X of an information bottleneck module;
s52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features Z of the information bottleneck module;
s53: splicing the text representation and the image representation after the modal interaction based on the bimodal graph network in the S4 to serve as the output characteristic Y of the information bottleneck module;
s54: the information bottleneck aims to reduce mutual information between X and Z and increase mutual information between Z and Y, and the calculation formula is as follows:
R(θ)=I(Z,Y;θ)-βI(Z,X;θ)
in the formula, R (theta) is a target to be optimized for an information bottleneck module, theta is a parameter of an emotion identification method based on the bimodal graph network information bottleneck, I (Z, Y; theta) is mutual information between Z and Y, I (Z, X; theta) is mutual information between X and Z, and beta is an adjustable coefficient;
s6: converting the node representation of the bimodal topological graph into a graph representation by adopting a graph pooling technology;
s7: identifying bimodal emotional tendency by taking a multilayer perceptron as a classifier;
s8: training the model through the bimodal data;
s9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
2. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the S1 specifically is: processing the text by adopting a word embedding technology Glove to obtain a text embedding matrix E t (ii) a Processing the image by using an image processing technology ResNet152, cutting the image into n visual blocks before processing to obtain an image representation matrix E v (ii) a Where n represents the number of visual blocks.
3. The emotion recognition method based on bimodal graph network information bottleneck, as claimed in claim 1, wherein said S6 specifically is: obtaining a graph representation vector by adopting a graph pooling technology of all node representations in the spliced bimodal topological graph, wherein a calculation formula is as follows:
g=concat(a k |k∈G m )
in the above formula, g represents a graph representation vector obtained by splicing the text and all the nodes of the visual block, k is all the nodes in the bimodal topological graph, and a k Is the representation of node k after S4.
4. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the S8 specifically is: using a cross entropy loss function and an information bottleneck objective function as model training objectives, and using an Adam optimizer with hot start to train a model; wherein the training targets of the model are as follows:
Figure FDA0004063113850000031
in the above formula, J is a sample in the training set, J is a set of all training samples, β is an adjustable coefficient, θ is a parameter of the emotion recognition method based on the bimodal graph network information bottleneck, y j Is the true value of the sample or samples,
Figure FDA0004063113850000032
is a predicted value. />
CN202211645853.1A 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck Active CN115631504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211645853.1A CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211645853.1A CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Publications (2)

Publication Number Publication Date
CN115631504A CN115631504A (en) 2023-01-20
CN115631504B true CN115631504B (en) 2023-04-07

Family

ID=84910557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211645853.1A Active CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Country Status (1)

Country Link
CN (1) CN115631504B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304984A (en) * 2023-03-14 2023-06-23 烟台大学 Multi-modal intention recognition method and system based on contrast learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6413391B2 (en) * 2014-06-27 2018-10-31 富士通株式会社 CONVERSION DEVICE, CONVERSION PROGRAM, AND CONVERSION METHOD
CN112860888B (en) * 2021-01-26 2022-05-06 中山大学 Attention mechanism-based bimodal emotion analysis method
CN114511906A (en) * 2022-01-20 2022-05-17 重庆邮电大学 Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment
CN115363531A (en) * 2022-08-22 2022-11-22 山东师范大学 Epilepsy detection system based on bimodal electroencephalogram signal information bottleneck

Also Published As

Publication number Publication date
CN115631504A (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN108733792B (en) Entity relation extraction method
Du et al. Text classification research with attention-based recurrent neural networks
CN107992597B (en) Text structuring method for power grid fault case
CN109918671A (en) Electronic health record entity relation extraction method based on convolution loop neural network
CN111079409B (en) Emotion classification method utilizing context and aspect memory information
CN109711464A (en) Image Description Methods based on the building of stratification Attributed Relational Graps
CN113535904B (en) Aspect level emotion analysis method based on graph neural network
CN110866542A (en) Depth representation learning method based on feature controllable fusion
CN113688878B (en) Small sample image classification method based on memory mechanism and graph neural network
CN110263325A (en) Chinese automatic word-cut
CN110580287A (en) Emotion classification method based ON transfer learning and ON-LSTM
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN115221325A (en) Text classification method based on label semantic learning and attention adjustment mechanism
CN114548099B (en) Method for extracting and detecting aspect words and aspect categories jointly based on multitasking framework
CN111563143A (en) Method and device for determining new words
Li et al. Dual CNN for relation extraction with knowledge-based attention and word embeddings
CN115631504B (en) Emotion identification method based on bimodal graph network information bottleneck
CN114528374A (en) Movie comment emotion classification method and device based on graph neural network
CN113822340A (en) Image-text emotion recognition method based on attention mechanism
CN114925205B (en) GCN-GRU text classification method based on contrast learning
CN114648031A (en) Text aspect level emotion recognition method based on bidirectional LSTM and multi-head attention mechanism
CN114048314A (en) Natural language steganalysis method
CN116562286A (en) Intelligent configuration event extraction method based on mixed graph attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant