CN116258147A - Multimode comment emotion analysis method and system based on heterogram convolution - Google Patents

Multimode comment emotion analysis method and system based on heterogram convolution Download PDF

Info

Publication number
CN116258147A
CN116258147A CN202310083964.6A CN202310083964A CN116258147A CN 116258147 A CN116258147 A CN 116258147A CN 202310083964 A CN202310083964 A CN 202310083964A CN 116258147 A CN116258147 A CN 116258147A
Authority
CN
China
Prior art keywords
vector
knowledge
comment
word
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310083964.6A
Other languages
Chinese (zh)
Inventor
陈羽中
万宇杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202310083964.6A priority Critical patent/CN116258147A/en
Publication of CN116258147A publication Critical patent/CN116258147A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a multi-mode comment emotion analysis method based on heterography convolution, which comprises the following steps: step A: collecting user comments and related images, extracting aspect words of products or services related in the user comments, and labeling emotion polarities of the user comments aiming at specific aspects of the products or services so as to construct a training setDBThe method comprises the steps of carrying out a first treatment on the surface of the And (B) step (B): using training setsDBDeep learning network model based on knowledge graph and heterogeneous graph convolution network is trainedDLMThe method comprises the steps of analyzing emotion polarities of user comments and related images on specific aspects of products or services; step C: inputting user comments and related images and related aspect words of the product or service in question to a trainingAnd obtaining emotion polarities of user comments and related images aiming at specific aspects in the product or service in the trained deep learning network model. The method and the system are beneficial to improving the accuracy of emotion classification.

Description

Multimode comment emotion analysis method and system based on heterogram convolution
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a multi-mode comment emotion analysis method and system based on heterographic convolution.
Background
Emotion analysis (Sentiment Analysis, SA), also known as opinion mining, is a fundamental task in the field of Natural Language Processing (NLP). The basic goal of this task is to mine the emotion information in the given text and analyze its emotion polarity. However, in the face of the ever-increasing comment text data, collection and analysis tasks of massive information on a network cannot be completed by a manual method, so that scientific research institutions pay attention to emotion analysis technology. The task is classified into document-level emotion analysis, sentence-level emotion analysis and aspect-level emotion analysis according to different classification granularities. Early emotion analysis studies focused mainly on document-level (document-level) and Sentence-level (segmentation-level), i.e., assuming that emotion is expressed for only one entity (entity) in a document or Sentence. Although document-level and sentence-level tasks are widely studied in the emotion analysis field, due to the limitations of the two tasks themselves, conventional document-level or sentence-level emotion analysis models can only analyze the entire document or sentence to identify its emotion polarity. The requirements cannot be met in practical applications. With the rapid development of the internet, a large number of comment texts are presented by the online social media platform and the online shopping platform, and when one comment sentence has multiple aspects and emotion polarities of each aspect are different, the document-level or sentence-level emotion analysis models obviously cannot correctly interpret emotion information in the comment sentences. Thus, fine-grained emotion analysis methods for specific entity aspects have become a major research problem for current emotion analysis tasks. The Aspect-level emotion analysis (Aspect-level Sentiment Analysis) aims at judging emotion polarity corresponding to a specific Aspect of a target in comment text, and the task relates to natural language processing core problems such as vocabulary semantics, reference resolution, viewpoint extraction and the like, so that the method has strong theoretical research significance and application value.
In addition, in the network age of rapid development, people tend to express their own views and moods in the form of graphic combinations or videos. The multi-modal language data takes on the richer and more attractive expression form, so that the multi-modal language data takes on the overwhelming advantages of various large social media websites, and meanwhile, sufficient data resources are provided for researching multi-modal language calculation. In recent years, multi-modal emotion analysis has become a key task in the emotion analysis field, and emotion analysis in multi-modal contexts brings a machine close to more realistic human emotion processing. For aspect-level emotion analysis tasks, image information is generally indicative as text information. In one aspect, in multimodal data, both text and images are highly relevant to aspect moods. Furthermore, different aspects may be associated with different portions of each modality data. In other words, the customer may write different text or attach different images for different aspects. Text and image information, on the other hand, may be mutually facilitated, complemented, enhancing analysis of emotion of a particular aspect. In summary, there are various correlations in multimodal data for aspect-level emotion analysis.
In recent years, with the rise of deep learning technology, the technology is widely applied by aspect-level emotion analysis tasks. Among the most common are Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Among them, CNNs perform well in capturing semantic information from text, while RNNs, especially Long and Short Term Memories (LSTM) and gate-controlled loops (GRU), can better extract emotional features of specific aspects for emotion classification based on context information. However, these neural network models ignore syntactic dependencies between given aspects and their context words, which represent dependencies between words in the comment sentence, and are particularly important for correctly judging the emotion polarity of the aspects. More recently, scholars have used Graph Neural Networks (GNNs) and variants thereof to consider syntactic information when learning aspect-level representations of emotion classification. Zhang et al combine a graph convolution network (Graph Convolutional Network, GCN) with an attention mechanism to obtain semantic correlation between context information and aspects. Hang et al encode sentences using Bi-LSTM and then extract the dependencies between context words using a graph attention network (Graph Attention Network, GAT). However, most of the prior aspect emotion analysis models ignore implicit emotion information in text data, and some data with the implicit emotion information exist in the prior data set, if the data are not processed pertinently, the models may not be recognized correctly when conditions such as a irony or an implicit expression exist, and further improvement of the performance of the models is hindered. In addition, because the model with smaller data set is difficult to learn such patterns of implicit emotion expression, most of the efforts to solve the implicit emotion problem mainly assist the model in identifying the implicit emotion information in the text by introducing external knowledge. However, most of their knowledge selection algorithms are based on rules or combiner attention mechanisms, not taking context semantics into account comprehensively.
With the popularity of multi-modal user-generated content (e.g., text, images, speech, or video), emotion analysis has outperformed traditional text-based analysis. Multimodal emotion analysis is an emerging field of research that integrates textual and non-textual information into user emotion analysis. Text-image pairs are the most common form of multimodal data. With the development of deep learning technology, some neural network-based models are proposed for multi-modal emotion analysis and have made significant progress. Yu et al train a logistic regression model by pre-training text CNN and image CNN to extract feature representations from the text and image, respectively, and combining these multi-modal features. In order to fully capture visual semantic information, xu et al extract scene and object features from images and utilize the visual semantic features to absorb text words to simulate the influence of the images on the text, which also proves that the text and the images can mutually promote and mutually supplement in emotion analysis tasks, and the performance of the model is improved. Thus, xu and Chen et al [38] propose a common memory attention mechanism that interactively simulates interactions between text and images. Their models take into account the impact of one modality on another modality (i.e., text-to-image and image-to-text) and achieve better performance than other related methods. However, at the intersection of aspect level and multi-modal emotion analysis, the work on the present is relatively small, the above model structure is relatively simple, and there is a problem that the relationship between the image state and the text state is not further fully utilized.
Disclosure of Invention
The invention aims to provide a multi-mode comment emotion analysis method and system based on heterogram convolution, which are beneficial to improving the accuracy of emotion classification.
In order to achieve the above purpose, the invention adopts the following technical scheme: a multi-mode comment emotion analysis method based on heterogram convolution comprises the following steps:
step A: collecting user comments and related images, extracting aspect words of products or services related in the user comments, and labeling emotion polarities of the user comments aiming at specific aspects of the products or services so as to construct a training set DB:
and (B) step (B): training a deep learning network model DLM based on a knowledge graph and a heterogeneous graph convolution network by using a training set DB, wherein the training set DB is used for analyzing emotion polarities of user comments and related images on specific aspects of products or services;
step C: inputting the user comments and related images and related aspect words of the product or service into a trained deep learning network model to obtain the emotion polarity of the user comments and related images aiming at the specific aspect in the product or service.
Further, the step B specifically includes the following steps:
Step B1: coding each training sample in the training set DB to obtain a comment semantic representation vector X r Semantic representation vector X of aspects a Syntax dependency adjacency matrix A r Image region characterization vector X im
Step B2: selecting a Set of knowledge triples from a knowledge graph that are relevant to a comment context according to a dynamic knowledge selection mechanism skt Then coding the knowledge words to obtain a Set skt Knowledge word characterization vector X kg
Step B3: representing vector X by semantics of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air The method comprises the steps of carrying out a first treatment on the surface of the Through the imageThe tag selection mechanism obtains a Set of tag t tuples related to comment context sit
Step B4: for X a Averaging pooling to obtain aspect average characterization vector
Figure SMS_1
For X r Performing position coding to obtain a comment characterization vector X with enhanced positions pw By connection X pw and />
Figure SMS_2
Obtaining a characterization vector C sd,0 The method comprises the steps of carrying out a first treatment on the surface of the Generating a text-knowledge-image heterogeneous graph TKIHG according to a text-knowledge-image heterogeneous composition strategy to obtain an adjacency matrix A of the text-knowledge-image heterogeneous graph TKIHG hg Then the nodes are encoded by using a cross-modal attention mechanism to obtain a node characterization vector C of the heterogeneous graph TKIHG hg,0
Step B5: will characterize vector C sd,0 And node characterization vector C of heterogeneous graph TKIHG hg,0 Respectively inputting the text graph convolution vector C into two different L-layer graph convolution networks, respectively recording the text graph convolution network RGCN and the text-knowledge-image heterograph convolution network HGCN as comments, respectively learning and extracting syntactic dependency relationship and heterogeneous information of context semantics, image labels and external knowledge to obtain a text graph convolution characterization vector C sd,L And a isomerous map volume characterization vector C hg,L
Step B6: convolving the text map with a token vector C sd,L Performing aspect masking operation to obtain a text graph convolution mask characterization vector C of comments mask,L Semantic representation vector X of remarks r Using the interactive attention mechanism, further enhancing the context representation with aspect information of the aggregate syntax information, obtaining an aspect enhanced token vector X of comment r ea
Step B7: wrapping the isomerism map by a token vector C hg,L Enhancing the token vector X with respect to comment r, respectively ea Aspect-dependent image region characterization vector X air Enhancement of heterogeneous information such as image tags and external knowledge using a cross-modal attention mechanismLearning emotion characteristics of a text mode and an image mode to obtain a heterogeneous enhanced text characterization vector X hm And image characterization vector X hair Finally connect X hm and Xhair Obtaining a final characterization vector X fin
Step B8: will ultimately characterize vector X fin Inputting a final prediction layer, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to a target loss function loss, and updating each parameter by using a random gradient descent method;
step B9: and when the iteration change of the loss value generated by the deep learning network model is smaller than a given threshold value or the maximum iteration number is reached, terminating the training process of the deep learning network model.
Further, the step B1 specifically includes the following steps:
step B11: traversing a training set DB, performing word segmentation processing on user comments and aspects in the training set DB, removing stop words, and adjusting an image related to the comments into an image with 3 multiplied by 224 pixels, wherein each training sample in the DB is expressed as d= (r, im, a, p);
wherein r is a user comment, im is an adjusted image related to the user comment, a is an aspect word or phrase of a product or service related to the user comment extracted from the user comment r, and p epsilon (positive, negative and neutral) is the emotion polarity of the comment on the aspect;
comment r is expressed as:
Figure SMS_3
wherein ,
Figure SMS_4
i=1, 2, …, n, n being the number of words in comment r;
Aspect a is represented as:
Figure SMS_5
wherein ,
Figure SMS_6
for the i-th word in aspect a, i=1, 2, …, m, m is the number of words of aspect a;
step B12: comment on step B11
Figure SMS_7
Coding by using a pre-training model BERT and reducing the dimension by using a full-connection layer to obtain a semantic representation vector X of the comment r r Expressed as:
Figure SMS_8
wherein ,
Figure SMS_9
for comment the i-th word->
Figure SMS_10
The corresponding semantic representation vector, d, represents the output dimension;
step B13: aspect obtained for step B11
Figure SMS_11
Encoding by using a pre-training model BERT and reducing dimension by using a full-connection layer to obtain a semantic representation vector X of the aspect a a Expressed as:
Figure SMS_12
wherein ,
Figure SMS_13
representing aspect the i-th word->
Figure SMS_14
The corresponding word vector, d, represents the output dimension;
step B14: the adjusted image is encoded and smoothed by using a pre-training model ResNet-152, and thenObtaining the representation vector of the image area after the dimension reduction of the full connection layer
Figure SMS_15
Wherein z represents the feature map size of the output and d represents the output dimension;
step B15: carrying out syntactic dependency analysis on the comment r to obtain a syntactic dependency tree SDT;
Figure SMS_16
wherein ,
Figure SMS_17
representing words ∈in comments>
Figure SMS_18
He word->
Figure SMS_19
A syntactic dependency exists between the two;
step B16: encoding the syntax-dependent tree SDT obtained in the step B15 into an n-order adjacency matrix A r ,A r Expressed as:
Figure SMS_20
wherein ,
Figure SMS_21
a1 indicates the word ++in the comment>
Figure SMS_22
He word->
Figure SMS_23
There is a syntactic dependency between->
Figure SMS_24
A0 indicates the word->
Figure SMS_25
He word->
Figure SMS_26
There is no syntactic dependency between them.
Further, the step B2 specifically includes the following steps:
step B21: selecting a context word with edge connection with the aspect word from the syntax dependency tree SDT obtained in the step B15, and combining the context word with the aspect a to form a seed node Set with the length of m' sn Expressed as:
Figure SMS_27
wherein ,
Figure SMS_28
is an aspect word or a context word with edge connection, m is less than or equal to m' is less than or equal to n;
step B22: for each node in the seed node set, selecting its emotion polarity word and 5 words similar to its semantic meaning from the knowledge graph to respectively form 5 candidate knowledge triples CKT of the seed node, which are expressed as:
CKT=<w sn ,w sp ,w ss >
wherein ,wsn Is a seed node word, w sp Is emotion polar word, w ss Is a semantically similar word;
step B23: constructing each candidate knowledge triplet CKT into one candidate knowledge sentence r cks Inputting a pre-training model BERT and taking an average value to obtain an average semantic representation vector X cks ;r cks Expressed as:
r cks = "word' w sn 'emotional polarity is' w sp 'the word most similar to its semantics is' w ss ”’
Step B24: semantic representation vector X of comments r Average semantic representation vector X is obtained after averaging rm Then calculate X rm and Xcks Cosine similarity between the two to obtain comment text r and candidate knowledge sentences r cks The similarity score of (2) is calculated as follows:
Similarity_Score(r,r cks )=CosineSimilarity(X rm ,X cks )
wherein ,Xrm
Figure SMS_29
Step B25: b24, calculating similarity scores of candidate knowledge sentences constructed by all candidate knowledge triples CKT, and selecting the original candidate knowledge triples of the top k candidate knowledge sentences with the highest scores to form a knowledge triplet set of the seed node as the external knowledge of the seed node most relevant to the context;
step B26: for seed node Set sn Repeating the steps to obtain a Set containing knowledge triples of all seed nodes skt Expressed as:
Figure SMS_30
wherein ,
Figure SMS_31
is the i seed node->
Figure SMS_32
Is a knowledge triplet set;
step B27: set of knowledge triples by knowledge graph embedding skt All emotion polarity words and semantic similar words of the tree are encoded to obtain node characterization vectors of the tree as follows
Figure SMS_33
wherein ,/>
Figure SMS_34
The knowledge representation vector corresponding to the ith emotion polarity word or semantic similarity word is obtained by training a knowledge word vector matrix in advance
Figure SMS_35
Where d represents the dimension of the knowledge word vector and V is the number of words in which the knowledge word is embedded.
Further, the step B3 specifically includes the following steps:
step B31: semantic representation vector X for a pair of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air ,X air The calculation process of (2) is as follows:
Figure SMS_36
Figure SMS_37
Figure SMS_38
wherein ,
Figure SMS_39
(·) T representing a transpose operation;
step B32: inputting the image im into a pre-training model ResNet-152 to obtain 10 related image tags tag= { tag 1 ,…,tag i ,…,tag 10 And then adding the same to the seed node Set obtained in the step 21 sn Each seed node word in the tree is respectively input into a pre-training model BERT to obtain semantic characterization vectors of the seed node words
Figure SMS_40
and />
Figure SMS_41
Figure SMS_42
wherein ,/>
Figure SMS_43
Is seed node->
Figure SMS_44
Semantic token vector of>
Figure SMS_45
Is the ith image tag i Semantic token vectors of (a);
step B33: calculating an image tag i Semantic characterization vectors and seed nodes of (a)
Figure SMS_46
Cosine similarity of semantic representation vectors of (2) to obtain seed node +.>
Figure SMS_47
And image tag i The similarity score between the two is calculated as follows:
Figure SMS_48
/>
step B34: calculating seed nodes according to step B33
Figure SMS_49
And similarity scores of 10 labels, and selecting top t image labels with highest scores and seed nodes +.>
Figure SMS_50
Forming a tag t tuple TT together; expressed as:
Figure SMS_51
step B35: for seed node Set sn Repeating the steps to obtain a Set containing the label t-tuple of all the seed nodes sit Expressed as:
Set sit ={TT 1 ,…,TT i ,…,TT m′ }
wherein ,TTi Is the ith seed node
Figure SMS_52
Is included in the tag t tuple of (c).
Further, the step B4 specifically includes the following steps:
step B41: semantic representation vector X for a pair of aspects a Carrying out average pooling operation to obtain
Figure SMS_53
The calculation formula is as follows:
Figure SMS_54
wherein ,
Figure SMS_55
step B42: semantic representation vector X for comments r Performing position coding to obtain a comment position-enhanced characterization vector X pw ,X pw Expressed as:
Figure SMS_56
Figure SMS_57
wherein ,
Figure SMS_58
reinforcing a characterization vector for the position corresponding to the ith word in the comment r, wherein "·" represents that real numbers and the vector are multiplied, and pw i The calculation mode of the position weight corresponding to the i-th word in the comment r is as follows:
Figure SMS_59
wherein θ and θ+m-1 represent the starting and ending positions of aspect a in the comment r, respectively;
step B43: averaging the aspect characterization vectors obtained in step B41
Figure SMS_60
X obtained in step B42 pw Connecting to obtain a characterization vector C sd,0 ,/>
Figure SMS_61
Expressed as:
Figure SMS_62
wherein ,
Figure SMS_63
i=1, 2, …, n, ", which is a token vector corresponding to the i-th word in comment r that is input into the graph convolution network; "means vector join operations;
Step B44: based on the Set obtained in step 26 skt Knowledge triplet sets of each seed node in the tree, and emotion polarity w in each knowledge triplet in turn sp And semantically similar words w ss Respectively serving as knowledge expansion nodes and adding an edge to be connected with the relevant seed nodes in the syntax dependency tree SDT obtained in the step B15; similarly, a Set is obtained based on step 35 sit The label t tuple of each seed node in the tree is used for sequentially connecting each image label serving as an image label expansion node and adding an edge with the seed node related to the syntax dependency tree SDT; to avoid redundancy, if a new knowledge expansion node or image tag expansion node already exists in the graph, only one edge is added between it and the relevant seed node; finally, obtaining a text-knowledge-image heterogeneous graph TKIHG;
Figure SMS_64
wherein ,
Figure SMS_67
representing words ∈in comments>
Figure SMS_69
He word->
Figure SMS_70
There is a syntactic dependency between->
Figure SMS_66
Representing words ∈in comments>
Figure SMS_71
The emotion polar word is w sp ,/>
Figure SMS_72
Representing words ∈in comments>
Figure SMS_73
The semantically similar word is w ss
Figure SMS_65
Word ∈in representation and comment>
Figure SMS_68
The relevant image label is tag j
Step B45: encoding the text-knowledge-image iso-composition TKIHG obtained in the step B44 into a u-order adjacency matrix A hg ,A hg Expressed as:
Figure SMS_74
wherein ,
Figure SMS_75
1 indicates that a connection relationship exists between two words, < ->
Figure SMS_76
If 0, no connection relationship exists between the two words, and u=n+m' × (t+k); the first n nodes in the figure are words in comment text, and the (n+1) -th to (n+th)The m 'x t nodes are words of related image label nodes, and finally the n+m' x t+1 to the u node are words of related knowledge expansion nodes;
step B46: image tag node words in a text-knowledge-image heterograph TKIHG are spliced to comment text r in sequence to form a new sequence, and then a pre-training model BERT is input to perform dimension reduction by using a full-connection layer to obtain an upper and lower Wen Yuyi characterization vector X of a text mode rt Expressed as:
Figure SMS_77
wherein
Figure SMS_78
Step B47: representation vector X for upper and lower Wen Yuyi of text modality rt And step 27, obtaining a characterization vector X of the knowledge expansion node kg Obtaining semantic guided external knowledge representation vector X by using cross-modal attention mechanism kg′ The calculation process is as follows:
Figure SMS_79
wherein ,
Figure SMS_80
W 1 ,W 2 and />
Figure SMS_81
Is a learnable weight matrix;
step B48: representation vector X for upper and lower Wen Yuyi of text modality rt Mapping it to an X using a self-attention mechanism kg′ Under the same feature space, a transformed upper and lower Wen Yuyi characterization vector X is obtained rt′ The calculation process is as follows:
Figure SMS_82
wherein ,
Figure SMS_83
is the upper and lower Wen Yuyi characterization vector of the mapped text mode, W 4 ,W 5 and />
Figure SMS_84
Is a learnable weight matrix;
step B49: characterizing the transformed upper and lower Wen Yuyi to vector X rt′ And semantically guided external knowledge representation vector X kg′ Connecting to obtain a node characterization vector C of the text-knowledge-image heterograph TKIHG hg,0 It is represented as follows:
Figure SMS_85
wherein
Figure SMS_86
Further, the step B5 specifically includes the following steps:
step B51: for the comment text graph rolling network RGCN, the characterization vector C obtained in the step B43 is obtained sd,0 Input first layer graph rolling network utilizing adjacency matrix A r Updating the characterization vector of each word node, and outputting C sd,1 And is used as the input of the next layer graph rolling network;
wherein ,Csd,1 Expressed as:
Figure SMS_87
wherein ,
Figure SMS_88
is the output of the ith node in the first layer graph roll-up network,/for>
Figure SMS_89
The calculation formula of (2) is as follows:
Figure SMS_90
Figure SMS_91
wherein ,
Figure SMS_92
is a weight matrix that can be learned, +.>
Figure SMS_93
Is a bias vector; reLU is an activation function; the ith node in the graph roll-up network and the ith word in comment r +.>
Figure SMS_94
Correspondingly, d i Representing the degree of the ith node, d i +1 is to prevent calculation errors when the degree of the ith node is 0;
step B52: for a text-knowledge-image heterograph convolution network HGCN, the node characterization vector C of the heterograph TKIHG obtained in the step B49 is calculated hg,0 Input first layer graph rolling network utilizing adjacency matrix A hg Updating the characterization vector of each node, and outputting C hg,1 And is used as the input of the next layer graph rolling network;
wherein ,Chg,1 Expressed as:
Figure SMS_95
wherein ,
Figure SMS_96
is the output of the ith node in the first layer graph roll-up network,/for>
Figure SMS_97
The calculation formula of (2) is as follows:
Figure SMS_98
Figure SMS_99
/>
wherein
Figure SMS_100
Is a weight matrix that can be learned, +.>
Figure SMS_101
Is a bias vector; reLU is an activation function; edges between nodes in a graph rolling network represent that connection relations exist between the nodes, d i Representing the degree of the ith node, d i +1 is to prevent calculation errors when the degree of the ith node is 0;
step B53: respectively C sd,1 and Chg,1 Inputting to the next layer graph rolling network of RGCN and HGCN, repeating steps B51 and B52;
wherein, for comment text graph rolling network RGCN, the output of the first layer graph rolling network
Figure SMS_102
As input of the layer 1 (l+1) graph convolution network, obtaining a text graph convolution token vector ++after iteration is finished>
Figure SMS_103
For the text-knowledge-image heterograph convolution network HGCN, the output of the layer I graph convolution network is +.>
Figure SMS_104
As input of the layer 1 graph convolution network, obtaining the isomerous graph convolution characterization vector +.>
Figure SMS_105
L is the layer number of the graph rolling network, and L is more than or equal to 1 and less than or equal to L.
Further, the step B6 specifically includes the following steps:
Step B61: for the text obtained in the step B53The volume of this figure characterizes vector C sd,L Performing aspect masking operation, masking text graph convolution output which does not belong to aspect words, and obtaining a text graph convolution aspect characterization vector C of comments r mask,L The calculation process is as follows:
Figure SMS_106
where θ represents the start position of the aspect in the comment sentence, θ+m-1 represents the end position of the aspect in the comment sentence,
Figure SMS_107
representing a graph convolution aspect characterization vector corresponding to an ith word in the comment, wherein 0 represents a zero vector with a dimension d;
step B62: the semantic characterization vector X of the comment r obtained in the step B12 is obtained r And the graph convolution aspect characterization vector C of comment r obtained in step B61 mask,L Inputting an interactive attention network, enhancing the context representation by using aspect information of aggregation syntax information through an interactive attention mechanism, and obtaining an aspect enhancement characterization vector X of comments r ea The calculation formula is as follows:
Figure SMS_108
Figure SMS_109
Figure SMS_110
wherein ,(·)T Representing the transpose operation, beta i Is the attention weight of the i-th word in comment r.
Further, the step B7 specifically includes the following steps:
step B71: the volume characterization vector C of the isomerism map obtained in the step B53 hg,L And comment r aspect enhancement characterization vector X ea By using a cross-modal attention mechanism, the heterogeneous information such as image labels, external knowledge and the like is further utilized to enhance the learning text mode, and the heterogeneous enhanced text characterization vector X is obtained hm The calculation process is as follows:
Figure SMS_111
wherein ,
Figure SMS_112
W 9 ,W 10 and />
Figure SMS_113
A learnable weight matrix;
step B72: characterizing vector C for a volume of heterogeneous images hg,L Aspect-dependent image region characterization vector X air Using a cross-mode attention mechanism, further utilizing heterogeneous information such as image labels, external knowledge and the like to strengthen emotion characteristics of a learning image mode, and obtaining a heterogeneous enhanced image characterization vector X hair The calculation process is as follows:
Figure SMS_114
wherein ,
Figure SMS_115
W 12 ,W 13 and />
Figure SMS_116
A learnable weight matrix;
step B73: text token vector X for heterogeneous enhancement hm And image characterization vector X hair Performing connection operation to obtain final representation final characterization vector X fin The calculation process is as follows:
X fin =[X hm ;X hair ]。
the invention also provides a multi-mode comment emotion analysis system adopting the method, which comprises the following steps:
the data collection module is used for extracting user comments and related images, aspect words in the comments and position information of the aspect words, marking emotion polarities of the aspects and constructing a training set;
the preprocessing module is used for preprocessing training samples in a training set, and comprises word segmentation processing, stop word removal, image size adjustment, syntactic dependency analysis, selection of a related knowledge triplet set and an image tag set and generation of a text-knowledge-image heterogram;
The coding module is used for searching word vectors of knowledge words of the knowledge triplet set in the pre-trained knowledge map word vector matrix to obtain knowledge word characterization vectors of the knowledge triplet set;
the network training module is used for inputting the processed user comments, related images, aspects, text-knowledge-image heterograms and knowledge word characterization vectors of the knowledge triplet set into the deep learning network to obtain final characterization vectors of the multi-mode comments, training the deep learning network by using the probability that the characterization vectors belong to a certain category and labels in a training set as losses, and training the whole deep learning network by using the minimum losses as targets to obtain a deep learning network model based on the knowledge map and the heterogeneous map convolution network; and
and the emotion analysis module is used for extracting aspects in the input user comments by using an NLP tool, then analyzing and processing the input comments, images and aspects by using a trained deep learning network model based on the knowledge graph and the heterogeneous graph convolution network, and outputting emotion evaluation polarities related to specific aspects in the user comments and related images.
Compared with the prior art, the invention has the following beneficial effects: the method and the system firstly utilize a pre-training model to respectively encode comment sentences, product aspects and images, and then utilize a knowledge graph and a dynamic knowledge selection mechanism to obtain related knowledge nodes of corresponding comment sentences; next, obtaining an aspect-related image tag by using an image selection mechanism, and then constructing a text-knowledge-image heterogram by using knowledge information and image tag information; and then, carrying out position weighting on comment sentence representation by using position information, learning syntactic dependency relationship and heterogeneous information in the multi-mode comment by using two GCNs, and finally, enhancing emotion characteristics of a learning text mode and an image mode by further using heterogeneous information such as image labels, external knowledge and the like by using a cross-mode attention mechanism, thereby improving the accuracy of model prediction emotion classification.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a deep learning network model in an embodiment of the invention;
fig. 3 is a schematic diagram of a system structure according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1, the embodiment provides a multi-mode comment emotion analysis method based on heterographic convolution, which includes the following steps:
Step A: and collecting user comments and related images, extracting aspect words of products or services related to the user comments, and labeling emotion polarities of the user comments aiming at specific aspects of the products or services so as to construct a training set DB.
And (B) step (B): the training set DB is used for training a deep learning network model DLM based on a knowledge graph and a heterogeneous graph convolution network and is used for analyzing the emotion polarity of user comments and related images on specific aspects of products or services. The architecture of the deep learning network model is shown in fig. 2.
In this embodiment, the step B specifically includes the following steps:
step B1: coding each training sample in the training set DB to obtain a comment semantic representation vector X r Semantic representation vector X of aspects a Syntax dependency adjacency matrix A r Image region characterization vector X im
In this embodiment, the step B1 specifically includes the following steps:
step B11: traversing the training set DB, performing word segmentation processing on user comments and aspects in the training set DB, removing stop words, and adjusting an image related to the comments into an image with 3 multiplied by 224 pixels, wherein each training sample in the DB is expressed as d= (r, im, a, p).
Wherein r is a user comment, im is an adjusted image related to the user comment, a is an aspect word or phrase of a product or service related to the user comment extracted from the user comment r, and p epsilon (positive, negative, neutral) is the emotion polarity of the comment on the aspect.
Comment r is expressed as:
Figure SMS_117
wherein ,
Figure SMS_118
for the i-th word in comment r, i=1, 2, …, n, n is the number of words in comment r.
Aspect a is represented as:
Figure SMS_119
wherein ,
Figure SMS_120
for the i-th word in aspect a, i=1, 2, …, m, m is the number of words of aspect a.
Step B12: comment on step B11
Figure SMS_121
Coding by using a pre-training model BERT and reducing the dimension by using a full-connection layer to obtain a semantic representation vector X of the comment r r Expressed as:
Figure SMS_122
wherein ,
Figure SMS_123
for comment the i-th word->
Figure SMS_124
The corresponding semantic token vector, d, represents the output dimension.
Step B13: aspect obtained for step B11
Figure SMS_125
Encoding by using a pre-training model BERT and reducing dimension by using a full-connection layer to obtain a semantic representation vector X of the aspect a a Expressed as:
Figure SMS_126
wherein ,
Figure SMS_127
representing aspect the i-th word->
Figure SMS_128
The corresponding word vector, d, represents the output dimension.
Step B14: pre-training for adjusted imagesCoding and smoothing a model ResNet-152, and obtaining an image region characterization vector after full-connection layer dimension reduction
Figure SMS_129
Where z represents the feature map size of the output and d represents the output dimension.
Step B15: and carrying out syntactic dependency analysis on the comment r to obtain a syntactic dependency tree SDT.
Figure SMS_130
wherein ,
Figure SMS_131
representing words ∈in comments>
Figure SMS_132
He word->
Figure SMS_133
There is a syntactic dependency between them.
Step B16: encoding the syntax-dependent tree SDT obtained in the step B15 into an n-order adjacency matrix A r ,A r Expressed as:
Figure SMS_134
wherein ,
Figure SMS_135
a1 indicates the word ++in the comment>
Figure SMS_136
He word->
Figure SMS_137
There is a syntactic dependency between->
Figure SMS_138
A0 indicates the word->
Figure SMS_139
He word->
Figure SMS_140
There is no syntactic dependency between them.
Step B2: selecting a Set of knowledge triples from a knowledge graph that are relevant to a comment context according to a dynamic knowledge selection mechanism skt Then coding the knowledge words to obtain a Set skt Knowledge word characterization vector X kg
In this embodiment, the step B2 specifically includes the following steps:
step B21: selecting a context word with edge connection with the aspect word from the syntax dependency tree SDT obtained in the step B15, and combining the context word with the aspect a to form a seed node Set with the length of m' sn Expressed as:
Figure SMS_141
wherein ,
Figure SMS_142
is an aspect word or a context word with edge connection, and m is less than or equal to m' is less than or equal to n.
Step B22: for each node in the seed node set, selecting its emotion polarity word and 5 words similar to its semantic meaning from the knowledge graph to respectively form 5 candidate knowledge triples CKT of the seed node, which are expressed as:
CKT=<w sn ,w sp ,w ss )
wherein ,wsn Is a seed node word, w sp Is emotion polar word, w ss Is a semantically similar word.
Step B23: constructing each candidate knowledge triplet CKT into one candidate knowledge sentence r cks Inputting a pre-training model BERT and taking an average value to obtain an average semantic representation vector X cks ;r cks Expressed as:
r cks = "word' w sn ' emotion electrodeSex is' w sp 'the word most similar to its semantics is' w ss ”’
Step B24: semantic representation vector X of comments r Average semantic representation vector X is obtained after averaging rm Then calculate X rm and Xcks Cosine similarity between the two to obtain comment text r and candidate knowledge sentences r cks The similarity score of (2) is calculated as follows:
Similarity_Score(r,r cks )=CosineSimilarity(X rm ,X cks )
wherein ,Xrm
Figure SMS_143
Step B25: and B24, calculating similarity scores of candidate knowledge sentences constructed by all candidate knowledge triples CKT, and selecting the original candidate knowledge triples of the top k candidate knowledge sentences with the highest scores to form a knowledge triplet set of the seed node as the external knowledge of the seed node which is most relevant to the context.
Step B26: for seed node Set sn Repeating the steps to obtain a Set containing knowledge triples of all seed nodes skt Expressed as:
Figure SMS_144
wherein ,
Figure SMS_145
is the i seed node->
Figure SMS_146
Is a knowledge triplet set.
Step B27: set of knowledge triples by knowledge graph embedding skt All emotion polarity words and semantic similar words of the tree are encoded to obtain node characterization vectors of the tree as follows
Figure SMS_147
wherein ,/>
Figure SMS_148
The knowledge representation vector corresponding to the ith emotion polarity word or semantic similarity word is obtained by training a knowledge word vector matrix in advance
Figure SMS_149
Where d represents the dimension of the knowledge word vector and V is the number of words in which the knowledge word is embedded.
Step B3: representing vector X by semantics of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air The method comprises the steps of carrying out a first treatment on the surface of the Acquiring a Set of tag t tuples related to comment context through an image tag selection mechanism sit
In this embodiment, the step B3 specifically includes the following steps:
step B31: semantic representation vector X for a pair of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air ,X air The calculation process of (2) is as follows:
Figure SMS_150
Figure SMS_151
Figure SMS_152
wherein ,
Figure SMS_153
(·) T representing the transpose operation.
Step B32: inputting the image im into a pre-training model ResNet-152 to obtain 10 relevant graphs Like tag= { tag 1 ,…,tag i ,…,tag 10 And then adding the same to the seed node Set obtained in the step 21 sn Each seed node word in the tree is respectively input into a pre-training model BERT to obtain semantic characterization vectors of the seed node words
Figure SMS_154
and />
Figure SMS_155
Figure SMS_156
wherein ,/>
Figure SMS_157
Is seed node->
Figure SMS_158
Semantic token vector of>
Figure SMS_159
Is the ith image tag i Is described.
Step B33: calculating an image tag i Semantic characterization vectors and seed nodes of (a)
Figure SMS_160
Cosine similarity of semantic representation vectors of (2) to obtain seed node +.>
Figure SMS_161
And image tag i The similarity score between the two is calculated as follows:
Figure SMS_162
step B34: calculating seed nodes according to step B33
Figure SMS_163
And similarity scores of 10 labels, and selecting top t image labels with highest scores and seed nodes +.>
Figure SMS_164
Forming a tag t tuple TT together; expressed as:
Figure SMS_165
step B35: for seed node Set sn Repeating the steps to obtain a Set containing the label t-tuple of all the seed nodes sit Expressed as:
Set sit ={TT 1 ,…,TT i ,…,TT m′ }
wherein ,TTi Is the ith seed node
Figure SMS_166
Is included in the tag t tuple of (c).
Step B4: for X a Averaging pooling to obtain aspect average characterization vector
Figure SMS_167
For X r Performing position coding to obtain a comment characterization vector X with enhanced positions pw By connection X pw and />
Figure SMS_168
Obtaining a characterization vector C sd,0 The method comprises the steps of carrying out a first treatment on the surface of the Generating a text-knowledge-image heterogeneous graph TKIHG according to a text-knowledge-image heterogeneous composition strategy to obtain an adjacency matrix A of the text-knowledge-image heterogeneous graph TKIHG hg Then the nodes are encoded by using a cross-modal attention mechanism to obtain a node characterization vector C of the heterogeneous graph TKIHG hg,0
In this embodiment, the step B4 specifically includes the following steps:
step B41: semantic representation vector X for a pair of aspects a Carrying out average pooling operation to obtain
Figure SMS_169
The calculation formula is as follows:
Figure SMS_170
wherein ,
Figure SMS_171
step B42: semantic representation vector X for comments r Performing position coding to obtain a comment position-enhanced characterization vector X pw ,X pw Expressed as:
Figure SMS_172
Figure SMS_173
wherein ,
Figure SMS_174
reinforcing a characterization vector for the position corresponding to the ith word in the comment r, wherein "·" represents that real numbers and the vector are multiplied, and pw i The calculation mode of the position weight corresponding to the i-th word in the comment r is as follows:
Figure SMS_175
/>
where θ and θ+m-1 represent the position of the start and end of aspect a in the comment r, respectively.
Step B43: averaging the aspect characterization vectors obtained in step B41
Figure SMS_176
X obtained in step B42 pw Connecting to obtain a characterization vector C sd,0 ,/>
Figure SMS_177
Expressed as:
Figure SMS_178
wherein ,
Figure SMS_179
i=1, 2, …, n, ", which is a token vector corresponding to the i-th word in comment r that is input into the graph convolution network; "means vector join operation.
Step B44: based on the Set obtained in step 26 skt Knowledge triplet sets of each seed node in the tree, and emotion polarity w in each knowledge triplet in turn sp And semantically similar words w ss Respectively serving as knowledge expansion nodes and adding an edge to be connected with the relevant seed nodes in the syntax dependency tree SDT obtained in the step B15; similarly, a Set is obtained based on step 35 sit The label t tuple of each seed node in the tree is used for sequentially connecting each image label serving as an image label expansion node and adding an edge with the seed node related to the syntax dependency tree SDT; to avoid redundancy, if a new knowledge expansion node or image tag expansion node already exists in the graph, only one edge is added between it and the relevant seed node; finally, a text-knowledge-image-different composition TKIHG is obtained.
Figure SMS_180
wherein ,
Figure SMS_183
representing words ∈in comments>
Figure SMS_186
He word->
Figure SMS_188
There is a syntactic dependency between->
Figure SMS_182
Representing words ∈in comments>
Figure SMS_185
The emotion polar word is w sp ,/>
Figure SMS_187
Representing words ∈in comments>
Figure SMS_189
The semantically similar word is w ss
Figure SMS_181
Word ∈in representation and comment>
Figure SMS_184
The relevant image label is tag j
Step B45: encoding the text-knowledge-image iso-composition TKIHG obtained in the step B44 into a u-order adjacency matrix A hg ,A hg Expressed as:
Figure SMS_190
wherein ,
Figure SMS_191
1 indicates that a connection relationship exists between two words, < ->
Figure SMS_192
If 0, no connection relationship exists between the two words, and u=n+m' × (t+k); the first n nodes in the graph are words in comment texts, the (n+1) th to (n+m '. Times.t) th nodes are words of related image label nodes, and the (n+m'. Times.t+1) th to (u) th nodes are words of related knowledge expansion nodes.
Step B46: image tag node words in a text-knowledge-image heterograph TKIHG are spliced to comment text r in sequence to form a new sequence, and then a pre-training model BERT is input to perform dimension reduction by using a full-connection layer to obtain an upper and lower Wen Yuyi characterization vector X of a text mode rt Expressed as:
Figure SMS_193
wherein
Figure SMS_194
/>
Step B47: representation vector X for upper and lower Wen Yuyi of text modality rt And step 27, obtaining a characterization vector X of the knowledge expansion node kg Obtaining semantic guided external knowledge representation vector X by using cross-modal attention mechanism kg′ The calculation process is as follows:
Figure SMS_195
wherein ,
Figure SMS_196
W 1 ,W 2 and />
Figure SMS_197
Is a learnable weight matrix.
Step B48: representation vector X for upper and lower Wen Yuyi of text modality rt Mapping it to an X using a self-attention mechanism kg′ Under the same feature space, a transformed upper and lower Wen Yuyi characterization vector X is obtained rt′ The calculation process is as follows:
Figure SMS_198
wherein ,
Figure SMS_199
is the upper and lower Wen Yuyi characterization vector of the mapped text mode, W 4 ,W 5 and />
Figure SMS_200
Is a learnable weight matrix.
Step B49: characterizing the transformed upper and lower Wen Yuyi to vector X rt′ And semantic indexingGuided extrinsic knowledge characterization vector X kg′ Connecting to obtain a node characterization vector C of the text-knowledge-image heterograph TKIHG hg,0 It is represented as follows:
Figure SMS_201
wherein
Figure SMS_202
Step B5: will characterize vector C sd,0 And node characterization vector C of heterogeneous graph TKIHG hg,0 Respectively inputting the text graph convolution vector C into two different L-layer graph convolution networks, respectively recording the text graph convolution network RGCN and the text-knowledge-image heterograph convolution network HGCN as comments, respectively learning and extracting syntactic dependency relationship and heterogeneous information of context semantics, image labels and external knowledge to obtain a text graph convolution characterization vector C sd,L And a isomerous map volume characterization vector C hg,L
In this embodiment, the step B5 specifically includes the following steps:
step B51: for the comment text graph rolling network RGCN, the characterization vector C obtained in the step B43 is obtained sd,0 Input first layer graph rolling network utilizing adjacency matrix A r Updating the characterization vector of each word node, and outputting C sd,1 And serves as input to the next layer of graph rolling network.
wherein ,Csd,1 Expressed as:
Figure SMS_203
wherein ,
Figure SMS_204
is the output of the ith node in the first layer graph roll-up network,/for>
Figure SMS_205
The calculation formula of (2) is as follows:
Figure SMS_206
Figure SMS_207
wherein ,
Figure SMS_208
is a weight matrix that can be learned, +.>
Figure SMS_209
Is a bias vector; reLU is an activation function; the ith node in the graph roll-up network and the ith word in comment r +.>
Figure SMS_210
Correspondingly, d i Representing the degree of the ith node, d i +1 is to prevent the calculation error from being caused when the degree of the i-th node is 0.
Step B52: for a text-knowledge-image heterograph convolution network HGCN, the node characterization vector C of the heterograph TKIHG obtained in the step B49 is calculated hg,0 Input first layer graph rolling network utilizing adjacency matrix A hg Updating the characterization vector of each node, and outputting C hg,1 And serves as input to the next layer of graph rolling network.
wherein ,Chg,1 Expressed as:
Figure SMS_211
wherein ,
Figure SMS_212
is the output of the ith node in the first layer graph roll-up network,/for>
Figure SMS_213
The calculation formula of (2) is as follows:
Figure SMS_214
Figure SMS_215
wherein
Figure SMS_216
Is a weight matrix that can be learned, +.>
Figure SMS_217
Is a bias vector; reLU is an activation function; edges between nodes in a graph rolling network represent that connection relations exist between the nodes, d i Representing the degree of the ith node, d i +1 is to prevent the calculation error from being caused when the degree of the i-th node is 0.
Step B53: respectively C sd,1 and Chg,1 The next layer of graph rolling network input to RGCN and HGCN, repeat steps B51 and B52.
Wherein, for comment text graph rolling network RGCN, the output of the first layer graph rolling network
Figure SMS_218
As input of the layer 1 (l+1) graph convolution network, obtaining a text graph convolution token vector ++after iteration is finished>
Figure SMS_219
For the text-knowledge-image heterograph convolution network HGCN, the output of the layer I graph convolution network is +.>
Figure SMS_220
As input of the layer 1 graph convolution network, obtaining the isomerous graph convolution characterization vector +.>
Figure SMS_221
L is the layer number of the graph rolling network, and L is more than or equal to 1 and less than or equal to L.
Step B6: convolving the text map with a token vector C sd,L Performing aspect shielding operation to obtain text graph convolution shielding characterization of commentsVector C mask,L Semantic representation vector X of remarks r Using the interactive attention mechanism, further enhancing the context representation with aspect information of the aggregate syntax information, obtaining an aspect enhanced token vector X of comment r ea
In this embodiment, the step B6 specifically includes the following steps:
step B61: the text chart obtained in the step B53 is rolled to represent the vector C sd,L Performing aspect masking operation, masking text graph convolution output which does not belong to aspect words, and obtaining a text graph convolution aspect characterization vector C of comments r mask,L The calculation process is as follows:
Figure SMS_222
where θ represents the start position of the aspect in the comment sentence, θ+m-1 represents the end position of the aspect in the comment sentence,
Figure SMS_223
representing a graph convolution aspect characterization vector corresponding to an ith word in the comment, and 0 represents a zero vector with a dimension d.
Step B62: the semantic characterization vector X of the comment r obtained in the step B12 is obtained r And the graph convolution aspect characterization vector C of comment r obtained in step B61 mask,L Inputting an interactive attention network, enhancing the context representation by using aspect information of aggregation syntax information through an interactive attention mechanism, and obtaining an aspect enhancement characterization vector X of comments r ea The calculation formula is as follows:
Figure SMS_224
Figure SMS_225
Figure SMS_226
wherein ,(·)T Representing the transpose operation, beta i Is the attention weight of the i-th word in comment r.
Step B7: wrapping the isomerism map by a token vector C hg,L Enhancing the token vector X with respect to comment r, respectively ea Aspect-dependent image region characterization vector X air Using a cross-mode attention mechanism, further utilizing heterogeneous information such as image labels and external knowledge to strengthen learning text modes and emotion characteristics of the image modes, and obtaining a heterogeneous enhanced text characterization vector X hm And image characterization vector X hair Finally connect X hm and Xhair Obtaining a final characterization vector X fin
In this embodiment, the step B7 specifically includes the following steps:
Step B71: the volume characterization vector C of the isomerism map obtained in the step B53 hg,L And comment r aspect enhancement characterization vector X ea By using a cross-modal attention mechanism, the heterogeneous information such as image labels, external knowledge and the like is further utilized to enhance the learning text mode, and the heterogeneous enhanced text characterization vector X is obtained hm The calculation process is as follows:
Figure SMS_227
wherein ,
Figure SMS_228
W 9 ,W 10 and />
Figure SMS_229
A learnable weight matrix.
Step B72: characterizing vector C for a volume of heterogeneous images hg,L Aspect-dependent image region characterization vector X air Using a cross-mode attention mechanism, further utilizing heterogeneous information such as image labels, external knowledge and the like to strengthen emotion characteristics of a learning image mode, and obtaining a heterogeneous enhanced image characterization vector X hair The calculation process is as follows:
Figure SMS_230
wherein ,
Figure SMS_231
W 12 ,W 13 and />
Figure SMS_232
A learnable weight matrix.
Step B73: text token vector X for heterogeneous enhancement hm And image characterization vector X hair Performing connection operation to obtain final representation final characterization vector X fin The calculation process is as follows:
X fin =[X hm ;X hair ]。
step C: inputting the user comments and related images and related aspect words of the product or service into a trained deep learning network model to obtain the emotion polarity of the user comments and related images aiming at the specific aspect in the product or service.
Step B8: will ultimately characterize vector X fin And inputting a final prediction layer, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function loss, and updating each parameter by using a random gradient descent method.
Step B9: and when the iteration change of the loss value generated by the deep learning network model is smaller than a given threshold value or the maximum iteration number is reached, terminating the training process of the deep learning network model.
As shown in fig. 3, this embodiment further provides a multi-mode comment emotion analysis system adopting the above method, including: the system comprises a data collection module, a preprocessing module, a coding module, a network training module and an emotion analysis module.
The data collection module is used for extracting user comments and related images, aspect words in the comments and position information of the aspect words, marking emotion polarities of the aspects and constructing a training set.
The preprocessing module is used for preprocessing training samples in a training set, and comprises word segmentation processing, stop word removal, image size adjustment, syntactic dependency analysis, selection of a related knowledge triplet set and an image tag set and generation of a text-knowledge-image heterogram.
The encoding module is used for searching word vectors of knowledge words of the knowledge triplet set in the pre-trained knowledge map word vector matrix to obtain knowledge word characterization vectors of the knowledge triplet set.
The network training module is used for inputting the processed user comments, related images, aspects, text-knowledge-image heterograms and knowledge word characterization vectors of the knowledge triplet set into the deep learning network to obtain final characterization vectors of the multi-mode comments, training the deep learning network by using the characterization vectors, which belong to a certain category, probability and labels in training sets as losses, and training the whole deep learning network by using the minimum losses as targets to obtain a deep learning network model based on the knowledge map and the heterogeneous map convolution network.
The emotion analysis module extracts aspects in the input user comments by using an NLP tool, analyzes and processes the input comments, images and aspects by using a trained deep learning network model based on a knowledge graph and a heterogeneous graph convolution network, and outputs emotion evaluation polarities related to specific aspects in the user comments and related images.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. The multi-mode comment emotion analysis method based on heterogram convolution is characterized by comprising the following steps of:
step A: collecting user comments and related images, extracting aspect words of products or services related to the user comments, and labeling emotion polarities of the user comments aiming at specific aspects of the products or services so as to construct a training set DB;
And (B) step (B): training a deep learning network model DLM based on a knowledge graph and a heterogeneous graph convolution network by using a training set DB, wherein the training set DB is used for analyzing emotion polarities of user comments and related images on specific aspects of products or services;
step C: inputting the user comments and related images and related aspect words of the product or service into a trained deep learning network model to obtain the emotion polarity of the user comments and related images aiming at the specific aspect in the product or service.
2. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 1, wherein the step B specifically includes the steps of:
step B1: coding each training sample in the training set DB to obtain a comment semantic representation vector X r Semantic representation vector X of aspects a Syntax dependency adjacency matrix A r Image region characterization vector X im
Step B2: selecting a Set of knowledge triples from a knowledge graph that are relevant to a comment context according to a dynamic knowledge selection mechanism skt Then coding the knowledge words to obtain a Set skt Knowledge word characterization vector X kg
Step B3: representing vector X by semantics of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air The method comprises the steps of carrying out a first treatment on the surface of the Acquiring a Set of tag t tuples related to comment context through an image tag selection mechanism sit
Step B4: for X a Averaging pooling to obtain aspect average characterization vector
Figure FDA0004068450380000011
For X r Performing position coding to obtain a comment characterization vector X with enhanced positions pw By connection X pw and />
Figure FDA0004068450380000012
Obtaining a characterization vector C sd,O The method comprises the steps of carrying out a first treatment on the surface of the Generating a text-knowledge-image heterogeneous graph TKIHG according to a text-knowledge-image heterogeneous composition strategy to obtain an adjacency matrix A of the text-knowledge-image heterogeneous graph TKIHG hg Then the nodes are encoded by using a cross-modal attention mechanism to obtain a node characterization vector C of the heterogeneous graph TKIHG hg,0
Step B5: will characterize vector C sd,0 And node characterization vector C of heterogeneous graph TKIHG hg,0 Respectively inputting the text graph convolution vector C into two different L-layer graph convolution networks, respectively recording the text graph convolution network RGCN and the text-knowledge-image heterograph convolution network HGCN as comments, respectively learning and extracting syntactic dependency relationship and heterogeneous information of context semantics, image labels and external knowledge to obtain a text graph convolution characterization vector C sd,L And a isomerous map volume characterization vector C hg,L
Step B6: convolving the text map with a token vector C sd,L Performing aspect masking operation to obtain a text graph convolution mask characterization vector C of comments mask,L Semantic representation vector X of remarks r Using the interactive attention mechanism, further enhancing the context representation with aspect information of the aggregate syntax information, obtaining an aspect enhanced token vector X of comment r ea
Step B7: wrapping the isomerism map by a token vector C hg,L Enhancing the token vector X with respect to comment r, respectively ea Aspect-dependent image region characterization vector X air Using a cross-mode attention mechanism, further utilizing heterogeneous information such as image labels and external knowledge to strengthen learning text modes and emotion characteristics of the image modes, and obtaining a heterogeneous enhanced text characterization vector X hm And image characterization vector X hair Finally connect X hm and Xhair Obtaining a final characterization vector X fin
Step B8: will ultimately characterize vector X fin Inputting a final prediction layer, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to a target loss function loss, and updating each parameter by using a random gradient descent method;
step B9: and when the iteration change of the loss value generated by the deep learning network model is smaller than a given threshold value or the maximum iteration number is reached, terminating the training process of the deep learning network model.
3. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 2, wherein the step B1 specifically includes the steps of:
step B11: traversing a training set DB, performing word segmentation processing on user comments and aspects in the training set DB, removing stop words, and adjusting an image related to the comments into an image with 3 multiplied by 224 pixels, wherein each training sample in the DB is expressed as d= (r, im, a, p);
wherein r is a user comment, im is an adjusted image related to the user comment, a is an aspect word or phrase of a product or service related to the user comment extracted from the user comment r, and p epsilon (positive, negative and neutral) is the emotion polarity of the comment on the aspect;
comment r is expressed as:
Figure FDA0004068450380000021
wherein ,
Figure FDA0004068450380000022
i=1, 2, …, n, n being the number of words in comment r;
aspect a is represented as:
Figure FDA0004068450380000023
wherein ,
Figure FDA0004068450380000024
for the i-th word in aspect a, i=1, 2, …, m, m is the number of words of aspect a;
step B12: comment on step B11
Figure FDA0004068450380000025
Coding by using a pre-training model BERT and reducing the dimension by using a full-connection layer to obtain a semantic representation vector X of the comment r r Expressed as:
Figure FDA0004068450380000026
wherein ,
Figure FDA0004068450380000027
For comment the i-th word->
Figure FDA0004068450380000028
The corresponding semantic representation vector, d, represents the output dimension;
step B13: aspect obtained for step B11
Figure FDA0004068450380000029
Encoding by using a pre-training model BERT and reducing dimension by using a full-connection layer to obtain a semantic representation vector X of the aspect a a Expressed as:
Figure FDA00040684503800000210
wherein ,
Figure FDA00040684503800000211
representing aspect the i-th word->
Figure FDA00040684503800000212
The corresponding word vector, d, represents the output dimension;
step B14: the adjusted image is encoded and smoothed by using a pre-training model ResNet-152, and an image region representation vector is obtained after full-connection layer dimension reduction
Figure FDA0004068450380000031
Wherein z represents the feature map size of the output and d represents the output dimension;
step B15: carrying out syntactic dependency analysis on the comment r to obtain a syntactic dependency tree SDT;
Figure FDA0004068450380000032
wherein ,
Figure FDA0004068450380000033
representing words ∈in comments>
Figure FDA0004068450380000034
He word->
Figure FDA0004068450380000035
A syntactic dependency exists between the two;
step B16: encoding the syntax-dependent tree SDT obtained in the step B15 into an n-order adjacency matrix A r ,A r Expressed as:
Figure FDA0004068450380000036
wherein ,
Figure FDA0004068450380000037
a1 indicates the word ++in the comment>
Figure FDA0004068450380000038
He word->
Figure FDA0004068450380000039
There is a syntactic dependency between->
Figure FDA00040684503800000310
A0 indicates the word->
Figure FDA00040684503800000311
He word->
Figure FDA00040684503800000312
There is no syntactic dependency between them.
4. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 3, wherein said step B2 specifically includes the steps of:
Step B21: selecting a context word with edge connection with the aspect word from the syntax dependency tree SDT obtained in the step B15, and combining the context word with the aspect a to form a seed node Set with the length of m' sn Expressed as:
Figure FDA00040684503800000313
wherein ,
Figure FDA00040684503800000314
is an aspect word or a context word with edge connection, m is less than or equal to m' is less than or equal to n;
step B22: for each node in the seed node set, selecting its emotion polarity word and 5 words similar to its semantic meaning from the knowledge graph to respectively form 5 candidate knowledge triples CKT of the seed node, which are expressed as:
CKT=<w sn ,w sp ,w ss >
wherein ,wsn Is a seed node word, w sp Is emotion polar word, w ss Is a semantically similar word;
step B23: constructing each candidate knowledge triplet CKT into one candidate knowledge sentence r cks Inputting a pre-training model BERT and taking an average value to obtain an average semantic representation vector X cks ;r cks Expressed as:
r cks = "word' w sn 'emotional polarity is' w sp 'the word most similar to its semantics is' w ss ”’
Step B24: semantic representation vector X of comments r Average semantic representation vector X is obtained after averaging rm Then calculate X rm and Xcks Cosine similarity between the two to obtain comment text r and candidate knowledge sentences r cks The similarity score of (2) is calculated as follows:
Similarity_Score(r,r cks )=CosineSimilarity(X rm ,X cks )
wherein ,
Figure FDA0004068450380000041
step B25: b24, calculating similarity scores of candidate knowledge sentences constructed by all candidate knowledge triples CKT, and selecting the original candidate knowledge triples of the top k candidate knowledge sentences with the highest scores to form a knowledge triplet set of the seed node as the external knowledge of the seed node most relevant to the context;
step B26: for seed node Set sn Repeating the steps to obtain a Set containing knowledge triples of all seed nodes skt Expressed as:
Figure FDA0004068450380000042
wherein ,
Figure FDA0004068450380000043
is the i seed node->
Figure FDA0004068450380000044
A knowledge triplet set;
step B27: set of knowledge triples by knowledge graph embedding skt All emotion polarity words and semantic similar words of the tree are encoded to obtain node characterization vectors of the tree as follows
Figure FDA0004068450380000045
wherein ,/>
Figure FDA0004068450380000046
The knowledge representation vector corresponding to the ith emotion polarity word or semantic similarity word is obtained by training a knowledge word vector matrix in advance
Figure FDA0004068450380000047
Where d represents the dimension of the knowledge word vector and V is the number of words in which the knowledge word is embedded.
5. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 4, wherein said step B3 specifically includes the steps of:
Step B31: semantic representation vector X for a pair of aspects a And image region characterization vector X im Obtaining aspect-dependent image region characterization vector X using interactive attention mechanisms air ,X air The calculation process of (2) is as follows:
Figure FDA0004068450380000048
/>
Figure FDA0004068450380000049
Figure FDA00040684503800000410
wherein ,
Figure FDA00040684503800000411
(·) T representing a transpose operation;
step B32: inputting the image im into a pre-training model ResNet-152 to obtain 10 related image tags tag= { tag 1 ,…,tag i ,…,tag 10 And then adding the same to the seed node Set obtained in the step 21 sn Each seed node word in the tree is respectively input into a pre-training model BERT to obtain semantic characterization vectors of the seed node words
Figure FDA0004068450380000051
and />
Figure FDA0004068450380000052
Figure FDA0004068450380000053
wherein ,/>
Figure FDA0004068450380000054
Is seed node->
Figure FDA0004068450380000055
Semantic token vector of>
Figure FDA0004068450380000056
Is the ith image tag i Semantic token vectors of (a);
step B33: calculating an image tag i Semantic characterization vectors and seed nodes of (a)
Figure FDA0004068450380000057
Cosine similarity of semantic representation vectors of (2) to obtain seed node +.>
Figure FDA0004068450380000058
And image tag i The similarity score between the two is calculated as follows:
Figure FDA0004068450380000059
step B34: calculating seed nodes according to step B33
Figure FDA00040684503800000510
And similarity scores of 10 labels, and selecting top t image labels with highest scores and seed nodes +.>
Figure FDA00040684503800000511
Forming a tag t tuple TT together; expressed as:
Figure FDA00040684503800000512
step B35: for seed node Set sn Repeating the steps to obtain a Set containing the label t-tuple of all the seed nodes sit Expressed as:
Set sit ={TT 1 ,…,TT i ,…,TT m′ }
wherein ,TTi Is the ith seed node
Figure FDA00040684503800000513
Is included in the tag t tuple of (c).
6. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 5, wherein said step B4 specifically includes the steps of:
step B41: semantic representation vector X for a pair of aspects a Carrying out average pooling operation to obtain
Figure FDA00040684503800000514
The calculation formula is as follows:
Figure FDA00040684503800000515
wherein ,
Figure FDA00040684503800000516
step B42: semantic representation vector X for comments r Performing position coding to obtain a comment position-enhanced characterization vector X pw ,X pw Expressed as:
Figure FDA00040684503800000517
Figure FDA00040684503800000518
wherein ,
Figure FDA00040684503800000519
reinforcing a characterization vector for the position corresponding to the ith word in the comment r, wherein "·" represents that real numbers and the vector are multiplied, and pw i The calculation mode of the position weight corresponding to the i-th word in the comment r is as follows:
Figure FDA0004068450380000061
wherein θ and θ+m-1 represent the starting and ending positions of aspect a in the comment r, respectively;
step B43: averaging the aspect characterization vectors obtained in step B41
Figure FDA0004068450380000062
X obtained in step B42 pw Connecting to obtain a characterization vector C sd,0 ,/>
Figure FDA0004068450380000063
Expressed as:
Figure FDA0004068450380000064
wherein ,
Figure FDA0004068450380000065
i=1, 2, …, n, ", which is a token vector corresponding to the i-th word in comment r that is input into the graph convolution network; "means vector join operations;
step B44: based on the Set obtained in step 26 skt Knowledge triplet sets of each seed node in the tree, and emotion polarity w in each knowledge triplet in turn sp And semantically similar words w ss Respectively serving as knowledge expansion nodes and adding an edge to be connected with the relevant seed nodes in the syntax dependency tree SDT obtained in the step B15; similarly, a Set is obtained based on step 35 sit The label t tuple of each seed node in the tree is used for sequentially connecting each image label serving as an image label expansion node and adding an edge with the seed node related to the syntax dependency tree SDT; to avoid redundancy, if a new knowledge expansion node or image tag expansion node already exists in the graph, only one edge is added between it and the relevant seed node; finally, obtaining a text-knowledge-image heterogeneous graph TKIHG;
Figure FDA0004068450380000066
wherein ,
Figure FDA0004068450380000067
representing words ∈in comments>
Figure FDA0004068450380000068
He word->
Figure FDA0004068450380000069
There is a syntactic dependency between->
Figure FDA00040684503800000610
Representing words ∈in comments>
Figure FDA00040684503800000611
The emotion polar word is w sp ,/>
Figure FDA00040684503800000612
Representing words ∈in comments>
Figure FDA00040684503800000613
The semantically similar word is w ss ,/>
Figure FDA00040684503800000614
Word ∈in representation and comment>
Figure FDA00040684503800000615
The relevant image label is tag j
Step B45: encoding the text-knowledge-image iso-composition TKIHG obtained in the step B44 into a u-order adjacency matrix A hg ,A hg Expressed as:
Figure FDA00040684503800000616
wherein ,
Figure FDA00040684503800000617
1 indicates that a connection relationship exists between two words, < ->
Figure FDA00040684503800000618
If 0, no connection relationship exists between the two words, and u=n+m' × (t+k); the first n nodes in the graph are words in comment texts, the (n+1) th to (n+m '. Times.t) th nodes are words of related image label nodes, and the (n+m'. Times.t+1) th to (u) th nodes are words of related knowledge expansion nodes; />
Step B46: image tag node words in a text-knowledge-image heterograph TKIHG are spliced to comment text r in sequence to form a new sequence, and then a pre-training model BERT is input to perform dimension reduction by using a full-connection layer to obtain an upper and lower Wen Yuyi characterization vector X of a text mode rt Expressed as:
Figure FDA0004068450380000071
wherein
Figure FDA0004068450380000072
Step B47: representation vector X for upper and lower Wen Yuyi of text modality rt And step 27, obtaining a characterization vector X of the knowledge expansion node kg Obtaining semantic guided external knowledge representation vector X by using cross-modal attention mechanism kg′ The calculation process is as follows:
Figure FDA0004068450380000073
wherein ,
Figure FDA0004068450380000074
W 1 ,W 2 and />
Figure FDA0004068450380000075
Is a learnable weight matrix;
step B48: representation vector X for upper and lower Wen Yuyi of text modality rt Mapping it to an X using a self-attention mechanism kg′ Under the same feature space, a transformed upper and lower Wen Yuyi characterization vector X is obtained rt′ The calculation process is as follows:
Figure FDA0004068450380000076
wherein ,
Figure FDA0004068450380000077
is the upper and lower Wen Yuyi characterization vector of the mapped text mode, W 4 ,W 5 And
Figure FDA0004068450380000078
is a learnable weight matrix;
step B49: characterizing the transformed upper and lower Wen Yuyi to vector X rt′ And semantically guided external knowledge representation vector X kg′ Connecting to obtain a node characterization vector C of the text-knowledge-image heterograph TKIHG hg,0 It is represented as follows:
Figure FDA0004068450380000079
wherein
Figure FDA00040684503800000710
7. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 6, wherein said step B5 specifically includes the steps of:
step B51: for the comment text graph rolling network RGCN, the characterization vector C obtained in the step B43 is obtained sd,0 Input first layer graph rolling network utilizing adjacency matrix A r Updating the characterization vector of each word node, and outputting C sd,1 And is used as the input of the next layer graph rolling network;
wherein ,Csd,1 Expressed as:
Figure FDA0004068450380000081
wherein ,
Figure FDA0004068450380000082
is the output of the ith node in the first layer graph roll-up network,/for>
Figure FDA0004068450380000083
The calculation formula of (2) is as follows:
Figure FDA0004068450380000084
/>
Figure FDA0004068450380000085
wherein ,
Figure FDA0004068450380000086
is a weight matrix that can be learned, +.>
Figure FDA0004068450380000087
Is a bias vector; reLU is an activation function; the ith node in the graph roll-up network and the ith word in comment r +.>
Figure FDA0004068450380000088
Correspondingly, d i Representing the degree of the ith node, d i +1 is to prevent calculation errors when the degree of the ith node is 0;
step B52: for a text-knowledge-image heterograph convolution network HGCN, the node characterization vector C of the heterograph TKIHG obtained in the step B49 is calculated hg,0 Input first layer graph rolling network utilizing adjacency matrix A hg Updating the characterization vector of each node, and outputting C hg,1 And is used as the input of the next layer graph rolling network;
wherein ,Chg,1 Expressed as:
Figure FDA0004068450380000089
wherein ,
Figure FDA00040684503800000810
is the output of the ith node in the first layer graph roll-up network,/for>
Figure FDA00040684503800000811
The calculation formula of (2) is as follows:
Figure FDA00040684503800000812
Figure FDA00040684503800000813
wherein
Figure FDA00040684503800000814
Is a weight matrix that can be learned, +.>
Figure FDA00040684503800000815
Is a bias vector; reLU is an activation function; edge generation between nodes in graph rolling networkThe table nodes have a connection relationship, d i Representing the degree of the ith node, d i +1 is to prevent calculation errors when the degree of the ith node is 0;
step B53: respectively C sd,1 and Chg,1 Inputting to the next layer graph rolling network of RGCN and HGCN, repeating steps B51 and B52;
wherein, for comment text graph rolling network RGCN, the output of the first layer graph rolling network
Figure FDA00040684503800000816
As input of the layer 1 (l+1) graph convolution network, obtaining a text graph convolution token vector ++after iteration is finished>
Figure FDA0004068450380000091
For the text-knowledge-image heterograph convolution network HGCN, the output of the layer I graph convolution network is +. >
Figure FDA0004068450380000092
As input of the layer 1 graph convolution network, obtaining the isomerous graph convolution characterization vector +.>
Figure FDA0004068450380000093
L is the layer number of the graph rolling network, and L is more than or equal to 1 and less than or equal to L.
8. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 7, wherein said step B6 specifically includes the steps of:
step B61: the text chart obtained in the step B53 is rolled to represent the vector C sd,L Performing aspect masking operation, masking text graph convolution output which does not belong to aspect words, and obtaining a text graph convolution aspect characterization vector C of comments r mask,L The calculation process is as follows:
Figure FDA0004068450380000094
where θ represents the start position of the aspect in the comment sentence, θ+m-1 represents the end position of the aspect in the comment sentence,
Figure FDA0004068450380000095
representing a graph convolution aspect characterization vector corresponding to an ith word in the comment, wherein 0 represents a zero vector with a dimension d;
step B62: the semantic characterization vector X of the comment r obtained in the step B12 is obtained r And the graph convolution aspect characterization vector C of comment r obtained in step B61 mask,L Inputting an interactive attention network, enhancing the context representation by using aspect information of aggregation syntax information through an interactive attention mechanism, and obtaining an aspect enhancement characterization vector X of comments r ea The calculation formula is as follows:
Figure FDA0004068450380000096
Figure FDA0004068450380000097
Figure FDA0004068450380000098
wherein ,(·)T Representing the transpose operation, beta i Is the attention weight of the i-th word in comment r.
9. The multi-modal comment emotion analysis method based on heterographic convolution according to claim 8, wherein said step B7 specifically includes the steps of:
step B71: the volume characterization vector C of the isomerism map obtained in the step B53 hg,L And comment r aspect enhancement characterization vector X ea Further utilizing image tags and external knowledge and other heterogeneous forms by using a cross-modal attention mechanismInformation is used for enhancing the learning text mode to obtain a heterogeneous enhanced text characterization vector X hm The calculation process is as follows:
Figure FDA0004068450380000101
wherein ,
Figure FDA0004068450380000102
W 9 ,W 10 and />
Figure FDA0004068450380000103
A learnable weight matrix;
step B72: characterizing vector C for a volume of heterogeneous images hg,L Aspect-dependent image region characterization vector X air Using a cross-mode attention mechanism, further utilizing heterogeneous information such as image labels, external knowledge and the like to strengthen emotion characteristics of a learning image mode, and obtaining a heterogeneous enhanced image characterization vector X hair The calculation process is as follows:
Figure FDA0004068450380000104
wherein ,
Figure FDA0004068450380000105
W 12 ,W 13 and />
Figure FDA0004068450380000106
A learnable weight matrix;
step B73: text token vector X for heterogeneous enhancement hm And image characterization vector X hair Performing connection operation to obtain final representation final characterization vector X fin The calculation process is as follows:
X fin =[X hm ;X hair ]。
10. a multimodal comment emotion analysis system employing the method of any of claims 1-9, comprising:
the data collection module is used for extracting user comments and related images, aspect words in the comments and position information of the aspect words, marking emotion polarities of the aspects and constructing a training set;
the preprocessing module is used for preprocessing training samples in a training set, and comprises word segmentation processing, stop word removal, image size adjustment, syntactic dependency analysis, selection of a related knowledge triplet set and an image tag set and generation of a text-knowledge-image heterogram;
the coding module is used for searching word vectors of knowledge words of the knowledge triplet set in the pre-trained knowledge map word vector matrix to obtain knowledge word characterization vectors of the knowledge triplet set;
the network training module is used for inputting the processed user comments, related images, aspects, text-knowledge-image heterograms and knowledge word characterization vectors of the knowledge triplet set into the deep learning network to obtain final characterization vectors of the multi-mode comments, training the deep learning network by using the probability that the characterization vectors belong to a certain category and labels in a training set as losses, and training the whole deep learning network by using the minimum losses as targets to obtain a deep learning network model based on the knowledge map and the heterogeneous map convolution network; and
And the emotion analysis module is used for extracting aspects in the input user comments by using an NLP tool, then analyzing and processing the input comments, images and aspects by using a trained deep learning network model based on the knowledge graph and the heterogeneous graph convolution network, and outputting emotion evaluation polarities related to specific aspects in the user comments and related images.
CN202310083964.6A 2023-01-31 2023-01-31 Multimode comment emotion analysis method and system based on heterogram convolution Pending CN116258147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310083964.6A CN116258147A (en) 2023-01-31 2023-01-31 Multimode comment emotion analysis method and system based on heterogram convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310083964.6A CN116258147A (en) 2023-01-31 2023-01-31 Multimode comment emotion analysis method and system based on heterogram convolution

Publications (1)

Publication Number Publication Date
CN116258147A true CN116258147A (en) 2023-06-13

Family

ID=86680360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310083964.6A Pending CN116258147A (en) 2023-01-31 2023-01-31 Multimode comment emotion analysis method and system based on heterogram convolution

Country Status (1)

Country Link
CN (1) CN116258147A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573988A (en) * 2023-10-17 2024-02-20 广东工业大学 Offensive comment identification method based on multi-modal deep learning
CN117573988B (en) * 2023-10-17 2024-05-14 广东工业大学 Offensive comment identification method based on multi-modal deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573988A (en) * 2023-10-17 2024-02-20 广东工业大学 Offensive comment identification method based on multi-modal deep learning
CN117573988B (en) * 2023-10-17 2024-05-14 广东工业大学 Offensive comment identification method based on multi-modal deep learning

Similar Documents

Publication Publication Date Title
CN111444340B (en) Text classification method, device, equipment and storage medium
WO2022007823A1 (en) Text data processing method and device
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
WO2021204014A1 (en) Model training method and related apparatus
CN113641820A (en) Visual angle level text emotion classification method and system based on graph convolution neural network
CN113553418B (en) Visual dialogue generation method and device based on multi-modal learning
WO2023236977A1 (en) Data processing method and related device
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN116628186A (en) Text abstract generation method and system
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN114492459A (en) Comment emotion analysis method and system based on convolution of knowledge graph and interaction graph
Cao et al. Visual question answering research on multi-layer attention mechanism based on image target features
Song et al. Exploring explicit and implicit visual relationships for image captioning
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
CN112307179A (en) Text matching method, device, equipment and storage medium
CN117216234A (en) Artificial intelligence-based speaking operation rewriting method, device, equipment and storage medium
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN116680407A (en) Knowledge graph construction method and device
CN114881038B (en) Chinese entity and relation extraction method and device based on span and attention mechanism
CN116910190A (en) Method, device and equipment for acquiring multi-task perception model and readable storage medium
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
CN115422945A (en) Rumor detection method and system integrating emotion mining
CN115659242A (en) Multimode emotion classification method based on mode enhanced convolution graph
CN115357712A (en) Aspect level emotion analysis method and device, electronic equipment and storage medium
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination