CN115269847A

CN115269847A - Knowledge-enhanced syntactic heteromorphic graph-based aspect-level emotion classification method

Info

Publication number: CN115269847A
Application number: CN202210922723.1A
Authority: CN
Inventors: 吴丽娟; 陆广泉; 李杰成; 张魁; 张桂衔
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-11-01

Abstract

The invention discloses an aspect-level emotion classification method based on a knowledge-enhanced syntactic heteromorphic graph, which comprises the following steps of: 1) A data acquisition stage; 2) Constructing an enhanced syntax heterogeneous graph; 3) Obtaining the enhanced syntactic characteristics under the local context of the domain knowledge; 4) Constructing global semantic graph characteristics; 5) A feature adaptive fusion stage; 6) A feature vector output stage; 7) And (5) a model training stage. The method enhances the generalization ability of the model and improves the emotion classification ability of the face text.

Description

Knowledge-enhanced syntactic heteromorphic graph-based aspect-level emotion classification method

Technical Field

The invention relates to the technical field of text aspect level emotion analysis in natural language processing, in particular to an aspect level emotion classification method based on a knowledge enhanced syntactic heterograph.

Background

Emotion analysis is a fundamental and meaningful task in Natural Language Processing (NLP). The method aims to mine emotional information from the user comment text, help enterprises or consumers to provide valuable information and make reasonable decisions. Aspect-Based Sentiment Analysis (ABSA for short) is a fine-grained Sentiment Analysis task, and the ABSA is generally divided into Aspect Extraction (AE) and Aspect-Level Sentiment Classification (ALSC). We focus only on the ALSC task, which aims to identify the emotional polarity (i.e., positive, negative, or neutral) of aspects that appear explicitly in a sentence. For example, "atmosphere is good, but food is bad! In this sentence, the two terms "atmosphere" and "food" have positive and negative emotional polarities, respectively.

In the context of the aspect-based sentiment classification task earlier, various attention mechanisms have been proposed to model the interaction between contextual words and aspects to predict sentiment polarity for a particular aspect. Wang et al, among other things, propose an attention-based long-short term memory network (ATAE-LSTM) that feeds aspects and sentence links into the LSTM, generates an aspect-aware sentence representation, and assigns weights to contexts via an attention mechanism. Although attention-based models have achieved good performance, due to the complexity of the text, the mere use of attention may not accurately capture the important relationships between the facets and contextual words, and neglect the long-term dependency of the facet words on emotion-related words, which limits the performance of the model. With the development of the pre-training model PTMS, a model such as BERT (2019) is largely used in the task of emotion analysis. The BERT-ADA model uses a large corpus in the field to perform fine adjustment on the BERT, so that the performance of the ALSC task is improved.

In recent years, graph Neural Networks (GNNs) in dependency trees have attracted considerable attention from ALSC research. Zhang et al propose to construct GCN model based on dependency tree, use syntax dependence to model aspect and context, have shortened the dependence distance between aspect and the opinion word, have alleviated the problem of long-term dependence. However, the graph neural network-based models provide a good improvement over earlier neural network models, but they also have drawbacks. First, we observe that the use of syntactic dependencies alone does not fully exploit semantic information, and may yield opposite results for syntactically ambiguous sentence modeling. Secondly, most of the dependency graphs only consider the syntactic dependency of sentences, and the edges of the dependency graphs are binary (i.e. 0 or 1) in weight, and cannot assign weights to words with different emotions. Finally, when capturing syntactic dependencies, the local contextual information of the facet words is ignored as important for determining the emotional polarity of the specific facets to be predicted accurately.

Disclosure of Invention

The invention aims to provide an aspect level emotion classification method based on a knowledge-enhanced syntactic heterograph, aiming at the defects in the prior art. The method enhances the generalization ability of the model and improves the emotion classification ability of the face text.

The technical scheme for realizing the purpose of the invention is as follows:

an aspect level emotion classification method based on a knowledge enhancement syntax heterograph comprises the following steps:

1) A data acquisition stage: obtaining a comment text data set; acquiring external emotion knowledge, processing the acquired external emotion knowledge, and generating a key value pair file of words and scores;

2) And (3) constructing an enhanced syntactic heterogeneous graph stage: for a given sentence, an 'en _ core _ web _ sm' analytic sentence is loaded through a space tool, part-of-speech information of each word in the sentence is obtained through token, pos _ attribute, the part-of-speech information of each word is stored by using a pos list, the length of the sentence is calculated, the sequence length is obtained through n = len (pos), noun, adverb and adjective information are spliced into a matrix when an abnormal composition is constructed, and specifically, an initialized A matrix with 1 is constructed, and the size of the initialized A matrix is large and smallIs composed of

The type is float32, then whether a word in a sentence appears in an emotion dictionary is traversed, if the word appears, an emotion Score of the word is taken out and converted into a float type, otherwise, an emotion value is assigned to be 0, each word in the sentence is regarded as a node, the dependency relationship between the word and the word in a dependency tree is represented as an edge, in order to enhance the emotion information expression of the sentence, the Score of the emotion word in sentiment knowledge of SenticNet5 is used to enrich the representation of an adjacency matrix, if a dependency edge exists between two words, the value of the edge is 1 Score, then an A matrix is updated and initialized, in the composition, the relationship between a parent node and a child node which exist the dependency relationship is considered to be mutual, and the derived enhanced dependency graph is an undirected graph A _i,j ＝A _j,i In the enhanced dependency graph, the part of speech of the aspect in the comment sentence is that the NOUNs are many, and the aspect is important in the emotion classification task, so that the part of speech words in the sentence are more concerned, the description of the aspect in a sentence is usually an adjective, so that the adjective is also important in the sentence, a positive or negative adverb appears in the comment sentence, when the negative adverb is 'no, no', the emotion polarity of the aspect is opposite, which is also the reason for paying attention to the adverb in the comment, specifically, the name of the NOUN 'NOUN' and the adjective 'ADJ' and the adverb 'ADP' are stored by using the list m, and the sentence is traversed if pos [ i ] is]＝“NOUN”，A _i,-3 Has a value of 1 if pos [ i ]]＝“ADJ”，A _i,-2 Has a value of 1 if pos [ i ]]＝“ADP”，A _i,-1 Is set to 1, and finally, an enhanced syntactic heterogeneous graph matrix of the sentence is derived

3) And a stage of obtaining the enhanced syntactic characteristics under the local context of the domain knowledge: [ CLS ] was transformed using Tokenizer4Bert]+ text + [ SEP]The formal input generates a vector, and the vector is filled to the same length by pad _ and _ truncate, and is in the form of E, E = { w = { (w) } ₁ ,..,w _i .,w _a1 ,w _ai ,...,w _k Where k is a set maximum length, w _i Denotes the i +1 th word, w _ai The method is the item of the ith aspect, and the size of the sentence abnormal graph obtained after the abnormal graph is processed by np

Inputting E into pre-trained BERT-ADA in the field to obtain sentence vector representation

The BERT-ADA is a BERT model obtained by fine tuning in an Amazon notebook computer review data set and a Yelp data set challenge review corpus, the relative distance between token and an aspect word is obtained by utilizing the position of each token and the position of the aspect word, namely, a weighting matrix V with all 1 is initialized first, the length x of the aspect word and the starting position asp _ begin of the aspect word are obtained, and then the average center position Avg of the aspect word is obtained _a = (asp _ begin + asp _ len)/2, calculate relative distance between each context word and aspect word in sentence

Using the relative distance to further weight the sentence vector after BERT coding, if P _i If the value is less than the set threshold value 3, the semantic information of the text word is kept, and if the value is more than the threshold value 3, the text word of the semantic is constructed into a weighting vector

To weight the features, update the weighting matrix V = [ V ] of the input sequence according to the semantic relative distance of the words ₀ ,V ₁ …V _k ]Preliminary characterization of the BERT _ ADA will be passed

Mul () operation with weight matrix V, i.e. H _l ＝H _bert ·V，H _l Is the output of the local dynamic weight layer, and uses graph convolution network, i.e. GCN to obtain the feature table of local context with knowledge in the fieldShow H _l Syntactic heteromorphism matrix A with enhancement _h As input, enhanced syntactic dependency information in the local context of knowledge in the domain is then obtained via an activation function ReLU:

H _{s_loc} ＝ReLU(GCN(A _h ,H _l ,W))，

wherein, the formula of GCN is H ^l ＝σ(A _h H ^l-1 W ^l-1 +b ^l-1 )，W ^l-1 And b ^l-1 Is the linear transformation weight and bias term parameters of layer l-1 of the model, σ is a nonlinear function usually set as ReLU, the initial input H ⁰ Is a sentence representation H _l ；

4) And (3) constructing global semantic graph characteristics: comment on text and aspect words with "[ CLS ]]+ text + [ SEP]+ aspect + [ SEP]"get the vector representation of text _ Bert _ indices using Tokenizer4Bert, regenerate an index representation to distinguish comment text from aspect words, and put the first half sentence [ CLS ]]+ text + [ SEP]Index position is represented by 0, facet + [ SEP ]]The position of (a) is represented by 1, a BERT-segments-indexsx vector is obtained, text-BERT-indexses and BERT-segments-indexsx are input into the pre-training BERT-ADA in the field, and a vector representation H of a global sentence is obtained _g Then, H is added _g Input into multiple heads of attention, each head of attention obtains a feature

Splicing the attention matrixes of the h heads, and then dividing the spliced attention matrixes by h to obtain a semantic matrix

Fully obtaining semantic information of each word in the global sentence, and obtaining M through a Dropot layer in order to prevent overfitting _se ＝Dropout(M _se ) When constructing the semantic graph, M is added _se Diagonals are set to 0 by using the second time of the torch, and then the elements on diagonals are set to 1 by using the second time of the torch, and each word and oneselfThe semantic relevance of (a) is one hundred percent, so far, a sentence global semantic graph of neighborhood knowledge is obtained, namely the input of semantic GCN, H _glo ＝Relu(GCN(M _se ,H _g W)), the global semantic information feature H is updated by extraction using the convolutional net _glo ；

5) A characteristic self-adaptive fusion stage: enhancing syntactic dependency information under local context of knowledge in domain H _{s_loc} With global semantic information H _glo Splicing, i.e. X = torr _{s_loc} +H _glo ) Obtaining enhanced syntactic information and global semantic information of sentences under a local context considering domain knowledge, and inputting the enhanced syntactic information and the global semantic information into a self-attention layer for self-adaptive fusion after passing through a residual multi-layer perceptron to obtain characteristic representation suitable for tasks;

6) And a feature vector output stage: after the fused feature vectors are subjected to BERT pooling operation, outputting final vector representation, and obtaining positive, negative and neutral emotion polarity probabilities through a softmax classifier;

7) A model training stage: and (3) optimizing the network by adopting a cross entropy loss function as a loss function through an Adam algorithm, namely, the goal of training a classifier is to minimize the cross entropy loss between the predicted emotion distribution and the real emotion distribution:

wherein S is the number of training samples, C is the number of polarity classes,

is the true emotional distribution of the sample, y is the emotional distribution of the predicted sample, and λ is L ₂ The weights of the regularization terms, Θ, represent all trainable parameters.

The technical scheme has the advantages or beneficial effects that:

(1) The technical scheme integrates external emotion knowledge into a syntactic dependency graph, pays attention to noun, adverb and adjective information in a sentence, and obtains an enhanced syntactic heterogeneous graph representation of the sentence. The emotion supervision signal with finer granularity can be provided for the model, and the emotion correlation dependency relationship between the extraction aspect of the model and the context is promoted.

(2) According to the scheme, the local semantic information of knowledge in the field and the enhanced syntactic heterogeneous graph are used as input in the GCN to obtain enhanced syntactic dependency characteristics under the local context of the field knowledge, and local characteristics of the aspect words are enhanced. A sentence global semantic graph with neighborhood knowledge is built by utilizing multi-head self-attention, global semantic features are obtained by utilizing semantic GCN, and syntactic analysis errors existing in a syntactic resolver can be avoided to a certain extent.

(3) The residual multi-layer perceptron and the self-attention mechanism are used for carrying out self-adaptive fusion, the complementary emotion knowledge on the local context is enhanced with syntactic information and global semantic information, richer semantic expressions are obtained, and the effect of the aspect-level emotion classification model is further improved.

The method enhances the generalization ability of the model and improves the emotion classification ability of the face text.

Drawings

FIG. 1 is a model structure diagram of an embodiment;

FIG. 2 is a block diagram of an embodiment of a residual multi-layered perceptron.

Detailed Description

The invention will be described in further detail with reference to the following drawings and specific examples, but the invention is not limited thereto.

Example (b):

referring to fig. 1 and 2, the method for classifying the aspect-level emotion based on the knowledge enhanced syntactic dissimilarity graph includes the following steps:

1) A data acquisition stage: acquiring a comment text data set; obtaining external emotional knowledge, generating key-value pair files of words and scores using disclosed emotion analysis datasets, respectively REST14, LAP14, REST15, and REST16, wherein a laptop (LAP 14) and a restaurant (REST 14) dataset are from SemEval-2014 task 4 subtask 2, a restaurant (REST 15) dataset is from SemEval task 2015 task 12, and a restaurant (REST 16) dataset is from SemEval2016 task 5, the four datasets all relate to three emotion categories, namely positive, neutral, and negative, wherein each dataset sample includes a comment sentence, various aspects, and their corresponding emotion polarities, the example numbers of the different parts of the four datasets are shown in table 1, using SenticNet5 as the external emotion task 2016 task 5, wherein the emotion value range of the concept in the SenticNet5 dictionary is between-1 and +1, 1 represents extreme negative, +1 represents extreme negative, if there is no emotion value, then the concept is 0, then the concept is neutral, or the word is a word in the SenticNet5 dictionary, the sentiment score is 890.894. The positive emotion score is 880. The score of happier score; the words Dad and Terribly score-0.800 and-0.82 near-1 in the emotion dictionary, respectively, indicating that the emotion of these words is negative;

TABLE 1 statistics of data sets

2) And (3) constructing an enhanced syntactic heterogeneous graph: as shown in fig. 2, for a given sentence, the sentence is parsed by loading "en _ core _ web _ sm" through a space tool, obtaining part-of-speech information of each word in the sentence through token

The type is float32, then whether a word in a sentence appears in an emotion dictionary is traversed, if the word appears, the emotion Score of the word is taken out and converted into the float32 type, otherwise, the emotion value is assigned to be 0, each word in the sentence is regarded as a node, the dependency relationship between the word and the word in the dependency tree is represented as an edge, in order to enhance the emotional information expression of the sentence, the Score of the emotion word in sentiment knowledge of SenticNet5 is used, the representation of an adjacency matrix is enriched, if a dependency edge exists between two words, the value of the edge is 1 Score, and then the value of the edge is 1+ Score, and thenUpdating an initialization matrix A, in the composition, considering that the relationship between a parent node and a child node with dependency relationship is mutual, and deriving the enhanced dependency graph as an undirected graph A _i,j ＝A _j,i After obtaining the enhanced dependency graph, through observation, the part of speech of the face word in the comment sentence is NOUN, and the face word is important in the emotion classification task, so that the word of the part of speech in the sentence is more concerned, the word describing emotion of the face word in one sentence is usually an adjective, so the adjective is also important in the sentence, a positive or negative adverb appears in the comment sentence, when a negative adverb appears, "none, none", the emotion polarity of the aspect is opposite, specifically, the name of NOUN "and adjective" and "ADP" are stored by using the list m, and the pos [ i ] and the adverb "ADP" are traversed in the sentence]＝“NOUN”，A _i,-3 Has a value of 1, if pos [ i ]]＝“ADJ”，A _i,-2 Has a value of 1 if pos [ i ]]＝“ADP”，A _i,-1 Is set to 1, and finally, an enhanced syntactic heterogeneous graph matrix of the sentence is derived

3) And a stage of obtaining enhanced syntactic characteristics under local context of domain knowledge: [ CLS ] was transformed using Tokenizer4Bert]+ text + [ SEP]The formal input generates a vector, and the vector is filled to the same length by pad _ and _ truncate, and is in the form of E, E = { w = { (w) } ₁ ,..,w _i .,w _a1 ,w _ai ,...,w _k Where k is a set maximum length, w _i Represents the i +1 th word, w _ai The method is the item of the ith aspect, and the size of the sentence abnormal graph obtained after the abnormal graph is processed by np

Wherein BERT-ADA is a challenge comment via Amazon notebook review dataset and Yelp datasetObtaining the relative distance between the token and the aspect word by utilizing the position of each token and the position of the aspect word, namely, firstly initializing a weighting matrix V with all 1 to obtain the length x of the aspect word and the initial position asp _ begin of the aspect word, and then obtaining the average central position Avg of the aspect word _a = (asp _ begin + asp _ len)/2, calculate relative distance between each context word and aspect word in sentence

The encoded sentence vector of BERT is further weighted by the relative distance if P _i If the value is less than the set threshold value 3, the semantic information of the self-body is kept, and if the value is greater than the threshold value 3, the upper and lower words with relatively less semantics are constructed into a weighting vector

To weight the features, updating the weighting matrix of the input sequence V = [ V ] according to the semantic relative distance of the words ₀ ,V ₁ …V _k ]Will go through preliminary characterization of BERT _ ADA

Mul () operation with weight matrix V, i.e. H _l ＝H _bert ·V，H _l Is the output of the local dynamic weight layer, and uses graph convolution network, i.e. GCN to obtain the characteristic representation H of local context with knowledge in the field _l Syntactic heteromorphism matrix A with enhancement _h As input, enhanced syntactic dependency information in the local context of the domain knowledge is then obtained via an activation function ReLU:

H _{s_loc} ＝ReLU(GCN(A _h ,H _l ,W))，

wherein, the formula of GCN is H ^l ＝σ(A _h H ^l-1 W ^l-1 +b ^l-1 )，W ^l-1 And b ^l-1 Is the linear transformation weight and bias term parameter of the l-1 layer of the model, σ is a nonlinear function usually set to ReLU, the initial input H ⁰ Is a sentence representation H _l ；

4) A global semantic graph feature construction stage: comment text and aspect words in "[ CLS ]]+ text + [ SEP]+ aspect + [ SEP]"get the vector representation of text _ Bert _ indices using token 4Bert, regenerate a position index representation to distinguish text from aspect words, and put the first half sentence [ CLS ]]+ text + [ SEP]Index position is represented by 0, facet + [ SEP ]]The position of (a) is represented by 1, a BERT _ segments _ indexsx vector is obtained, text _ BERT _ indexses and BERT _ segments _ indexsx are input into the pre-training BERT-ADA in the field, and a vector representation H of a global sentence is obtained _g Then, H is added _g Inputting into multi-head attention, obtaining a feature for each attention head

Sufficiently obtaining semantic information of each word in the global sentence, and obtaining M through a Dropout layer in order to prevent overfitting _se ＝Dropout(M _se ) When constructing the semantic graph, M _se Diag, setting the value on the diagonal to 0, setting the element on the diagonal to 1 by using torch, eye, and the semantic relevance of each word and the word is one hundred percent, so as to obtain the sentence global semantic graph of the neighborhood knowledge, namely the input of the semantic GCN, H _glo ＝Relu(GCN(M _se ,H _g W)), the global semantic information feature H is extracted and updated by the convolutional graph network _glo ；

5) A characteristic self-adaptive fusion stage: enhancing syntactic dependency information under local context of knowledge in domain H _{s_loc} With global semantic information H _glo Splicing, i.e. X = torr _{s_loc} +H _glo ) Obtaining enhanced syntactic information and global semantic information of sentences under local context considering domain knowledge, and then obtaining enhanced syntactic information and global semantic information of sentences through residual errorsInputting the data into a self-attention layer for self-adaptive fusion after the multi-layer perceptron to obtain characteristic representation suitable for tasks;

To better illustrate the advantages of the method of this example, the following experimental verification was performed:

in the experimental setting, when a semantic graph is constructed, the number of multi-head attention heads is set to be 16, the number of GCN layers is set to be 2, the number of Dropout layers is set to be 0.1, the learning rate of an optimizer Adma is 10-5, and the number of random seeds is set to be 568. In the experiment, classification accuracy (acc.) and a harmonic mean (F1 value) of accuracy and recall are used as performance evaluation indexes, wherein the higher the two evaluation indexes are, the better the model classification capability is. In order to verify the effectiveness of the model provided by the embodiment, some mainstream and newly developed models are selected in aspect level emotion classification and are compared, the relevant baseline of the models can be divided into two types, namely a semantic-based model and a grammar-based model in principle, table 2 shows the result of comparing the embodiment with a comparison model, and the following is a brief introduction to the relevant model:

based on the semantic model:

ATAE-LSTM: the sentence vector and the specific aspect vector are used as splicing input, and the attention-based LSTM is used for exploring the relation between the aspect and the sentence.

IAN: two LSTM were designed to model aspects and context separately, using an interactive attention mechanism to learn the aspects and the feature representation of the sentence.

BERT: the prediction method is a general BERT model, takes "[ CLS ] + sentence + [ SEP ]" as input, and uses the expression of [ CLS ] to perform prediction.

BERT-ADA: the method is a field adaptive pre-training model for challenging a comment corpus based on an amazon notebook computer comment data set and a Yelp data set.

Based on the grammar model:

ASGCN, constructing a graph volume network by using syntactic dependency on a sentence dependency tree.

DGEDT-BERT is a double-transformer-based network, and information fusion is achieved by learning a planar representation learned by a transform and a dependency graph.

KumagCN the HardKuma distribution is utilized to sample sentences to generate a specific potential graph structure, and the potential graph and a dependency tree adopt a gating mechanism to be combined.

And the BiGCN adopts a double-layer interactive graph convolution network to fully utilize a global vocabulary graph and a concept hierarchy graph.

As can be seen from table 2, firstly, the experimental results prove that the model LSGCN proposed in this example is superior to the semantic neural network-based and grammar-based network comparison models in the ALSC task. This verifies the validity of the knowledge-based enhanced syntactic heterogeneous graph model concept proposed in this example. Secondly, it can be observed that the BERT-based and syntactic method-based models (ASGCN, DGEDT-BERT, kumaGCN, biGCN) achieve better results on each dataset than the models using attention alone (ATAE-LSTM, IAN), wherein BERT-ADA is greatly improved compared to the common BERT model, which indicates that data in the field of use is very essential for emotion analysis tasks. However, these models ignore the importance of emotion knowledge and word parts of speech to the ALSC task, lack finer grained emotion signals when constructing a dependency graph, and do not combine semantic and syntactic information well, resulting in a model classification performance lower than that of the present invention.

TABLE 2 comparison of experimental results of this and related models

Claims

1. An aspect level emotion classification method based on a knowledge enhancement syntax heterograph is characterized by comprising the following steps:

1) A data acquisition stage: acquiring a comment text data set; acquiring external emotion knowledge, processing the acquired external emotion knowledge, and generating a key value pair file of words and scores;

2) And (3) constructing an enhanced syntactic heterogeneous graph stage: for a given sentence, an 'en _ core _ web _ sm' analytic sentence is loaded through a space tool, part-of-speech information of each word in the sentence is obtained through token, part-of-speech information of each word is stored by using a pos list, the length of the sentence is calculated, the sequence length is obtained through n = len (pos), noun, adverb and adjective information are spliced into a matrix when an abnormal composition is constructed, and specifically, an initialized A matrix with the size of 1 is constructed

The type is float32, then whether a word in a sentence appears in an emotion dictionary is traversed, if the word appears, an emotion Score of the word is taken out and converted into a float type, otherwise, an emotion value is assigned to be 0, each word in the sentence is regarded as a node, the dependency relationship between the word and the word in a dependency tree is represented as an edge, in order to enhance the emotion information expression of the sentence, the Score of the emotion word in sentiment knowledge of SenticNet5 is used to enrich the representation of an adjacency matrix, if a dependency edge exists between two words, the value of the edge is 1 Score, then an A matrix is updated and initialized, and in the composition, the relationship between a parent node and a child node which are considered to have the dependency relationship is the relationship between the parent node and the child nodeMutually, the derived enhanced dependency graph is an undirected graph A _i,j ＝A _j,i The enhanced dependency graph is obtained by observing that the part of speech of the aspect words in the comment sentences is NOUN, and the aspect words are important in the emotion classification task, so that the words of the part of speech in the sentences are more concerned, the description of the aspect words in one sentence is usually adjectives, so the adjectives are also important in the sentences, positive or negative adverbs can appear in the comment sentences, when the negative adverbs 'no, no' appear, the emotional polarities of the aspect words are opposite, specifically, the names of the NOUNs 'NOUN' and the adjectives 'ADJ' and 'ADP' are stored by using the list m, and the sentence is traversed if pos [ i ] i]＝“NOUN”，A _i,-3 Has a value of 1 if pos [ i ]]＝“ADJ”，A _i,-2 Has a value of 1 if pos [ i ]]＝“ADP”，A _i,-1 Is set to 1, and finally, an enhanced syntactic heterogeneous graph matrix of the sentence is derived

3) And a stage of obtaining enhanced syntactic characteristics under local context of domain knowledge: [ CLS ] was transformed using Tokenizer4Bert]+ text + [ SEP]The formal input generates a vector, and the vector is filled to the same length by pad _ and _ truncate, and is in the form of E, E = { w = { (w) } ₁ ,..,w _i .,w _a1 ,w _ai ,...,w _k Where k is a set maximum length, w _i Denotes the i +1 th word, w _ai The method is the item of the ith aspect, and the size of the sentence abnormal graph obtained after the abnormal graph is processed by np

Wherein BERT-ADA is a BERT model obtained by fine tuning in an Amazon notebook comment data set and a Yelp data set challenge comment corpus, and the BERT-ADA is obtained by utilizing the position of each token and the position of an aspect wordThe relative distance from token to the facet is obtained by first initializing a weighting matrix V of all 1, obtaining the length x of the facet and the start position asp _ begin of the facet, and then obtaining the average center position Avg of the facet _a = (asp _ begin + asp _ len)/2, calculate relative distance between each context word and aspect word in sentence

Mul () operation with weight matrix V, i.e. H _l ＝H _bert ·V，H _l Is the output of the local dynamic weight layer, using the graph-convolution network, i.e. GCN, to obtain a feature representation H of the local context with knowledge in the field _l Syntactic heteromorphism matrix A with enhancement _h As input, enhanced syntactic dependency information in the local context of knowledge in the domain is then obtained via an activation function ReLU:

H _{s_loc} ＝ReLU(GCN(A _h ,H _l ,W))，

4) And (3) constructing global semantic graph characteristics: will comment onText and aspect words with "[ CLS]+ text + [ SEP]+ aspect + [ SEP]"get the vector representation of text _ Bert _ indices using Tokenizer4Bert, regenerate an index representation to distinguish comment text from aspect words, and put the first half sentence [ CLS ]]+ text + [ SEP]Index position is represented by 0, facet + [ SEP ]]The position of (a) is represented by 1, a BERT-segments-indexsx vector is obtained, text-BERT-indexses and BERT-segments-indexsx are input into the pre-training BERT-ADA in the field, and a vector representation H of a global sentence is obtained _g Then, H is added _g Inputting into multi-head attention, each attention head gets a feature

Sufficiently obtaining semantic information of each word in the global sentence, and obtaining M through a Dropout layer in order to prevent overfitting _se ＝Dropout(M _se ) When constructing the semantic graph, M is added _se Diag, setting the value on the diagonal to be 0 after twice using the torch, setting the element on the diagonal to be 1 by using torch, eye, setting the semantic correlation between each word and the word to be one hundred percent, and obtaining a sentence global semantic graph of neighborhood knowledge, namely obtaining the input of semantic GCN, H _glo ＝Relu(GCN(M _se ,H _g W)), the global semantic information feature H is updated by extraction using the convolutional net _glo ；

5) A characteristic self-adaptive fusion stage: enhancing syntactic dependency information under local context of knowledge in domain H _{s_loc} With global semantic information H _glo Splicing, i.e. X = torr _{s_loc} +H _glo ) Get the field of considerationEnhancing syntactic information and global semantic information of sentences under the local context of knowledge, and inputting the information into a self-attention layer for self-adaptive fusion after passing through a residual multi-layer perceptron to obtain characteristic representation suitable for tasks;