CN116049393A

CN116049393A - Aspect-level text emotion classification method based on GCN

Info

Publication number: CN116049393A
Application number: CN202211650414.XA
Authority: CN
Inventors: 龙昭华; 王高远; 张�林
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-05-02

Abstract

The invention discloses an aspect-level text emotion classification method based on a graph roll-up neural network (GCN), which comprises the following steps: (1) pretreatment: given text information of sentence-aspect pairs, using BERT as a sentence encoder to extract hidden context representations, generating hidden state vectors (2) respectively inputting the hidden state vectors of sentences into a grammar GCN module and a semantic GCN module for feature learning (3) and adopting a biaffin module to realize effective information flow, namely exchanging grammar and semantic features. (4) And applying average pooling and connection operation on aspect nodes of the grammar GCN and the semantic GCN module to obtain final characteristic representation and realize aspect-oriented emotion classification.

Description

Aspect-level text emotion classification method based on GCN

Technical Field

The invention belongs to the field of natural language processing, and mainly relates to a method for classifying aspect-level text emotion based on a graph roll-up neural network (GCN).

Background

Social media is rapidly developed in the world, and emotion analysis is also a basic task in the field of natural language processing. The text with the emotion viewpoints of the user is analyzed by using natural language processing technology, and the emotion tendencies contained in the text are mined to become an important way for social public opinion supervision and after-sales information feedback of manufacturers. Therefore, researching the text emotion analysis method has important social meaning and commercial value.

Emotion analysis can be classified into three types of document-level emotion classification, sentence-level emotion classification and aspect-level emotion classification according to the granularity of research on text. Early emotion analysis was mainly performed for coarse granularity emotion analysis for document-level and sentence-level text. Document-level emotion classification refers to tagging the emotional tendency/polarity of an entire opinion-type document with an opinion, i.e., determining whether the document as a whole conveys a positive or negative opinion. Since document-level emotion analysis is too coarse, the emotion tendencies of the text cannot be accurately described. Sentence-level emotion classification performs emotion polarity judgment on subjective sentences. However, coarse-grained emotion analysis only assumes that a text contains only a single emotion, such as positive or negative, and cannot identify emotion from text containing multiple aspects. However, the emotion analysis of the aspect level text has small text granularity, so that emotion polarities in different aspects can be accurately judged in one sentence, and the emotion analysis becomes an important research direction in the emotion analysis field.

In current research on aspect-level emotion classification, aspect-level emotion classification is solved mainly by modeling semantic associations between context and aspect terms using an attention-based neural network. Wang et al use an attention mechanism to focus attention on different parts of the sentence, generating attention vectors for aspect emotion classification; chen et al propose a multi-layer attention network to infer the emotional polarity of this aspect; ma et al introduced an interactive attention mechanism that generated representations of aspects and contexts, respectively; wang et al designed an aspect-oriented hierarchical attention model for aspect emotion classification. Another trend is to use dependency trees, where syntactic information can make relationships between aspects and corresponding opinion words, and GCNs based on dependency trees achieve good results in ABSA. (Zhang et al, 2019; sun et al, 2019) a GCN layer is stacked to extract rich representations on the dependency tree; liang et al (2020) construct aspect-oriented and space-oriented graphs to learn specific emotional characteristics of aspects; the beam and the like construct an emotion enhancement graph by integrating emotion knowledge of SenticNet, and consider emotion information between opinion words and aspect words; the field et al (2021) uses dependency types to distinguish between different relationships in the dependency tree. However, these methods typically ignore an efficient fusion of syntactic structures and semantic associations, thereby obtaining more rich information.

The existing method has the defects that:

(1) Sentences have different sensitivity to semantic information and to semantic information. Particularly those sentences whose grammatical structure is not obvious, have a low sensitivity to grammatical information, meaning that grammatical information may not help the model in judging the emotional polarity of the sentence in some cases.

(2) The syntax structure cannot be fully utilized, and only the information of the neighbor node is considered. In addition, some indistinct cases express the emotion of the aspect vocabulary in a fuzzy manner, and the aspect vocabulary and the opinion vocabulary have no direct syntactic relation. Most methods use multiple layers of GCNs to derive expression of opinion words, which creates potential noise.

CN114791950a, an aspect-level emotion classification method and device based on part-of-speech location and graph convolution network. The method comprises the following steps: obtaining word vector representation of sentence text where the word is located according to part-of-speech position information of the word; generating an enhanced syntactic dependency tree integrating part-of-speech position information and graph convolution network information for each target sentence; through the interactive information between the learning aspect words and the context, the emotion classification is realized.

The invention is different from CN114791950A in that CN114791950A is to enhance the dependency tree characteristic information of the syntactic grammar by fusing part-of-speech positions, and the invention not only extracts the syntactic characteristic information, but also extracts semantic characteristic information by aspect attention and self attention, and further improves the aspect-level emotion classification effect by fusing syntactic and semantic characteristics, thereby having more ideal text emotion analysis accuracy.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. An aspect-level text emotion classification method based on GCN is provided. The technical scheme of the invention is as follows:

an aspect-level text emotion classification method based on GCN, which comprises the following steps:

step a, acquiring sentence-aspect pairs of an aspect-level emotion classification task, and extracting hidden context representations by using a BERT encoder (pre-training language model) as a sentence encoder to generate a hidden state vector;

b, respectively inputting a grammar GCN (graph convolution neural network) module and a semantic GCN module according to the hidden state vector of the sentence obtained in the step a; the grammar GCN (graph convolution neural network) module is used for extracting sentence grammar characteristics by utilizing the graph convolution network after establishing an adjacency matrix through a dependency tree, and the semantic GCN module is used for extracting better semantic characteristics by integrating an aspect awareness attention matrix and a self attention matrix;

step c, realizing effective information flow by adopting a BiAffine module; the BiAffine module is used for effectively exchanging related characteristics between the SynGCN and the SemCN module, and mutual BiAffine conversion is adopted;

step d, aggregating all aspect node representations from grammar GCN and semantic GCN module by pooling and connection to form final aspect representation, realizing aspect word-oriented emotion classification

Further, the step a specifically includes:

gives a sentence-aspect pair (s, a), s= { w ₁ ，w ₂ ，...，w _n }，a＝{a ₁ ，a ₂ ，...，a _m -is an aspect word, also a subsequence of sentences s, a representing a predefined set of aspects; w (w) _n Representing a word in a given sentence s, a _m An aspect word in a is preset. BERT encoder, employing "[ CLS ]]Sentence [ SEP ]]Aspect [ SEP ]]"as input text, special mark CLS in BERT, separator SEP in BERT, output after the BERT encoder is as shown in formula (1):

H＝[h ₀ ，h ₁ ，h ₂ ，...，h _m ，h _m+1 ]#(1)。

BERT is an architecture that can be used for many downstream tasks, such as answering questions, classification, NER, etc. The pre-trained BERT can be assumed to be a black box that provides a vector of h=768 dimensions for each input token (word) in the sequence. The sequence may be a single sentence or a sequence consisting of delimiters [ SEP ]]Separating and marking [ CLS ]]A first pair of sentences. h is a _m Represents an mth dimension context representation obtained after BERT encoding, etc.

Further, in said step b, the dependency tree is converted into a graph structure G in a grammar GCN module _syn ＝(A _sy ，H)，A _sy Is an adjacency matrix, then the grammar information is extracted by using a graph convolution network, and the formula is as follows:

/>

W ^(l+1) is the weight of layer l+1ReLU is a piecewise linear function, changing all negative values to 0, while positive values are unchanged;

vector representation representing layer 1 of grammar GCN, H ^c Is a feature matrix.

Wherein:

is Bi-LSTM or BERT code output, which is used as input of the first GCN layer;

is a learnable matrix of the layer I GCN, d _lstm Is the dimension, d, of the hidden representation of Bi-LSTM learning _gcn Is the dimension of the GCN layer output, each node can iteratively aggregate the information from its one-hop neighbors and update its representation through 1-step convolution operation, successfully integrate the syntax information into the final representation through the grammar graph convolution module>

Is a kind of medium.

In the semantic GCN module, attention matrix integrating the attention matrix and self-attention matrix perceived in terms of attention matrix to obtain better semantic features, wherein:

is the aspect awareness feature matrix and b is the bias.

Where K is equal to H, W generated by the coding layer ^a ∈R ^d×d ,W ^k ∈R ^d×d For a learnable weight matrix, for H _a Average pooling, replication n times, R ^d×d The representation matrix dimension is d dimension. Obtaining H _a ∈R ^n×d As an aspect word representation; the p-head aspect-aware attention is used to obtain an attention score matrix for a sentence,

indicating that it is obtained through the ith attention header;

A _self constructed by using self-intent, which captures the interaction between two arbitrary words in a single sentence, where Q and K are equal to H generated by the coding layer; w (W) ^Q ∈R ^d×d ，W ^K ∈R ^d×d Is a learnable parameter. Then, integrating the attention matrix of the aspect perception and the self-attention matrix;

A ⁱ ∈R ^n×n as input to the calculation. A is that ⁱ Representing the integrated attention profile matrix.

Further, in the step c, in order to effectively exchange correlation characteristics between the grammar GCN module and the semantic GCN module, mutual BiAffine transformation is performed as follows:

H ^syn′ ＝softmax(H ^syn W ₁ (H ^sem ) ^T )H ^sem #(11)

H ^sem′ ＝softmax(H ^sem W ₂ (H ^syn ) ^T )H ^syn #(12)

wherein W is ₁ And W is ₂ Is a trainable parameter. H ^syn′ Is the characteristic vector H obtained after BiAffine transformation ^sem′ Is a feature vector obtained after BiAffine conversion, a feature vector obtained by Hmem through semantic GCN module and H ^syn And the grammar GCN module obtains the characteristic vector.

Further, in the step d, an average pooling and connection operation is applied to the aspect nodes of the grammar GCN and the semantic GCN module, so as to obtain a final feature representation, which specifically includes:

and->

Representing the grammar feature matrix and the semantic feature matrix obtained by the average pool function, and r is the sum of

And->

And (5) connecting the obtained matrixes.

Where f (·) is the average pool function applied to the aspect node representation; then, inputting the obtained representation r into a linear layer, and inputting a softmax function to obtain emotion polarity probability distribution p, namely:

p(a)＝softmax(W _p r+b _p )#(16)

W _p and b _p Is a learnable weight and bias.

The invention has the advantages and beneficial effects as follows:

the invention discloses a GCN-based aspect-level text emotion classification method, which combines a self-attention and an aspect-aware attention mechanism in a semantic GCN module to acquire an attention score matrix of sentences, so that not only the semantics related to aspects but also the global semantics can be learned. In the grammar GCN module, a dependency tree structure diagram is utilized to carry out diagram convolution to learn grammar information, grammar and semantic GCN features are shared to carry out aspect-level emotion classification, so that the error of dependency classification is reduced, and the sensitivity of sentences to the grammar information and the semantic information is improved.

The invention mainly fuses grammar and semantic features, and considers the complementarity of a grammar structure and the relativity of semantics, wherein the semantic GCN module skillfully combines the attention of aspects with the self-attention, and can better learn the semantics related to the aspects and the global semantics.

Drawings

FIG. 1 is a diagram of the overall architecture of an aspect level text emotion classification method based on GCN in accordance with a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

an aspect-level text emotion classification method based on a graph roll-up neural network (GCN), comprising the steps of:

step c, realizing effective information flow by adopting a BiAffine module; the BiAffine module is used for S _y The relative characteristics are effectively exchanged between the nGCN and the SemCN modules, and mutual BiAffine conversion is adopted;

Further, the step a specifically includes:

gives a sentence-aspect pair (s, a), s= { w ₁ ，w ₂ ，...，wn}，a＝{a ₁ ，a ₂ ,. am } is an aspect word, also a subsequence of sentences s, a representing a predefined set of aspects; w (w) _n Representing a word in a given sentence s, a _m An aspect word in a is preset. BERT encoder, employing "[ CLS ]]Sentence [ SEP ]]Aspect [ SEP ]]"as input text, special mark CLS in BERT, separator SEP in BERT, output after the BERT encoder is as shown in formula (1):

H＝[h ₀ ，h ₁ ，h ₂ ，...，h _m ，h _m+1 ]#(1)。

Further, in said step b, in the grammar GCConverting dependency tree into graph structure G in N module _syn ＝(A _sy ，H)，A _sy Is an adjacency matrix, then the grammar information is extracted by using a graph convolution network, and the formula is as follows:

W ^(l+1) the weight, reLU, being layer 1, is a piecewise linear function, changing all negative values to 0, while positive values are unchanged;

Wherein:

is Bi-LSTM or BERT code output, which is used as input of the first GCN layer;

Is a kind of medium.

is the aspect awareness feature matrix and b is the bias.

Where K is equal to H, W generated by the coding layer ^a ∈R ^d×d ，W ^k ∈R ^d×d For a learnable weight matrix, for H _a Average pooling, replication n times, R ^d×d The representation matrix dimension is d dimension. Obtaining H _a ∈R ^n×d As an aspect word representation; the p-head aspect-aware attention is used to obtain an attention score matrix for a sentence,

indicating that it is obtained through the ith attention header; />

A _self By constructing with self-attitution, it capturesInteractions between two arbitrary words in a single sentence are described, where Q and K are equal to H generated by the coding layer; w (W) ^Q ∈R ^d× d，W ^K ∈R ^s×s Is a learnable parameter. Then, integrating the attention matrix of the aspect perception and the self-attention matrix;

H ^syn′ ＝softmax(H ^syn W ₁ (H ^sem ) ^T )H ^sem (11)

H ^sem′ ＝softmax(H ^sem W ₂ (H ^syn ) ^T )H ^syn #(12)

wherein W is ₁ And W is ₂ Is a trainable parameter. H ^syn′ Is the characteristic vector H obtained after BiAffine transformation ^sem’ Is the characteristic vector H obtained after BiAffine transformation ^sem Feature vector, H obtained through semantic GCN module ^syn And the grammar GCN module obtains the characteristic vector.

and->

And->

And (5) connecting the obtained matrixes.

p(a)＝softmax(W _p r+b _p )#(16)

W _p and b _p Is a learnable weight and bias.

Finally, standard cross entropy loss is used as a loss function:

wherein the method comprises the steps of

All sentence-aspect pairs are included, a representing the aspect appearing in sentence s. θ represents all trainable parameters, +.>

Is a set of emotion polarities.

The dataset of the present invention is the restaurant and notebook reviews in SemEval 2014task 4 (Pontiki et al, 2014) and the Twitter post of Dong et al (2014). Each aspect is labeled as one of three emotional polarities: positive, neutral and negative. The statistics of the three data sets are shown in table 1 below:

table 1 aspect Emotion Classification common dataset

/>

The invention adopts the accuracy, recall rate and F1 value to evaluate the result, and the calculation formula is shown as the following 19-21:

where TP represents the number of positive classes predicted as positive classes, FN represents the number of positive classes predicted as negative classes, and FP represents the number of negative classes predicted as positive classes.

The experimental environment of the invention is based on a Pytorch framework, adopts a NVIDIATESLAP100GPU training model, uses an English Bert-Base-based pre-training model as a text encoder, and uses an Adam optimizer to train the model. Word embedding is initialized using a 300-dimensional Glove vector provided by Pennington et al (2014). In addition, a 30-dimensional part-of-speech (POS) and 30-dimensional position embedding, i.e., the relative position of each word with respect to the aspect in the sentence, are also used. Word embedding, POS embedding, and location embedding are then concatenated as an input word representation. All sentences were parsed by Stanfordparser 2. The batch size of all models was set to 16 and the number of gcn layers was 2. Further, a dropout function is applied to the input word representation of BiLSTM, and the dropout rate is set to 0.3 and the learning rate is set to 0.002 to optimize the parameters.

TABLE 2 super parameter settings

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. The aspect-level text emotion classification method based on GCN is characterized by comprising the following steps of:

b, respectively inputting a grammar GCN graph convolution neural network module and a semantic GCN module according to the hidden state vector of the sentence obtained in the step a; the grammar GCN graph convolution neural network module is used for extracting sentence grammar characteristics by utilizing a graph convolution network after establishing an adjacency matrix through a dependency tree, and the semantic GCN module is used for extracting better semantic characteristics by integrating an aspect perception attention matrix and a self attention matrix;

and d, aggregating all aspect node representations from the grammar GCN and the semantic GCN module through pooling and connection to form a final aspect representation, and realizing aspect word emotion classification.

2. The GCN-based aspect-level text emotion classification method according to claim 1, wherein said step a specifically includes:

gives a sentence-aspect pair (s, a), s= { w ₁ ,w ₂ ,...,w _n },a＝{a ₁ ,a ₂ ,...,a _m -is an aspect word, also a subsequence of sentences s, a representing a predefined set of aspects; w (w) _n Representing a word in a given sentence s, a _m An aspect word in a is preset. BERT encoder, employing "[ CLS ]]Sentence [ SEP ]]Aspect [ SEP ]]"as input text, special mark CLS in BERT, separator SEP in BERT, output after the BERT encoder is as shown in formula (1):

H＝[h ₀ ,h ₁ ,h ₂ ,...,h _m ,h _m+1 ]#(1)；

BERT is an architecture that can be used for many downstream tasks, including answering questions, classification, NER, assuming that the pre-trained BERT is a black box that provides a vector of h=768 dimensions for each input token (word) in a sequence, either a single sentence or a sequence of delimiters [ SEP ]]Separating and marking [ CLS ]]A pair of sentences at the beginning, h _m Represents an mth dimension context representation obtained after BERT encoding, etc.

3. The GCN-based aspect text emotion classification method of claim 1, wherein in said step b, the dependency tree is converted into graph structure G in a grammar GCN module _syn ＝(A _sy ,H),A _sy Is an adjacency matrix, then the grammar information is extracted by using a graph convolution network, and the formula is as follows:

vector representation representing layer 1 of grammar GCN, H ^c Is a feature matrix;

wherein:

is Bi-LSTM or BERT code output, which is used as input of the first GCN layer; />

Is a learnable matrix of the layer I GCN, d _lstm Is the dimension, d, of the hidden representation of Bi-LSTM learning _gcn Is the dimension of the GCN layer output, each node can iteratively aggregate the information from its one-hop neighbors and update its representation by l-step convolution operation, successfully integrate the syntax information into the final representation by the syntax graph convolution module>

Is a kind of medium.

4. A GCN-based aspect-level text sentiment classification method according to claim 3, wherein in the semantic GCN module, attention matrices integrate the aspect-aware attention matrices and self-attention matrices to obtain better semantic features, wherein:

is an aspect awareness feature matrix, b is a bias;

where K is equal to H, W generated by the coding layer ^a ∈R ^d×d ,W ^k ∈R ^d×d For a learnable weight matrix, for H _a Average pooling, replication n times, R ^d×d The representation matrix dimension is d dimension; obtaining H _a ∈R ^n×d As an aspect word representation; the p-head aspect awareness attentiveness is used to obtain an attention score matrix for a sentence,

indicating that it is obtained through the ith attention header;

A _self constructed by using self-intent, which captures the interaction between two arbitrary words in a single sentence, where Q and K are equal to H generated by the coding layer; w (W) ^Q ∈R ^d×d ,W ^K ∈R ^d×d As a learnable parameter, then integrating the aspect-aware attention matrix with the self-attention matrix;

A ⁱ ∈R ^n×n as input to the calculation, A ⁱ Representing the integrated attention profile matrix.

5. The GCN-based aspect-level text emotion classification method of claim 4, wherein in step c, in order to effectively exchange correlation characteristics between the grammar GCN module and the semantic GCN module, mutual BiAffine transformation is performed as follows:

H ^syn' ＝softmax(H ^syn W ₁ (H ^sem ) ^T )H ^sem #(11)

H ^sem' ＝softmax(H ^sem W ₂ (H ^syn ) ^T )H ^syn #(12)

wherein W is ₁ And W is ₂ Is a trainable parameter, H ^syn' Is the characteristic vector obtained after BiAffine transformation, H ^sem' Is the characteristic vector obtained after BiAffine transformation, H ^sem Representing feature vectors obtained by semantic GCN module, H ^syn Representing the feature vector obtained by the grammar GCN module.

6. The GCN-based aspect text emotion classification method of claim 5, wherein in step d, an average pooling and concatenation operation is applied to aspect nodes of the grammar GCN and the semantic GCN module to obtain a final feature representation, and specifically includes:

/>

and->

Representing the grammar feature matrix and the semantic feature matrix obtained by the average pool function, r is ∈>

And

a matrix obtained after connection;

p(a)＝softmax(W _p r+b _p )#(16)

W _p and b _p Is a learnable weight and bias.