CN113705238B - Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model - Google Patents
Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model Download PDFInfo
- Publication number
- CN113705238B CN113705238B CN202110670846.6A CN202110670846A CN113705238B CN 113705238 B CN113705238 B CN 113705238B CN 202110670846 A CN202110670846 A CN 202110670846A CN 113705238 B CN113705238 B CN 113705238B
- Authority
- CN
- China
- Prior art keywords
- information
- context
- words
- representation
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004458 analytical method Methods 0.000 claims abstract description 73
- 230000002452 interceptive effect Effects 0.000 claims abstract description 48
- 230000007246 mechanism Effects 0.000 claims abstract description 48
- 230000003993 interaction Effects 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 29
- 230000004807 localization Effects 0.000 claims description 25
- 230000007774 longterm Effects 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 19
- 238000011176 pooling Methods 0.000 claims description 17
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 230000002996 emotional effect Effects 0.000 claims description 15
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 239000003550 marker Substances 0.000 claims description 9
- 238000012797 qualification Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 238000012886 linear function Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 230000008447 perception Effects 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims description 4
- 238000012512 characterization method Methods 0.000 claims 2
- 239000000126 substance Substances 0.000 claims 1
- 238000011156 evaluation Methods 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000007935 neutral effect Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 5
- 238000013145 classification model Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012418 validation experiment Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011423 initialization method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to an aspect level emotion analysis method and a model based on a BERT and an aspect feature positioning model, wherein the method comprises the following steps: firstly, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information; then constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the surface and the context representation, integrating the relationship between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result; then, an aspect feature positioning model is constructed to capture aspect information during sentence modeling, and complete information of aspects is integrated into interactive semantics, so that the influence of interference words irrelevant to the aspect words is reduced, and the integrity of the aspect word information is improved; and finally, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information. The implicit relation between the contexts can be better simulated, the information of the aspect words is better utilized, and the interference of the information irrelevant to the aspect words is reduced, so that higher accuracy and macro F1 are obtained.
Description
Technical Field
The invention belongs to the technical field of aspect level emotion analysis, and particularly relates to an aspect level emotion analysis method and system (ALM-BERT) based on a BERT and an aspect feature positioning model.
Background
Electronic commerce is a rapidly developing industry, and the importance of electronic commerce to global economy is increasing day by day. In particular, with the rapid development of social media and the continuous popularization of social networking platforms, more and more users begin to express comments with emotion on various networking platforms. These comments reflect the mood of the user and consumer, providing the seller and government alike with a lot of valuable feedback information about the quality of the goods or services. For example: before purchasing a product, a user may browse through a large number of reviews on the product on an e-commerce platform to determine whether the product is worth purchasing. Also, governments and enterprises can collect a large amount of public comments directly from the internet, analyze the opinions and satisfaction of users, and further meet their needs. Therefore, sentiment analysis has attracted a great deal of attention from the theoretical and practical world as a fundamental and critical task of natural language processing.
However, common emotion analysis tasks (e.g., sentence-level emotion analysis) can only determine the user's emotional polarity (e.g., positive, negative, and neutral) for a product or event from the entire sentence, and cannot determine the emotional polarity of a particular aspect of the sentence. In contrast, aspect level sentiment analysis is a more granular classification task that can identify sentiment polarity of aspects in a sentence. For example, as shown in FIG. 9, some examples of sentence-level sentiment analysis and aspect-based sentiment analysis are provided (a consumer review example with three aspect terms), and we can see from the review text that "it does not have any accompanying software installed outside the windows media, but for price i are very satisfied with its condition and overall product", the emotional polarity of the aspect term "software" is negative, "windows media" is neutral, "price" and "very satisfied" are positive.
In prior studies, researchers have proposed various methods to accomplish aspect level emotion analysis tasks. Most of the methods are based on supervised machine learning algorithm, and certain effect is achieved. However, these statistical methods require careful design of manual features on large-scale datasets, resulting in significant labor and time costs. In view of the ability of neural network models to automatically learn low-dimensional representations of aspects and contexts from comment text without relying on artificial feature engineering, neural networks have gained increasing attention in recent years in aspect-level sentiment analysis tasks.
Unfortunately, existing methods mostly utilize either a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN) directly to model independently and express semantic information of aspect words (aspect words) and their contexts, but ignore the fact that they lack sensitivity to the location of critical components. In practice, researchers have demonstrated that the emotional polarity of body words is highly correlated with body word information and word order information, which means that the emotional polarity of facet words is more susceptible to contextual words that are closer to the facet words. In addition, it is difficult for neural networks to capture long-term dependencies between facet words and context, resulting in loss of valuable information.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a BERT and aspect feature localization model-based aspect level emotion analysis method capable of better utilizing information of aspect words and reducing interference of information irrelevant to the aspect words, thereby obtaining higher accuracy and macro F1, and a system based on the BERT and aspect feature localization model-based aspect level emotion analysis method.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides an aspect level emotion analysis method based on a BERT and an aspect feature positioning model, which comprises the following steps:
s1, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information;
s2, constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the body word representation and the context representation, integrating the relation between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result;
s3, constructing an aspect feature positioning model to capture aspect information during sentence modeling, and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;
and S4, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information.
Further, the "obtaining high-quality context information representation and aspect information representation by using BERT model" means that a pre-trained BERT model is used as a text vectorization mechanism to generate high-quality text feature vector representation, the BERT model is a pre-trained language representation model, and the text vectorization mechanism is to map each word to a high-dimensional vector space, specifically: the BERT model generates text representation by using a deep-layer bidirectional converter coder, divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for the different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.
Wherein, the "dividing a given word sequence into different segments by adding special word segmentation markers at the beginning and end of the input sequence, generating marker embedding, segment embedding and position embedding for different segments, and finally converting the annotation text and the aspect words respectively to obtain context information representation and aspect information representation" specifically includes:
the BERT model adds special word segmentation marks [ CLS ] at the beginning and the end of an input sequence respectively]And [ SEP]Dividing a given word sequence into different segments, generating mark embedding, segment embedding and position embedding for the different segments, enabling the embedded representation of the input sequence to contain all the information of the three kinds of embedding, and finally converting the comment text and the aspect word into "[ CLS ] respectively in a BERT model]+ comment text + [ SEP]"and" [ CLS]+ target + [ SEP]"get context representation E c And aspect represents E a :
E c ={we [CLS] ,we 1 ,we 2 ,...,we [SEP] };
E a ={ae [CLS] ,ae 1 ,ae 2 ,...,ae [SEP] };
Wherein we [CLS] ,ae [CLS] Indicates a Classification marker [ CLS]Vector of (2), we [SEP] And ae [SEP] Representation delimiter [ SEP]The vector of (2).
Further, the "constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the body word representation and the context representation and integrate the relationship between the body word and the context" means that the important feature extraction of the aspect-level emotion analysis is realized based on the multi-head attention mechanism, and the important information of the context and the target is extracted, specifically: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; and then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, the long-term dependence information and the context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and the final interactive hidden state of the context interaction and the final interactive hidden state of the aspect words are obtained after mean value pooling operation.
The "extracting interactive semantics from the aspect information representation and the context information representation generated by the BERT model through the transcoder, determining the context which is most important for emotion qualification of the aspect words, simultaneously generating hidden states by using the long-term dependence information and the context perception information of the context as the input data of the position feed-forward network, and obtaining the final interactive hidden state of context interaction and the final interactive hidden state of the context and the aspect words after the mean pooling operation" specifically includes:
s201, mapping a query sequence and a series of key (K) values (V) for capturing different important information in a parallel subspace from aspect information representation and context information representation generated by a BERT model through a plurality of self-attention mechanisms forming a multi-head attention mechanism in a conversion encoder;
s202, through an attention scoring function formula f s (Q,K,V)=σ(f e (Q, K)) V calculates an attention score for each important captured message, where σ (f) e (Q, K)) represents a normalized exponential function, f e (Q, K) is an energy function for learning the correlation characteristics between K and Q, and is calculated by the following formula;
s203, inputting the context expression and the aspect expression into the attention score function formula f mh (Q,K,V)=[a 1 ;a 2 ;...;a i ;...;a n-head ]W d Respectively obtaining long-term dependency information c of context cc And context-aware information t ca To capture the long-term dependencies of contexts and to determine which contexts are most important for the emotional qualification of the facet words; wherein, a 1 Attention score, which represents the ith important information captured, [ a ] 1 ;a 2 ;...;a i ;...;a n-head ]Denotes a concatenation vector, W d Is an attention weight matrix, c cc =f mh (E c ,E c ),t ca =f mh (E c ,E a );
S204, converting the encoder with c cc And t ca Generating hidden states h as input data to a position feedforward network c And h a The position feedforward network PFN (h) is a variant of the multi-layer perceptron c And h a The definition is as follows:
h c =PFN(c cc )
h a =PFN(t ca );
PFN(h)=ζ(hW 1 +b 1 )W 2 +b 2 ;
wherein, ζ (hW) 1 +b 1 ) Is a corrected linear unit, b 1 And b 2 Is an offset value, W 1 And W 2 Representing a learnable weight parameter;
s205. In the hidden state h c And h a After the mean value pooling operation is carried out, a final interactive hidden state h of context interaction is obtained cm Final interactive hidden state h of context and aspect word am 。
Further, the working process of the aspect feature positioning model is as follows, algorithm 1:
in particular, the feature localization algorithm represents E from the context according to the position and length of the facet words c Extracting the most important related information of the aspect word af; while taking the most important feature AF from the AF using max pooling, then performing a dropout operation on the most important feature AF, and representing E in context c Important characteristics h of the Chinese obtained aspect word af 。
Further, the "fusing context and target important information related to the target, and predicting probabilities of different emotion polarities by using the emotion prediction factors on the basis of the fused information" specifically includes:
s401, h is spliced by using a vector splicing mode cm 、h am And h af Taken together to give the overall characteristic r:
r=[h cm ;h am ;h af ];
s402, performing data preprocessing on r by adopting a linear function, namely:
x=W u r+b u wherein W is u Is a weight matrix, b u Is a bias value;
s403, calculating the probability Pr (a = p) that the emotion polarity of the aspect word a in the sentence is p by using a softmax function:
where p represents the candidate emotion polarity and C is the number of categories of emotion polarities.
Further, the method for analyzing the aspect level emotion based on the BERT and the aspect feature positioning model further comprises the following steps: training is carried out by adopting cross entropy and L2 regularization as loss functions, and the training is defined as follows:
where D represents all training data, j and i are indices of the training data samples and emotion classes, respectively, λ represents a factor for L2 regularization, θ represents a set of parameters for the model, y represents predicted emotion polarity,indicating the correct emotional polarity.
The invention also provides an aspect level emotion analysis system, which comprises:
the text vectorization mechanism obtains high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information;
the feature extraction model of the aspect-level emotion analysis is used for learning the interaction between the representation of the body words and the representation of the context, integrating the relationship between the body words and the context to distinguish the contributions of different sentences and the aspect words to the classification result, capturing aspect information during sentence modeling, and integrating the complete information of the aspect into interactive semantics to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the information of the aspect words;
and the emotion predictor is used for fusing context related to the target and important information of the target and predicting the probability of different emotion polarities by utilizing the emotion prediction factors on the basis of the fused information.
Further, the BERT model is a pre-trained language representation model, a text representation is generated by using a deep-layer multi-layer bidirectional converter encoder, meanwhile, special word segmentation markers are respectively added at the beginning and the end of an input sequence to divide a given word sequence into different fragments, marker embedding, segmentation embedding and position embedding are generated for the different fragments, and finally, a comment text and an aspect word are respectively converted to obtain a context information representation and an aspect information representation;
the feature extraction model of the aspect level emotion analysis comprises an important feature extraction model and an aspect feature positioning model; the important feature extraction model is an attention encoder based on a multi-head attention mechanism and used for learning interaction between the body word representation and the context representation and integrating the relationship between the body words and the context to distinguish contributions of different sentences and aspect words to the classification result; the aspect feature positioning model is used for capturing aspect information during sentence modeling and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;
the emotion predictor connects the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words by using a vector splicing mode to obtain overall features, then performs data preprocessing on the overall features by adopting a linear function, and finally calculates the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity by utilizing a softmax function.
The invention has the beneficial effects that:
according to the technical scheme, the implicit relation between contexts can be better simulated through the conversion encoder, the information of the aspect words can be better utilized through the aspect feature positioning model, interference of information irrelevant to the aspect words is reduced, and therefore higher accuracy and macro F1 are obtained (the accuracy rate of the macro F1 and the average value of the macro F1 on sentences of different lengths are respectively 3.1% and 6.56% higher), and meanwhile feasibility and effectiveness of the BERT model and the aspect information in aspect-level emotion analysis tasks are verified.
Drawings
FIG. 1 is a flow diagram of an embodiment of a method for facet emotion analysis based on BERT and facet feature location models in accordance with the present invention;
FIG. 2 is a schematic structural diagram of an embodiment of an aspect level sentiment analysis system based on BERT and aspect feature localization models according to the present invention;
FIG. 3 is a graph of experimental results of drop rate parameter optimization in an evaluation experiment according to the aspect level emotion analysis method based on BERT and the aspect feature localization model of the present invention;
FIG. 4 is a graph of experimental results of learning rate parameter optimization in an evaluation experiment of the aspect level emotion analysis method based on BERT and the aspect feature localization model according to the present invention;
FIG. 5 is a graph of the experimental results of L2 regularization parameter optimization in an evaluation experiment of the aspect level emotion analysis method based on BERT and the aspect feature localization model according to the present invention;
FIG. 6 is a graph of ROUGE scores (ROUGE-1) of different lengths of source text for a BERT and aspect feature localization model-based aspect level emotion analysis method and TD-LSTM validation experiment according to the present invention;
FIG. 7 is a graph of the ROUGE score (ROUGE-2) of different length source texts of a TD-LSTM verification experiment and an aspect level emotion analysis method based on BERT and an aspect feature localization model according to the present invention;
FIG. 8 is a graph of ROUGE scores (ROUGE-L) of different lengths of source text for a BERT and aspect feature localization model-based aspect level emotion analysis method and TD-LSTM validation experiment in accordance with the present invention;
FIG. 9 is an example of prior art facet sentiment analysis.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, an aspect-level sentiment analysis method based on BERT and aspect feature localization models according to an embodiment of the present invention includes the following steps:
s1, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information; specifically, a pre-trained BERT model is used as a text vectorization mechanism to generate high-quality text feature vector representation, the BERT model is a pre-trained language representation model, and the text vectorization mechanism is used for mapping each word to a high-dimensional vector space, and specifically comprises the following steps: the BERT model generates text representation by using a deep-layer bidirectional converter coder, divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for the different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.
S2, constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the body word representation and the context representation, integrating the relationship between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result; specifically, the method is to extract important features of aspect-level emotion analysis based on a multi-head attention mechanism, and extract important information of context and a target, and specifically includes: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; and then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, the long-term dependence information and the context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and the final interactive hidden state of the context interaction and the final interactive hidden state of the aspect words are obtained after mean value pooling operation.
S3, constructing an aspect feature positioning model to capture aspect information during sentence modeling, and integrating complete information of aspects into interactive semantics to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information; the aspect feature positioning module is constructed based on a maximum pooling function, namely, the extracted aspect words and the context hidden features thereof are divided into a plurality of areas, the maximum value is selected in each area to represent the area, and the aspect feature positioning module (positioning core features) is constructed in such a way; the working process of the aspect feature positioning model expresses E from the context according to the position and the length of the aspect word through a feature positioning algorithm c Extracting the most important related information of the aspect word af; while taking the most important feature AF from the AF using max pooling, then performing a dropout operation on the most important feature AF, and representing E in context c To obtain the important characteristics h of the facet word af 。
S4, fusing context related to the target and important target information, and predicting probabilities of different emotion polarities by using emotion prediction factors on the basis of the fused information; the method comprises the following specific steps: and connecting the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words by using a vector splicing mode to obtain overall features, then performing data preprocessing on the overall features by adopting a linear function, and finally calculating the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity by utilizing a softmax function.
As shown in FIG. 2, the present invention further provides an aspect level emotion analysis model, which includes a text vectorization mechanism 100, a feature extraction model 200 for aspect level emotion analysis, and an emotion predictor 300.
The text vectorization mechanism 100 is a multi-angle text vectorization mechanism, and obtains high-quality context information representation and aspect information representation by using a BERT model to maintain the integrity of text information; the BERT model is a pre-trained language representation model, generates text representation by using a deep-layer bidirectional converter coder, simultaneously divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for the different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.
The feature extraction model 200 of the aspect-level emotion analysis is used for learning the interaction between the representation of the body words and the representation of the context, integrating the relationship between the body words and the context to distinguish the contributions of different sentences and the aspect words to the classification result, capturing aspect information during sentence modeling and integrating complete information of the aspects into interactive semantics; the method specifically comprises the following steps: the feature extraction model 200 of the aspect-level sentiment analysis comprises an important feature extraction model and an aspect feature positioning model; the important feature extraction model is an attention encoder based on a multi-head attention mechanism and used for learning interaction between the body word representation and the context representation and integrating the relationship between the body words and the context so as to distinguish the contributions of different sentences and aspect words to the classification result; the aspect feature positioning model is used for capturing aspect information during sentence modeling and integrating complete information of aspects into interactive semantics; therefore, the influence of interference words irrelevant to the aspect words can be reduced, and the completeness of the aspect word information is improved;
the emotion predictor 300 is used for fusing context related to the target and important information of the target and predicting the probability of different emotion polarities by using emotion prediction factors on the basis of the fused information; specifically, the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words are connected in a vector splicing mode to obtain overall features, then data preprocessing is performed on the overall features by adopting a linear function, and finally the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity is calculated by utilizing a softmax function.
In general, aspect level emotion analysis refers to a process of taking a sentence and some predefined aspect words as input data, and finally outputting the emotion polarity of each aspect word in the sentence. Here we use some practical review examples to illustrate the aspect level sentiment analysis task.
It is clear that each example sentence contains two aspect words, each having four different emotional polarities, i.e., positive, neutral, negative, and conflicting, as shown in table 1. Then the aspect level sentiment analysis is defined as follows:
some examples of Table 1 aspect level sentiment analysis
Defining one: formally, one comment sentence S = { w } is given 1 ,w 2 ,...,w n N is the total number of words in S. One aspect word table a = { a = { [ a ] 1 ,...,a i ,...,a m H, length m, wherein a i Represents the ith aspect word in aspect word table a, which is a subsequence of sentence S. P = { P 1 ,...,p j ,...,p C Denotes the candidate emotion polarity, where C denotes the number of categories of emotion polarity, p j Indicating the jth emotion polarity.
The problems are that: the goal of the aspect level emotion analysis model is to predict the most likely emotion polarity for a particular aspect word, which can be expressed as:
where phi denotes a function for quantizing the facet words a i And the emotional polarity p in the sentence s j The degree of match between. And finally, outputting the emotion polarity with the highest matching degree as a classification result by the model. Table 2 summarizes the symbols in the model and their delineationsThe above-mentioned processes are described.
Table 2: symbols used and their description
The invention relates to an aspect level emotion analysis method based on a BERT and aspect feature positioning model, which comprises the following steps: firstly, generating a high-quality sequence word vector by utilizing a pre-training BERT model, and providing effective support for the subsequent steps; then, in the feature extraction method of aspect-level emotion analysis, an important feature extraction module is realized based on a multi-head attention mechanism, and important information of context and a target is extracted; then, providing an aspect feature positioning model, and comprehensively considering the important features of the target words to obtain target related features; and finally, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information. The specific method and principle are as follows:
1. multi-angle text vectorization mechanism
The text vectorization mechanism essentially maps each word to a high-dimensional vector space. Generally, two context-based Word embedding models, namely Word2vec and Glove, are widely applied to text vectorization, and achieve great performance in aspect-level emotion analysis tasks. However, research has shown that the two-word embedding model cannot obtain enough information in the text, which results in insufficient classification accuracy and reduced performance. Therefore, a high-quality word embedding model has an important influence on improving the accuracy of the classification result.
The key to implementing aspect-level sentiment analysis is to effectively understand natural language processing, which is usually highly dependent on large-scale high-quality labeled text, and fortunately, the BERT model is a language pre-training model that can effectively utilize unlabeled text and uses a random masking partIn the vocabulary mode, a deep multi-layer bidirectional converter encoder is used for extracting a universal natural language recognition model from massive unlabeled texts, and a small amount of labeled data is further used for fine adjustment, so that high-quality text feature vector representation can be generated. It is inspired by this that in the ALM-BERT method proposed by the present invention, for a given word sequence, special segmentation flags [ CLS ] are added at the beginning and end of the input sequence, respectively]And [ SEP]In order to divide the sequence into different segments. That is, the word embedding vector input in this way includes vectors such as mark embedding, segment embedding, and position embedding generated for different segments. Specifically, the comment text and the aspect word are converted into "[ CLS ], respectively]+ comment text + [ SEP]"and" [ CLS]+ target + [ SEP]", the resulting context representation E c And aspect represents E a :
E c ={we [CLS] ,we 1 ,we 2 ,...,we [SEP] } (2)
E a ={ae [CLS] ,ae 1 ,ae 2 ,...,ae [SEP] } (3)
Wherein we [CLS] ,ae [CLS] Indicates a Classification marker [ CLS]Vector of (2), we [SEP] And ae [SEP] Representation delimiter [ SEP]The vector of (2).
2. Feature extraction method for aspect-level emotion analysis
In order to extract hidden features of an aspect word and context thereof and emphatically consider auxiliary information contained in the aspect word, a converter encoder is introduced, and an aspect word feature positioning module is provided. The basic idea is to model the context and the target word interactively to integrate the information of the aspect words and the context fully. In addition, the emotion classification accuracy can be improved by acquiring the feature information of the aspect words in the context.
2.1 important feature extraction model
A transform encoder (transform encoder) is a novel feature extractor based on a multi-head attention mechanism and a position feed-forward network. It can learn different important information in different feature representation subspaces. Moreover, the transcoder can directly capture long-term correlation in the sequence, is easier to parallelize than a recurrent neural network and a convolutional neural network, and greatly reduces training time. The invention extracts interactive semantics from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, determines the most important context for emotion qualification of aspect words, simultaneously uses the long-term dependence information and the context perception information of the context as the input data of a position feed-forward network, respectively generates hidden states, and obtains the final interactive hidden state of context interaction and the final interactive hidden state of the context and the aspect words after mean pooling operation.
Intuitively, a multi-head attention mechanism is composed of a plurality of self-attention mechanisms (self-attention mechanisms) that can map to a query sequence (Q) and a series of key (K) values (V) that capture different important information in a parallel subspace. Attention score function f s () the calculation process in the self-attention mechanism is as follows:
f s (Q,K,V)=σ(f e (Q,K))V (4)
where σ () denotes a normalized exponential function, f e (.) is an energy function that learns the correlation features between K and Q, which can be calculated using the following formula:
Attention scoring function f of multi-head attention mechanism mh () obtained by concatenating the attention scores of the self-attention mechanism:
f mh (Q,K,V)=[a 1 ;a 2 ;...;a i ;...;a n-head ]W d (6)
wherein a is i Attention score, which represents the ith important information captured, [ a ] 1 ;a 2 ;...;a i ;...;a n-head ]Representing a concatenation vector, W d Is the attention weight matrix.
As shown in equations (8) - (9) below, the context representation and the facet representation are input into a multi-headed attention mechanism to capture the long-term dependencies of contexts and determine which contexts are most important for sentiment qualification of facet words.
c cc =f mh (E c ,E c ) (8)
t ca =f mh (E c ,E a ) (9)
Wherein, c cc And t ca Respectively long-term dependency information and context-aware information of the context.
Then, the coders are converted to c respectively cc And t ca Generating hidden states h as input data to a position feedforward network c And h a . In particular, the position feedforward network PFN (h) is a variant of a multi-layer perceptron. Formally, a position feedforward network PFN, h c And h a The definition is as follows:
h c =PFN(c cc ) (10)
h a =PFN(t ca ) (11)
PFN(h)=ζ(hW 1 +b 1 )W 2 +b 2 (12)
wherein, ζ (hW) 1 +b 1 ) Is a corrected linear unit, b 1 And b 2 Is an offset value, W 1 And W 2 Representing a learnable weight parameter.
In pair h c And h a After the mean value pooling operation is carried out, a final interactive hidden state h of context interaction is obtained cm Final interactive hidden state h of context and aspect word am 。
2.2 aspect feature localization model
The transcoder captures long term dependencies of the context and generates semantic information of the interaction between the facet words and the context. In order to highlight the importance of different aspect words, the invention establishes an aspect word feature positioning model, and the main idea is to select information related to the aspect words from context feature representations, and better integrate the aspect information by capturing feature representation vectors containing the aspect information, thereby improving the accuracy of aspect level emotion classification. The working process of the aspect feature positioning model is shown as an algorithm 1:
in particular, the feature localization algorithm represents E from the context according to the position and length of the aspect word c Extracting the most important related information of the aspect word af; the most important feature AF is also obtained from AF with maximum pooling, as follows:
AF=Maxpooling(af,dim=0) (13)
thereafter, a dropout operation is performed on the most important feature AF, and E is indicated in the context c Important characteristics h of the Chinese obtained aspect word af 。
3. Emotion predictor
Firstly, h is spliced by using a vector splicing mode cm 、h am And h af Taken together to give the overall characteristic r:
r=[h cm ;h am ;h af ] (14)
then, a linear function is used to perform data preprocessing on r, namely:
x=W u r+b u (15)
wherein, W u Is a weight matrix, b u Is the offset value.
Finally, calculating the probability Pr (a = p) that the emotion polarity of the aspect word a in the sentence is p by using the softmax function:
where p represents the emotion polarity candidate and C is the number of categories of emotion polarities.
In summary, the method for analyzing the emotion at the aspect level based on the BERT and the aspect feature positioning model of the present invention is an end-to-end operation process. Furthermore, to optimize the parameters of the method, the predicted emotional polarity y and the correct emotional polarity are madeAnd (3) minimizing losses therebetween, further comprising: training is performed by using cross entropy and L2 regularization as loss functions, and the training is defined as:
where D represents all training data, j and i are indices of the training data samples and emotion classes, respectively, λ represents a factor for L2 regularization, θ represents a parameter set for the model, y represents predicted emotion polarity,indicating the correct emotional polarity.
4. Evaluation test
In order to evaluate the rationality and effectiveness of the BERT and aspect feature positioning model-based aspect level emotion analysis method and the model, the analysis is carried out through the following evaluation experiments.
4.1 data set and evaluation index
We constructed our relevant evaluation experiments in three published English review data sets. The details of these three data sets are shown in table 3: restaurant (Restaurant) and notebook (Laptop) datasets are provided by SemEval (references: pontiki M, D Galanis, pavlooplos J, et al. SemEval-2014 Task 4; the Twitter dataset consists of user comments on Twitter collected by Li et al (ref: li D, wei F, tan C, et al. Adaptive reactive Neural Network for Target-dependent Twitter sententiment Classification [ C ]// Meeting of the Association for Computational Linear constraints.2014.), with emotional polarities labeled as positive, negative and neutral. The three data sets are popular comment data sets at present and are widely applied to aspect-level sentiment analysis tasks.
TABLE 3 statistical information of data sets
In addition, in order to objectively evaluate the performance of the BERT and aspect feature localization model-based aspect-level emotion analysis method and model, evaluation indexes commonly used in aspect-level emotion analysis tasks, namely macro F1 (macro-F1) and accuracy (Acc) are adopted. Is defined as follows:
Acc=SC/N (18)
where SC represents the number of correctly sorted samples and N represents the total number of samples. In general, the higher the accuracy, the better the performance of the model.
In addition, the macro F1 (macro-F1) is used to truly reflect the performance of the model, i.e., the weighted average of precision and recall. macro-F1 is calculated according to the following formula:
where T is the number of samples correctly classified as emotion polarity i, FP is the number of samples misclassified as emotion polarity i, FN is the number of samples whose emotion polarity i is misclassified as other emotion polarities, C is the number of categories of emotion polarities,is the accuracy of emotion polarity i (precision),indicating the recall (recall) of the emotion polarity i. In our experiments, to more fully evaluate the performance of our model, we classified the emotional polarity into two categories, 3C = { positive, neutral, negative } and 4C = { positive, neutral, negative, conflict }.
4.2 parameter optimization
During the training of the model, we utilize the BERT model to generate vector representations of context and aspect words. Specifically, we use the standard parameter BERT of the BERT model BASE To complete the model training. Wherein, in BERT BASE The number of conversion modules, the number of hidden neurons, and the number of heads of self-attention in (1) are 12, 768, and 12, respectively. Furthermore, to analyze the optimal hyper-parameter settings, we provide several important hyper-parameter setting examples.
First, the drop rate (Dropout) refers to the probability of dropping some neurons during the training of the neural network to solve the overfitting and enhance the generalization ability of the model. Where we initialize the value of dropout to 0.3 and then search for the best value at 0.1 intervals. Experimental results as shown in fig. 3, when dropout is 0.5, the precision and F1 value of the aspect-level emotion analysis method and model based on the BERT and aspect feature localization model of the present invention are the best on three data sets.
Second, the learning rate (1 earning rate) determines whether and when the objective function converges to a local minimum. In our experiments, we used the Adam optimization algorithm to update the parameters of the model and explore at [10 ] -5 ,0.1]An optimal learning rate parameter within a range. As shown in fig. 4, when the learning rate is 2 × 10 -5 In time, the performance of the aspect level emotion analysis method and the aspect level emotion analysis model based on the BERT and the aspect feature positioning model is the best.
Finally, the L2 regularization parameter is a hyper-parameter that can prevent the model from over-fitting. As shown in fig. 5, when the value of the L2 regularization parameter is set to 0.01, the performance of the aspect level emotion analysis method and model based on the BERT and aspect feature localization model of the present invention is the best; meanwhile, the weights of the model are initialized by a Glorot parameter initialization method, the batch size is set to be 16, and 10 iteration times are trained in total.
4.3 comparison Algorithm
In order to verify the effectiveness of the BERT and aspect feature positioning model-based aspect level emotion analysis method and model of the present invention, the BERT and aspect feature positioning model-based aspect level emotion analysis method and model are compared with many popular aspect level emotion analysis models, as follows:
TD-LSTM is a classical classification model, which integrates related information of the aspect words and their contexts into the LSTM-based classification model, improving the classification accuracy.
ATAE-LSTM is a classification model that inputs the embedded representation of the aspect words as an embedded representation of the sentence into the model, and then applies an attention mechanism to compute weights to achieve high-precision emotion classification.
MemNet is a data-driven classification model that uses multiple attention-based models to capture the importance of each context word to complete emotion classification.
IAN is an interactive attention network that models the aspect words and their contexts, respectively, and generates an associative representation of the target and context.
RAM builds a multi-attention mechanism-based framework to capture distant features in the text, enhancing the representation capability of the model.
TNet uses a two-way LSTM to generate hidden representations of context and aspect words. The CNN layer is used instead of the attention mechanism to extract important features from the hidden representation.
Cabasc utilizes two attention-enhancing mechanisms, focusing on the facet and context separately, and comprehensively considering the context and the correlation between the facets.
AOA constructs a dual attention module that links emotion words to facet words. Further, the dual attention module automatically generates mutual attention weights from facet to text and from text to facet.
MGAN is a multi-granular attention model that captures information about interactions between terms and context from coarse to fine.
AEN-BERT is a model based on attention mechanism and BERT, showing good performance in the aspect-level sentiment analysis task.
BERT-base is a pre-trained BERT based aspect-level sentiment analysis model with complete connectivity layers and softmax layers for classification tasks.
In order to measure the performance of the model more accurately, the AOA model, the IAN model and the MemNet model are expanded, and embedded layers of the models are replaced by the BERT model to obtain the AOA-BERT model, the IAN-BERT model and the MemNet-BERT model. The structure of the rest model is consistent with that described herein.
4.4 evaluation test analysis
As shown in table 4 below, the results of the emotion classification at the emotion polarity C =3 are shown. We can easily observe from the table that BERT-based (BERT pre-training based aspect-level sentiment analysis method) accuracy and macroscopic Fl are significantly higher than models based on glove and word2vec methods. Particularly for restaurant data sets, the precision and macro F1 of the aspect-level emotion analysis method and the model based on the BERT and the aspect feature positioning model are respectively 12.77% higher and 30.97% higher than those of the classical IAN model. This shows that BERT can better express semantic and grammatical features of text, and the facet-level emotion analysis method and model based on the BERT and facet feature localization model of the invention achieve the best classification performance on the three data sets. Specifically, in the restaurant dataset, the accuracy and macro F1 of the method for analyzing the emotion at the facet level based on the BERT and facet feature location models of the present invention are improved by 4.2% and 8.81%, respectively, compared to the AEN method. In addition, it can be easily found that on a notebook computer data set, the classification accuracy of the aspect-level emotion analysis method based on the BERT and the aspect feature localization model and the macro F1 are respectively 3.29% higher and 3.15% higher than that of the BERT-base model, which shows that the aspect feature localization module plays a positive role in the aspect-level emotion analysis.
TABLE 4 Experimental evaluation results for various comparative methods
From the perspective of capturing long-term dependency relationships in comment texts, a series of verification experiments are constructed on texts with different lengths.
As shown in FIGS. 6-8, the aspect level emotion analysis method and model based on the BERT and aspect feature localization model of the present invention generally achieves higher accuracy and macro F1 than TD-LSTM, which means that we build a transform coder that can better simulate the implicit relationship between contexts than LSTM based coders. Furthermore, as shown in the following graph 7, we also note that the ALM-BERT model has 3.1% and 6.56% higher accuracy and mean of macro F1 over sentences of different lengths than AEN, respectively, because the aspect-level sentiment analysis method and model based on BERT and aspect feature localization model of the present invention utilizes information of aspect words better than AEN, reducing interference of information unrelated to aspect words.
In conclusion, the experiments show that the BERT and aspect feature positioning model-based aspect-level emotion analysis method and model can obtain higher accuracy and macro F1, and further verify the feasibility and effectiveness of the BERT model and aspect information in aspect-level emotion analysis tasks.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
Claims (7)
1. An aspect level sentiment analysis method based on BERT and an aspect feature localization model is characterized by comprising the following steps:
s1, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information;
s2, constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the body word representation and the context representation, integrating the relationship between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result; the "constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the body word representation and the context representation and integrate the relationship between the body word and the context" means that the important feature extraction of the aspect-level emotion analysis is realized based on the multi-head attention mechanism, and the important information of the context and the target is extracted, specifically: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, long-term dependence information and context perception information of the context are used as input data of a position feed-forward network, hidden states are respectively generated, and the final interactive hidden state of context interaction and the final interactive hidden state of the context and the aspect words are obtained after mean value pooling operation; wherein the content of the first and second substances,
the "extracting interactive semantics from the aspect information representation and the context information representation generated by the BERT model through the transform coder, determining a context which is most important for emotion qualification of the aspect words, simultaneously generating hidden states by using long-term dependence information and context perception information of the context as input data of a position feed-forward network, and obtaining a final interactive hidden state of context interaction and a final interactive hidden state of the context and the aspect words after a mean pooling operation" specifically includes:
s201, mapping a query sequence and a series of key K values V for capturing different important information in a parallel subspace from aspect information representation and context information representation generated by a BERT model through a plurality of self-attention mechanisms forming a multi-head attention mechanism in a conversion encoder;
s202, through an attention scoring function formula f s (Q,K,V)=σ(f e (Q, K)) V calculates an attention score for each important captured message, where σ (f) e (Q, K)) represents a normalized exponential function, f e (Q, K) is an energy function for learning the correlation characteristics between K and Q, and is calculated by the following formula;
s203, inputting the context expression and the aspect expression into the attention score function formula f mh (Q,K,V)=[a 1 ;a 2 ;...;a i ;...;a n-head ]W d In (2), respectively obtaining long-term dependencies of contextInformation c cc And context awareness information t ca To capture long term dependencies of contexts and to determine which contexts are most important for sentiment characterization of the facet words; wherein, a i Attention score, which represents the ith important information captured, [ a ] 1 ;a 2 ;…;a i ;…;a n-head ]Denotes a concatenation vector, W d Is an attention weight matrix, c cc =f mh (E c ,E c ),t ca =f mh (E c ,E a );
S204, converting the encoder with c cc And t ca Generating hidden states h as input data to a position feedforward network c And h a The position feedforward network is a variant of the multi-layer perceptron and is denoted as PFN (h), the PFN (h), h c And h a The definition is as follows:
h c =PFN(c cc )
h a =PFN(t ca );
PFN(h)=ζ(hW 1 +b 1 )W 2 +b 2 ;
wherein ξ (hW) 1 +b 1 ) Is a corrected linear unit, b 1 And b 2 Is an offset value, W 1 And W 2 Representing a learnable weight parameter;
s205. In the hidden state h c And h a After the mean value pooling operation is carried out, the final interactive hidden state h of the context interaction is obtained cm Final interactive hidden state h of context and aspect word am ;
S3, constructing an aspect feature positioning model to capture aspect information during sentence modeling, and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;
and S4, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information.
2. The method according to claim 1, wherein the "obtaining high-quality context information representation and aspect information representation using BERT model" means generating high-quality text feature vector representation using a pre-trained BERT model as a text vectorization mechanism, wherein the BERT model is a pre-trained language representation model, and the text vectorization mechanism is mapping each word to a high-dimensional vector space, specifically: the BERT model generates text representation by using a deep-layer bidirectional converter coder, divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for the different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.
3. The method according to claim 2, wherein said "dividing a given word sequence into different segments by adding special segmentation markers at the beginning and end of the input sequence, respectively, generating marker embedding, segment embedding and position embedding for the different segments, and finally converting the annotation text and the facet words, respectively, to obtain the context information representation and the facet information representation", specifically:
the BERT model adds special word segmentation marks [ CLS ] at the beginning and the end of an input sequence respectively]And [ SEP ]]Dividing a given word sequence into different segments, generating mark embedding, segment embedding and position embedding for the different segments, enabling the embedded representation of the input sequence to contain all the information of the three embedding, and finally respectively converting the annotation text and the aspect words into 'CLS' in a BERT model]+ comment text + [ SEP]"and" [ CLS]+ target + [ SEP]"get context representation E c And aspect represents E a :
E c ={we [CLS] ,we 1 ,we 2 ,...,we [SEP] };
E a ={ae [CLS] ,ae 1 ,ae 2 ,...,ae [SEP] };
Wherein we [CLS] ,ae [CLS] Represents a Classification tag [ CLS ]]Vector of (2), we [SEP] And ae [SEP] Representation delimiter [ SEP]The vector of (2).
4. The method of claim 3, wherein the aspect feature localization model works as the following algorithm 1:
in particular, the feature localization algorithm represents E from the context according to the position and length of the facet words c Extracting the most important related information of the aspect word af; while the most important feature AF is obtained from the AF using max pooling, then a dropout operation is performed on the most important feature AF and is indicated as E in the context c To obtain the important characteristics h of the facet word af 。
5. The method according to claim 4, wherein the fusing context and target importance information related to the target and predicting probabilities of different emotion polarities using the emotion prediction factors on the basis of the fused information specifically comprises:
s401, h is spliced by using a vector splicing mode cm 、h am And h af Taken together to give the overall characteristic r:
r=[h cm ;h am ;h af ];
s402, data preprocessing is carried out on r by adopting a linear function, namely:
x=W u r+b u wherein W is u Is a weight matrix, b u Is a bias value;
s403, calculating the probability Pr (a = p) that the emotion polarity of the aspect word a in the sentence is p by using a softmax function:
6. The method according to any one of claims 1-5, further comprising: training is performed by using cross entropy and L2 regularization as loss functions, and the training is defined as:
7. An aspect-level sentiment analysis system, comprising:
a text vectorization mechanism, which utilizes a BERT model to obtain high-quality context information representation and aspect information representation so as to maintain the integrity of text information;
the feature extraction model of the aspect-level emotion analysis is used for learning the interaction between the representation of the body words and the representation of the context, integrating the relationship between the body words and the context to distinguish the contributions of different sentences and the aspect words to the classification result, capturing aspect information during sentence modeling, and integrating the complete information of the aspect into interactive semantics to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the information of the aspect words;
the emotion predictor is used for fusing context related to the target and important target information and predicting the probability of different emotion polarities by utilizing the emotion prediction factors on the basis of the fused information;
the BERT model is a pre-trained language representation model, the representation of a text is generated by using a deep-layer multi-layer bidirectional converter encoder, meanwhile, a given word sequence is divided into different fragments by respectively adding special word segmentation marks at the beginning and the end of an input sequence, mark embedding, segmentation embedding and position embedding are generated for the different fragments, and finally, a comment text and an aspect word are respectively converted to obtain context information representation and aspect information representation;
the feature extraction model of the aspect level emotion analysis comprises an important feature extraction model and an aspect feature positioning model; the important feature extraction model is an attention encoder based on a multi-head attention mechanism and used for learning interaction between the body word representation and the context representation and integrating the relationship between the body words and the context so as to distinguish the contributions of different sentences and aspect words to the classification result; the aspect feature positioning model is used for capturing aspect information during sentence modeling and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;
the emotion predictor connects the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words by using a vector splicing mode to obtain overall features, then performs data preprocessing on the overall features by adopting a linear function, and finally calculates the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity by utilizing a softmax function;
interactive semantics are extracted from aspect information representation and context information representation generated by a BERT model through a conversion encoder, the context which is most important for emotion qualification of aspect words is determined, meanwhile, long-term dependence information and context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and a final interactive hidden state of context interaction and a final interactive hidden state of the aspect words are obtained after mean value pooling operation, and the method specifically comprises the following steps:
s201, mapping a query sequence and a series of key K values V for capturing different important information in a parallel subspace from aspect information representation and context information representation generated by a BERT model through a plurality of self-attention mechanisms forming a multi-head attention mechanism in a conversion encoder;
s202, through an attention scoring function formula f s (Q,K,V)=σ(f e (Q, K)) V calculates an attention score for each important captured message, where σ (f) e (Q, K)) represents a normalized exponential function, f e (Q, K) is an energy function for learning the correlation characteristics between K and Q, and is calculated by the following formula;
s203, inputting the context expression and the aspect expression into the attention score function formula f mh (Q,K,V)=[a 1 ;a 2 ;...;a i ;...;a n-head ]W d Respectively obtaining long-term dependency information c of context cc And context-aware information t ca To capture long term dependencies of contexts and to determine which contexts are most important for sentiment characterization of the facet words; wherein, a i Attention score representing the ith important information captured, [ a ] 1 ;a 2 ,…,a i ;…;a n-head ]Denotes a concatenation vector, W d Is an attention weight matrix, c cc =f mh (E c ,E c ),t ca =f mh (E c ,E a );
S204, converting encoders with c respectively cc And t ca Generating hidden states h as input data to a position feedforward network c And h a The position feedforward network is a variant of the multi-layer perceptron and is denoted as PFN (h), the PFN (h), h c And h a The definition is as follows:
h c =PFN(c cc )
h a =PFN(t ca );
PFN(h)=ζ(hW 1 +b 1 )W 2 +b 2 ;
wherein, ζ (hW) 1 +b 1 ) Is a corrected linear unit, b 1 And b 2 Is an offset value, W 1 And W 2 Representing a learnable weight parameter;
s205. In the hidden state h c And h a After the mean value pooling operation is carried out, the final interactive hidden state h of the context interaction is obtained cm Final interactive hidden state h of context and aspect word am ;
The working process of the aspect feature positioning model is as follows, algorithm 1:
in particular, the feature localization algorithm represents E from the context according to the position and length of the facet words c Extracting the most important related information of the aspect word af; while taking the most important feature AF from the AF using max pooling, then performing a dropout operation on the most important feature AF, and representing E in context c Important characteristics h of the Chinese obtained aspect word af 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110670846.6A CN113705238B (en) | 2021-06-17 | 2021-06-17 | Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110670846.6A CN113705238B (en) | 2021-06-17 | 2021-06-17 | Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113705238A CN113705238A (en) | 2021-11-26 |
CN113705238B true CN113705238B (en) | 2022-11-08 |
Family
ID=78648134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110670846.6A Active CN113705238B (en) | 2021-06-17 | 2021-06-17 | Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113705238B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003726B (en) * | 2021-12-31 | 2022-04-08 | 山东大学 | Subspace embedding-based academic thesis difference analysis method |
CN114548099B (en) * | 2022-02-25 | 2024-03-26 | 桂林电子科技大学 | Method for extracting and detecting aspect words and aspect categories jointly based on multitasking framework |
CN116841609B (en) * | 2023-08-28 | 2023-11-24 | 中国兵器装备集团兵器装备研究所 | Method, system, electronic device and storage medium for supplementing code annotation information |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717334A (en) * | 2019-09-10 | 2020-01-21 | 上海理工大学 | Text emotion analysis method based on BERT model and double-channel attention |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11010559B2 (en) * | 2018-08-30 | 2021-05-18 | International Business Machines Corporation | Multi-aspect sentiment analysis by collaborative attention allocation |
CN110457480B (en) * | 2019-08-16 | 2023-07-28 | 国网天津市电力公司 | Construction method of fine granularity emotion classification model based on interactive attention mechanism |
CN112231478B (en) * | 2020-10-22 | 2022-06-24 | 电子科技大学 | Aspect-level emotion classification method based on BERT and multi-layer attention mechanism |
CN112199956B (en) * | 2020-11-02 | 2023-03-24 | 天津大学 | Entity emotion analysis method based on deep representation learning |
-
2021
- 2021-06-17 CN CN202110670846.6A patent/CN113705238B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717334A (en) * | 2019-09-10 | 2020-01-21 | 上海理工大学 | Text emotion analysis method based on BERT model and double-channel attention |
Also Published As
Publication number | Publication date |
---|---|
CN113705238A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113705238B (en) | Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model | |
CN109753566A (en) | The model training method of cross-cutting sentiment analysis based on convolutional neural networks | |
CN110929030A (en) | Text abstract and emotion classification combined training method | |
CN110192203A (en) | Joint multitask neural network model for multiple natural language processings (NLP) task | |
Tripathy et al. | Comprehensive analysis of embeddings and pre-training in NLP | |
Zhang et al. | Aspect-based sentiment analysis for user reviews | |
Lopes et al. | An AutoML-based approach to multimodal image sentiment analysis | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN117453921B (en) | Data information label processing method of large language model | |
CN117574904A (en) | Named entity recognition method based on contrast learning and multi-modal semantic interaction | |
Dhar et al. | Bengali news headline categorization using optimized machine learning pipeline | |
Dangi et al. | An efficient model for sentiment analysis using artificial rabbits optimized vector functional link network | |
Hakimov et al. | Evaluating architectural choices for deep learning approaches for question answering over knowledge bases | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
CN115964497A (en) | Event extraction method integrating attention mechanism and convolutional neural network | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN115169429A (en) | Lightweight aspect-level text emotion analysis method | |
CN114595324A (en) | Method, device, terminal and non-transitory storage medium for power grid service data domain division | |
Wang et al. | Event extraction via dmcnn in open domain public sentiment information | |
Wu et al. | Intelligent Customer Service System Optimization Based on Artificial Intelligence | |
Manshani et al. | Sentiment Analysis: A comparative study of Deep Learning and Machine Learning | |
Li et al. | Sentiment Analysis of User Comment Text based on LSTM | |
Wang | An Analysis of the Historical Process of Cultural Confidence in Ideological and Political Education Based on Deep Learning | |
Wang et al. | Entity recognition based on heterogeneous graph reasoning of visual region and text candidate | |
Bharath et al. | FANCFIS: ensemble deep learning based features learning with a novel fuzzy approach for sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20211126 Assignee: Pubei County Hongwei Nut Planting Co.,Ltd. Assignor: WUZHOU University Contract record no.: X2023980046051 Denomination of invention: Aspect level sentiment analysis method and system based on BERT and aspect feature localization model Granted publication date: 20221108 License type: Common License Record date: 20231108 |
|
EE01 | Entry into force of recordation of patent licensing contract |