CN115169429A

CN115169429A - Lightweight aspect-level text emotion analysis method

Info

Publication number: CN115169429A
Application number: CN202210390699.1A
Authority: CN
Inventors: 曹小鹏; 梁浩; 王凯丽
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-10-11

Abstract

The invention provides a lightweight aspect-level text emotion analysis method, which solves the problem that a DNN-based method only focuses on analyzing the correlation between global context and emotion polarity before recognizing the emotion polarity of a target aspect based on global context characteristics, and the technical scheme mainly comprises the following steps: the method comprises the steps of (1) inputting a data set subjected to data cleaning into a Distilroberta pre-training model (2), inputting word vectors represented by vectorization into an SRU + + feature extraction network (3), setting SRD thresholds according to different data sets, extracting local context features (4), performing interactive learning on the local and global context features by using a multi-head attention mechanism, and predicting emotion polarity by using a Softmax function to obtain probability distribution of emotion categories of words in corresponding aspects. The method is mainly applied to text emotion analysis application.

Description

Lightweight aspect-level text emotion analysis method

Technical Field

The invention belongs to the field of computer natural language processing, and particularly relates to a lightweight aspect-level text sentiment analysis method.

Background

In the big data age, digital information interacted in the internet is continuously generated at an exponential level, wherein user comments occupy a considerable weight. However, the process of analyzing user information from large-scale user comment data by manual means is time-consuming and labor-consuming, and mass extraction and analysis by a computer gradually become the mainstream of information analysis means.

The emotion analysis can acquire the viewpoint, emotion and attitude expressed by the user in the comment information by analyzing and extracting the emotion information in the comment of the user. Emotion analysis can be divided into chapter-level emotion classification, sentence-level emotion classification and Aspect-level emotion classification according to the research granularity of texts, wherein Aspect-level emotion analysis (ABSA) can more accurately judge the emotion polarities in different aspects in a sentence due to the small text granularity, and becomes an important research direction [1] in the field of emotion analysis, for example: in the sentence "While the food is a good and a porous way waiting can be a rightmare," the emotion of "food" is positive and the emotion of "waiting" is negative. Since the two aspect items express opposite emotions, it is not appropriate to assign only one sentence level emotion polarity, and at this time, the analysis of the whole sentence cannot accurately extract the emotion information of the user for various aspects and attributes of the product, so that a fine-grained emotion analysis, that is, the aspect level emotion analysis, needs to be adopted in this problem. The method can be used for finishing a fine-grained sentiment analysis task, namely mining sentiment information of different aspects of the comment text.

The starting layer of the aspect level sentiment analysis model is a word embedding layer, and the purpose of the layer is to map each input word into a low-dimensional vector. Depending on the downstream task, it is important to select the appropriate word embedding tool. The mainstream Word embedding Word2vec Model is divided into a skip-gram Model and a Bag of Words (CBOW) Model. The word vector formed by the model can express the similarity between different words, but does not fuse the overall context semantics. The pre-training model obtains a pre-training model independent of a specific task from large-scale data through self-supervised learning. The semantic representation of a certain word in a specific context is better reflected. A Glove pre-training corpus obtained from large-scale corpus training can construct vectors according to co-occurrence information among words. However, in practical applications, the word vectors generated by Glove fail to effectively incorporate context information. The subsequent Elmo model and the BERT model can fuse context information in the word embedding process, and effectively solve the problem of word ambiguity. The method has the advantages that when the BEART classification model is applied to the aspect level emotion analysis task by Hoang and the like and Gao and the like, the obtained classification effect is higher than that of a model constructed by other word embedding tools. The BERT model can efficiently acquire context information, and the word embedding expression generated by the model can effectively solve the problem of word ambiguity. Song et al propose BERT-SPC based on BERT, which model prepares the input sequence by attaching aspects to a context, with the context and aspects as two segments. Li et al introduced a new approach named GBCN that embedded a gating mechanism with context-aware aspects to enhance and control BERT representation of aspect-based emotional analysis.

Long Short Term Memory networks (LSTM) are well suited to handle serialized data, but using LSTM directly can cause the facet words to be ignored, resulting in the overall sentiment of the sentence. Attention mechanisms are then introduced to this task. Wang et al propose AT-LSTM, encode the context, input into LSTM and extract the semantic information, reuse the attention mechanism to further highlight the characteristic that helps the emotion to distinguish, distinguish the emotion again, has ignored the influence that the word of the aspect is on the emotion prediction; wang et al also proposed an AT-LSTM-based ATAE-LSTM that concatenates together representations of aspect words and context and uses an attention mechanism to model aspects and context. Considering that an aspect item may be a phrase consisting of multiple words, and none of the previous models model the aspect word separately, ma et al propose an IAN (Interactive Attention Networks) that models the aspect word and context separately and links them together through an Attention mechanism.

Conventional DNN-based approaches focus on analyzing the global context's correlation to emotional polarity only, before identifying the emotional polarity in terms of targets based on global context features. Zeng et al propose a Local Context Focus (LCF) model that is different from the global Context Focus approach described above. The LCF model uses the word Distance in the sentence sequence as the Semantic-Relative Distance (SRD) to get the local context representation. The model notes that the emotional polarity of an aspect is more affected by context words closer to itself, and that context words further away from an aspect may negatively impact the accuracy of prediction of a particular aspect polarity. As in the above example, the opinion word "good" is closer to "food" in the sentence sequence, and "nightmare" is closer to "waiting", so it can be said that the farther away words have less influence on the emotional polarity of the aspect. However, the LCF baseline model uses a self-attention network for coding, so that the model is slow to converge during training, and an efficient representation cannot be trained due to a small data set. In addition, local and global context Representations obtained by two BERT (Bidirectional Encoder retrieval from transforms) models of LCF-BERT greatly increase the parameter number of the models, and an LGLFF model is proposed for the purpose.

Disclosure of Invention

The invention provides a text sentiment analysis method with multi-feature fusion, which can efficiently and accurately analyze the sentiment polarity of sentences in a text, and the technical scheme of the invention mainly comprises the following steps:

1. performing word embedding and coding representation on the global context by using a Distilroberta pre-training model; 2. using an SRU + + network to extract features to obtain global features; 3. adjusting the size of a Semantic-Relative Distance (SRD) threshold according to different data sets to mask the global context representation to obtain a local context representation; 4. interactive learning modeling is carried out on the global context and the local context characteristics by using multi-head attention; 5. after the interactive features obtained through the multi-head attention mechanism learning are obtained, dimension reduction is carried out through pooling, and the learned representations are collected together. And finally, predicting the emotion polarity by utilizing a Softmax layer to obtain the probability distribution of the emotion categories of the corresponding aspect words.

The invention has the following effects: by applying the method to the Laptop and SemEval-2014 Restaurant review data set Restaurant and the introduced ACL 14twitter public social data set, the accuracy and the F1 value of the optimal experimental result on the Restaurant data set are 87.23% and 81.78% respectively, the accuracy and the F1 value of the optimal experimental result on the Laptop data set are 81.46% and 78.31% respectively, and the accuracy and the F1 value of the optimal experimental result on the twitter data set are 77.11% and 76.2% respectively. The emotion analysis effect is superior to that of the traditional model.

Drawings

FIG. 1 model structure diagram

FIG. 2 graph of pre-training model results

FIG. 3 Pre-training model vectorized representation

FIG. 4 feature extraction network diagram

Detailed Description

The specific implementation of the invention is divided into four steps: 1. inputting the data set subjected to data cleaning into a Distilroberta pre-training model; 2. inputting the vectorized word vector into an SRU + + feature extraction network; 3. setting SRD threshold values according to different data sets to extract local context features; 4. performing interactive learning on local and global context characteristics by using a multi-head attention mechanism; 5. and predicting the emotion polarity by utilizing a Softmax layer to obtain the probability distribution of the emotion categories of the corresponding aspect words. Firstly, mapping each input word into a low-dimensional vector by using a pre-training model, then extracting the features of the low-dimensional vector, then determining the value of the SRD according to the difference of data sets, interactively learning the local and global features, and finally predicting the emotion polarity.

The structure of the method is shown in figure 1:

(1) Text vectorization

For the aspect-level emotion classification task, the input sequence prepared for the model is generally composed of a context sequence and an aspect sequence, which enables the model to learn the relevance of the context and the aspect. Let s = { w = { [ w ] ₀ ,w ₁ ,…,w _n Is a sequence of input contexts containing facets, the sequence containing n words containing facet targets.

Is a device comprising an object aspectAnd (4) sequencing. s ^t Is a subsequence of s consisting of m (m.gtoreq.1) words.

The first layer of the LGLFF model is the input layer, which aims to convert the context text sequence coding into a serialized vector representation, and this layer is mainly composed of a word embedding layer and a coding layer. The LGLFF model uses the BERT-SPC input proposed by Song et al, which is a sentence pair classification task that pre-trains the BERT model, which prepares the input sequence by appending aspects to a context, treating the context and aspects as two fragments. The global input sequence of the BERT-SPC construct of the ABSA task is "[ CLS ] + s + [ SEP ] + [ ASP ] + [ SEP ]".

Distilroberta is a pre-training model designed based on a knowledge distillation method, and the design is based on Transformers models. Through knowledge distillation, a large amount of knowledge coded in the large-scale TeacherBERT model can be transferred to the small-scale student BERT model to achieve the purpose of reducing the scale of the model, so that the Distilroberta model can accelerate reasoning speed, reduce the scale of model parameters and keep accuracy.

Distilroberta Pre-training model sequence w ₁ ,w ₂ ,w ₃ ,…,w _n As input, T ₁ ,T ₂ ,T ₃ ,…,T _n Vector representation as output of the Distilroberta model. Wherein each vector T _i Mapping to each word w in the sequence _i . Distilroberta learns the context information for each word in the input sequence using a Transformer encoder. The Transformer encoder generates context embedding using a self-attention mechanism. The context embedding extracted for each word is connected into a vector to represent the semantic information present in the input sequence.

The pre-training model structure is shown in fig. 2:

the pre-training process is shown in fig. 3:

(2) Global feature extraction

The RNN model deals with the timing relationship between sequences through a memory unit, which is one of the most common models for sequence information analysis. However, in the conventional RNN models, such as LSTM and GRU, the calculation of the output state at the current time must be performed after the output state at the previous time is completely completed, and parallel calculation cannot be implemented. The dependency between the front time step and the back time step enables the loop network to be much slower than other models, the operation efficiency is limited, the parallel processing can not be realized, and the limitation is removed by a Simple loop recovery Unit (SRU). And in the model, the improved SRU + + of the SRU is adopted for feature extraction. SRU + + adds a mechanism of attention to SRU to improve learning of dependency relationships between the current word and other words.

The SRU + + part of the calculation process is as follows:

f _t ＝σ(U[t，0]+v⊙c _t-1 +b)

r _t ＝σ(U[t，1]+v′⊙c _t-1 +b′)

c _t ＝f _t ⊙c _t-1 +(1-f _t )⊙U[t，2]

H ^g ＝h _t ＝r _t ⊙c _t +(1-r _t )⊙x _t

wherein U [ t,0]、U[t,1]And U [ t,2 ]]Replaces Wx in SRU _t 、W′x _t And W' x _t Improving SRU from a code level to improve the parallelism capability, wherein sigma represents a sigmoid function; an indication of a matrix element multiplication; w, W ', v and v' are learnable weight matrices; b. b' is an offset value; f. of _t 、c _t 、r _t And h _t Respectively representing the forgetting gate, the t-time hidden state, the reset gate and the t-time state output; x is the number of _t The word vector input at time T is the row vector of the word vector matrix T. H ^g I.e. the extracted global context feature representation. The simple cyclic unit no longer depends on the output h of the last moment _t-1 And parallel processing can be realized.

The process of feature extraction is as shown in fig. 4:

(3) Local feature extraction

1. Semantic correlation distance: the model uses semantic relative distances, SRDs, to determine whether a context word is a local context for a particular aspect.

2. Contextual Dynamic Weighting (CDW): the context features semantically related to the target aspect will be preserved, while the non-semantically related context features will be weighted to be attenuatedAnd (4) subtracting. Features of contextual words that are far from the target aspect will be attenuated according to the SRD. The CDW constructs a weighted vector W for each context word with relatively weak semantics _i To weight the features to construct a mask matrix M.

Representing H according to a mask matrix M and a global context ^g The local context representation H can be computed ^l ：H ^l ＝H ^g ·M

(4) Feature interactive learning

The model uses multi-head self-attention to carry out emotion classification in the aspect of feature extraction, and the multi-head nature is a plurality of independent self-attention calculations, so that the model can be allowed to learn related information in different expression subspaces. Global context H obtained through word embedding layer and feature extraction layer ^g And local context H ^l And splicing vector representation, and performing interactive learning through a multi-head self-attention mechanism to obtain features relevant to aspects in context and suppress useless features.

O _d ＝W _d ([H ^l ；Hgl)+b _d O _m ＝MHSA(O _d )

Wherein, W _d ，b _d Respectively, a parameter matrix and a bias vector for the linear layer.

(5) Emotional polarity prediction

After the interactive features obtained through the multi-head attention mechanism learning are obtained, dimension reduction is carried out through pooling, and the learned representations are collected together. And finally, predicting the emotion polarity by using a Softmax layer to obtain the probability distribution of the emotion categories of the corresponding aspect words, wherein the maximum probability value is the emotion category to which the corresponding aspect class belongs.

Wherein K is the number of the three classification labels, and Y is the predicted emotion polarity output by the model.

Evaluation indexes are as follows:

in order to verify the performance of the proposed model, the classification Accuracy and the F1 score are used for evaluation, and the calculation process is as follows.

Wherein TP indicates that both the predicted tag and the true tag are positive; FP indicates that the predicted tag is positive and the true tag is negative; TN indicates that both the predictive tag and the authentic tag are negative; FN indicates that the predicted tag is negative and the true tag is positive; precision is Precision and Recall is Recall.

Table 1: results of the experiment

Compared with the LSTM, the accuracies Acc of the AT-LSTM are respectively improved by 2.23%, 2.47% and 1.91% on the three public data sets, and the accuracies Acc of the F1 are respectively improved by 1.09%, 2.30% and 1.87%, so that the judgment of the model on the emotion is effectively improved by further extracting important emotion characteristics from the context by an attention mechanism; ATAE-LSTM connects facet word coding with context coding on the basis of AT-LSTM, preliminarily associates facet word coding information with context, respectively promotes Acc on Rest14 and Twitter data sets by 0.24% and 0.69%, and the effect of fine-grained facet words on emotion prediction cannot be reflected under simple connection; the IAN model further considers the interaction between the aspect words and the targets, compared with the ATAE-LSTM, on three public data sets, the accuracy Acc is respectively improved by 3.85%, 3.35% and 2.06%, the accuracy F1 is respectively improved by 4.28%, 3.45% and 3.07%, the interaction information between the context and the aspect words is fused into the model, the effect improvement is quite obvious, and the importance of the interaction information between the aspect words and the context is further reflected; compared with BERT-SPC and BERT-PT, the introduction of external knowledge greatly reserves context semantic information during text vectorization, better reserves the context semantic information during word vector generation for the framework of the ABSA sentences for classification tasks, and respectively improves the accuracy Acc by 1.03 percent and 2.18 percent and the accuracy F1 by 1.83 percent and 2.33 percent on three public data sets; in an LCF-BERT model, external knowledge and attention mechanism are introduced, a Local Context (LCF) mechanism is provided at the same time, and the capture capability of local important semantic information is effectively improved. After the global and local context features are obtained, the feature interaction learning layer effectively learns the context features related to the aspect by using a multi-head attention mechanism, namely, the emotion polarity beyond the SRD threshold range is not ignored while the local context related to the aspect is emphasized.

In conclusion, the LGLFF model is effective in an aspect-level emotion analysis task, vectorization representation and context semantic extraction of a Distilroberta module, extraction of features by an SRU + + module and learning of local and global context and aspect item interaction information play a certain role in improving emotion judgment effects.

The above examples are merely illustrative of the present invention and should not be construed as limiting the scope of the invention, which is intended to be covered by the claims as well as any design similar or equivalent to the scope of the present invention.

Claims

1. A multi-model dynamic collaborative semantic matching method is characterized by comprising the following steps:

(1) Defining meta-models and domain models: according to the model scale and the processing task, defining the model as a unique meta model and a plurality of domain models, wherein the domain models can be dynamically added with configuration;

(2) Multi-model pre-training: respectively training word vectors aiming at different tasks, pre-training a meta-model by using a universal field data set to obtain a universal word vector, and pre-training a domain model by using a certain data set to obtain the field word vector;

(3) Calculating the text similarity of each model: the meta-model and the domain model respectively obtain the matching similarity of the meta-model and the matching similarity of each domain model through matching the pyramid model;

(4) Setting a cooperation rule, and calculating text similarity: and combining the matching similarity of the meta model and the domain model through setting rules to calculate the text similarity.