CN111339260A

CN111339260A - BERT and QA thought-based fine-grained emotion analysis method

Info

Publication number: CN111339260A
Application number: CN202010136542.7A
Authority: CN
Inventors: 谭祥; 车海莺
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2020-06-26

Abstract

The invention relates to the field of text emotion analysis in natural language processing, in particular to a fine-grained emotion analysis method based on BERT and QA ideas, which changes an original single-sentence task into a sentence-pair task by adding auxiliary sentences in a data set and can utilize the characteristic that a BERT model has better effect in the sentence-pair task; meanwhile, a downstream task of the BERT model is changed into a question-answering task by adding an auxiliary sentence, namely, emotion polarities corresponding to the extracted aspect are found from the auxiliary sentence, and the original two-step subtasks can be completed by one step by using the method provided by the technology; according to the invention, by utilizing the BERT model, the downstream task is changed into the QA task by constructing the auxiliary sentence, and the BERT model is finely adjusted, so that the effects on two subtasks of aspect extraction task and emotion classification are improved, a plurality of aspect words can be extracted, the model efficiency is improved, and the related redundant workload is reduced.

Description

BERT and QA thought-based fine-grained emotion analysis method

Technical Field

The invention relates to the field of text emotion analysis in natural language processing, in particular to a fine-grained emotion analysis method based on BERT and QA ideas.

Background

At present, text comments (including social comments, news comments, commodity evaluations and the like) on the internet have academic value and commercial value, and the comments or the comments can be subjected to sentiment analysis to identify the sentiment attitude of a corresponding user to a certain event or commodity and the like, for example, a social platform and a news portal can utilize information after sentiment analysis to carry out targeted marketing and pushing on the user; the E-commerce platform can analyze attributes of various aspects of the commodity by utilizing an emotion analysis technology to obtain the evaluation of the consumers, so that the browsing time of other consumers is saved, and the evaluation of the commodity can be subjected to refined label display instead of simple good evaluation and poor evaluation.

Currently, emotion analysis can be divided into two types, namely coarse-grained type and fine-grained type according to the granularity of task attention objects, wherein the coarse-grained task attention objects are at a document level and a sentence level, and the fine-grained task attention objects are at an Aspect (Aspect) level, wherein the Aspect can be a word or a plurality of words. Because the coarse-grained emotion analysis is to give corresponding emotion polarities to a document or a sentence, such a task has a great defect, because one sentence may contain emotion polarities different from multiple attributes, the coarse-grained emotion analysis can neither find out which attributes or aspects of commodities or news are expressed by a user, nor know which emotion is expressed to a specific attribute, and is difficult to comprehensively cover the emotion expression of the user, so the fine-grained emotion analysis has a great research value.

The fine-grained emotion analysis task has two subtasks, namely aspect extraction and aspect level emotion classification. In the prior art, the aspect extraction and emotion classification tasks are mainly and respectively performed, and improvement is performed on the two subtasks. The existing technology of the aspect extraction task mainly comprises unsupervised learning, semi-supervised learning and supervised learning, wherein deep learning related models in the supervised learning, such as LSTM, CRF, BERT and the like, have excellent effects; the prior art of the emotion classification task mainly comprises an emotion dictionary-based method, a machine learning-based method and a deep learning-based method. The deep learning method has better performance in aspect extraction and emotion classification tasks compared with other methods.

However, the main problem of the current fine-grained emotion analysis task is that the emotion analysis task is performed on text data, and some deep learning models such as CNN, RNN, LSTM and the like need to be pre-trained on a large number of text data sets, which takes a long time; secondly, the current fine-grained emotion analysis tasks are mainly step-by-step, after the sub-tasks are extracted on the aspect in a unified manner, the emotion polarity of the corresponding aspect is judged for each sentence again, and therefore efficiency is low.

Disclosure of Invention

The invention aims to provide a method for analyzing fine-grained emotion based on BERT and QA ideas, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a fine-grained emotion analysis method based on BERT and QA ideas comprises the following steps:

the method comprises the following steps: selecting a SemEval2014 year data set as a corpus, and adding auxiliary sentences to all text data;

step two: segmenting the processed data text, marking a sentence X by using an 30552 word vocabulary, connecting a [ CLS ] mark at the beginning of the sentence, adding an [ SEP ] mark between an auxiliary sentence and an original sentence, and generating an input sequence X, wherein the method specifically comprises the following steps: [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ];

step three: vectorizing the input sequence X, representing each word in the input text sequence by using a pre-trained word feature vector to obtain a word vector h of the input text⁰Expressed as:

h^o∈R^(n+2)×h

where h is the size of the hidden layer;

step four: the word vector h obtained in the third step⁰Training as input to the Transformer block of the L layerWord vector h of the syntactical sentenceⁱExpressed as:

step five: constructing a heuristic aspect extractor capable of extracting a plurality of aspects, which specifically comprises the following sub-steps:

step 5.1: word vector h based on fused sentence semanticsⁱIn the aspect of training data, a BERT model is trained by using a gradient descent method, and parameters are updated;

defining the probability p of the start and end positions of the corresponding aspect of each sentence^sAnd p^eNamely:

g^s＝w_sh^L,p^s＝softmax(g^s)

g^e＝w_eh^L,p^e＝softmax(g^e)

wherein, w_s∈R^h，w_e∈R^hIs a weight vector to be trained, and softmax is a normalization function;

step 5.2: pre-marking the boundary of the target entity in the training data set to obtain a marking list T and obtain a starting vector y^s∈R⁽ⁿ⁺²⁾And an end vector y^e∈R⁽ⁿ⁺²⁾Wherein each element

Indicating whether the ith token is the start of a target aspect,

indicating whether the ith token is the end of an aspect;

minimizing the sum of the negative logarithms of the probabilities of the real start position and the end position, and training a model by gradient descent:

step 5.3: redundancy invalidation occurs when multiple parties are extracted, for example:

original sentence: i like the food but the service was so awful!

Food, service in real aspect

And (3) prediction aspect: food, food but the service, service was so awful, service …

A heuristic multi-boundary algorithm is provided, and pseudo codes are as follows:

wherein, g^sScore, g, representing the starting position^eScore representing the end position, γ is a hyperparameter, is a set minimum score threshold, and K is the maximum number in terms of a single sentence:

the algorithm mainly comprises the following steps:

a, initializing three sets R, U and O (line 1);

b, g obtained according to the trained weight vector^s，g^eIn the method, the first M position index sets S and E with high scores are selected, wherein S is_iSubscript, e, indicating the ith starting position in the set S_jThe subscript indicating the jth end position in set E (line 2);

c at the end position not less than the start position and

add the values of(s) over γ (lines 3-8), the candidate boundary(s)_i，e_j) Add to set R (lines 7-8) and apply heuristic regularization

Add to set U (line 6, line 8);

d, removing redundant boundaries in the set R by using a non-maximum suppression algorithm, namely when the set R is not an empty set and the set R is a setWhen the size of O is less than the value K (lines 9-14), the maximum value U in the set U at that time is removed from the set R_iCorresponding boundary r_lAnd combining the boundary r_lAdd to set O (lines 10-11); when R is in the set R_lOverlapping r_kThen, i.e., checking whether there is overlap using a fine-grained F1 numerical measure, the corresponding bounds and values are removed from the set R and the set U (lines 12-14), i.e., redundant aspects are removed from the candidate aspects;

e, obtaining a boundary set O (line 15) of the starting position and the ending position corresponding to the multiple aspects, namely extracting the aspect words and ending;

step six: and splicing the extracted feature word vectors of the aspect words and the feature word vectors of the auxiliary sentences by utilizing self-attention operation to obtain auxiliary feature word vectors with semantic fusion, and recording the auxiliary feature word vectors as h'.

Step seven: predicting the emotion polarity of the corresponding aspect of the auxiliary feature word vector through a feedforward neural network, and the specific process is as follows:

step 7.1: the emotion polarity value is obtained by using the Tanh activation function between two linear transformations:

wherein, W_pAnd W_h＇Two parameter matrixes to be trained are provided;

the emotion polarity probability is normalized by a softmax function to a polarity value,

step 7.2: gradient training parameter matrix W using negative log probability that minimizes true emotion polarity prediction probability_p，W_h＇And a fine-tuning BERT model, expressed as:

wherein, y^pFor true emotion polesThe linear one-hot represents a vector;

step 7.3: using formulas

After the BERT model is trained in the gradient descent, the emotion polarity probability of each auxiliary feature word vector h' is solved, wherein the emotion word corresponding to the value with the maximum probability is the emotion polarity of the corresponding aspect word.

Preferably, the data set selected in the first step includes three data sets, namely LAPTOP, REST and TWITTER.

Preferably, the content of the auxiliary sentence in the first step is: "positive or negative or neutral".

Preferably, the heuristic regularization number in the algorithm process of the step 5.3

Representing the sum of the start position and end position values minus the facet length.

Preferably, the input sequence is a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.

Compared with the prior art, the invention has the beneficial effects that: mainly comprises the following aspects:

when the BERT model used by the invention is proposed by Google, the BERT model is pre-trained on a large number of text data sets, and compared with models such as CNN, RNN and LSTM, the method can reduce the steps of pre-training and the complex workload;

in a fine-grained emotion analysis task, compared with a step-by-step approach, a trained model is used for extracting a target entity, and then another model is used for judging the corresponding emotion polarity, so that the model training efficiency can be improved;

the performance of the BERT model on the task is more excellent than that of a single sentence task, and the characteristic of the BERT model can be utilized by adding an auxiliary sentence mode;

compared with a sequence method, experiments show that the method for marking the boundary has better performance, and a heuristic extractor is embedded by utilizing the method for marking the boundary, so that multiple target aspects can be output at one time, namely, one sentence contains multiple aspects.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a fine-grained emotion analysis method based on BERT and QA ideas comprises the following steps:

h^o∈R^(n+2)×h

where h is the size of the hidden layer;

step four: the word vector h obtained in the third step⁰As the input of the Transformer block of the L layer, training to obtain a word vector h fusing sentence semanticsⁱExpressed as:

g^s＝w_sh^L,p^s＝softmax(g^s)

g^e＝w_eh^L,p^e＝softmax(g^e)

Indicating whether the ith token is the start of a target aspect,

indicating whether the ith token is the end of an aspect;

original sentence: i like the food but the service was so awful!

Food, service in real aspect

the algorithm mainly comprises the following steps:

a, initializing three sets R, U and O (line 1);

c at the end position not less than the start position and

Add to set U (line 6, line 8);

d, removing redundant boundaries in the set R by using a non-maximum suppression algorithm, namely removing the maximum value U in the set U at the moment from the set R when the set R is not an empty set and the size of the set O is smaller than the value K (lines 9-14)_iCorresponding boundary r_lAnd combining the boundary r_lAdd to set O (lines 10-11); when R is in the set R_lOverlapping r_kThen, i.e., checking whether there is overlap using a fine-grained F1 numerical measure, the corresponding bounds and values are removed from the set R and the set U (lines 12-14), i.e., redundant aspects are removed from the candidate aspects;

wherein, W_pAnd W_h＇Two parameter matrixes to be trained are provided;

wherein, y^pOne-hot representing vector for true emotion polarity;

step 7.3: using formulas

Further, the data set selected in the first step includes three data sets, namely, LAPTOP, REST, and TWITTER.

Further, the content of the auxiliary sentence in the first step is as follows: "positive or negative or neutral".

Further, the heuristic regularization number in the algorithm process of the step 5.3

Furthermore, the input sequence refers to a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.

The model experiment of the invention is to respectively compare and extract the experimental effects of two tasks of entity classification and emotion classification by using the same training set and test set under the same condition.

Wherein the F1 value is used as an index for evaluating and extracting entity tasks, and Accuracy (Accuracy rate) is used as an index for evaluating emotion analysis tasks.

Wherein TP represents that the model prediction is true and the data true value is true, namely, the model prediction is correctly accepted;

FP indicates that the model prediction is true but the data true value is false, i.e. false acceptance;

TN indicates that the model prediction is false but the data true value is false, i.e. correct rejection;

FN indicates that the model prediction is false but the data true value is true, i.e., false rejection.

In terms of the task of extracting entities, compared to the BERT model + CRF model, DE-CNN (the model which currently extracts entities and represents the best excellence), the F1 values are as follows:

on the LAPTOP dataset and TWITTER dataset, the model proposed by the present invention works best among the three models.

In terms of emotion classification task, Accuracy values are as follows compared to MGAN, TNet (the model that currently emotion classification represents the best show):

	LAPtop	REST	TWITTER
				MGAN	75.39	-	-
Tnet	76.54	-	-
				model of the invention	83.37	88.43	76.27

The model proposed by the invention works best among the three models on the LAPTOP dataset, the REST dataset and the TWITTER dataset.

According to the invention, by utilizing the BERT model, the downstream task is changed into the QA task by constructing the auxiliary sentence, and the BERT model is finely adjusted, so that the effects on two subtasks of aspect extraction task and emotion classification are improved, a plurality of aspect words can be extracted, the model efficiency is improved, and the related redundant workload is reduced.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A fine-grained emotion analysis method based on BERT and QA ideas is characterized by comprising the following steps: the method comprises the following steps:

h^o∈R^(n+2)×h

where h is the size of the hidden layer;

g^s＝w_sh^L,p^s＝softmax(g^s)

g^e＝w_eh^L,p^e＝softmax(g^e)

step 5.2: pre-marking the boundary of the target entity in the training data set to obtain a marking list T and obtain a starting vector y^s∈R⁽ⁿ⁺²⁾And an end vector y^e∈R⁽ⁿ⁺²⁾Each of whichElement(s)

Indicating whether the ith token is the start of a target aspect,

indicating whether the ith token is the end of an aspect;

original sentence: i like the food but the service was so awful!

Food, service in real aspect

the algorithm mainly comprises the following steps:

a, initializing three sets R, U and O (line 1);

b, g obtained according to the trained weight vector^s，g^eIn the method, the first M position index sets S and E with high scores are selected, wherein S is_iPresentation setSubscript of ith start position in S, e_jThe subscript indicating the jth end position in set E (line 2);

c at the end position not less than the start position and

Add to set U (line 6, line 8);

wherein, W_pAnd W_h＇Two parameter matrixes to be trained are provided;

wherein, y^pOne-hot representing vector for true emotion polarity;

step 7.3: using formulas

2. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the data set selected in the first step comprises three data sets of LAPTOP, REST and TWITTER.

3. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the content of the auxiliary sentence in the first step is as follows: "positive or negative or neutral".

4. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: said step 5.3 algorithmic ProcessNormalized number

5. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the input sequence is a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.