CN111339260A - BERT and QA thought-based fine-grained emotion analysis method - Google Patents
BERT and QA thought-based fine-grained emotion analysis method Download PDFInfo
- Publication number
- CN111339260A CN111339260A CN202010136542.7A CN202010136542A CN111339260A CN 111339260 A CN111339260 A CN 111339260A CN 202010136542 A CN202010136542 A CN 202010136542A CN 111339260 A CN111339260 A CN 111339260A
- Authority
- CN
- China
- Prior art keywords
- sentence
- auxiliary
- sequence
- bert
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 66
- 238000004458 analytical method Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 27
- 239000013598 vector Substances 0.000 claims description 48
- 238000012549 training Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 5
- FSCNUJMKSQHQSY-UHFFFAOYSA-N Gein Chemical compound COC1=CC(CC=C)=CC=C1OC1C(O)C(O)C(O)C(COC2C(C(O)C(O)CO2)O)O1 FSCNUJMKSQHQSY-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000007935 neutral effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the field of text emotion analysis in natural language processing, in particular to a fine-grained emotion analysis method based on BERT and QA ideas, which changes an original single-sentence task into a sentence-pair task by adding auxiliary sentences in a data set and can utilize the characteristic that a BERT model has better effect in the sentence-pair task; meanwhile, a downstream task of the BERT model is changed into a question-answering task by adding an auxiliary sentence, namely, emotion polarities corresponding to the extracted aspect are found from the auxiliary sentence, and the original two-step subtasks can be completed by one step by using the method provided by the technology; according to the invention, by utilizing the BERT model, the downstream task is changed into the QA task by constructing the auxiliary sentence, and the BERT model is finely adjusted, so that the effects on two subtasks of aspect extraction task and emotion classification are improved, a plurality of aspect words can be extracted, the model efficiency is improved, and the related redundant workload is reduced.
Description
Technical Field
The invention relates to the field of text emotion analysis in natural language processing, in particular to a fine-grained emotion analysis method based on BERT and QA ideas.
Background
At present, text comments (including social comments, news comments, commodity evaluations and the like) on the internet have academic value and commercial value, and the comments or the comments can be subjected to sentiment analysis to identify the sentiment attitude of a corresponding user to a certain event or commodity and the like, for example, a social platform and a news portal can utilize information after sentiment analysis to carry out targeted marketing and pushing on the user; the E-commerce platform can analyze attributes of various aspects of the commodity by utilizing an emotion analysis technology to obtain the evaluation of the consumers, so that the browsing time of other consumers is saved, and the evaluation of the commodity can be subjected to refined label display instead of simple good evaluation and poor evaluation.
Currently, emotion analysis can be divided into two types, namely coarse-grained type and fine-grained type according to the granularity of task attention objects, wherein the coarse-grained task attention objects are at a document level and a sentence level, and the fine-grained task attention objects are at an Aspect (Aspect) level, wherein the Aspect can be a word or a plurality of words. Because the coarse-grained emotion analysis is to give corresponding emotion polarities to a document or a sentence, such a task has a great defect, because one sentence may contain emotion polarities different from multiple attributes, the coarse-grained emotion analysis can neither find out which attributes or aspects of commodities or news are expressed by a user, nor know which emotion is expressed to a specific attribute, and is difficult to comprehensively cover the emotion expression of the user, so the fine-grained emotion analysis has a great research value.
The fine-grained emotion analysis task has two subtasks, namely aspect extraction and aspect level emotion classification. In the prior art, the aspect extraction and emotion classification tasks are mainly and respectively performed, and improvement is performed on the two subtasks. The existing technology of the aspect extraction task mainly comprises unsupervised learning, semi-supervised learning and supervised learning, wherein deep learning related models in the supervised learning, such as LSTM, CRF, BERT and the like, have excellent effects; the prior art of the emotion classification task mainly comprises an emotion dictionary-based method, a machine learning-based method and a deep learning-based method. The deep learning method has better performance in aspect extraction and emotion classification tasks compared with other methods.
However, the main problem of the current fine-grained emotion analysis task is that the emotion analysis task is performed on text data, and some deep learning models such as CNN, RNN, LSTM and the like need to be pre-trained on a large number of text data sets, which takes a long time; secondly, the current fine-grained emotion analysis tasks are mainly step-by-step, after the sub-tasks are extracted on the aspect in a unified manner, the emotion polarity of the corresponding aspect is judged for each sentence again, and therefore efficiency is low.
Disclosure of Invention
The invention aims to provide a method for analyzing fine-grained emotion based on BERT and QA ideas, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a fine-grained emotion analysis method based on BERT and QA ideas comprises the following steps:
the method comprises the following steps: selecting a SemEval2014 year data set as a corpus, and adding auxiliary sentences to all text data;
step two: segmenting the processed data text, marking a sentence X by using an 30552 word vocabulary, connecting a [ CLS ] mark at the beginning of the sentence, adding an [ SEP ] mark between an auxiliary sentence and an original sentence, and generating an input sequence X, wherein the method specifically comprises the following steps: [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ];
step three: vectorizing the input sequence X, representing each word in the input text sequence by using a pre-trained word feature vector to obtain a word vector h of the input text0Expressed as:
ho∈R(n+2)×h
where h is the size of the hidden layer;
step four: the word vector h obtained in the third step0Training as input to the Transformer block of the L layerWord vector h of the syntactical sentenceiExpressed as:
step five: constructing a heuristic aspect extractor capable of extracting a plurality of aspects, which specifically comprises the following sub-steps:
step 5.1: word vector h based on fused sentence semanticsiIn the aspect of training data, a BERT model is trained by using a gradient descent method, and parameters are updated;
defining the probability p of the start and end positions of the corresponding aspect of each sentencesAnd peNamely:
gs=wshL,ps=softmax(gs)
ge=wehL,pe=softmax(ge)
wherein, ws∈Rh,we∈RhIs a weight vector to be trained, and softmax is a normalization function;
step 5.2: pre-marking the boundary of the target entity in the training data set to obtain a marking list T and obtain a starting vector ys∈R(n+2)And an end vector ye∈R(n+2)Wherein each elementIndicating whether the ith token is the start of a target aspect,indicating whether the ith token is the end of an aspect;
minimizing the sum of the negative logarithms of the probabilities of the real start position and the end position, and training a model by gradient descent:
step 5.3: redundancy invalidation occurs when multiple parties are extracted, for example:
original sentence: i like the food but the service was so awful!
Food, service in real aspect
And (3) prediction aspect: food, food but the service, service was so awful, service …
A heuristic multi-boundary algorithm is provided, and pseudo codes are as follows:
wherein, gsScore, g, representing the starting positioneScore representing the end position, γ is a hyperparameter, is a set minimum score threshold, and K is the maximum number in terms of a single sentence:
the algorithm mainly comprises the following steps:
a, initializing three sets R, U and O (line 1);
b, g obtained according to the trained weight vectors,geIn the method, the first M position index sets S and E with high scores are selected, wherein S isiSubscript, e, indicating the ith starting position in the set SjThe subscript indicating the jth end position in set E (line 2);
c at the end position not less than the start position andadd the values of(s) over γ (lines 3-8), the candidate boundary(s)i,ej) Add to set R (lines 7-8) and apply heuristic regularizationAdd to set U (line 6, line 8);
d, removing redundant boundaries in the set R by using a non-maximum suppression algorithm, namely when the set R is not an empty set and the set R is a setWhen the size of O is less than the value K (lines 9-14), the maximum value U in the set U at that time is removed from the set RiCorresponding boundary rlAnd combining the boundary rlAdd to set O (lines 10-11); when R is in the set RlOverlapping rkThen, i.e., checking whether there is overlap using a fine-grained F1 numerical measure, the corresponding bounds and values are removed from the set R and the set U (lines 12-14), i.e., redundant aspects are removed from the candidate aspects;
e, obtaining a boundary set O (line 15) of the starting position and the ending position corresponding to the multiple aspects, namely extracting the aspect words and ending;
step six: and splicing the extracted feature word vectors of the aspect words and the feature word vectors of the auxiliary sentences by utilizing self-attention operation to obtain auxiliary feature word vectors with semantic fusion, and recording the auxiliary feature word vectors as h'.
Step seven: predicting the emotion polarity of the corresponding aspect of the auxiliary feature word vector through a feedforward neural network, and the specific process is as follows:
step 7.1: the emotion polarity value is obtained by using the Tanh activation function between two linear transformations:
wherein, WpAnd Wh'Two parameter matrixes to be trained are provided;
the emotion polarity probability is normalized by a softmax function to a polarity value,
step 7.2: gradient training parameter matrix W using negative log probability that minimizes true emotion polarity prediction probabilityp,Wh'And a fine-tuning BERT model, expressed as:
wherein, ypFor true emotion polesThe linear one-hot represents a vector;
step 7.3: using formulas
After the BERT model is trained in the gradient descent, the emotion polarity probability of each auxiliary feature word vector h' is solved, wherein the emotion word corresponding to the value with the maximum probability is the emotion polarity of the corresponding aspect word.
Preferably, the data set selected in the first step includes three data sets, namely LAPTOP, REST and TWITTER.
Preferably, the content of the auxiliary sentence in the first step is: "positive or negative or neutral".
Preferably, the heuristic regularization number in the algorithm process of the step 5.3Representing the sum of the start position and end position values minus the facet length.
Preferably, the input sequence is a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.
Compared with the prior art, the invention has the beneficial effects that: mainly comprises the following aspects:
when the BERT model used by the invention is proposed by Google, the BERT model is pre-trained on a large number of text data sets, and compared with models such as CNN, RNN and LSTM, the method can reduce the steps of pre-training and the complex workload;
in a fine-grained emotion analysis task, compared with a step-by-step approach, a trained model is used for extracting a target entity, and then another model is used for judging the corresponding emotion polarity, so that the model training efficiency can be improved;
the performance of the BERT model on the task is more excellent than that of a single sentence task, and the characteristic of the BERT model can be utilized by adding an auxiliary sentence mode;
compared with a sequence method, experiments show that the method for marking the boundary has better performance, and a heuristic extractor is embedded by utilizing the method for marking the boundary, so that multiple target aspects can be output at one time, namely, one sentence contains multiple aspects.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a fine-grained emotion analysis method based on BERT and QA ideas comprises the following steps:
the method comprises the following steps: selecting a SemEval2014 year data set as a corpus, and adding auxiliary sentences to all text data;
step two: segmenting the processed data text, marking a sentence X by using an 30552 word vocabulary, connecting a [ CLS ] mark at the beginning of the sentence, adding an [ SEP ] mark between an auxiliary sentence and an original sentence, and generating an input sequence X, wherein the method specifically comprises the following steps: [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ];
step three: vectorizing the input sequence X, representing each word in the input text sequence by using a pre-trained word feature vector to obtain a word vector h of the input text0Expressed as:
ho∈R(n+2)×h
where h is the size of the hidden layer;
step four: the word vector h obtained in the third step0As the input of the Transformer block of the L layer, training to obtain a word vector h fusing sentence semanticsiExpressed as:
step five: constructing a heuristic aspect extractor capable of extracting a plurality of aspects, which specifically comprises the following sub-steps:
step 5.1: word vector h based on fused sentence semanticsiIn the aspect of training data, a BERT model is trained by using a gradient descent method, and parameters are updated;
defining the probability p of the start and end positions of the corresponding aspect of each sentencesAnd peNamely:
gs=wshL,ps=softmax(gs)
ge=wehL,pe=softmax(ge)
wherein, ws∈Rh,we∈RhIs a weight vector to be trained, and softmax is a normalization function;
step 5.2: pre-marking the boundary of the target entity in the training data set to obtain a marking list T and obtain a starting vector ys∈R(n+2)And an end vector ye∈R(n+2)Wherein each elementIndicating whether the ith token is the start of a target aspect,indicating whether the ith token is the end of an aspect;
minimizing the sum of the negative logarithms of the probabilities of the real start position and the end position, and training a model by gradient descent:
step 5.3: redundancy invalidation occurs when multiple parties are extracted, for example:
original sentence: i like the food but the service was so awful!
Food, service in real aspect
And (3) prediction aspect: food, food but the service, service was so awful, service …
A heuristic multi-boundary algorithm is provided, and pseudo codes are as follows:
wherein, gsScore, g, representing the starting positioneScore representing the end position, γ is a hyperparameter, is a set minimum score threshold, and K is the maximum number in terms of a single sentence:
the algorithm mainly comprises the following steps:
a, initializing three sets R, U and O (line 1);
b, g obtained according to the trained weight vectors,geIn the method, the first M position index sets S and E with high scores are selected, wherein S isiSubscript, e, indicating the ith starting position in the set SjThe subscript indicating the jth end position in set E (line 2);
c at the end position not less than the start position andadd the values of(s) over γ (lines 3-8), the candidate boundary(s)i,ej) Add to set R (lines 7-8) and apply heuristic regularizationAdd to set U (line 6, line 8);
d, removing redundant boundaries in the set R by using a non-maximum suppression algorithm, namely removing the maximum value U in the set U at the moment from the set R when the set R is not an empty set and the size of the set O is smaller than the value K (lines 9-14)iCorresponding boundary rlAnd combining the boundary rlAdd to set O (lines 10-11); when R is in the set RlOverlapping rkThen, i.e., checking whether there is overlap using a fine-grained F1 numerical measure, the corresponding bounds and values are removed from the set R and the set U (lines 12-14), i.e., redundant aspects are removed from the candidate aspects;
e, obtaining a boundary set O (line 15) of the starting position and the ending position corresponding to the multiple aspects, namely extracting the aspect words and ending;
step six: and splicing the extracted feature word vectors of the aspect words and the feature word vectors of the auxiliary sentences by utilizing self-attention operation to obtain auxiliary feature word vectors with semantic fusion, and recording the auxiliary feature word vectors as h'.
Step seven: predicting the emotion polarity of the corresponding aspect of the auxiliary feature word vector through a feedforward neural network, and the specific process is as follows:
step 7.1: the emotion polarity value is obtained by using the Tanh activation function between two linear transformations:
wherein, WpAnd Wh'Two parameter matrixes to be trained are provided;
the emotion polarity probability is normalized by a softmax function to a polarity value,
step 7.2: gradient training parameter matrix W using negative log probability that minimizes true emotion polarity prediction probabilityp,Wh'And a fine-tuning BERT model, expressed as:
wherein, ypOne-hot representing vector for true emotion polarity;
step 7.3: using formulas
After the BERT model is trained in the gradient descent, the emotion polarity probability of each auxiliary feature word vector h' is solved, wherein the emotion word corresponding to the value with the maximum probability is the emotion polarity of the corresponding aspect word.
Further, the data set selected in the first step includes three data sets, namely, LAPTOP, REST, and TWITTER.
Further, the content of the auxiliary sentence in the first step is as follows: "positive or negative or neutral".
Further, the heuristic regularization number in the algorithm process of the step 5.3Representing the sum of the start position and end position values minus the facet length.
Furthermore, the input sequence refers to a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.
The model experiment of the invention is to respectively compare and extract the experimental effects of two tasks of entity classification and emotion classification by using the same training set and test set under the same condition.
Wherein the F1 value is used as an index for evaluating and extracting entity tasks, and Accuracy (Accuracy rate) is used as an index for evaluating emotion analysis tasks.
Wherein TP represents that the model prediction is true and the data true value is true, namely, the model prediction is correctly accepted;
FP indicates that the model prediction is true but the data true value is false, i.e. false acceptance;
TN indicates that the model prediction is false but the data true value is false, i.e. correct rejection;
FN indicates that the model prediction is false but the data true value is true, i.e., false rejection.
In terms of the task of extracting entities, compared to the BERT model + CRF model, DE-CNN (the model which currently extracts entities and represents the best excellence), the F1 values are as follows:
on the LAPTOP dataset and TWITTER dataset, the model proposed by the present invention works best among the three models.
In terms of emotion classification task, Accuracy values are as follows compared to MGAN, TNet (the model that currently emotion classification represents the best show):
LAPtop | REST | ||
MGAN | 75.39 | - | - |
Tnet | 76.54 | - | - |
model of the invention | 83.37 | 88.43 | 76.27 |
The model proposed by the invention works best among the three models on the LAPTOP dataset, the REST dataset and the TWITTER dataset.
According to the invention, by utilizing the BERT model, the downstream task is changed into the QA task by constructing the auxiliary sentence, and the BERT model is finely adjusted, so that the effects on two subtasks of aspect extraction task and emotion classification are improved, a plurality of aspect words can be extracted, the model efficiency is improved, and the related redundant workload is reduced.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (5)
1. A fine-grained emotion analysis method based on BERT and QA ideas is characterized by comprising the following steps: the method comprises the following steps:
the method comprises the following steps: selecting a SemEval2014 year data set as a corpus, and adding auxiliary sentences to all text data;
step two: segmenting the processed data text, marking a sentence X by using an 30552 word vocabulary, connecting a [ CLS ] mark at the beginning of the sentence, adding an [ SEP ] mark between an auxiliary sentence and an original sentence, and generating an input sequence X, wherein the method specifically comprises the following steps: [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ];
step three: vectorizing the input sequence X, representing each word in the input text sequence by using a pre-trained word feature vector to obtain a word vector h of the input text0Expressed as:
ho∈R(n+2)×h
where h is the size of the hidden layer;
step four: the word vector h obtained in the third step0As the input of the Transformer block of the L layer, training to obtain a word vector h fusing sentence semanticsiExpressed as:
step five: constructing a heuristic aspect extractor capable of extracting a plurality of aspects, which specifically comprises the following sub-steps:
step 5.1: word vector h based on fused sentence semanticsiIn the aspect of training data, a BERT model is trained by using a gradient descent method, and parameters are updated;
defining the probability p of the start and end positions of the corresponding aspect of each sentencesAnd peNamely:
gs=wshL,ps=softmax(gs)
ge=wehL,pe=softmax(ge)
wherein, ws∈Rh,we∈RhIs a weight vector to be trained, and softmax is a normalization function;
step 5.2: pre-marking the boundary of the target entity in the training data set to obtain a marking list T and obtain a starting vector ys∈R(n+2)And an end vector ye∈R(n+2)Each of whichElement(s)Indicating whether the ith token is the start of a target aspect,indicating whether the ith token is the end of an aspect;
minimizing the sum of the negative logarithms of the probabilities of the real start position and the end position, and training a model by gradient descent:
step 5.3: redundancy invalidation occurs when multiple parties are extracted, for example:
original sentence: i like the food but the service was so awful!
Food, service in real aspect
And (3) prediction aspect: food, food but the service, service was so awful, service …
A heuristic multi-boundary algorithm is provided, and pseudo codes are as follows:
wherein, gsScore, g, representing the starting positioneScore representing the end position, γ is a hyperparameter, is a set minimum score threshold, and K is the maximum number in terms of a single sentence:
the algorithm mainly comprises the following steps:
a, initializing three sets R, U and O (line 1);
b, g obtained according to the trained weight vectors,geIn the method, the first M position index sets S and E with high scores are selected, wherein S isiPresentation setSubscript of ith start position in S, ejThe subscript indicating the jth end position in set E (line 2);
c at the end position not less than the start position andadd the values of(s) over γ (lines 3-8), the candidate boundary(s)i,ej) Add to set R (lines 7-8) and apply heuristic regularizationAdd to set U (line 6, line 8);
d, removing redundant boundaries in the set R by using a non-maximum suppression algorithm, namely removing the maximum value U in the set U at the moment from the set R when the set R is not an empty set and the size of the set O is smaller than the value K (lines 9-14)iCorresponding boundary rlAnd combining the boundary rlAdd to set O (lines 10-11); when R is in the set RlOverlapping rkThen, i.e., checking whether there is overlap using a fine-grained F1 numerical measure, the corresponding bounds and values are removed from the set R and the set U (lines 12-14), i.e., redundant aspects are removed from the candidate aspects;
e, obtaining a boundary set O (line 15) of the starting position and the ending position corresponding to the multiple aspects, namely extracting the aspect words and ending;
step six: and splicing the extracted feature word vectors of the aspect words and the feature word vectors of the auxiliary sentences by utilizing self-attention operation to obtain auxiliary feature word vectors with semantic fusion, and recording the auxiliary feature word vectors as h'.
Step seven: predicting the emotion polarity of the corresponding aspect of the auxiliary feature word vector through a feedforward neural network, and the specific process is as follows:
step 7.1: the emotion polarity value is obtained by using the Tanh activation function between two linear transformations:
wherein, WpAnd Wh'Two parameter matrixes to be trained are provided;
the emotion polarity probability is normalized by a softmax function to a polarity value,
step 7.2: gradient training parameter matrix W using negative log probability that minimizes true emotion polarity prediction probabilityp,Wh'And a fine-tuning BERT model, expressed as:
wherein, ypOne-hot representing vector for true emotion polarity;
step 7.3: using formulas
After the BERT model is trained in the gradient descent, the emotion polarity probability of each auxiliary feature word vector h' is solved, wherein the emotion word corresponding to the value with the maximum probability is the emotion polarity of the corresponding aspect word.
2. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the data set selected in the first step comprises three data sets of LAPTOP, REST and TWITTER.
3. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the content of the auxiliary sentence in the first step is as follows: "positive or negative or neutral".
5. The fine-grained emotion analysis method based on BERT and QA ideas of claim 1, wherein: the input sequence is a sequence formed by splicing linguistic data and auxiliary sentences through predefined symbols [ CLS ] and [ SEP ] after the linguistic data and the auxiliary sentences are segmented; the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136542.7A CN111339260A (en) | 2020-03-02 | 2020-03-02 | BERT and QA thought-based fine-grained emotion analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136542.7A CN111339260A (en) | 2020-03-02 | 2020-03-02 | BERT and QA thought-based fine-grained emotion analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111339260A true CN111339260A (en) | 2020-06-26 |
Family
ID=71184644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010136542.7A Withdrawn CN111339260A (en) | 2020-03-02 | 2020-03-02 | BERT and QA thought-based fine-grained emotion analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111339260A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966827A (en) * | 2020-07-24 | 2020-11-20 | 大连理工大学 | Conversation emotion analysis method based on heterogeneous bipartite graph |
CN112860889A (en) * | 2021-01-29 | 2021-05-28 | 太原理工大学 | BERT-based multi-label classification method |
CN113204616A (en) * | 2021-04-30 | 2021-08-03 | 北京百度网讯科技有限公司 | Method and device for training text extraction model and extracting text |
CN113377910A (en) * | 2021-06-09 | 2021-09-10 | 平安科技(深圳)有限公司 | Emotion evaluation method and device, electronic equipment and storage medium |
CN113901171A (en) * | 2021-09-06 | 2022-01-07 | 特赞(上海)信息科技有限公司 | Semantic emotion analysis method and device |
CN114332544A (en) * | 2022-03-14 | 2022-04-12 | 之江实验室 | Image block scoring-based fine-grained image classification method and device |
CN114896365A (en) * | 2022-04-27 | 2022-08-12 | 马上消费金融股份有限公司 | Model training method, emotional tendency prediction method and device |
CN114896987A (en) * | 2022-06-24 | 2022-08-12 | 浙江君同智能科技有限责任公司 | Fine-grained emotion analysis method and device based on semi-supervised pre-training model |
-
2020
- 2020-03-02 CN CN202010136542.7A patent/CN111339260A/en not_active Withdrawn
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966827A (en) * | 2020-07-24 | 2020-11-20 | 大连理工大学 | Conversation emotion analysis method based on heterogeneous bipartite graph |
CN111966827B (en) * | 2020-07-24 | 2024-06-11 | 大连理工大学 | Dialogue emotion analysis method based on heterogeneous bipartite graph |
CN112860889A (en) * | 2021-01-29 | 2021-05-28 | 太原理工大学 | BERT-based multi-label classification method |
CN113204616A (en) * | 2021-04-30 | 2021-08-03 | 北京百度网讯科技有限公司 | Method and device for training text extraction model and extracting text |
CN113204616B (en) * | 2021-04-30 | 2023-11-24 | 北京百度网讯科技有限公司 | Training of text extraction model and text extraction method and device |
CN113377910A (en) * | 2021-06-09 | 2021-09-10 | 平安科技(深圳)有限公司 | Emotion evaluation method and device, electronic equipment and storage medium |
CN113901171A (en) * | 2021-09-06 | 2022-01-07 | 特赞(上海)信息科技有限公司 | Semantic emotion analysis method and device |
CN114332544A (en) * | 2022-03-14 | 2022-04-12 | 之江实验室 | Image block scoring-based fine-grained image classification method and device |
CN114332544B (en) * | 2022-03-14 | 2022-06-07 | 之江实验室 | Image block scoring-based fine-grained image classification method and device |
CN114896365A (en) * | 2022-04-27 | 2022-08-12 | 马上消费金融股份有限公司 | Model training method, emotional tendency prediction method and device |
CN114896987A (en) * | 2022-06-24 | 2022-08-12 | 浙江君同智能科技有限责任公司 | Fine-grained emotion analysis method and device based on semi-supervised pre-training model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CN108446271B (en) | Text emotion analysis method of convolutional neural network based on Chinese character component characteristics | |
US20190287142A1 (en) | Method, apparatus for evaluating review, device and storage medium | |
CN111753082A (en) | Text classification method and device based on comment data, equipment and medium | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN112287100A (en) | Text recognition method, spelling error correction method and voice recognition method | |
CN116561592B (en) | Training method of text emotion recognition model, text emotion recognition method and device | |
CN112528658A (en) | Hierarchical classification method and device, electronic equipment and storage medium | |
CN110888983A (en) | Positive and negative emotion analysis method, terminal device and storage medium | |
CN112926308A (en) | Method, apparatus, device, storage medium and program product for matching text | |
CN115062718A (en) | Language model training method and device, electronic equipment and storage medium | |
CN110826315B (en) | Method for identifying timeliness of short text by using neural network system | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
El-Alfy et al. | Empirical study on imbalanced learning of Arabic sentiment polarity with neural word embedding | |
CN117911079A (en) | Personalized merchant marketing intelligent recommendation method and system | |
CN113761910A (en) | Comment text fine-grained emotion analysis method integrating emotional characteristics | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
Susmitha et al. | Sentimental Analysis on Twitter Data using Supervised Algorithms | |
CN115906824A (en) | Text fine-grained emotion analysis method, system, medium and computing equipment | |
CN107729509B (en) | Discourse similarity determination method based on recessive high-dimensional distributed feature representation | |
CN115577109A (en) | Text classification method and device, electronic equipment and storage medium | |
CN115659990A (en) | Tobacco emotion analysis method, device and medium | |
CN113051396B (en) | Classification recognition method and device for documents and electronic equipment | |
CN115618875A (en) | Public opinion scoring method, system and storage medium based on named entity recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200626 |