CN115048936A - Method for extracting aspect-level emotion triple fused with part-of-speech information - Google Patents

Method for extracting aspect-level emotion triple fused with part-of-speech information Download PDF

Info

Publication number
CN115048936A
CN115048936A CN202210633972.9A CN202210633972A CN115048936A CN 115048936 A CN115048936 A CN 115048936A CN 202210633972 A CN202210633972 A CN 202210633972A CN 115048936 A CN115048936 A CN 115048936A
Authority
CN
China
Prior art keywords
words
word
emotion
viewpoint
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210633972.9A
Other languages
Chinese (zh)
Inventor
相艳
柳如熙
陆婷
郭军军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202210633972.9A priority Critical patent/CN115048936A/en
Publication of CN115048936A publication Critical patent/CN115048936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an aspect-level emotion triple extraction method fusing part of speech information, and belongs to the field of emotion analysis. The method fuses part-of-speech information in the model input stage, and deepens the understanding of the model to the self semantics of the data; then, an aspect-level sentiment triple extraction task is efficiently completed in a multi-task learning mode, wherein two annotators are used for completing extraction of aspect words and viewpoint words, and a Biofine sentiment dependency analyzer is used for fully learning interactive information between the aspect words and the viewpoint words so as to judge sentiment polarity corresponding to the aspect words; and finally, decoding the triple to obtain the span representation of the triple. Experiments are carried out on four reference data sets disclosed by SemEval, and the experimental results show that the performance of the method in the aspect-level emotion triple extraction task is superior to that of other baseline models, so that the fact that the method for fusing part of speech information is really helpful for interactive modeling is verified, and the performance of the models is finally improved.

Description

Method for extracting aspect-level emotion triple fused with part-of-speech information
Technical Field
The invention relates to an aspect-level emotion triple extraction method fusing part of speech information, and belongs to the technical field of emotion analysis.
Background
Aspect-level emotion triple extraction (ASTE) is a recently proposed ABSA subtask aiming at extracting emotion triples (aspect words, viewpoint words, emotion polarities) in an input sentence. To address this task, Peng et al propose a two-stage pipelined approach, where the first stage uses two sequence annotators, one to extract aspect words-emotion polarities jointly, and the other to extract viewpoint words, and then the second stage uses position information between the aspect words and viewpoint words to link them together to determine their emotion polarity pairing relationship, thereby implementing emotion triple extraction. Xu et al have adopted a novel location-aware tagging scheme that can specify the structural information of triples, enabling the connection between the three elements through tag semantics. Xu et al, considering that the aspect and point of view words may be composed of multiple words, propose a Span-based ASTE model (Span ASTE) that directly captures Span-to-Span interactions when predicting the emotional relationships of aspect and point of view word pairs for the first time. These methods all achieve considerable effects on the ASTE task, but still have the following disadvantages: the feature representation does not contain part-of-speech information and does not consider guiding interactive modeling of the aspect words, the viewpoint words and the emotion polarities by the part-of-speech information, so that the semantic relation among the three cannot be deeply understood by the model and the model only stays in shallow learning.
Aiming at the problems, the invention provides a multitask learning framework OTE-MTL-POS for extracting emotion triple combined with part of speech information. On the basis of four ABSA reference data disclosed by SemEval, the method disclosed by the invention is compared with a series of baseline methods, and experimental results prove the effectiveness of the method disclosed by the invention on an emotion triple extraction task.
Disclosure of Invention
The invention provides a method for extracting aspect-level emotion triples fusing part-of-speech information, which is used for solving the problems that part-of-speech information is not contained in a feature representation stage in the prior relevant research work and the interactive modeling of aspect words, viewpoint words and emotion polarities is not guided by the part-of-speech information.
The technical scheme of the invention is as follows: the method for extracting the aspect-level emotion triple fused with the part of speech information comprises the following specific steps:
step1, fusing the word vector representation based on the text and the corresponding part of speech vector representation;
step2, acquiring fusion characteristic representation containing part-of-speech information;
step3, inputting the fusion feature representation into the multi-task learning network prediction triples, namely, taking part-of-speech information as guidance to complete the extraction of the aspect words, the extraction of the viewpoint words and the analysis of the emotion polarity in parallel;
and Step4, decoding the triples, and generating a triplet span representation according to a heuristic rule.
As a further scheme of the invention, in Step1, each word in the input sentence is mapped to a low-dimensional vector space by using a pre-training model GloVe to obtain corresponding word vector representation
Figure BDA0003681215670000021
Where d is the dimension of the word vector; meanwhile, in the input stage of the model, part-of-speech information is explicitly embedded into word vectors to generate text representation fused with syntax and semantics, so that the triple extraction task benefits from the triple extraction task, specifically, for an input sequence
Figure BDA0003681215670000022
Firstly, a part-of-speech tag is labeled for each word in the word by using an NLTK part-of-speech tagger pos _ tag, wherein the part-of-speech tag is shown in a table 1; then, each part-of-speech tag is assigned a part-of-speech vector
Figure BDA0003681215670000023
d is the dimension of the part of speech vector, which is consistent with the dimension of the word vector; finally, part-of-speech vectors for each word
Figure BDA0003681215670000024
And word vector
Figure BDA0003681215670000025
Splicing to obtain vector representation f of fusion features i The calculation process is shown in formula (1).
Figure BDA0003681215670000026
Wherein the content of the first and second substances,
Figure BDA0003681215670000027
Figure BDA0003681215670000028
representing a concatenation operation of the vectors.
TABLE 1 part-of-speech tagging example
Figure BDA0003681215670000029
Syntax is a basic feature of a sentence, and part of speech is a basic syntax. Part of speech specifies the sequence of words in a sentence, e.g., a noun followed by a verb to represent an action and a noun followed by an adjective to modify. In order to solve the problem that the word property information is not introduced in the previous emotion triple extraction work, so that the model performance is limited due to the lack of deep semantics, the word property information is explicitly embedded into word vectors to generate fusion feature vector representation, and the comprehension of the semantics of the model to the text is deepened. The operation of fusing part of speech information not only gives full play to the data advantages that most of the aspect words in the comment text belong to nouns and most of the viewpoint words belong to adjectives, but also is directly beneficial to the extraction of aspect-level emotion triples. The operation also guides modeling by using part-of-speech information, fully considers semantic information and part-of-speech information of the characteristics, and indirectly assists the completion of the emotion triple task.
The part-of-speech tagger pos _ tag in the NLTK toolkit is essentially a part-of-speech classification neural network model trained based on a rich training corpus. The advantages are that: (1) inputting an emotion triple task consistent with the text, and taking a sentence as a unit; (2) the marking method is better and comprises a maximum entropy marker, a regular expression marker, an unigram marker, an n-gram marker and other various markers. These markers may be used in combination using the backoff technique; (3) the universal part-of-speech tag set comprises 36 common part-of-speech tags such as nouns, verbs, adjectives and adverbs; (4) based on an object-oriented script language, the method supports rapid prototyping and word programming and is easy to learn and use. In view of the above, the present invention automatically labels the text using a pos _ tag part of speech tagger, obtaining part of speech tags corresponding to each word.
As a further embodiment of the present invention, in Step2, a vector representation f of the fused features is obtained i Thereafter, a representation containing context information is obtained using a Bi-LSTM network
Figure BDA0003681215670000031
The following were used:
Figure BDA0003681215670000032
wherein d is h The dimensions of the layer vectors are hidden for the unidirectional LSTM network,
Figure BDA0003681215670000033
respectively represent forward LSTM and backward LSTM,
Figure BDA0003681215670000034
a splicing operation of the expression vectors;
then, through a linear layer and a non-linear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension reduction is realized; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may cull features that are unrelated to aspect words and viewpoint words. The specific calculation process is as follows:
Figure BDA0003681215670000035
Figure BDA0003681215670000036
wherein r is i ap ,
Figure BDA0003681215670000037
Feature vector representations of the aspect words and the point of view words, respectively, d r Is the vector dimension.
Figure BDA0003681215670000038
And
Figure BDA0003681215670000039
respectively, learnable weights and biases. Furthermore, g (-) is a non-linear function ReLU (-) i.e. max (-) 0.
It should be noted that the above vector representations are all in preparation for the extraction of triples. A group of feature expressions r for emotion polarity analysis can be obtained in the same way as the expressions (3) and (4) i ap′ ,
Figure BDA00036812156700000310
The calculation process is consistent with the expressions (3) and (4), and only the learning parameters are different.
As a further aspect of the present invention, in Step3, the multitask learning network architecture includes two parts: extracting aspect words and viewpoint words, and analyzing word-level emotion dependence;
when extracting the aspect words and the viewpoint words, the network model adopts a { B, I, O } marking scheme, wherein B represents the initial position of the aspect words or the viewpoint words, I represents the middle position, and O represents the others; and (3) representing the feature vectors of the aspect words and the viewpoint words into an input multitask learning network model, wherein the probability distribution of each word in the model prediction sentence, which is the aspect word, is shown as a formula (5), and the probability distribution of the viewpoint words is shown as a formula (6).
Figure BDA00036812156700000311
Figure BDA00036812156700000312
Wherein, W t ap ,
Figure BDA00036812156700000313
Is a weight matrix,
Figure BDA00036812156700000314
For bias, both are training parameters.
When extracting the aspect words and the viewpoint words, the model is jointly trained by using cross entropy loss, and the loss function comprises two parts, namely the aspect word extraction loss and the viewpoint word extraction loss, as follows:
Figure BDA0003681215670000041
wherein S is the total number of samples, and k is the number of class iterations.
Figure BDA0003681215670000042
Respectively represent the real label probability distribution of the aspect words and the viewpoint words,
Figure BDA0003681215670000043
respectively representing the probability distribution of the predictive labels of the aspect words and the viewpoint words.
For a sentence with length N, N is total 2 For freely combined word pairs (including self-pairing), the model of the invention analyzes emotion dependence for each word pair. The emotion dependency category set is defined as { NEU, NEG, POS, NO-DEP }, wherein NEU, NEG, POS respectively represent neutral, negative and positive directions, and NO-DEP represents NO emotion dependency relationship. Here, we only consider the emotional dependency of a word pairI.e., the last word of the facet word and the last word of the viewpoint word. For example, in the sentence "Great bag, start up speed," a triple is referred to as (start up speed, Great, POS), the triple is simplified to (speed, Great, POS) in the emotion dependency parsing process, and the parsing result is shown in fig. 2. In this way, the learning redundancy of the parser will be greatly reduced and the emotional dependence at word level is still available when combined with the extracted aspect word span and viewpoint word span.
In the process of word-level emotion dependency analysis, a biaffine scorer with better performance in syntax dependency analysis is used for capturing the interaction relation of two words in each word pair; the emotional dependency score for each word pair is as follows:
Figure BDA0003681215670000044
wherein the content of the first and second substances,
Figure BDA0003681215670000045
represents a word pair (w) i ,w j ) Score of emotional dependency relationship belonging to category k, W k And b k Respectively training parameter matrix and bias, r i ap′ ,
Figure BDA0003681215670000046
The feature representation of the aspect word and the viewpoint word calculated in Step 2. Furthermore, with S i,j,k To express a score
Figure BDA0003681215670000047
Normalized score after softmax.
Figure BDA0003681215670000048
As can be seen from the factorization of equation (8), conceptually, the biaffine scorer can not only calculate w i Receive w j Probability of belonging to a particular dependency class (first term), and w may also be j The probability of belonging to the dependency class is taken as the prior probability (second term). In fact, the biaffine scorer is the operation of one biaffine transformation and one matrix multiplication.
Subsequently, a word-level emotion dependency parser is trained. This stage uses the cross-entropy function as a loss function, as follows:
Figure BDA0003681215670000049
wherein | N | 2 Is the total number of word pairs,
Figure BDA00036812156700000410
for each word pair (w) i ,w j ) True dependency class of S i,j,k A biaffine score.
Then carrying out model training;
finally, the joint training loss function of the multitask learning framework is as follows:
Figure BDA0003681215670000051
where α is a balance coefficient for balancing the learning between the model execution extraction task and the emotion dependency analysis. Theta is a training parameter, | theta | | non-woven phosphor 2 And gamma is L of the parameter theta 2 Regularization and control coefficients.
In a further aspect of the present invention, in Step4, after obtaining the aspect words, viewpoint words and word-level emotion dependency relationships extracted in Step3, the present invention performs emotion triple decoding using a heuristic rule to obtain triples represented by spans. Specifically, the emotion dependency relationship generated by the biaffine scorer is used as a pivot, and reverse order traversal is performed on the labels generated by the face word and the viewpoint word annotator.
For example, from the input sequence "Great battle, start up speed", it can be obtained that the aspect word label is { O, B, I, O }, the viewpoint word label is { B, O }, and the word-level emotion dependency is POS. The index-type emotion triple can be represented as (6, 1, POS), where 6 is the index of the last word of the aspect word (speed), 1 is the index of the last word of the viewpoint word (Great), and the emotion polarity obtained by the two analyses is Positive (POS). The reverse order traversal takes the index of the aspect word and the viewpoint word as a pivot, and the mark of the traversal word sequence is a label I. The final output emotion triplets are [ (4,6), (1,1), POS ].
The invention has the beneficial effects that:
1. the method of the invention embeds the part of speech information into the word vector in an explicit way, generates the expression of the fusion characteristic vector, utilizes the data characteristics that most of the aspect words belong to nouns and most of the viewpoint words belong to adjectives or adverbs, and deepens the understanding of the model to the self semantics of the text;
2. according to the method, a multi-task learning mode is adopted to process the target task, the sentiment dependence between the aspect words and the viewpoint words is analyzed by using the analyzer while the aspect words and the viewpoint words are independently extracted, and the requirement of interactive modeling between three elements in the triple is met;
3. firstly, using a part-of-speech tagging tool pos _ tag in an NLTK tool kit to tag the part of speech of an input sequence, acquiring corresponding part-of-speech vector representation, and fusing the part-of-speech vector and a word vector to obtain fused feature vector representation; then, using LSTM network to encode sentences, and integrating sentence context information into feature representation; moreover, while the extraction of the aspect words and the viewpoint words is respectively finished by using two independent markers, a biaffine emotion dependence analyzer is used for predicting word-level emotion dependence to finish the extraction of emotion triples; and finally, carrying out emotion triple decoding to obtain the span representation of the emotion triples.
4. On the basis of four ABSA reference data disclosed by SemEval, the method disclosed by the invention is compared with a series of baseline methods, and experimental results prove the effectiveness of the method disclosed by the invention on an emotion triple extraction task.
Drawings
FIG. 1 is an aspect-level emotion triple extraction model framework fused with part-of-speech information according to the present invention;
FIG. 2 is an example of a word-level emotion dependency parsing result of the present invention;
FIG. 3 illustrates triple categories in accordance with the present invention; where light spans represent facet words and dark spans represent viewpoint words. The arrows indicate emotional polar dependencies, always pointing from the aspect words to the viewpoint words.
Fig. 4 shows the experimental results (F1 value) of the influence of the part-of-speech information on the extraction of the side words according to the present invention.
Fig. 5 shows the experimental results (F1 value) of the influence of the part-of-speech information on the extraction of the viewpoint words according to the present invention.
FIG. 6 is a diagram of the composition of the false negative and false positive data on the Rest14 data set in accordance with the present invention.
Detailed Description
Example 1: as shown in fig. 1 to 6, the method for extracting the aspect-level emotion triple fused with part-of-speech information specifically includes the following steps:
step1, text-based word vector representation and corresponding part-of-speech vector representation are fused;
step2, acquiring fusion characteristic representation containing part-of-speech information;
step3, inputting the fusion feature representation into the multi-task learning network prediction triple, namely, taking part-of-speech information as guidance to finish the extraction of the aspect words, the extraction of the viewpoint words and the analysis of emotion polarity in parallel;
and Step4, decoding the triples, and generating a triplet span representation according to a heuristic rule.
As a further scheme of the invention, in Step1, each word in the input sentence is mapped to a low-dimensional vector space by using a pre-training model GloVe to obtain corresponding word vector representation
Figure BDA0003681215670000061
Where d is the dimension of the word vector; meanwhile, in the input stage of the model, part-of-speech information is explicitly embedded into word vectors to generate text representation fused with syntax and semantics, so that the triple extraction task benefits from the triple extraction task, specifically, for an input sequence
Figure BDA0003681215670000062
Firstly, a part-of-speech tag is labeled for each word in the word by using an NLTK part-of-speech tagger pos _ tag, wherein the part-of-speech tag is shown in a table 1; then, each part-of-speech tag is assigned a part-of-speech vector
Figure BDA0003681215670000063
d is the dimension of the part of speech vector, which is consistent with the dimension of the word vector; finally, part-of-speech vectors for each word
Figure BDA0003681215670000064
And word vector
Figure BDA0003681215670000065
Vector representation f of fusion features obtained by splicing i The calculation process is shown in formula (1).
Figure BDA0003681215670000066
Wherein the content of the first and second substances,
Figure BDA0003681215670000067
Figure BDA0003681215670000068
representing a concatenation operation of the vectors.
TABLE 1 part-of-speech tagging example
Figure BDA0003681215670000069
Syntax is a basic feature of a sentence, and part of speech is a basic syntax. Part of speech specifies the sequence of words in a sentence, e.g., a noun followed by a verb to represent an action and a noun followed by an adjective to modify. In order to solve the problem that the word property information is not introduced in the previous emotion triple extraction work, so that the deep semantics is lacked, and the model performance is limited, the word property information is explicitly embedded into word vectors to generate fusion feature vector representation, and the understanding of the model to the semantics of the text is deepened. The operation of fusing part of speech information not only gives full play to the data advantages that most of the aspect words in the comment text belong to nouns and most of the viewpoint words belong to adjectives, but also is directly beneficial to the extraction of aspect-level emotion triples. The operation also guides modeling by using part-of-speech information, fully considers semantic information and part-of-speech information of the characteristics, and indirectly assists the completion of the emotion triple task.
The part-of-speech tagger pos _ tag in the NLTK toolkit is essentially a part-of-speech classification neural network model trained based on a rich training corpus. The advantages are that: (1) inputting an emotion triple task consistent with the text, and taking a sentence as a unit; (2) the marking method is better and comprises a maximum entropy marker, a regular expression marker, an unigram marker, an n-gram marker and other various markers. These markers may be used in combination using the backoff technique; (3) the universal part-of-speech tag set comprises 36 common part-of-speech tags such as nouns, verbs, adjectives and adverbs; (4) based on an object-oriented script language, the method supports rapid prototyping and word programming and is easy to learn and use. In view of the above, the present invention automatically labels the text using a pos _ tag part of speech tagger, obtaining part of speech tags corresponding to each word.
As a further aspect of the present invention, Step2 provides a vector representation f of the fusion features i Thereafter, a representation containing context information is obtained using a Bi-LSTM network
Figure BDA0003681215670000071
The following were used:
Figure BDA0003681215670000072
wherein d is h The dimensions of the layer vectors are hidden for the unidirectional LSTM network,
Figure BDA0003681215670000073
respectively represent forward LSTM and backward LSTM,
Figure BDA0003681215670000074
stitching operation for representing vectors;
Then, through a linear layer and a nonlinear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension is reduced; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may eliminate features that are unrelated to the aspect words and the point words. The specific calculation process is as follows:
Figure BDA0003681215670000075
Figure BDA0003681215670000076
wherein r is i ap ,
Figure BDA0003681215670000077
Feature vector representations of the aspect words and the point of view words, respectively, d r Is the vector dimension.
Figure BDA0003681215670000078
And
Figure BDA0003681215670000079
respectively, learnable weights and biases. Furthermore, g (-) is a non-linear function ReLU (-) i.e. max (-) 0.
It should be noted that the above vector representations are all in preparation for the extraction of triples. A group of feature expressions r for emotion polarity analysis can be obtained in the same way as the expressions (3) and (4) i ap′ ,
Figure BDA0003681215670000081
The calculation process is consistent with the expressions (3) and (4), and only the learning parameters are different.
As a further aspect of the present invention, in Step3, the multitask learning network architecture includes two parts: extracting aspect words and viewpoint words, and analyzing word-level emotion dependence;
when extracting the aspect words and the viewpoint words, the network model adopts a { B, I, O } marking scheme, wherein B represents the initial position of the aspect words or the viewpoint words, I represents the middle position, and O represents the others; and (3) representing the feature vectors of the aspect words and the viewpoint words into an input multitask learning network model, wherein the probability distribution of each word in the model prediction sentence, which is the aspect word, is shown as a formula (5), and the probability distribution of the viewpoint words is shown as a formula (6).
Figure BDA0003681215670000082
Figure BDA0003681215670000083
Wherein, W t ap ,
Figure BDA0003681215670000084
Is a weight matrix,
Figure BDA0003681215670000085
For bias, both are training parameters.
When extracting the aspect words and the viewpoint words, the model is jointly trained by using cross entropy loss, and the loss function comprises two parts, namely the aspect word extraction loss and the viewpoint word extraction loss, as follows:
Figure BDA0003681215670000086
wherein S is the total number of samples, and k is the number of class iterations.
Figure BDA0003681215670000087
Respectively represent the real label probability distribution of the aspect words and the viewpoint words,
Figure BDA0003681215670000088
respectively representing an aspect word and a point of view wordAnd predicting the probability distribution of the label.
For a sentence with length N, N is total 2 For freely combined word pairs (including self-pairing), the model of the invention analyzes emotion dependence for each word pair. The emotion dependency category set is defined as { NEU, NEG, POS and NO-DEP }, wherein NEU, NEG and POS respectively represent neutral, negative and positive directions, and NO-DEP represents NO emotion dependency relationship. Here, we only consider the emotional dependency of a word pair, i.e. the last word of the aspect word and the last word of the viewpoint word. For example, in the sentence "Great bag, start up speed," a triple is referred to as (start up speed, Great, POS), the triple is simplified to (speed, Great, POS) in the emotion dependency parsing process, and the parsing result is shown in fig. 2. In this way, the learning redundancy of the parser will be greatly reduced and the emotional dependence at word level is still available when combined with the extracted aspect word span and viewpoint word span.
In the process of word-level emotion dependency analysis, a biaffine scorer with better performance in syntax dependency analysis is used for capturing the interaction relation of two words in each word pair; the emotional dependency score for each word pair is as follows:
Figure BDA0003681215670000089
wherein the content of the first and second substances,
Figure BDA0003681215670000091
represents a word pair (w) i ,w j ) Score of emotional dependency relationship belonging to category k, W k And b k Respectively training parameter matrix and bias, r i ap′ ,
Figure BDA0003681215670000092
The feature representation of the aspect word and the viewpoint word calculated in Step 2. Furthermore, with S i,j,k To express a score
Figure BDA0003681215670000093
After passing through softmnaxIs normalized to the score of (a).
Figure BDA0003681215670000094
As can be seen from the factorization of equation (8), conceptually, the biaffine scorer can not only calculate w i Receive w j Probability of belonging to a particular dependency class (first term), but also w j The probability of belonging to the dependency class is taken as a prior probability (second term). In essence, the biaffine scorer is the operation of one biaffine transformation and one matrix multiplication.
Subsequently, a word-level emotion dependency parser is trained. This stage uses the cross-entropy function as a loss function, as follows:
Figure BDA0003681215670000095
wherein | N | 2 Is the total number of word pairs,
Figure BDA0003681215670000096
for each word pair (w) i ,w j ) True dependency class of S i,j,k A biaffine score.
Then carrying out model training;
finally, the joint training loss function of the multitask learning framework is as follows:
Figure BDA0003681215670000097
where α is a balance coefficient for balancing the learning between the model execution extraction task and the emotion dependency analysis. Theta is a training parameter, | theta | | non-woven phosphor 2 And gamma is L of the parameter theta 2 Regularization and control coefficients.
In a further aspect of the present invention, in Step4, after obtaining the aspect words, viewpoint words and word-level emotion dependency relationships extracted in Step3, the present invention performs emotion triple decoding using a heuristic rule to obtain triples represented by spans. Specifically, the emotion dependency relationship generated by the biaffine scorer is used as a pivot, and reverse order traversal is performed on the labels generated by the face word and the viewpoint word annotator.
For example, from the input sequence "Great battle, start up speed", it can be obtained that the aspect word label is { O, B, I, O }, the viewpoint word label is { B, O }, and the word-level emotion dependency is POS. The index-type emotion triple can be represented as (6, 1, POS), where 6 is the index of the last word of the aspect word (speed), 1 is the index of the last word of the viewpoint word (Great), and the emotion polarity obtained by the two analyses is Positive (POS). The reverse order traversal takes the aspect word and the viewpoint word index as pivot, and the mark of the traversal word sequence is a label I. The final output emotion triplets are [ (4,6), (1,1), POS ]. The detailed procedure of the algorithm is as follows.
Table 2 shows the detailed process of the algorithm
Figure BDA0003681215670000098
Figure BDA0003681215670000101
Aiming at the fact that the emotion triple extraction method fusing part-of-speech information provided by the invention is verified on 4 public data sets, the method specifically comprises the following steps:
(1): the experimental data of the invention adopts Laptop14, Rest14, Rest15 and Rest 16. The Laptop14 data set is a Laptop data set disclosed by SemEval Challenge 2014Task4, and the Rest14, the Rest15 and the Rest16 data sets are Restaurant data sets disclosed by SemEval Challenge 2014Task4, SemEval Challenge 2015Task 12 and SemEval Challenge 2016Task 5 respectively. Each data set is divided into three subsets, namely a training set, a verification set and a test set, and the division ratio is 3:1: 1. Furthermore, the sequence overlap indicates the number of sentences including triple of overlapping viewpoint words, the triple overlap indicates the number of triple of overlapping viewpoint words in all the triplets, and the viewpoint word overlap phenomenon indicates that a plurality of aspect words correspond to the same viewpoint word, as shown in fig. 3. Because the percentage of the triples with overlapped viewpoint words in the triples is 24%, the experimental result is influenced to a certain extent, and therefore the data used in the experiment comprises the triples with overlapped viewpoint words.
(2): the experimental environment of the invention is a deep learning framework Python 3.7.3 Pythoch 1.2.0. The invention adopts a GloVe pre-training model to initialize the word vector of the data set, and the dimension of the word vector is set to be d equal to 300. The invention adopts a part of speech annotator pos _ tag provided in an NLTK package to mark part of speech, and the dimensionality of a part of speech vector is 300. LSTM hidden layer vector dimension set to d h 600. The characteristic dimension after dimension reduction is d r 100. In the target function, alpha is 1, L 2 Regularization coefficient γ of 10 -5 . During the model training process, the learning rate is 10 -3 Batch _ size is 32, dropout strategy is adopted to prevent overfitting, and dropout is 0.5. Initializing all parameters by adopting uniform distribution, optimizing the parameters by using an Adam optimizer, initializing and training 10 models respectively in an experiment, and obtaining a final experiment result which is an average value of 10 model test results. In addition, when the model performance on the verification set is not further improved for 5 times continuously, the model stops training.
(3): the baseline model selected by the invention comprises two types: one class is the classical model of aspect-level triple extraction, and the second class is the model based on multi-task learning. The method comprises the following specific steps:
rinnate +: the model mines the extraction rules of the aspect words and the viewpoint words based on the dependency relationship of the words in the sentence, so as to jointly extract the aspect words-viewpoint words.
CMLA +: the model utilizes an attention mechanism to interactively learn the relationships between the aspect words and the viewpoint words to achieve a joint extraction of aspect words-viewpoint words.
Unifield +: the model is a classical aspect word-emotion polarity combined extraction model, and relates to two stacked RNNs: one is used for predicting the joint label and completing the joint extraction task; and the other is used for predicting the boundary of the facet words and assisting the joint extraction.
Pipeline: the model decomposes the triplet extraction into two stages: the first stage is used for predicting the joint label of the aspect word-emotion polarity and the label of the viewpoint word, and the second stage is used for pairing the two results of the first stage.
CMLA-MTL: the model is a multi-task learning model, and is obtained by CMLA expansion by Zhang et al and applied to an aspect-level emotion triple extraction task.
HAST-MTL: the model is obtained by expanding HAST by Zhang et al and is applied to an aspect-level emotion triple extraction task.
OTE-MTL: the model jointly extracts aspect words and viewpoint words by using a multi-task learning method, and analyzes emotion dependency relations between the aspect words and the viewpoint words by using a double affine scorer (biaffine scorer), so that the extraction of aspect-level emotion triples is realized.
The aspect-level emotion triple extraction model POS-OTE-MTL fused with part of speech information provided by the invention is compared with the baseline model, and the experimental result is shown in Table 3.
Table 3 comparative experiment results (%). The mark is the experimental result obtained on the data set without the concept word overlap statement, and the optimal experimental result is shown in bold.
Figure BDA0003681215670000111
Figure BDA0003681215670000121
As can be seen from Table 3, the performance of OTE-MTL was superior to OTE-MTL except for the Laptop14 data set. This shows that the existence of the viewpoint word overlapping data indeed hinders the further improvement of the model performance, and challenges are brought to the aspect-level emotion triple extraction task. In addition, the CMLA + model performed poorly compared to the CMLA-MTL model. The method has the advantages that the adverse effect of the viewpoint word overlapping data on the model can be weakened to a certain extent by decoupling the two tasks of the aspect word extraction and the emotion polarity prediction and processing the two tasks in a multi-task learning mode. Whether CMLA +, Unifie +, or Pipeline, joint labels are used to couple and process the facet extraction task and the emotion polarity classification task simultaneously. Such a processing manner is inherently simple and efficient, but there is a problem that the aspect word extraction task and the emotion polarity classification task are associated by a label, and the semantic dependency between the aspect words and the viewpoint words is not obtained, that is, the phenomenon of overlapping of the viewpoint words is ignored. The problem can be avoided by using a multi-task processing mode, the aspect word extraction task and the emotion polarity classification task are processed respectively in parallel, and when the aspect words are matched with the emotion polarities, the semantic relation between the aspect words and the viewpoint words can be acquired, so that the emotion polarity classification accuracy is improved, and a better effect can be achieved. Moreover, compared with the CMLA-MTL model, the POS-OTE-MTL model provided by the invention has higher F1 values on all data sets, which are respectively improved by 14.88%, 12.16%, 12.60% and 11.25%. The reasonableness and superiority of processing viewpoint word overlapping data by using the model of the invention are verified.
In summary, as can be seen from table 3, compared with all baseline models, the model POS-OTE-MTL provided by the present invention achieves the highest F1 value on all three data sets of Rest15, Rest16, and Laptop14, and verifies the effectiveness of the method herein. Compared with the OTE-MTL model, the F1 values of the model are respectively increased by 0.74%, 1.78% and 0.88% on three data sets of Rest15, Rest16 and Laptop14, and the model verifies that the part-of-speech information is really helpful for extracting aspect-level emotion triples. The model of the invention fuses the part of speech vector and the part of speech vector, takes the part of speech vector as the feature representation, and fully considers the part of speech information of the feature and the semantic information of the feature. The semantic information can assist the model to learn the incidence relation among different characteristics, and the model is favorable for understanding the context of the sentence. The part-of-speech information not only utilizes the data characteristics that most of the aspect words are nouns and most of the viewpoint words are adjectives and adverbs, but also follows the grammar rule of nouns followed by adjectives, thereby deepening the understanding of the model to the text content. Therefore, the model of the invention obtains better performance. In addition, when such models are used in emotional triplet tasks, the model of the present invention exhibits significant advantages over models such as CMLA-MTL and HAST-MTL that focus on the conjunctive abstraction of the terms-terms, specifically, an average performance that is 12.72% higher than CMLA-MTL and 9.40% higher than HAST-MTL. The reason may be that when the joint extraction model is migrated to the emotion triple extraction task, the loss of emotion polarity analysis needs to be considered, and the model performance is lowered to a certain extent.
(4): in order to verify the influence of part-of-speech information introduced by the model of the invention on each element in the triplet (the aspect word, the viewpoint word and the emotion polarity) and further analyze the reason behind the higher F1 value of the model, the invention respectively evaluates the performance of the POS-OTE-MTL model on the aspect word extraction task and the viewpoint word extraction task on four data sets, and the experimental results are respectively shown in fig. 4 and fig. 5.
As can be seen from FIG. 4, the method of the present invention performed better on the Rest15 and Laptop14 data sets, with F1 values raised by 0.05% and 0.34%, respectively, compared to the previous method OTE-MTL. The introduction of the part-of-speech information is verified to be helpful for the extraction of the facet words to some extent. Most of the terms in the product comment corpus belong to nouns, and the sensitivity of the terms in model recognition can be enhanced by introducing part-of-speech information. Secondly, for the extraction of the viewpoint words, as can be seen from fig. 5, the performance of the method provided by the invention on the data sets of Rest15 and Laptop14 is also better than that of the OTE-MTL, and the F1 values are respectively improved by 0.09% and 0.65%. This demonstrates that part-of-speech information can assist in the extraction of the opinion words. In the product comment sentence, most of the viewpoint words are adjectives and adverbs, and the noun is generally followed by the adjective, and the adjective is followed by the adverb. Therefore, the sentence part-of-speech information is given, the model can be assisted to quickly locate the adjectives and the adverbs in the sentence, the identification of the viewpoint words is facilitated, and the semantic relation between the aspect words and the viewpoint words can be enhanced.
In addition, as can be seen from fig. 4 and 5, the method of the present invention has poor performance on the Rest16 data set, regardless of the extraction of the aspect words and the extraction of the viewpoint words. However, in Table 3, the method herein performed better than OTE-MTL in the triple extraction, verifying that the method of the present invention was more beneficial for the prediction of emotional polarity on the Rest16 dataset. Different from the work of directly using a classifier to predict the emotion polarity in the prior art, the method adopts a sentence dependency analysis mode, expresses the feature vectors of the aspect words and the viewpoint words into the emotion polarity analyzer, and fully acquires the relation between the aspect words and the viewpoint words. In addition, the invention introduces part-of-speech information, and the part-of-speech information is used for guiding the model to predict the emotional polarity, so that the model performance is improved.
(5): to further explore the advantages and limitations of the model POS-OTE-MTL of the present invention, we performed model error source analysis. In particular, the data composition of false positive (model prediction label is positive, but true label is negative) and false negative (model prediction label is negative, but true label is positive) on the Rest14 dataset was analyzed in detail. As shown in fig. 6, we classify false positive sources into four categories: side word errors, viewpoint word errors, emotion polarity errors, and others. For false negatives, we classify their sources into three categories: concept word overlap, facet word overlap, normal source.
The error analysis results are shown in fig. 6. As can be seen from the right side of fig. 6, more than 50% of false positive errors are derived from the facet word error, the viewpoint word error, and the emotion polarity error, which indicates that the false positive is triggered to a large extent only by one error element (especially the facet word or the viewpoint word) of the extracted triplet, which also indicates that there is still room for improvement in our triplet span decoding algorithm. In future work, development of stronger span detection algorithms is possible. In addition, this reflects from the side that the model loss function calculation method of the present invention may have disadvantages, and the exact matching of the aspect word, the viewpoint word and the emotion polarity in the triple is not an ideal metric. Since slight differences in the facet and opinion term span representations may have no effect on the emotional polarity in the experiment. As can be seen from the left side of FIG. 6, more than 40% of the false negative errors of the model of the present invention are derived from the data of the aspect word overlap and the viewpoint word overlap. This shows that although the POS-OTE-MTL model of the present invention can alleviate the negative effects caused by the overlapping examples to some extent, the problem still cannot be cured, and the solution of the problem still needs to be further studied.
While the present invention has been described in detail with reference to the embodiments, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (7)

1. The extraction method of the aspect-level emotion triple fused with part-of-speech information is characterized by comprising the following steps of: the method for extracting the aspect-level emotion triple fused with the part of speech information comprises the following specific steps:
step1, text-based word vector representation and corresponding part-of-speech vector representation are fused;
step2, acquiring fusion characteristic representation containing part-of-speech information;
step3, inputting the fusion feature representation into the multi-task learning network prediction triples, namely, taking part-of-speech information as guidance to complete the extraction of the aspect words, the extraction of the viewpoint words and the analysis of the emotion polarity in parallel;
and Step4, decoding the triples, and generating a triplet span representation according to a heuristic rule.
2. The extraction method of the part-of-speech information fused aspect-level emotion triples according to claim 1, wherein: in Step1, each word in the input sentence is mapped to a low-dimensional vector space by using a pre-training model GloVe to obtain corresponding word vector representation
Figure FDA0003681215660000011
Where d is the dimension of the word vector; for input sequences
Figure FDA0003681215660000012
Firstly, a part-of-speech tag is marked for each word in the word by using an NLTK part-of-speech marker pos _ tag, and then, a part-of-speech vector is distributed to each part-of-speech tag
Figure FDA0003681215660000013
d is the dimension of the part of speech vector, which is consistent with the dimension of the word vector; finally, part-of-speech vectors for each word
Figure FDA0003681215660000014
And word vector
Figure FDA0003681215660000015
Splicing to obtain vector representation f of fusion features i
3. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step2, a vector representation f of the fusion feature is obtained i Thereafter, a representation containing context information is obtained using a Bi-LSTM network
Figure FDA0003681215660000016
Then, through a linear layer and a non-linear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension reduction is realized; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may eliminate features that are unrelated to the aspect words and the point words.
4. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step3, the multitask learning network architecture includes two parts: extracting aspect words and viewpoint words, and analyzing word-level emotion dependence; when extracting the aspect words and the viewpoint words, the network model adopts a { B, I, O } marking scheme, wherein B represents the initial position of the aspect words or the viewpoint words, I represents the middle position, and O represents the others; representing and inputting feature vectors of aspect words and viewpoint words into a multi-task learning network model, capturing the interaction relation of two words in each word pair by using a biaffine scorer with better performance in syntactic dependency analysis in the process of word-level emotional dependency analysis, and then training a word-level emotional dependency analyzer.
5. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step4, performing emotion triple decoding by using a heuristic rule to obtain a triple represented by a span; specifically, the emotion dependency relationship generated by the biaffine scorer is used as a pivot, and reverse order traversal is performed on the labels generated by the face word and the viewpoint word annotator.
6. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 3, wherein: in Step2, a specific calculation process for obtaining feature expressions of the aspect words and the viewpoint words while achieving dimension reduction through a linear layer and a non-linear transformation is as follows:
Figure FDA0003681215660000021
Figure FDA0003681215660000022
wherein the content of the first and second substances,
Figure FDA0003681215660000023
feature vector representations of the aspect words and the point of view words, respectively, d r In the form of a vector of dimensions, the vector,
Figure FDA0003681215660000024
and
Figure FDA0003681215660000025
learnable weights and biases, respectively, and furthermore g (-) is a nonlinear function ReLU (-) i.e. max (-) 0.
7. The method for extracting aspect-level emotion triple fused with part-of-speech information as claimed in claim 4, wherein: in Step3, feature vectors of the aspect words and the viewpoint words are expressed and input into a multitask learning network model, the probability distribution of each word in the model prediction sentence being the aspect word is shown as expression (3), and the probability distribution of the viewpoint words is shown as expression (4).
Figure FDA0003681215660000026
Figure FDA0003681215660000027
Wherein the content of the first and second substances,
Figure FDA0003681215660000028
is a weight matrix,
Figure FDA0003681215660000029
For the bias, both are training parameters,
Figure FDA00036812156600000210
feature vector representations of the aspect words and the point of view words, respectively, d r Is the vector dimension.
CN202210633972.9A 2022-06-07 2022-06-07 Method for extracting aspect-level emotion triple fused with part-of-speech information Pending CN115048936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210633972.9A CN115048936A (en) 2022-06-07 2022-06-07 Method for extracting aspect-level emotion triple fused with part-of-speech information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210633972.9A CN115048936A (en) 2022-06-07 2022-06-07 Method for extracting aspect-level emotion triple fused with part-of-speech information

Publications (1)

Publication Number Publication Date
CN115048936A true CN115048936A (en) 2022-09-13

Family

ID=83159778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210633972.9A Pending CN115048936A (en) 2022-06-07 2022-06-07 Method for extracting aspect-level emotion triple fused with part-of-speech information

Country Status (1)

Country Link
CN (1) CN115048936A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511012A (en) * 2022-11-22 2022-12-23 南京码极客科技有限公司 Class soft label recognition training method for maximum entropy constraint
CN117494727A (en) * 2023-12-29 2024-02-02 卓世科技(海南)有限公司 De-biasing method for large language model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511012A (en) * 2022-11-22 2022-12-23 南京码极客科技有限公司 Class soft label recognition training method for maximum entropy constraint
CN117494727A (en) * 2023-12-29 2024-02-02 卓世科技(海南)有限公司 De-biasing method for large language model
CN117494727B (en) * 2023-12-29 2024-03-29 卓世科技(海南)有限公司 De-biasing method for large language model

Similar Documents

Publication Publication Date Title
CN110781680B (en) Semantic similarity matching method based on twin network and multi-head attention mechanism
Yang et al. Adversarial learning for chinese ner from crowd annotations
CN115048936A (en) Method for extracting aspect-level emotion triple fused with part-of-speech information
Fonseca et al. A two-step convolutional neural network approach for semantic role labeling
CN115618045B (en) Visual question answering method, device and storage medium
CN112395876B (en) Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112926337B (en) End-to-end aspect level emotion analysis method combined with reconstructed syntax information
CN115168541A (en) Chapter event extraction method and system based on frame semantic mapping and type perception
Du et al. Syntax-type-aware graph convolutional networks for natural language understanding
CN114429143A (en) Cross-language attribute level emotion classification method based on enhanced distillation
Kasai et al. End-to-end graph-based TAG parsing with neural networks
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
Stengel-Eskin et al. Universal decompositional semantic parsing
Mai et al. Pronounce differently, mean differently: a multi-tagging-scheme learning method for Chinese NER integrated with lexicon and phonetic features
CN116681061A (en) English grammar correction technology based on multitask learning and attention mechanism
CN115906818A (en) Grammar knowledge prediction method, grammar knowledge prediction device, electronic equipment and storage medium
Zhang et al. Semantics-aware inferential network for natural language understanding
Zhang et al. Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey
Hsu et al. An interpretable generative adversarial approach to classification of latent entity relations in unstructured sentences
Fernandes et al. Entropy-guided feature generation for structured learning of Portuguese dependency parsing
CN115481217A (en) End-to-end attribute level emotion analysis method based on sentence component perception attention mechanism
Xiang et al. A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension
CN114626463A (en) Language model training method, text matching method and related device
Nio et al. Intelligence is asking the right question: a study on Japanese question generation
CN114595700A (en) Zero-pronoun and chapter information fused Hanyue neural machine translation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination