CN115048936A

CN115048936A - Method for extracting aspect-level emotion triple fused with part-of-speech information

Info

Publication number: CN115048936A
Application number: CN202210633972.9A
Authority: CN
Inventors: 相艳; 柳如熙; 陆婷; 郭军军
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-09-13

Abstract

The invention relates to an aspect-level emotion triple extraction method fusing part of speech information, and belongs to the field of emotion analysis. The method fuses part-of-speech information in the model input stage, and deepens the understanding of the model to the self semantics of the data; then, an aspect-level sentiment triple extraction task is efficiently completed in a multi-task learning mode, wherein two annotators are used for completing extraction of aspect words and viewpoint words, and a Biofine sentiment dependency analyzer is used for fully learning interactive information between the aspect words and the viewpoint words so as to judge sentiment polarity corresponding to the aspect words; and finally, decoding the triple to obtain the span representation of the triple. Experiments are carried out on four reference data sets disclosed by SemEval, and the experimental results show that the performance of the method in the aspect-level emotion triple extraction task is superior to that of other baseline models, so that the fact that the method for fusing part of speech information is really helpful for interactive modeling is verified, and the performance of the models is finally improved.

Description

Method for extracting aspect-level emotion triple fused with part-of-speech information

Technical Field

The invention relates to an aspect-level emotion triple extraction method fusing part of speech information, and belongs to the technical field of emotion analysis.

Background

Aspect-level emotion triple extraction (ASTE) is a recently proposed ABSA subtask aiming at extracting emotion triples (aspect words, viewpoint words, emotion polarities) in an input sentence. To address this task, Peng et al propose a two-stage pipelined approach, where the first stage uses two sequence annotators, one to extract aspect words-emotion polarities jointly, and the other to extract viewpoint words, and then the second stage uses position information between the aspect words and viewpoint words to link them together to determine their emotion polarity pairing relationship, thereby implementing emotion triple extraction. Xu et al have adopted a novel location-aware tagging scheme that can specify the structural information of triples, enabling the connection between the three elements through tag semantics. Xu et al, considering that the aspect and point of view words may be composed of multiple words, propose a Span-based ASTE model (Span ASTE) that directly captures Span-to-Span interactions when predicting the emotional relationships of aspect and point of view word pairs for the first time. These methods all achieve considerable effects on the ASTE task, but still have the following disadvantages: the feature representation does not contain part-of-speech information and does not consider guiding interactive modeling of the aspect words, the viewpoint words and the emotion polarities by the part-of-speech information, so that the semantic relation among the three cannot be deeply understood by the model and the model only stays in shallow learning.

Aiming at the problems, the invention provides a multitask learning framework OTE-MTL-POS for extracting emotion triple combined with part of speech information. On the basis of four ABSA reference data disclosed by SemEval, the method disclosed by the invention is compared with a series of baseline methods, and experimental results prove the effectiveness of the method disclosed by the invention on an emotion triple extraction task.

Disclosure of Invention

The invention provides a method for extracting aspect-level emotion triples fusing part-of-speech information, which is used for solving the problems that part-of-speech information is not contained in a feature representation stage in the prior relevant research work and the interactive modeling of aspect words, viewpoint words and emotion polarities is not guided by the part-of-speech information.

The technical scheme of the invention is as follows: the method for extracting the aspect-level emotion triple fused with the part of speech information comprises the following specific steps:

step1, fusing the word vector representation based on the text and the corresponding part of speech vector representation;

step2, acquiring fusion characteristic representation containing part-of-speech information;

step3, inputting the fusion feature representation into the multi-task learning network prediction triples, namely, taking part-of-speech information as guidance to complete the extraction of the aspect words, the extraction of the viewpoint words and the analysis of the emotion polarity in parallel;

and Step4, decoding the triples, and generating a triplet span representation according to a heuristic rule.

As a further scheme of the invention, in Step1, each word in the input sentence is mapped to a low-dimensional vector space by using a pre-training model GloVe to obtain corresponding word vector representation

Where d is the dimension of the word vector; meanwhile, in the input stage of the model, part-of-speech information is explicitly embedded into word vectors to generate text representation fused with syntax and semantics, so that the triple extraction task benefits from the triple extraction task, specifically, for an input sequence

Firstly, a part-of-speech tag is labeled for each word in the word by using an NLTK part-of-speech tagger pos _ tag, wherein the part-of-speech tag is shown in a table 1; then, each part-of-speech tag is assigned a part-of-speech vector

d is the dimension of the part of speech vector, which is consistent with the dimension of the word vector; finally, part-of-speech vectors for each word

And word vector

Splicing to obtain vector representation f of fusion features _i The calculation process is shown in formula (1).

Wherein the content of the first and second substances,

representing a concatenation operation of the vectors.

TABLE 1 part-of-speech tagging example

Syntax is a basic feature of a sentence, and part of speech is a basic syntax. Part of speech specifies the sequence of words in a sentence, e.g., a noun followed by a verb to represent an action and a noun followed by an adjective to modify. In order to solve the problem that the word property information is not introduced in the previous emotion triple extraction work, so that the model performance is limited due to the lack of deep semantics, the word property information is explicitly embedded into word vectors to generate fusion feature vector representation, and the comprehension of the semantics of the model to the text is deepened. The operation of fusing part of speech information not only gives full play to the data advantages that most of the aspect words in the comment text belong to nouns and most of the viewpoint words belong to adjectives, but also is directly beneficial to the extraction of aspect-level emotion triples. The operation also guides modeling by using part-of-speech information, fully considers semantic information and part-of-speech information of the characteristics, and indirectly assists the completion of the emotion triple task.

The part-of-speech tagger pos _ tag in the NLTK toolkit is essentially a part-of-speech classification neural network model trained based on a rich training corpus. The advantages are that: (1) inputting an emotion triple task consistent with the text, and taking a sentence as a unit; (2) the marking method is better and comprises a maximum entropy marker, a regular expression marker, an unigram marker, an n-gram marker and other various markers. These markers may be used in combination using the backoff technique; (3) the universal part-of-speech tag set comprises 36 common part-of-speech tags such as nouns, verbs, adjectives and adverbs; (4) based on an object-oriented script language, the method supports rapid prototyping and word programming and is easy to learn and use. In view of the above, the present invention automatically labels the text using a pos _ tag part of speech tagger, obtaining part of speech tags corresponding to each word.

As a further embodiment of the present invention, in Step2, a vector representation f of the fused features is obtained _i Thereafter, a representation containing context information is obtained using a Bi-LSTM network

The following were used:

wherein d is _h The dimensions of the layer vectors are hidden for the unidirectional LSTM network,

respectively represent forward LSTM and backward LSTM,

a splicing operation of the expression vectors;

then, through a linear layer and a non-linear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension reduction is realized; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may cull features that are unrelated to aspect words and viewpoint words. The specific calculation process is as follows:

wherein r is _i ^ap ,

Feature vector representations of the aspect words and the point of view words, respectively, d _r Is the vector dimension.

And

respectively, learnable weights and biases. Furthermore, g (-) is a non-linear function ReLU (-) i.e. max (-) 0.

It should be noted that the above vector representations are all in preparation for the extraction of triples. A group of feature expressions r for emotion polarity analysis can be obtained in the same way as the expressions (3) and (4) _i ^ap′ ,

The calculation process is consistent with the expressions (3) and (4), and only the learning parameters are different.

As a further aspect of the present invention, in Step3, the multitask learning network architecture includes two parts: extracting aspect words and viewpoint words, and analyzing word-level emotion dependence;

when extracting the aspect words and the viewpoint words, the network model adopts a { B, I, O } marking scheme, wherein B represents the initial position of the aspect words or the viewpoint words, I represents the middle position, and O represents the others; and (3) representing the feature vectors of the aspect words and the viewpoint words into an input multitask learning network model, wherein the probability distribution of each word in the model prediction sentence, which is the aspect word, is shown as a formula (5), and the probability distribution of the viewpoint words is shown as a formula (6).

Wherein, W _t ^ap ,

Is a weight matrix,

For bias, both are training parameters.

When extracting the aspect words and the viewpoint words, the model is jointly trained by using cross entropy loss, and the loss function comprises two parts, namely the aspect word extraction loss and the viewpoint word extraction loss, as follows:

wherein S is the total number of samples, and k is the number of class iterations.

Respectively represent the real label probability distribution of the aspect words and the viewpoint words,

respectively representing the probability distribution of the predictive labels of the aspect words and the viewpoint words.

For a sentence with length N, N is total ² For freely combined word pairs (including self-pairing), the model of the invention analyzes emotion dependence for each word pair. The emotion dependency category set is defined as { NEU, NEG, POS, NO-DEP }, wherein NEU, NEG, POS respectively represent neutral, negative and positive directions, and NO-DEP represents NO emotion dependency relationship. Here, we only consider the emotional dependency of a word pairI.e., the last word of the facet word and the last word of the viewpoint word. For example, in the sentence "Great bag, start up speed," a triple is referred to as (start up speed, Great, POS), the triple is simplified to (speed, Great, POS) in the emotion dependency parsing process, and the parsing result is shown in fig. 2. In this way, the learning redundancy of the parser will be greatly reduced and the emotional dependence at word level is still available when combined with the extracted aspect word span and viewpoint word span.

In the process of word-level emotion dependency analysis, a biaffine scorer with better performance in syntax dependency analysis is used for capturing the interaction relation of two words in each word pair; the emotional dependency score for each word pair is as follows:

wherein the content of the first and second substances,

represents a word pair (w) _i ,w _j ) Score of emotional dependency relationship belonging to category k, W ^k And b ^k Respectively training parameter matrix and bias, r _i ^ap′ ,

The feature representation of the aspect word and the viewpoint word calculated in Step 2. Furthermore, with S _i,j,k To express a score

Normalized score after softmax.

As can be seen from the factorization of equation (8), conceptually, the biaffine scorer can not only calculate w _i Receive w _j Probability of belonging to a particular dependency class (first term), and w may also be _j The probability of belonging to the dependency class is taken as the prior probability (second term). In fact, the biaffine scorer is the operation of one biaffine transformation and one matrix multiplication.

Subsequently, a word-level emotion dependency parser is trained. This stage uses the cross-entropy function as a loss function, as follows:

wherein | N | ² Is the total number of word pairs,

for each word pair (w) _i ,w _j ) True dependency class of S _i,j,k A biaffine score.

Then carrying out model training;

finally, the joint training loss function of the multitask learning framework is as follows:

where α is a balance coefficient for balancing the learning between the model execution extraction task and the emotion dependency analysis. Theta is a training parameter, | theta | | non-woven phosphor ₂ And gamma is L of the parameter theta ₂ Regularization and control coefficients.

In a further aspect of the present invention, in Step4, after obtaining the aspect words, viewpoint words and word-level emotion dependency relationships extracted in Step3, the present invention performs emotion triple decoding using a heuristic rule to obtain triples represented by spans. Specifically, the emotion dependency relationship generated by the biaffine scorer is used as a pivot, and reverse order traversal is performed on the labels generated by the face word and the viewpoint word annotator.

For example, from the input sequence "Great battle, start up speed", it can be obtained that the aspect word label is { O, B, I, O }, the viewpoint word label is { B, O }, and the word-level emotion dependency is POS. The index-type emotion triple can be represented as (6, 1, POS), where 6 is the index of the last word of the aspect word (speed), 1 is the index of the last word of the viewpoint word (Great), and the emotion polarity obtained by the two analyses is Positive (POS). The reverse order traversal takes the index of the aspect word and the viewpoint word as a pivot, and the mark of the traversal word sequence is a label I. The final output emotion triplets are [ (4,6), (1,1), POS ].

The invention has the beneficial effects that:

1. the method of the invention embeds the part of speech information into the word vector in an explicit way, generates the expression of the fusion characteristic vector, utilizes the data characteristics that most of the aspect words belong to nouns and most of the viewpoint words belong to adjectives or adverbs, and deepens the understanding of the model to the self semantics of the text;

2. according to the method, a multi-task learning mode is adopted to process the target task, the sentiment dependence between the aspect words and the viewpoint words is analyzed by using the analyzer while the aspect words and the viewpoint words are independently extracted, and the requirement of interactive modeling between three elements in the triple is met;

3. firstly, using a part-of-speech tagging tool pos _ tag in an NLTK tool kit to tag the part of speech of an input sequence, acquiring corresponding part-of-speech vector representation, and fusing the part-of-speech vector and a word vector to obtain fused feature vector representation; then, using LSTM network to encode sentences, and integrating sentence context information into feature representation; moreover, while the extraction of the aspect words and the viewpoint words is respectively finished by using two independent markers, a biaffine emotion dependence analyzer is used for predicting word-level emotion dependence to finish the extraction of emotion triples; and finally, carrying out emotion triple decoding to obtain the span representation of the emotion triples.

4. On the basis of four ABSA reference data disclosed by SemEval, the method disclosed by the invention is compared with a series of baseline methods, and experimental results prove the effectiveness of the method disclosed by the invention on an emotion triple extraction task.

Drawings

FIG. 1 is an aspect-level emotion triple extraction model framework fused with part-of-speech information according to the present invention;

FIG. 2 is an example of a word-level emotion dependency parsing result of the present invention;

FIG. 3 illustrates triple categories in accordance with the present invention; where light spans represent facet words and dark spans represent viewpoint words. The arrows indicate emotional polar dependencies, always pointing from the aspect words to the viewpoint words.

Fig. 4 shows the experimental results (F1 value) of the influence of the part-of-speech information on the extraction of the side words according to the present invention.

Fig. 5 shows the experimental results (F1 value) of the influence of the part-of-speech information on the extraction of the viewpoint words according to the present invention.

FIG. 6 is a diagram of the composition of the false negative and false positive data on the Rest14 data set in accordance with the present invention.

Detailed Description

Example 1: as shown in fig. 1 to 6, the method for extracting the aspect-level emotion triple fused with part-of-speech information specifically includes the following steps:

step1, text-based word vector representation and corresponding part-of-speech vector representation are fused;

step3, inputting the fusion feature representation into the multi-task learning network prediction triple, namely, taking part-of-speech information as guidance to finish the extraction of the aspect words, the extraction of the viewpoint words and the analysis of emotion polarity in parallel;

And word vector

Vector representation f of fusion features obtained by splicing _i The calculation process is shown in formula (1).

Wherein the content of the first and second substances,

representing a concatenation operation of the vectors.

TABLE 1 part-of-speech tagging example

Syntax is a basic feature of a sentence, and part of speech is a basic syntax. Part of speech specifies the sequence of words in a sentence, e.g., a noun followed by a verb to represent an action and a noun followed by an adjective to modify. In order to solve the problem that the word property information is not introduced in the previous emotion triple extraction work, so that the deep semantics is lacked, and the model performance is limited, the word property information is explicitly embedded into word vectors to generate fusion feature vector representation, and the understanding of the model to the semantics of the text is deepened. The operation of fusing part of speech information not only gives full play to the data advantages that most of the aspect words in the comment text belong to nouns and most of the viewpoint words belong to adjectives, but also is directly beneficial to the extraction of aspect-level emotion triples. The operation also guides modeling by using part-of-speech information, fully considers semantic information and part-of-speech information of the characteristics, and indirectly assists the completion of the emotion triple task.

As a further aspect of the present invention, Step2 provides a vector representation f of the fusion features _i Thereafter, a representation containing context information is obtained using a Bi-LSTM network

The following were used:

respectively represent forward LSTM and backward LSTM,

stitching operation for representing vectors；

Then, through a linear layer and a nonlinear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension is reduced; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may eliminate features that are unrelated to the aspect words and the point words. The specific calculation process is as follows:

wherein r is _i ^ap ,

And

Wherein, W _t ^ap ,

Is a weight matrix,

For bias, both are training parameters.

respectively representing an aspect word and a point of view wordAnd predicting the probability distribution of the label.

For a sentence with length N, N is total ² For freely combined word pairs (including self-pairing), the model of the invention analyzes emotion dependence for each word pair. The emotion dependency category set is defined as { NEU, NEG, POS and NO-DEP }, wherein NEU, NEG and POS respectively represent neutral, negative and positive directions, and NO-DEP represents NO emotion dependency relationship. Here, we only consider the emotional dependency of a word pair, i.e. the last word of the aspect word and the last word of the viewpoint word. For example, in the sentence "Great bag, start up speed," a triple is referred to as (start up speed, Great, POS), the triple is simplified to (speed, Great, POS) in the emotion dependency parsing process, and the parsing result is shown in fig. 2. In this way, the learning redundancy of the parser will be greatly reduced and the emotional dependence at word level is still available when combined with the extracted aspect word span and viewpoint word span.

wherein the content of the first and second substances,

After passing through softmnaxIs normalized to the score of (a).

As can be seen from the factorization of equation (8), conceptually, the biaffine scorer can not only calculate w _i Receive w _j Probability of belonging to a particular dependency class (first term), but also w _j The probability of belonging to the dependency class is taken as a prior probability (second term). In essence, the biaffine scorer is the operation of one biaffine transformation and one matrix multiplication.

wherein | N | ² Is the total number of word pairs,

Then carrying out model training;

For example, from the input sequence "Great battle, start up speed", it can be obtained that the aspect word label is { O, B, I, O }, the viewpoint word label is { B, O }, and the word-level emotion dependency is POS. The index-type emotion triple can be represented as (6, 1, POS), where 6 is the index of the last word of the aspect word (speed), 1 is the index of the last word of the viewpoint word (Great), and the emotion polarity obtained by the two analyses is Positive (POS). The reverse order traversal takes the aspect word and the viewpoint word index as pivot, and the mark of the traversal word sequence is a label I. The final output emotion triplets are [ (4,6), (1,1), POS ]. The detailed procedure of the algorithm is as follows.

Table 2 shows the detailed process of the algorithm

Aiming at the fact that the emotion triple extraction method fusing part-of-speech information provided by the invention is verified on 4 public data sets, the method specifically comprises the following steps:

(1): the experimental data of the invention adopts Laptop14, Rest14, Rest15 and Rest 16. The Laptop14 data set is a Laptop data set disclosed by SemEval Challenge 2014Task4, and the Rest14, the Rest15 and the Rest16 data sets are Restaurant data sets disclosed by SemEval Challenge 2014Task4, SemEval Challenge 2015Task 12 and SemEval Challenge 2016Task 5 respectively. Each data set is divided into three subsets, namely a training set, a verification set and a test set, and the division ratio is 3:1: 1. Furthermore, the sequence overlap indicates the number of sentences including triple of overlapping viewpoint words, the triple overlap indicates the number of triple of overlapping viewpoint words in all the triplets, and the viewpoint word overlap phenomenon indicates that a plurality of aspect words correspond to the same viewpoint word, as shown in fig. 3. Because the percentage of the triples with overlapped viewpoint words in the triples is 24%, the experimental result is influenced to a certain extent, and therefore the data used in the experiment comprises the triples with overlapped viewpoint words.

(2): the experimental environment of the invention is a deep learning framework Python 3.7.3 Pythoch 1.2.0. The invention adopts a GloVe pre-training model to initialize the word vector of the data set, and the dimension of the word vector is set to be d equal to 300. The invention adopts a part of speech annotator pos _ tag provided in an NLTK package to mark part of speech, and the dimensionality of a part of speech vector is 300. LSTM hidden layer vector dimension set to d _h 600. The characteristic dimension after dimension reduction is d _r 100. In the target function, alpha is 1, L ₂ Regularization coefficient γ of 10 ^-5 . During the model training process, the learning rate is 10 ^-3 Batch _ size is 32, dropout strategy is adopted to prevent overfitting, and dropout is 0.5. Initializing all parameters by adopting uniform distribution, optimizing the parameters by using an Adam optimizer, initializing and training 10 models respectively in an experiment, and obtaining a final experiment result which is an average value of 10 model test results. In addition, when the model performance on the verification set is not further improved for 5 times continuously, the model stops training.

(3): the baseline model selected by the invention comprises two types: one class is the classical model of aspect-level triple extraction, and the second class is the model based on multi-task learning. The method comprises the following specific steps:

rinnate +: the model mines the extraction rules of the aspect words and the viewpoint words based on the dependency relationship of the words in the sentence, so as to jointly extract the aspect words-viewpoint words.

CMLA +: the model utilizes an attention mechanism to interactively learn the relationships between the aspect words and the viewpoint words to achieve a joint extraction of aspect words-viewpoint words.

Unifield +: the model is a classical aspect word-emotion polarity combined extraction model, and relates to two stacked RNNs: one is used for predicting the joint label and completing the joint extraction task; and the other is used for predicting the boundary of the facet words and assisting the joint extraction.

Pipeline: the model decomposes the triplet extraction into two stages: the first stage is used for predicting the joint label of the aspect word-emotion polarity and the label of the viewpoint word, and the second stage is used for pairing the two results of the first stage.

CMLA-MTL: the model is a multi-task learning model, and is obtained by CMLA expansion by Zhang et al and applied to an aspect-level emotion triple extraction task.

HAST-MTL: the model is obtained by expanding HAST by Zhang et al and is applied to an aspect-level emotion triple extraction task.

OTE-MTL: the model jointly extracts aspect words and viewpoint words by using a multi-task learning method, and analyzes emotion dependency relations between the aspect words and the viewpoint words by using a double affine scorer (biaffine scorer), so that the extraction of aspect-level emotion triples is realized.

The aspect-level emotion triple extraction model POS-OTE-MTL fused with part of speech information provided by the invention is compared with the baseline model, and the experimental result is shown in Table 3.

Table 3 comparative experiment results (%). The mark is the experimental result obtained on the data set without the concept word overlap statement, and the optimal experimental result is shown in bold.

As can be seen from Table 3, the performance of OTE-MTL was superior to OTE-MTL except for the Laptop14 data set. This shows that the existence of the viewpoint word overlapping data indeed hinders the further improvement of the model performance, and challenges are brought to the aspect-level emotion triple extraction task. In addition, the CMLA + model performed poorly compared to the CMLA-MTL model. The method has the advantages that the adverse effect of the viewpoint word overlapping data on the model can be weakened to a certain extent by decoupling the two tasks of the aspect word extraction and the emotion polarity prediction and processing the two tasks in a multi-task learning mode. Whether CMLA +, Unifie +, or Pipeline, joint labels are used to couple and process the facet extraction task and the emotion polarity classification task simultaneously. Such a processing manner is inherently simple and efficient, but there is a problem that the aspect word extraction task and the emotion polarity classification task are associated by a label, and the semantic dependency between the aspect words and the viewpoint words is not obtained, that is, the phenomenon of overlapping of the viewpoint words is ignored. The problem can be avoided by using a multi-task processing mode, the aspect word extraction task and the emotion polarity classification task are processed respectively in parallel, and when the aspect words are matched with the emotion polarities, the semantic relation between the aspect words and the viewpoint words can be acquired, so that the emotion polarity classification accuracy is improved, and a better effect can be achieved. Moreover, compared with the CMLA-MTL model, the POS-OTE-MTL model provided by the invention has higher F1 values on all data sets, which are respectively improved by 14.88%, 12.16%, 12.60% and 11.25%. The reasonableness and superiority of processing viewpoint word overlapping data by using the model of the invention are verified.

In summary, as can be seen from table 3, compared with all baseline models, the model POS-OTE-MTL provided by the present invention achieves the highest F1 value on all three data sets of Rest15, Rest16, and Laptop14, and verifies the effectiveness of the method herein. Compared with the OTE-MTL model, the F1 values of the model are respectively increased by 0.74%, 1.78% and 0.88% on three data sets of Rest15, Rest16 and Laptop14, and the model verifies that the part-of-speech information is really helpful for extracting aspect-level emotion triples. The model of the invention fuses the part of speech vector and the part of speech vector, takes the part of speech vector as the feature representation, and fully considers the part of speech information of the feature and the semantic information of the feature. The semantic information can assist the model to learn the incidence relation among different characteristics, and the model is favorable for understanding the context of the sentence. The part-of-speech information not only utilizes the data characteristics that most of the aspect words are nouns and most of the viewpoint words are adjectives and adverbs, but also follows the grammar rule of nouns followed by adjectives, thereby deepening the understanding of the model to the text content. Therefore, the model of the invention obtains better performance. In addition, when such models are used in emotional triplet tasks, the model of the present invention exhibits significant advantages over models such as CMLA-MTL and HAST-MTL that focus on the conjunctive abstraction of the terms-terms, specifically, an average performance that is 12.72% higher than CMLA-MTL and 9.40% higher than HAST-MTL. The reason may be that when the joint extraction model is migrated to the emotion triple extraction task, the loss of emotion polarity analysis needs to be considered, and the model performance is lowered to a certain extent.

(4): in order to verify the influence of part-of-speech information introduced by the model of the invention on each element in the triplet (the aspect word, the viewpoint word and the emotion polarity) and further analyze the reason behind the higher F1 value of the model, the invention respectively evaluates the performance of the POS-OTE-MTL model on the aspect word extraction task and the viewpoint word extraction task on four data sets, and the experimental results are respectively shown in fig. 4 and fig. 5.

As can be seen from FIG. 4, the method of the present invention performed better on the Rest15 and Laptop14 data sets, with F1 values raised by 0.05% and 0.34%, respectively, compared to the previous method OTE-MTL. The introduction of the part-of-speech information is verified to be helpful for the extraction of the facet words to some extent. Most of the terms in the product comment corpus belong to nouns, and the sensitivity of the terms in model recognition can be enhanced by introducing part-of-speech information. Secondly, for the extraction of the viewpoint words, as can be seen from fig. 5, the performance of the method provided by the invention on the data sets of Rest15 and Laptop14 is also better than that of the OTE-MTL, and the F1 values are respectively improved by 0.09% and 0.65%. This demonstrates that part-of-speech information can assist in the extraction of the opinion words. In the product comment sentence, most of the viewpoint words are adjectives and adverbs, and the noun is generally followed by the adjective, and the adjective is followed by the adverb. Therefore, the sentence part-of-speech information is given, the model can be assisted to quickly locate the adjectives and the adverbs in the sentence, the identification of the viewpoint words is facilitated, and the semantic relation between the aspect words and the viewpoint words can be enhanced.

In addition, as can be seen from fig. 4 and 5, the method of the present invention has poor performance on the Rest16 data set, regardless of the extraction of the aspect words and the extraction of the viewpoint words. However, in Table 3, the method herein performed better than OTE-MTL in the triple extraction, verifying that the method of the present invention was more beneficial for the prediction of emotional polarity on the Rest16 dataset. Different from the work of directly using a classifier to predict the emotion polarity in the prior art, the method adopts a sentence dependency analysis mode, expresses the feature vectors of the aspect words and the viewpoint words into the emotion polarity analyzer, and fully acquires the relation between the aspect words and the viewpoint words. In addition, the invention introduces part-of-speech information, and the part-of-speech information is used for guiding the model to predict the emotional polarity, so that the model performance is improved.

(5): to further explore the advantages and limitations of the model POS-OTE-MTL of the present invention, we performed model error source analysis. In particular, the data composition of false positive (model prediction label is positive, but true label is negative) and false negative (model prediction label is negative, but true label is positive) on the Rest14 dataset was analyzed in detail. As shown in fig. 6, we classify false positive sources into four categories: side word errors, viewpoint word errors, emotion polarity errors, and others. For false negatives, we classify their sources into three categories: concept word overlap, facet word overlap, normal source.

The error analysis results are shown in fig. 6. As can be seen from the right side of fig. 6, more than 50% of false positive errors are derived from the facet word error, the viewpoint word error, and the emotion polarity error, which indicates that the false positive is triggered to a large extent only by one error element (especially the facet word or the viewpoint word) of the extracted triplet, which also indicates that there is still room for improvement in our triplet span decoding algorithm. In future work, development of stronger span detection algorithms is possible. In addition, this reflects from the side that the model loss function calculation method of the present invention may have disadvantages, and the exact matching of the aspect word, the viewpoint word and the emotion polarity in the triple is not an ideal metric. Since slight differences in the facet and opinion term span representations may have no effect on the emotional polarity in the experiment. As can be seen from the left side of FIG. 6, more than 40% of the false negative errors of the model of the present invention are derived from the data of the aspect word overlap and the viewpoint word overlap. This shows that although the POS-OTE-MTL model of the present invention can alleviate the negative effects caused by the overlapping examples to some extent, the problem still cannot be cured, and the solution of the problem still needs to be further studied.

While the present invention has been described in detail with reference to the embodiments, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The extraction method of the aspect-level emotion triple fused with part-of-speech information is characterized by comprising the following steps of: the method for extracting the aspect-level emotion triple fused with the part of speech information comprises the following specific steps:

2. The extraction method of the part-of-speech information fused aspect-level emotion triples according to claim 1, wherein: in Step1, each word in the input sentence is mapped to a low-dimensional vector space by using a pre-training model GloVe to obtain corresponding word vector representation

Where d is the dimension of the word vector; for input sequences

Firstly, a part-of-speech tag is marked for each word in the word by using an NLTK part-of-speech marker pos _ tag, and then, a part-of-speech vector is distributed to each part-of-speech tag

And word vector

Splicing to obtain vector representation f of fusion features _i 。

3. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step2, a vector representation f of the fusion feature is obtained _i Thereafter, a representation containing context information is obtained using a Bi-LSTM network

Then, through a linear layer and a non-linear transformation, the characteristic representation of the aspect words and the viewpoint words is obtained while the dimension reduction is realized; here, there are two reasons why the hidden layer vector is not directly input into the next stage of the model: firstly, the characteristics of the hidden layer state comprise calculation redundant information and have the risk of overfitting; second, the operation may eliminate features that are unrelated to the aspect words and the point words.

4. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step3, the multitask learning network architecture includes two parts: extracting aspect words and viewpoint words, and analyzing word-level emotion dependence; when extracting the aspect words and the viewpoint words, the network model adopts a { B, I, O } marking scheme, wherein B represents the initial position of the aspect words or the viewpoint words, I represents the middle position, and O represents the others; representing and inputting feature vectors of aspect words and viewpoint words into a multi-task learning network model, capturing the interaction relation of two words in each word pair by using a biaffine scorer with better performance in syntactic dependency analysis in the process of word-level emotional dependency analysis, and then training a word-level emotional dependency analyzer.

5. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 1, wherein: in Step4, performing emotion triple decoding by using a heuristic rule to obtain a triple represented by a span; specifically, the emotion dependency relationship generated by the biaffine scorer is used as a pivot, and reverse order traversal is performed on the labels generated by the face word and the viewpoint word annotator.

6. The method for extracting aspect-level emotion triple fused with part-of-speech information according to claim 3, wherein: in Step2, a specific calculation process for obtaining feature expressions of the aspect words and the viewpoint words while achieving dimension reduction through a linear layer and a non-linear transformation is as follows:

wherein the content of the first and second substances,

feature vector representations of the aspect words and the point of view words, respectively, d _r In the form of a vector of dimensions, the vector,

and

learnable weights and biases, respectively, and furthermore g (-) is a nonlinear function ReLU (-) i.e. max (-) 0.

7. The method for extracting aspect-level emotion triple fused with part-of-speech information as claimed in claim 4, wherein: in Step3, feature vectors of the aspect words and the viewpoint words are expressed and input into a multitask learning network model, the probability distribution of each word in the model prediction sentence being the aspect word is shown as expression (3), and the probability distribution of the viewpoint words is shown as expression (4).

Wherein the content of the first and second substances,

is a weight matrix,

For the bias, both are training parameters,