CN115545038A

CN115545038A - Aspect emotion analysis method for optimizing grid label

Info

Publication number: CN115545038A
Application number: CN202211109002.5A
Authority: CN
Inventors: 葛洪伟; 王伟; 江明; 李婷
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-12-30

Abstract

The invention discloses an aspect emotion analysis method for optimizing a grid label, and belongs to the field of natural language processing and aspect emotion analysis. The method adopts a grid label method for enhancing position learning and incorporates a regular dropout method into a BERT model, and realizes aspect emotion extraction by matching an extended data set scheme on the regular dropout method. The method can reduce the space cost for marking, more accurately extract the information of the aspect emotion triple and the binary group in the text, improve the utilization rate of the model and meet the design requirement of an actual engineering system.

Description

Aspect emotion analysis method for optimizing grid label

Technical Field

The invention relates to an aspect emotion analysis method for optimizing a grid label, and belongs to the field of natural language processing and aspect emotion analysis.

Background

With the rapid development of the internet era, data is rapidly expanding, and how to extract information and knowledge beneficial to production and life of people from a huge amount of data with maximum benefit is a problem to be solved urgently. At present, more and more people begin to research on big data and extract data information. The analysis of language emotion is a necessary step in the big data era, almost everyone inevitably needs to talk with the internet, expresses own viewpoints through a network and a social platform, and shares own experiences, and the contents include a plurality of fields, such as the discussion about various problems in Xbo, the sharing of various living experiences in X, the opinions of various events, and the evaluation of a large number of customers in catering service platforms such as American X, hungry X and the like, which obviously carry emotional colors. The data volume is larger and larger along with the time, the service needs to be more and more efficient, and the text emotion analysis by technical means becomes necessary, so that the emotion analysis of the text is important, and the commercial value of the emotion analysis is very high.

The traditional text sentiment analysis simply divides the whole chapter or sentence into positive, neutral, negative and other types by extracting the sentiment polarity of the whole chapter or sentence, and the analysis process ignores the variability of semantics in the speech, thereby causing the inaccuracy of a recognition result; aspect-oriented emotion analysis (ABSA) aims to extract emotion polarities of all aspects in a text, and mining finer and finer emotion tendencies of a user through aspect term extraction, viewpoint term extraction and aspect-level emotion classification, and is a fine-grained emotion analysis technology.

Aspect-oriented sentiment analysis is a challenging task and a large amount of research has been done by many scholars in this area. For example, chen et al respectively constructs two graph attention networks (Chen X, sun C, wang J. [ C ]// Proceedings of the 58th Annual Meeting of the Association for Computational linkage prediction. 2020). Mao et al construct two BERT-MRC models for joint training (Mao Y, shen Y, yu C, A joint training dual-mr frame for aspect based sensory analysis [ J ]. ArXiv preprinting arXiv:2101.00816, 2021), learn the aspect terms of the text and the viewpoint terms of the text and the emotional polarity, the joint model parameter sharing, repeat the input holding connection of two reading understanding models to realize the joint training and the triple task extraction, although the meaning understanding is stronger, the importance of the relative position is not considered, and the word pair relation established by the parameter sharing is weaker. Li et al consider the complementarity of syntactic structure and semantic dependencies, using a Dual graph convolutional network (Li R, chen H, feng F. [ C ]// Proceedings of the 59th Annual Meeting of the Association for Computational linearity and the 11th International Joint Conference on Natural Language Processing (Volume 1.

The models have a common problem that the tasks of extracting the opinion triples are divided into several subtasks to be processed, so that the problems of complex tasks, large models and error propagation of training networks are caused. In order to solve the problem that multiple subtasks of an opinion triplet are easy to cause error propagation, zhen et al propose a Grid marking model (Wu Z, ying C, zhao F. [ J ]. ArXiv preprinting arXiv:2010.04640, 2020), in which a scheme combining BERT, called GTS-BERT for short, is constructed, and the opinion triplet is extracted in an end-to-end manner by using the relation between the pairs of marked words. However, the GTS-BERT classification model has large label dimensions, text learning is not sufficient, and the grid label model does not fully exert its performance.

In summary, GTS-BERT solves most of the existing problems, but there are some problems and new problems to be solved, such as: the method has the advantages that semantic learning of the text is insufficient, the relative position of the text is not fully utilized, and the like, and in addition, a grid label scheme also has a space for improving so as to obtain a more accurate emotion analysis result of the text.

Disclosure of Invention

In order to solve the existing problems, the invention provides an aspect emotion analysis method for optimizing a grid label, which is an improvement on GTS-BERT, and the grid label method of GTS-BERT is based on a global pointer and a special version global pointer marking principle extracted aiming at an opinion triple: since the output is a two-dimensional matrix, corresponding to a one-dimensional sentence, the number from the number of words to the number of words of the marked sentence can be indicated by the number of columns in the marked number of rows. Therefore, the label of one position in the two-dimensional space can represent the label of one section of position in the sentence, the problems of label overlapping in one-dimensional space and the like are avoided, the two-dimensional label space of the global pointer is large and is used for the single task with sparse labels, so that the GTS is generated, the part-of-speech label and the text are combined together in a classified mode by the GTS, the part-of-speech label is very ingenious, the part-of-speech label is similar to the global pointer, the label is marked at the corresponding position, and the label of the focus of the two part-of-speech labels represents the emotional polarity of the part-of-speech labels.

On the basis of GTS-BERT, the aspect emotion analysis method for optimizing grid labels adopts 3 labels to treat sentences S = { W = ₁ ,W ₂ …W _n Mark, W _i Representing the ith word in the sentence S to be analyzed; wherein, the aspect terms and the opinion terms in the sentence are labeled by diagonal lines, the word pair information is labeled in upper triangle, and the tag set { A, O, POS, NEU, NEG, N } is adopted to label the word pair (W) in the sentence _i ,W _j ) The relationship between; in the aspect emotion triple extraction, four

codes

0,1,2,3 are adopted to represent the labels.

Optionally, the term of the aspect and the term of the opinion in the sentence are labeled by diagonal lines, including:

if the words on the diagonal line belong to the aspect terms, the label is A and the code is 1, and if the words on the diagonal line belong to the opinion terms, the label is O and the code is 2; the set of word pairs consisting of the aspect terms and the opinion terms are identified by their intersection points, the emotion polarity is represented by the encoding of the intersection points, the 1,2,3 is used to represent the emotion labels Neg, neu, pos, and the three labels represent the negative, neutral or positive relationships of the expression of the opinion triples, respectively.

Optionally, the method comprises a model training phase and an actual analysis phase; in the model training stage, a public data set or a self-collected data set is adopted to train the BERT model with a rotary position embedded in an optimized grid label coding layer to obtain an optimal model; in the actual analysis stage, the optimal model is utilized to perform aspect emotion analysis on the file to be analyzed; the method comprises the following steps:

the method comprises the following steps: preprocessing a text data sample in the public data set or the self-collected data set;

step two: inputting the preprocessed text into a BERT model to obtain semantic information with the text, wherein the semantic information is one-dimensional data; wherein the BERT model dropout value is set to 0.3;

step three: inputting the output of the BERT model into an optimized grid label coding layer for label labeling; wherein, the optimized grid label coding layer is embedded by adopting a rotating position;

step four: respectively solving cross entropy and KL divergence of two outputs of two-dimensional grid-form low-dimensional label codes output by an optimized grid label coding layer, and further solving final Loss;

step five: determining an optimal model through the F1 value;

step six: and obtaining a grid corresponding to the text to be analyzed by using the optimal model, and analyzing the grid obtained by the optimal model into a corresponding triple serving as an aspect emotion analysis result.

Optionally, the preprocessing the text data sample in the public data set or the self-collected data set includes:

step 1.1, making each text data sample into a unified form of a triple' text # # # # wherein the text is an original text, the format of the triple is a list form of a tuple formed by three kinds of information, and the tuple format is as follows: ([ 1,2], [3], 'NEG') indicates that the 1 st and 2 nd words are terms of art, the 3 rd word is an opinion term, and NEG indicates that the corresponding emotion classification is negative;

step 1.2, adopting a first expansion scheme or a second expansion scheme to expand a sample;

randomly replacing aspect terms, opinion terms and emotion categories in any two text data samples by the first expansion scheme to obtain a new data sample; and in the second expansion scheme, after any two text data samples are spliced, the aspect terms, the opinion terms and the emotion categories in the sentence are repositioned to obtain a new data sample.

Optionally, before inputting the preprocessed text into the BERT model in the second step, the method further includes:

converting the preprocessed text input into a token input format of BERT: [ cls ] + sentence + [ seq ], and then converted into corresponding token id [101 \8230; 8230102; 102); where [ cls ] represents the beginning of a sentence, which can be used as the semantic information of the whole sentence text in the final semantic information, [ seq ] represents the end of the sentence, which is the token format of the input text.

Optionally, the cross entropy calculation formula in step four is:

where Loss1 represents cross entropy, C represents the number of label categories, and the label set {0.1.2.3}, then C =4, n represents the input length, i.e., the number of words contained in the text data sample, y _ij K denotes that the value of the (i, j) position of the two-dimensional grid space is equal to k;

a predictor representing the (i, j) position model;

the predicted value of the (i, j) position model is also indicated.

Optionally, the calculation formula of the KL divergence in the fourth step is as follows:

Loss2＝KL(p||q)+KL(q||p)

wherein, loss2 represents the KL divergence, and p | q represents that the former p is the real distribution and the latter q is the fitting distribution. The specific formula of KL (p | | q) is:

optionally, in the fifth step, the F1 value refers to Macro-F1, and the calculation formula is as follows:

wherein, the first and the second end of the pipe are connected with each other,

TP indicates positive sample accurate, FN indicates positive sample error, FP indicates negative sample error, and TN indicates negative sample correct.

The invention has the beneficial effects that:

according to the method, on the basis of grid labels, the position sensitivity is enhanced through a RoPE rotary position coding scheme, the position perception and semantic understanding of the text are achieved, then the label coding dimensionality is compressed, the labeling efficiency is improved, and then a regular Dropout method is added into the model, so that the generalization capability of the model is improved. And finally, in order to further exert the effect of regular dropout, a data expansion scheme is added, so that the problem of error propagation of the model in a plurality of subtasks is solved, deep learning of data is improved, the output dimension is reduced, and the model training efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of an implementation of an aspect emotion analysis method for optimizing a grid tag according to an embodiment of the present invention.

FIG. 2 is a diagram of two exemplary data expansion schemes involved in an aspect emotion analysis method for optimizing grid tags according to an embodiment of the present invention.

Fig. 3 is an exemplary diagram of learning a word-pair relationship by using an 8-tag scheme in the prior art.

FIG. 4 is an exemplary diagram of an optimized grid tag in an aspect emotion analysis method for optimizing grid tags according to an embodiment of the present invention.

FIG. 5 is a diagram of a BERT model structure adopted by an aspect emotion analysis method for optimizing grid labels according to an embodiment of the present invention.

Fig. 6 is a block diagram of input and output modes of BERT downstream tasks.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The basic theory to which this application relates is first introduced as follows:

BERT model

BERT is a pre-trained model proposed by the Google AI institute in 2018, month 10. The overall designation of BERT is Bidirective Encoder reproduction from transformations. FIG. 5 is a schematic diagram of the BERT model structure.

The most core part of BERT adopts a Transformer architecture, and the Transformer is composed of a multi-layer multi-head attention mechanism and the most basic feedforward neural network. The principle of the attention mechanism is as follows: and multiplying the embedded layer (embedding) by the three initialized weight matrixes to obtain three new vectors which are named as Query (Q), key (K) and Value (V) respectively. Then Query, key do point multiplication and get a score value, represent to the attention degree of other parts of the input sentence, divide a constant to the result, then do a calculation of softmax, the result obtained is the correlation size of the word of each word to the word of the present position, later multiply value and above value, and add, the final result is that each word in the whole sentence is strong and weak with the correlation of the model result information, the concrete formulation is:

the BERT is an Encoder of a multi-layer Transformer, and the input Embedding is subjected to encoding conversion through the Encoder of a layer by layer and then is connected to different downstream tasks. In addition, the BERT adopts a bidirectional parallel input mode, namely, sentences are input into the model in a whole mode, rather than words are input one by one, so that the performance of the GPU can be fully utilized, and the operation efficiency of the model is greatly improved. Meanwhile, because parallel input can cause the loss of position information of words in the text, the BERT model additionally adds a position coding input to ensure that the position information is not lost. For different downstream tasks, BERT introduces two special symbols in the input layer, one is [ CLS ], and the two special symbols are placed at the forefront of the whole input text sequence; the other is [ SEP ], which is placed in the middle of two sentences and at the end of the second sentence for the task that is input as a sentence pair. CLS is used as the representation of the classification task corresponding to the final output hidden state, if the classification is a single sentence classification, the CLS represents the classification of the input sentence; if it is a sentence pair classification, it means that the two sentences are related/unrelated, similar/opposite meaning.

When the downstream task is done, the model is implemented by adding an extra output layer on the basis of BERT. Referring to fig. 6, the downstream task input/output scheme used in the present model is shown. The BERT model is a mature model, and reference is made to the BERT model for more detailed description:

https://baijiahao.baidu.com/sid＝1711778702724114051&wfr＝spider&for＝pc。

2. optimizing grid labels

In the method for analyzing an aspect emotion by optimizing a grid label, a word pair relationship learning method for extracting a triple of an aspect emotion in the prior art proposes an 8-label scheme, as shown in fig. 3, a term formed by one word is distinguished from a term formed by a plurality of words, and the effect is improved to a certain extent, but the scheme has a great problem theoretically, firstly, a label on a multi-word term, AR and OR labels are used for reluctantly labeling the position of each term, a label a and a label O should be a supplementary label, but only can be a longitudinal supplement, and a multi-word label can be seen as being complete only in the longitudinal direction from the whole, so that the symmetry of data is not satisfied, and meanwhile, different labels are used for labeling the same information and are redundant, and in addition, a problem exists, when a single word term is converted into token, the single word term may be segmented, for example: abundance- > abund and ane, which results in two kinds of multi-word term labeling, is disadvantageous to training. But the end result is somewhat improved, possibly in relation to the way other parts of the model are changed. Another problem with this approach is that the use of 8 tags can make training more difficult.

The first embodiment is as follows: the embodiment provides an aspect emotion analysis method for optimizing grid labels, and referring to fig. 1, the method includes a model training phase and an actual analysis phase; in the model training stage, a public data set or a self-collected data set is adopted to train the BERT model with a rotary position embedded in an optimized grid label coding layer to obtain an optimal model; and in the actual analysis stage, performing aspect emotion analysis on the file to be analyzed by using the optimal model.

Specifically, the method comprises the following steps:

step 1, text preprocessing: unifying the sample format and performing data expansion;

fig. 2 shows data samples, which are two original samples, an extended a-scheme sample and an extended b-scheme sample.

(1.1) making each data sample into a uniform format triple' text # # # # wherein the text is an original text and is not processed, the format of the triple is a list form of a tuple consisting of three kinds of information, and the tuple format is as follows: ([ 1,2], [3], 'NEG'). It means that the 1 st and 2 nd words are terms in terms of the aspect, the 3 rd word is term of the opinion, and NEG means that the corresponding emotion category is negative.

In tabular form. There may be multiple triplets in a text, the same aspect term may appear in multiple triplets, and the same opinion term may also appear in multiple triplets.

(1.2) there are two schemes for data expansion:

the scheme a is to randomly replace the aspect terms, the opinion terms and the emotion categories in the sentences to obtain new data samples. The operation flow is as follows:

(1.2.1) first, two samples were randomly selected

Sample 1: the quality is all good, you will com out of The company.

[([1],[5],'POS'),([1],[11],'POS')]

Sample 2: the price is reasonable as The service is poor.

[([1],[3],'POS'),([6],[8],'NEG')]

(1.2.2) randomly removing a triplet in the first sample:

The is also very,you will come out satisfied.

[([1],[11],'POS')]

(1.2.3) randomly selecting a triplet in the second sample:

price reasonable'POS'

(1.2.4) putting the corresponding vacant position of the first sample, and recalculating the subscript, wherein the subscript is the position information of the word, namely the word of the first word, and starting from 0. The calculation method is simple, the starting index of The first vacant position is equal to The length of The front part (The), and The starting index of The second vacant position is equal to The length of The front part (The) + The length of The first vacant position + The length of The middle part (is all) (in practical operation, the position relation between The term and The opinion term is considered).

The price is also very reasonable,you will come out satisfied.

[([1],[5],'POS'),([1],[11],'POS')]

And b, after the sentences are spliced, repositioning the aspect terms, the opinion terms and the emotion categories in the sentences to obtain new data samples. Compared with the scheme a, the scheme is simple to operate, text semantics are more reasonable, and the scheme is easier to think. But the flexibility is not great enough.

Step 2. Model part

(2.1) model input: and copying one part of data in each batch to be respectively used as two inputs of the model. Convert data to token input format for BERT before input: the [ cls ] + sentence + [ seq ] is converted into the corresponding token id [101 \8230; \8230102; ].

(2.2) model body: compared with the traditional BERT model training, the method has the advantages that the dropout value needs to be set to be 0.3, the effect that part of neurons are inactivated randomly each time in the training process is achieved, part of information of sentences is lost due to the fact that part of neurons are inactivated each time, 0.3 represents that the neurons are inactivated by 30%, the normal model is set to be 0.1 to be optimal, and the performance of the model is reduced due to the fact that the neurons are increased.

The preprocessed text is input into a BERT model to obtain semantic information with the text, and the semantic information is one-dimensional data.

(2.3) optimizing the grid label coding layer: rotational Position Embedding (RoPE for short) is adopted. Assuming that f (., t) represents a function with absolute position information of t, g (q, k, m-n) represents an inner product operation based on position, q, k represents any two words in the text, and m, n represent position information of q.k respectively, the RoPE rotational position code satisfies the following formula:

f(q,m) ^T f(k,n)＝g(q,k,m-n)；

let the expression of f (, t) be in the form of the following equation:

f(x,t)＝R _t ；

the derivation can be found as follows:

the joint formula can deduce that RoPE is expressed as:

i.e. the matrix R _t All four values of (a) contain position information.

(2.4) model output layer: the model output layer is an optimized grid label layer, and two-dimensional grid form low-dimensional label codes are output. The length of the grid is the length of the sentence, pairIn the sentence to be recognized S = { W = ₁ ,W ₂ …W _n And (4) totally containing n words, and extracting a triple mainly formed by extracting aspect terms, viewpoint terms and emotions from the aspect emotion triple.

FIG. 4 is a sample graph of optimized grid labels: the aspect terms and the opinion terms are labeled by diagonals, the word pair information is labeled in an upper triangle, and the tag set { A, O, POS, NEU, NEG, N } is adopted to label the word pair (W) in the sentence _i ,W _j ) The label is expressed by adopting four

codes

0,1,2,3 in the aspect emotion triple extraction. According to the lattice label structure of fig. 3, if a word on a diagonal belongs to an Aspect term, the label is a (Aspect), the code is 1, and belongs to an Opinion term, the label is O (option), the code is 2, and unlike the original GTS scheme, only the position of the diagonal is labeled, and the focus between each word in the Aspect term and the Opinion term is not labeled. A group of word pairs formed by the aspect terms and the opinion terms are identified by intersection points, emotion polarity is represented by encoding of the intersection points, 1,2,3 is adopted to represent emotion labels Neg, neu, pos, and three labels respectively represent negative, neutral or positive relations of opinion triple expression. Compared with 1 label identification, the distance between 2 and 0 is larger, the weight of loss is increased, and the effect is better.

Step 3.Loss part, the method of Loss is divided into two parts

(3.1) one is to calculate cross entropy, and the real value and the output value of two samples (which are the same sample) are respectively subjected to cross entropy, and the specific formula is as follows:

where C represents the number of label categories, i.e., the label set {0.1.2.3}, and n represents the input length.

(3.2) the other is to solve for KL divergence and find the divergence value using the different outputs obtained from the two same samples. The specific formula is as follows:

Loss2＝KL(p||q)+KL(q||p)；

p is the true distribution, the latter q is the fitted distribution,

the specific formula of KL (p | | q) is:

and (3.3) finally summing the values of the two methods to obtain the final Loss. The smaller the Loss is, the smaller the error of the model to the training set is, the smaller the error is, the more the error is reversely conducted to the previous network through the change of the Loss each time, and then the network is adjusted.

Loss＝Loss1+Loss2；

And 4, selecting an optimal model from the optimized model through the change of the F1 value of the test set, wherein the model is more optimal when the F1 value is larger (the premise is that the test set is uniformly distributed, and the data in the test set does not participate in training).

In order to evaluate the effectiveness of the experiment and the performance among different methods, the invention uses the widely used accuracy (precision, P for short), recall (Recall, R for short) and Macro-F1 (F1 for short) values in the aspect emotion analysis as evaluation indexes, and the result is regarded as accurate only when the prediction result is completely consistent with the label, namely the correct emotion triple is extracted. The larger the index value is, the better the model effect is.

The calculation formulas of the accuracy, the recall rate and the F1 value are as follows:

wherein TP represents positive accurate, FN represents positive error, FP represents negative error, and TN represents negative accurate.

If 1 is the positive sample result and 0 is the negative sample result, then 1 is predicted to be 1, i.e. the positive sample prediction is correct; similarly, a 1 prediction is 0, which is a positive sample prediction error; a 0 prediction is 1 is a negative sample error; a 0 prediction of 0 is that a negative sample is correct.

Step 5, result prediction: and analyzing the finally predicted grids into corresponding triples by using the optimal model running test data.

The effects of the present invention can be further illustrated by the following experiments.

1. Simulation conditions

In order to verify the identification accuracy of the optimized network tag scheme on the ASTE task, a mainstream public data set of the ASTE and a data set of the SemEval competition are adopted in the experiment, a plurality of versions exist on the network, the data quantity is slightly different, but the test result is almost the same, and the most common D is adopted in the method _20a Carrying out experimental verification, D _20a The public data sets including four 2014-2016 SemEval competitions, 14res,15res,16res, represent restaurant-oriented emotion data sets published in 14, 15, 16 years, respectively, 14lap represents notebook-oriented emotion data sets published in 14 years, each piece of data consists of one review sentence and one to a plurality of triples (aspect terms, viewpoint terms, emotion polarities ('POS', 'NEU', 'NEG')) information in the extracted sentence, as shown in the following table:

TABLE 1D used in the experiments _20a Data set

To verify the feasibility and effectiveness of the algorithm proposed by the present invention, OGTS-BERT and Peng-two-stage (see Huang L, wang P, li s., "First target and option the polarity: enhancing target-optimization for aspect sensitive triple extract [ J ]. ArXiv prediction arXiv:2102.08549, 2021), dual-MRC (see Mao Y, shen Y, yu C, A joint training Dual-mr frame for estimated based sensory analysis [ J ]. ArXiv prediction arXiv:2101.00816, 2021), BARTABSSA (see Yan H, dai J, qiu X. A unified genetic work for estimated sensory based sensory analysis [ J ]. ArXiv prediction arXiv:2106.04300, 2021) and GTS-BERT (Wu Z, ying C, zoo F. Grid sampling for estimated fine-grained analysis [ J ]. 2020. Xpriv.04640).

Wherein Peng-two-stage aspect emotion extraction is proposed, and the first stage generates candidate aspect terms with emotion polarity and opinion items by utilizing the mutual influence between aspects and opinion items. The second stage pairs the correct facet terms with the opinion terms. The Dual-MRC connects two BERT models to learn left and right, learns the aspect terms of the text on the left, learns the viewpoint terms and the emotion polarities of the text on the right, shares parameters, and repeats the input on the left to keep connection with the input on the right to realize the joint training and the extraction of the triple tasks. The BARTABSA utilizes a sequence-to-sequence model BART to realize an end-to-end aspect emotion analysis task, and is characterized in that the thinking of the previous classification task is changed, and the generation mode is firstly used for realizing. The original GTS-BERT uses a grid label method to mark the relationship between word pairs, and converts multiple tasks into ingenious grid labeling tasks.

In the whole experimental process, the batch of the model is set to be 16, the weight attenuation rate of the optimizer is 0.1, the dropout is 0.3, and the learning rate is 2e-5. The experimental operating environment is as follows: the operating system is Ubuntu-16.04, the kernel is Linux 4.4.0-31-generic, the architecture is x86-64, the GPU is GeForce GTX 2080Ti, and the programming language is Python3.6.8. Kares-2.2.4, tensorflow-1.15.0 was used as the development environment. The invention adopts the accuracy rate, recall rate and F1 value which are widely used in emotion analysis in common use aspects as evaluation indexes, and the larger the index value is, the better the model effect is represented.

2. Simulation result

The results of comparing the OGTS-BERT of the method of the invention with two tasks of Peng-two-stage, dual-MRC, BARTABSA and original tag scheme GTS-BERT and D20a data sets are shown in the following table 2 and table 3:

TABLE 2 at D _20a Triple extraction task comparison experiment result on data set

TABLE 3 at D _20a Binary extraction task comparison result on data set

As can be seen from table 2, on the four data sets of D20a, each index extracted by the triple of OGTS-BERT of the method proposed by the present application is superior to the comparison model in most cases, which indicates that the method proposed by the present application integrates the dimensions of the compressed tag of the RoPE position code, and the model after enhancing the generalization capability by using the regular Dropout has a good improvement effect on the aspect emotion analysis task, and the accuracy, recall rate, and F value of the method proposed by the present application are improved by 1.4 percentage points, 6.5 percentage points, and 4 percentage points on average as compared with the original GTS-BERT scheme, and it can be seen that the improved method mainly improves recall rate, wherein the recall rate on the rest15 task is improved by more than 11%, and the model significantly improves the example recognition capability; each index extracted from the binary group in table 3 is also greatly improved, compared with the ternary group, the improvement of the model is not concentrated on the recall rate, and the accuracy, the recall rate and the F value are improved by 3.5 percent, 3.4 percent and 3.5 percent on average compared with the original GTS-BERT scheme. Compared with other models, the method provided by the application is improved in each index and is superior to a comparison experiment and the original GTS-BERT, which shows the effectiveness of the improved position-based coding scheme and network model to the experiment.

Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. An aspect emotion analysis method for optimizing grid labels is characterized in that a sentence S = { W } to be analyzed is processed by adopting 3 labels ₁ ,W ₂ …W _n Mark, W _i Representing the ith word in the sentence S to be analyzed; wherein, the aspect terms and the opinion terms in the sentence are labeled by diagonal lines, the word pair information is labeled in upper triangle, and the tag set { A, O, POS, NEU, NEG, N } is adopted to label the word pair (W) in the sentence _i ,W _j ) The relationship between; in the aspect emotion triple extraction, four codes 0,1,2,3 are adopted to represent the labels.

2. The method of claim 1, wherein the aspect terms and the opinion terms in the sentence are labeled by diagonals, comprising:

if the words on the diagonal belong to the aspect terms, the label is A and the code is 1, and if the words on the diagonal belong to the opinion terms, the label is O and the code is 2; a set of word pairs consisting of an aspect term and an opinion term are identified by their intersection points, emotion polarity is represented by the encoding of the intersection points, 1,2,3 is used to represent emotion tags Neg, neu, pos, which represent the negative, neutral and positive relationships, respectively, of an opinion triplet representation.

3. The method of claim 2, wherein the method comprises a model training phase and an actual analysis phase; in the model training stage, a public data set or a self-collected data set is adopted to train the BERT model with a rotary position embedded in an optimized grid label coding layer to obtain an optimal model; in the actual analysis stage, performing aspect emotion analysis on a file to be analyzed by using the optimal model; the method comprises the following steps:

step two: inputting the preprocessed text into a BERT model to obtain semantic information with the text; wherein the BERT model dropout value is set to 0.3;

step three: inputting semantic information with texts into an optimized grid label coding layer for label labeling; the optimized grid label coding layer is embedded by adopting a rotating position;

step five: determining an optimal model through an F1 value, wherein the F1 value is an evaluation method of a test set result;

4. The method of claim 3, wherein preprocessing the text data samples in the public data set or the self-collected data set comprises:

5. The method of claim 4, wherein the step two of inputting the preprocessed text into the BERT model further comprises:

converting the preprocessed text input into a token input format of BERT: [ cls ] + sentence + [ seq ], and then converted into corresponding token id [101 \8230; \8230102; where [ cls ] represents the beginning of a sentence, which can be used as semantic information of a whole sentence text in the last semantic information, [ seq ] represents the end of the sentence, which is the token format of the input text.

6. The method of claim 5, wherein the cross entropy calculation formula in step four is:

a predictor representing the (i, j) position model;

the predicted value of the (i, j) position model is also represented.

7. The method according to claim 6, wherein the KL divergence calculation formula in step four is as follows:

Loss2＝KL(p||q)+KL(q||p)

wherein, loss2 represents the KL divergence, p is the true distribution, and q is the fitted distribution.

8. The method of claim 7, wherein the F1 value in step five is Macro-F1 and is calculated by the formula:

wherein the content of the first and second substances,