CN113723075B

CN113723075B - Specific target emotion analysis method for enhancing and resisting learning by fusing word shielding data

Info

Publication number: CN113723075B
Application number: CN202110999219.7A
Authority: CN
Inventors: 刘小洋; 代尚宏; 高绿苑
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2021-08-28
Filing date: 2021-08-28
Publication date: 2023-04-07
Anticipated expiration: 2041-08-28
Also published as: CN113723075A

Abstract

The invention provides a specific target emotion analysis method for enhancing and counterlearning by fusing word shielding data, which comprises the following steps of: s1, synonym replacement and random word insertion are carried out on sentences by using a target entity shielding mode to generate effective samples and the effective samples are fused with original samples, and therefore word shielding data enhancement is achieved; s2, constructing a BERT-BASE-based confrontation learning specific target emotion classification model, and training the emotion classification model by using a clean sample and a confrontation sample together to enable the model to have a confrontation defense function; and S3, respectively carrying out counterstudy on the original sample and the sample subjected to data enhancement. The invention adopts data enhancement and confrontation training, has stronger robustness and can obtain better results.

Description

Specific target emotion analysis method for enhancing and counterlearning of fused word shielding data

Technical Field

The invention relates to the field of natural language processing, in particular to a specific target emotion analysis method for enhancing and resisting learning by fusing word shielding data.

Background

With the rapid development of social media (Microblog, twitter, facebook, etc.), sentiment analysis has become an extremely important task. A specific target Sentiment Analysis (ABSA) is a basic task in the field of text classification, and aims to analyze fine-grained Sentiment tendency of text data of an online social network by utilizing a deep learning and Natural Language Processing (NLP) technology, so that a user can clearly know Sentiment Polarity (Sentiment Polarity) and Attitude (attribute) hidden by a specific entity (Aspect) of online social network comment data. One sentence contains one or more entities, each entity has different emotional polarity, for example, a comment "Great food but the service waters dreadful! ", the emotional polarity of the entity" food "is" positive ", and the emotional polarity of the entity" service "is" negative ". Compared with sentence-level emotion analysis, the ABSA can display more accurate emotion key information of fine-grained entities for users.

The emotion analysis task is remarkably successful by adopting machine learning and deep learning. For example, kiritchenko and others adopt a Machine learning method to construct an artificial feature extraction model, and train an emotion classification model by using an extracted feature through a Support Vector Machine (SVM), but the artificial feature extraction is complicated and has low efficiency. In order to solve the complexity of manually extracting the features, a deep learning method is adopted to automatically extract more complex deep features. For example, li et al use an adaptive recurrent neural network to transform the dependency tree for different targets in the sentence, resulting in multiple different feature combining functions to train the model using the neural network. Because sentences have sequence, many models adopt Long Short-Term Memory networks (LSTMs) to extract Long-Term dependency information of the sentences, and Tang et al adopt two LSTMs to splice context feature vectors of specific target entities to obtain emotion classification models, but emotion information of Long-distance words can be lost. To be able to obtain textual long-range feature information, bahdana et al first applied the attention mechanism to natural language processing, and then many researchers introduced the attention mechanism into the emotion analysis task. Wang et al uses LSTM and attention weighting to obtain sentence expression vectors, tang et al adds feature word and target entity relative distance information to the attention mechanism, and uses multi-attention to obtain final entity expression. Chen et al constructs a RAM (Current Attention on Memory) structure to capture context semantic information and focus Attention, and fuses important features in long difficult sentences through a multi-Attention mechanism.

Because the parallelism of a Recurrent Neural Network (RNN) is not high, a transform structure is designed by Vaswani and the like, the RNN structural idea is completely abandoned, the parallelism is improved, a self-attention and multi-head attention mechanism is adopted, and position embedded information is added to help a model to understand the sequence of a language, so that long-distance dependency characteristic information can be captured better. Devlin et al designed bi-directional Encoder representation (BERT) with the Encoder part in the Transformer structure, which exhibited superior results on the text classification task, while being significantly improved on the ABSA task over other models.

However, the deep learning model is vulnerable to the attack of resisting samples (adaptive applications), and the recognition effect of the model after the attack of resisting samples on the ABSA task can be seen from fig. 1. Recent research shows that the robust neural network model can be constructed by resisting sample training, so that the robustness of the model is improved. The confrontation learning process is to input the confrontation sample into the model to continue learning from the input sample through the confrontation sample with less disturbance generated by the gradient. In the field of natural language processing, a traditional text countermeasure sample is obtained by perturbing a word or a sentence, for example, eger and the like selects a neighboring character of each character to perform shape-approximating word replacement, and Jin and the like replace an original word by a greedy-based word method. There is an antagonistic defense against attacks, for example, goodfellow et al propose a Fast Gradient Method (FGM) antagonistic training that outperforms other baseline methods on the text classification task.

Data enhancement and confrontation training has not been applied to the ABSA task in the above studies, and all of them are based on a model to improve the effect. For the currently disclosed ABSA data sets, the sufficient generalization capability and model robustness and model efficiency of the model are difficult to achieve.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a specific target emotion analysis method for enhancing and resisting learning by fusing word shielding data.

In order to achieve the above object, the present invention provides a specific target emotion analysis method for enhancing and counterlearning by fusing word masking data, comprising the following steps:

s1, synonym replacement and random word insertion are carried out on sentences by using a target entity shielding mode to generate effective samples and the effective samples are fused with original samples, so that word shielding data enhancement is realized;

s2, constructing a BERT-BASE-based confrontation learning specific target emotion classification model, taking the data fused in the S1 as the input of the model, and training the emotion classification model by using a clean sample and a confrontation sample together;

and S3, finally obtaining a specific target emotion analysis result, so that the model has the function of confrontation and defense.

Further, the method also comprises the following steps:

s4, respectively carrying out countermeasure learning on the original sample and the data-enhanced sample, and evaluating through evaluation indexes; the evaluation index includes: accuracy and/or F1 values.

Further, the calculation method of synonym replacement in step S1 is:

wherein S is _Sr Representing data after synonym replacement;

F _SR () represents a synonym replacement data enhancement function;

S _In an input representing an original corpus;

is the ith word of an original sample;

aspect represents a specific target entity;

rep (·) represents a word replacement function;

indicating the id word needing to be replaced;

id _Sr a location representing a word replacement;

indicates that based on the ith word->

Randomly finding num in Wordnet library ₁ A synonym;

| A = means not equal.

Further, the calculation method of randomly inserting words in step S1 is:

wherein S is _Ri Representing the data after random insertion;

F _RI (. To) represents a random insertion data enhancement function;

insert (·) denotes inserting a word after the id's word;

indicating the id-th word needing to be inserted;

id _RI representing a previous position in the sentence where the word is to be inserted;

ran (Wordnet, num) means to randomly find num in Wordnet library ₂ A word.

Further, the S2 includes:

enhanced data Da _Out As a clean sample, da _Out ＝S _Sr ∪S _Ri (ii) a For each batch of clean samples, the clean sample is first used to generate the opposing perturbation r of the word embedding layer _adv Thereby generating a challenge sample; the Adv-BERT model performs each batch training of the challenge samples and each batch training of the clean samples using BERT.

Further, the loss function for each batch training of the clean sample is calculated as follows:

wherein L is _clean (. Represents the loss function of a clean sample, N _batch Denotes the size of a batch, theta denotes the neural network parameter, p (y) _i |E _i ,aspect _i (ii) a Theta) represents the emotion prediction probability function of the ith sample in one batch;

the penalty function for each batch of challenge samples is calculated as follows:

wherein L is _adv (. Represents a loss function of challenge samples, N _batch Denotes the size of a batch, theta denotes the neural network parameter, p (y) _i |E _adv(i) ,aspect _i (ii) a Theta) represents the ith confrontation sample emotion prediction function.

Further, still include:

minimize the loss function for each batch of clean and challenge samples:

where L (-) represents the model loss function,

represents the value of the model parameter theta, L, when the loss function is minimized _clean (θ) represents the clean sample per batch penalty function, L _adv (θ) represents the loss function for each batch of challenge samples.

Further, the hidden layer of the BERT model adopts a gaussian error linear unit as an activation function:

wherein gelu (·) represents a gaussian error linear unit, theta represents a neural network parameter, and tanh is a hyperbolic tangent function.

Further, the antagonistic learning includes:

applying the antagonistic learning in the ABSA task, adding the antagonistic disturbance in an embedding layer of the model, wherein the probability that the emotion of the target entity aspect is y in one sentence is p (y | S) _BertIn Aspect), thus embedding of the modelThe loss function after layer addition against perturbation is as follows:

-logp(y|E _w +r _adv ,aspect；θ) (1)

wherein

p(y|E _w +r _adv Aspect; theta) represents the addition of the antagonistic disturbance r _adv Emotion prediction probability of r _adv Representing the counterdisturbance, r represents an counterdisturbance to the input, α represents a disturbance scaling factor, | | · | | represents a norm, argmin represents the r variable that minimizes the objective function, and then assigns this r value to radv,

representing the predicted probability after adding the perturbation r.

Further, the antagonistic learning further includes:

finding out the confrontation perturbation by using a fast gradient descent method, calculating the confrontation perturbation by using back propagation in a neural network, adding the confrontation perturbation and the word vector of the original embedded layer to obtain a confrontation sample, and obtaining the confrontation perturbation r _adv The calculation is as follows:

wherein

The challenge sample loss function for ABSA is therefore as follows:

wherein the confrontation sample E _adv Is represented as follows:

where α represents the anti-perturbation scaling factor, g _w Represents the gradient of the word embedding layer in the model, | · | | non-woven phosphor ₂ Represents a two-norm; ^ represents a gradient operator, E _w The word-embedding tensor representing the clean sample,

the probability of a clean sample emotion prediction is represented, aspect denotes a specific target entity>

A constant set representing current parameters of the neural network classifier; lambda [ alpha ] _i (.) represents the evaluation of a characteristic value of the matrix, which is greater or less than>

Denotes g _w The conjugate transpose of (1); n denotes the total number of samples, p (y) _i |E _adv(i) ,aspect _i (ii) a θ) represents the prediction probability of the ith challenge sample, y _i True tag representing the ith sample, E _adv(i) Embedding layer tensor, aspect, representing the ith countermeasure sample _i Representing the ith specific target entity, and theta represents a neural network parameter;

r _adv representing opposition to disturbance, E _seg Word-embedded tensor, E, representing clean samples _pos The position representing a clean sample embeds the tensor,

represents a tensor addition, based on the sum of the partial values>

Word embedding representing the 1 st word of a sample,/>

antagonistic perturbations embedded in a sample corresponding to word 1>

Word embedding, in-dash, representing the 2 nd word of a sample>

A sample corresponding to a 2 nd word embedded counter perturbation>

Word embedding, representing the i +1 th word of a sample, is present>

One sample corresponds to the counter perturbation embedded by the i +1 st word,

word embedding, which represents the nth word of a sample, is present>

A sample corresponds to the counter-perturbation embedded in the nth word>

Word embedding, representing the (n + 1) th word of a sample>

One sample corresponds to the n +1 th word-embedded counter perturbation.

In summary, due to the adoption of the technical scheme, the invention has the advantages that:

(1) Selecting BERT-BASE as a reference model, carrying out experiments on two data sets of Laptop and Restaurant, providing a text word shielding data enhancement method (synonym replacement keeping unchanged semantics and syntax structure, random insertion of emotion vocabulary and the like) based on the existing specific target emotion analysis corpus, and constructing an effective specific target emotion analysis new corpus;

(2) A data enhancement confrontation training method is provided, a generated new corpus (the original corpus and the data enhancement corpus are fused) is input into a model, and then fine confrontation disturbance is carried out on a word embedding layer of the model to obtain a specific target emotion analysis model with strong robustness;

(3) The specific target emotion classification models of the provided fusion word shielding data enhancement and the countermeasure learning are used for verifying that the data enhancement samples can effectively improve the performance of the models respectively, the performance of the models can be improved only by using the original data for the countermeasure training, and finally, the two data characteristics are fused for the countermeasure learning to obtain a better result.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a diagram of a deep learning model for countering attack emotion recognition in the prior art.

FIG. 2 is a schematic diagram of the network structure of the WMDE-AL model of the present invention.

FIG. 3 is a diagram of the confrontational training Accuracy of the present invention at different sizes α.

FIG. 4 is a diagram illustrating the magnitude of the increase in the resistance training evaluation index of different sizes α of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

1. Related work

ABSA is also called fine-grained emotion analysis and is a research hotspot at home and abroad all the time. The task mainly works to determine the emotion classification of a specific target entity in each sentence, and although the current research adopts a deep learning method to obtain a better-performance result, the problems that less training data causes weak generalization capability, the accuracy and the robustness of a model cannot be compatible and the like exist.

In the traditional deep learning method, a complex Neural Network structure is adopted to extract features, and for the problem of text classification, a Recurrent Neural Network (RNN), a long-short term memory Network (LSTM) and the like are adopted to obtain context characteristic information, but the traditional Neural networks do not consider specific target entities. To add a specific target entity to the signature coding, tang et al propose that TD-LSTM encodes the upper and lower context of the target entity with two LSTMs, respectively, in order to take the target entity into account in the signature coding. Wang et al propose to use an attention mechanism to obtain important information of the target entity after LSTM encoding. Chen et al propose that RAM utilizes a multi-attention mechanism to capture target entity feature information in long difficult sentences, solving the problem of contextual distraction. However, the models do not consider syntactic constraints and distant word dependence problems, which leads to misjudgment of the target entity emotion. Zhang et al propose that AS-GCN build a graph convolution neural network on the sentence dependency tree to obtain syntactic information and word dependencies. Karimi et al propose a BERT confrontation training architecture, fine-tune BERT models using confrontation training, and improve neural network generalization capability.

Antagonistic learning refers to training a neural network with an antagonistic sample to achieve antagonistic defense. Goodfellow et al propose FGM to resist disturbance of the embedded layer of the LSTM model by means of an image classification countertraining method, and perform countertraining on the obtained countersamples, but the counterdisturbance gradient is not constrained. Sato et al propose an iAdvT-Text confrontation learning method to generate confrontation samples on a word embedding layer, constrain the change direction of the gradient according to the distance of the word embedding layer, and finally improve the generalization capability of the model through confrontation learning. Li and the like propose a fine-grained virtual confrontation training method, which introduces character level confrontation disturbance to improve the initialization of confrontation training, and solves the problem of disturbance constraint by forcibly constraining the disturbance size through character level standardization. In order to overcome the problems of inconsistent semantics and unsmooth language of the generated confrontation samples, li and the like propose a BERT-ATTACK confrontation sample generator, firstly searching words of which the input sequence is easy to ATTACK, then generating substitute words of the words easy to ATTACK by the BERT, and generating the fluent and reasonable confrontation samples by utilizing the advantage that the BERT can capture context semantics. The above studies may improve model robustness, but there is a lack of research to improve model accuracy with antagonistic training. Xie et al propose an AdvProp confrontation training method, and use confrontation samples and clean samples to train together in an image classification task to solve the problem of uneven feature distribution, so that the image confrontation samples are verified to improve the classification accuracy.

The text data enhancement idea is derived from the image domain, but unlike image data enhancement, data enhancement is applied to prevent network overfitting when the dataset is small. The commonly used data enhancement modes in the NLP task comprise translation, synonym replacement, sentence abbreviation and the like, and recent research shows that text data enhancement can improve the performance of the NLP task. For example, zhu et al propose to automatically generate relevant non-answered questions according to the answered questions, the original text and the answers, and further improve the performance of the reading understanding system as a data enhancement method. In addition to data enhancement in reading comprehension, data enhancement is also required in text classification. Wei et al propose four data enhancement techniques of synonym replacement, random insertion, random exchange, and random deletion, which modify the original text only and do not modify the data labels. If the modified semantics have changed, then the data is invalid.

2. Proposed method

The definitions of the symbols used in the patent formula and model of the present invention are shown in table 1. The frame diagram of the Word-masking Data Enhancement and adaptive Learning model (WMDE-AL) provided by the present invention is shown in fig. 2. WMDE-AL references a simple text data enhancement method and a text countermeasure training, and enhances a specific target corpus by improving the simple data enhancement method.

Definition of all symbols in the model of Table 1

/>

Fig. 2 includes two modules, word-masking Data Enhancement (WMDE) and counterlearning (AL). (1) WMDE module samples S of original corpus _In Carrying out synonym replacement (constraint condition: keeping sentence smooth and semantic unchanged) and random insertion (constraint condition: keeping sentence structure unchanged) by shielding aspect to carry out data enhancement, and then combining the generated data and the original data to obtain input S of BERT _BertIn (ii) a (2) The AL module is combined with the BERT model and the Adv-BERT model to simultaneously learn the characteristics of the clean sample and the confrontation sample, and the problem of uneven distribution of the characteristics of the samples is solved.

3.1ABSA antagonistic learning

Counterlearning is a method to improve model robustness in the classification problem, with the goal of adding counterdisturbance to the raw data to minimize the maximum error classification optimization parameter θ. Applying the antagonistic learning in the ABSA task, adding the antagonistic disturbance in an embedding layer of the model, and assuming that the probability that the emotion of the target entity aspect is y in one sentence is p (y | S) _BertIn Aspect), the loss function after the embedded layer of the model is added to resist disturbance is as follows:

-logp(y|E _w +r _adv ,aspect；θ) (1)

wherein

/>

Where p (y | E) _w +r _adv Aspect; theta) represents the addition of the antagonistic disturbance r _adv Emotional prediction probability of E _w Word-embedding tensor, r, representing clean samples _adv Representing the countermeasure disturbance, aspect representing a specific target entity, theta representing a neural network parameter, r representing an countermeasure disturbance to the input, alpha representing a disturbance scaling factor, | | · | | | representing a norm, argmin representing an r variable that minimizes the target function, and then assigning this r value to radv,

representing the predicted probability after adding the disturbance r; the main meaning of equation (2) is to randomly add a perturbation to the sample to minimize the inverse of the loss function, i.e. to find the final perturbation variable radv in case of maximizing the loss function.

In order to solve the above minimization problem, the worst sample interference minimization loss function is sought, the fast gradient descent method is used to find the counterdisturbance, the counterdisturbance can be calculated by back propagation in the neural network, then the counterdisturbance can be added with the word vector of the original embedding layer to obtain the countersample, and the counterdisturbance r can be calculated by back propagation in the neural network _adv The calculation process is as follows:

wherein

The antagonistic sample loss function for ABSA is therefore as follows:

wherein the confrontation sample E _adv Is represented as follows:

where α represents the anti-perturbation scaling factor, g _w Represents the gradient of the word embedding layer in the model, | · | | non-woven phosphor ₂ Represents a two-norm; v represents the gradient operator, E _w The word-embedding tensor representing the clean sample,

representing the probability of a clean sample emotion prediction, aspect represents a particular target entity, </or >>

Denotes g _w The conjugation transpose of (1); n denotes the total number of samples, p (y) _i |E _adv(i) ,aspect _i (ii) a θ) represents the prediction probability of the ith challenge sample, y _i A true tag representing the ith sample, E _adv(i) Embedding layer tensor, aspect, representing the ith countermeasure sample _i Representing the ith specific target entity, and theta represents a neural network parameter;

r _adv representing opposition to disturbance, E _seg Word-embedding tensor, E, representing clean samples _pos The position representing a clean sample is embedded in the tensor,

represents a tensor addition, based on the sum of the partial values>

Word embedding, representing the 1 st word of a sample, is present>

A sample corresponds to the counter-perturbation embedded in the 1 st word>

Word embedding, representing the 2 nd word of a sample, is present>

A sample corresponds to the counter-perturbation embedded in the 2 nd word>

Word embedding, representing the i +1 th word of a sample>

One sample corresponds to the counter perturbation for the i +1 th word embedding,

word embedding, which represents the nth word of a sample, in->

A sample corresponding to an n-th word embedded counter perturbation>

Word embedding, representing the (n + 1) th word of a sample, is present>

One sample corresponds to the n +1 th word embedding counter perturbation.

Through the countermeasure training method, the loss function of the countermeasure sample and the characteristics of the countermeasure sample, namely the characteristics of the countermeasure sample extracted through the model, can be obtained, whether the robustness and the accuracy of the model can be improved through the combined characteristic distribution of the clean sample and the countermeasure sample is researched, and how to extract effective characteristics is the main work of the invention. Next, how to solve the problem of the non-uniform feature distribution by means of Adv-BERT will be described.

3.2 the WMDE-AL model proposed

For a small data set, data enhancement is the simplest strategy for improving feature diversification, so that synonym replacement and random insertion are used for data enhancement, in order to keep target entities in sentences unchanged, a WMDE method is adopted for data enhancement, and the statistics of the enhanced samples are shown in Table 2.

F _SR The equation is:

wherein S _Sr Representing data after synonym replacement, F _SR (. Represents a synonym replacement data enhancement function, S _In Representing the input of an original corpus of material,

is the ith word of an original sample, aspect represents a particular target entity, rep (-) word replacement function, and/or>

Indicating the id-th word, id, that needs to be replaced _Sr Indicating the location of the word replacement(s),

indicates that based on the ith word->

Randomly searching num in Wordnet library ₁ A synonym; | A = means not equal.

F _RI The equation is as follows:

wherein S _Ri Representing data after random insertion, F _RI (. Cndot.) denotes a random insertion data enhancement function, insert (. Cndot.) denotesThe word is inserted after the ith word,

indicating the id-th word, id, that needs to be inserted _RI Indicating the previous position in the sentence where the word is to be inserted, ran (Wordnet, num) indicates randomly finding num in Wordnet ₂ A word.

Table 2 sample statistics after data enhancement

Enhanced data Da _Out As a clean sample, da _Out ＝S _Sr ∪S _Ri (ii) a For each batch of clean samples, the clean sample is first used to generate the opposing perturbation r of the word embedding layer _adv Thereby generating a confrontation sample, and then training the confrontation sample by using Adv-BERT. Each batch training of clean samples was performed using BERT, and each batch training of challenge samples was performed by Adv-BERT. Where each batch penalty function for a clean sample is calculated as follows:

wherein L is _clean (. Represents the loss function of a clean sample, N _batch Denotes the size of a batch, p (y) _i |E _i ,aspect _i (ii) a Theta) represents the emotion prediction probability function of the ith sample in a batch, y _i A true tag representing the ith sample, E _i Embedding layer tensor, aspect, representing the ith clean sample _i Represents the ith clean sample specific target entity, theta represents a neural network parameter,

wherein L is _adv (. Represents a loss function of challenge samples, N _batch Denotes the size of a batch, theta denotes the neural network parameter, p (y) _i |E _adv(i) ,aspect _i (ii) a Theta) represents the ith challenge sample emotion prediction function, y _i A true tag representing the ith sample, E _adv(i) Embedding layer tensor, aspect, representing the ith countermeasure sample _i Representing the ith clean sample specific target entity, and theta represents a neural network parameter;

the loss function for each batch of two samples is finally minimized:

where L (-) represents the model loss function,

represents the value of the model parameter theta, L, when the loss function is minimized _clean (θ) represents the clean sample per batch penalty function, L _adv (θ) represents a loss function for each batch of challenge samples;

according to the invention, a specific target emotion classification of BERT-BASE is selected as a baseline, an individual data enhancement mode, an individual confrontation learning mode and a data enhancement confrontation learning mode experiment are respectively carried out, and compared with a BERT-BASE reference model, because an embedding layer of the BERT model has three vectors which are respectively Word embedding (Word embedding), segment embedding (Segment embedding) and Position embedding (Position embedding). In the experiment, counterattack is only carried out aiming at word embedding, so that a word embedding countersample is generated, and the other two embeddings are not changed. WMDE-AL Algorithm 1 shows:

the algorithm 1 comprises a WMDE function and an AL function, wherein the WMDE function describes a text word shielding data enhancement algorithm process, and the AL function describes an ABSA antagonistic learning process.

4. Analysis of Experimental results

4.1 preparation of the experiment

(1) Data set: the patent experiment of the invention uses two data sets of Laptop and Restaurant in SemEval2014, a specific target entity has four emotion polarities of positive, neutral, negative and conflict, as the proportion of the conflict polarity is small, the preprocessing is carried out by using a method for removing conflict polarity corpora of other researchers for reference, and the statistics of the number of the three emotion polarities of each data set is shown in Table 2. The dataset was tokenizer participled using a bertocher in a Pytorch-transformations tool and data enhancement using a Wordnet thesaurus in an NLTK tool.

(2) And (3) resisting the attack: the invention uses FGM as an anti-attack method, selects alpha with different sizes to carry out FGM attack, uses FGM to carry out anti-disturbance on a word embedding layer of a BERT model to generate an anti-sample, and then carries out anti-sample training through Adv-BERT.

(3) A reference model: the patent of the present invention uses BERT-BASE (L =12,h =768,a =12,total parameters = 110m) as the reference model of ABSA, where L denotes the number of network Hidden Layers (Numbers of Hidden Layers), H denotes the Size of network Hidden Layers (Numbers Size), and a denotes the number of Self-Attention Heads (Numbers of Self-Attention Heads). The activation function of the hidden layer of the BERT model adopts a Gaussian Error Linear unit (gelu), and the calculation formula is as follows:

wherein gelu (·) represents a gaussian error linear unit, θ represents a neural network parameter, and tanh represents a hyperbolic tangent function;

(4) Experimental environment and hyper-parameter settings: the patent experiment of the invention is realized by using a GPU (GeForce RTX 3090), a 24G video memory and a PyTorch 1.8.1 framework. The hyper-parameter settings are shown in table 3.

TABLE 3 Experimental hyper-parameter statistical table

Parameter(s)	Value
		Batch (batch) size	16
Learning rate	2e-5
		Anti-disturbance scaling factor alpha	α∈[0.01,0.09]
L2 regularization	0.01
		Dropout rate	0.1
Initialization device	xavier_uniform_
		Optimizer	adam
Number of training sessions	5
		Maximum sequence length	128

4.2 analysis of results

In order to verify the effectiveness of the proposed WMDE method, data enhancement is respectively carried out on a Laptop data set and a Restaurant data set through experiments, synonym replacement for shielding specific target words and random word insertion are carried out on each sentence to generate two pieces of enhanced data, then original data are combined and input into a BERT-BASE model to carry out specific target emotion classification, and the comparison of the WMDE method with other models and the comparison of the WMDE method with the performance of the BERT-BASE reference model are given in table 5. When the reference model is BERT-BASE, the emotion classification accuracy after the Laptop data set WMDE is enhanced is 79.00%, and the emotion classification accuracy is improved by 2.35% compared with the original data training accuracy of 76.65%. Similarly, the emotion classification accuracy rate after the enhancement of the Restaurant data set WMDE is 84.38%, which is improved by 0.36% compared with the original data training accuracy rate of 84.02%. Experimental results show that WMDE data enhancement is carried out on Laptop and Restaurant aspect-level emotion classification data sets to generate new training data, the model can be effectively improved, and the Laptop small data set is good in improving effect.

Through the comparison of the experimental results, in the WMDE method, synonym replacement is performed through a word shielding method, aspect is kept unchanged, the replaced parts of speech are the same, and the smoothness of the generated sentences and the unchanged semantics are ensured. When words are inserted randomly, the method of inserting adverbs is adopted, the meanings and the syntactic structures of the generated sentences and the original sentences are ensured not to be changed, and the similarity calculation is carried out on the new samples and the original samples when the final samples are fused. Therefore, the WMDE method has the function of generating more accurate enhanced samples through the original data on the premise of not changing the meaning and the syntactic structure of the original data, so that the model can learn more effective characteristics. In the experimental process, the effectiveness of synonym replacement and random insertion algorithms on model feature learning is verified respectively, and finally, the two enhancement modes are combined, and the result shows that the effect of combining the two enhancement modes is optimal.

Fusing the data enhancement sample and the original sample to be used as an input sample of the model, respectively using the perturbation coefficient range on the Laptop data set and the Restaurant data set from 0.01 to 0.09, and increasing the step length to 0.01. FIG. 3 shows a diagram of an Accuracy of a comparison experiment between disturbance coefficients alpha with different sizes and a BERT-BASE model set by using a WMDE-AL model provided by the invention, wherein (a) a subgraph in FIG. 3 shows a schematic diagram of an Accuracy of confrontation training under different sizes alpha of a Laptop data set, and (b) a subgraph in FIG. 3 shows a schematic diagram of an Accuracy of confrontation training under different sizes alpha of a Restaurant data set; each subgraph in FIG. 3 includes the performance of the four methods BERT-BASE, BERT-WMDE, BERT-AL, and BERT-WMDE-AL. The method is characterized in that alpha with different sizes is used for resisting training evaluation index growth amplitude based on BERT-BASE, as shown in FIG. 4, wherein the scale of a radar graph represents the size of resisting disturbance alpha, a subgraph in FIG. 4 (a) represents the growth condition of a Laptop data set Accuracy and F1 values, and the WMDE-AL method growth amplitude is higher than that of the AL method according to the subgraph; the subgraph in fig. 4 (b) shows the growth of the retaurant data set Accuracy and F1 values, and it can be derived from the subgraph that the WMDE-AL method has a slightly lower growth amplitude than the AL method. For the Laptop data set, when alpha =0.02, the accuracy reaches the maximum of 79.94% by using a BERT-AL training mode, which is improved by 3.29% compared with the BERT-BASE training mode; when the training mode of BERT-WMDE-AL is used, when alpha =0.01, the accuracy reaches the maximum value of 80.88%, which is respectively improved by 4.23% and 1.88% compared with the training modes of BERT-BASE and BERT-WMDE, and the accuracy of adding word mask data to enhance the countertraining is improved by 0.94% compared with the accuracy of not adding word mask data to enhance the countertraining. For the Restaurant data set, when alpha =0.08, the accuracy reaches the maximum of 85.71% by using a BERT-AL training mode, which is 1.69% higher than that by using a BERT-BASE training mode; when the training mode of BERT-WMDE-AL is used, when alpha =0.02, the accuracy rate reaches the maximum value of 85.27%, which is improved by 1.25% and 0.89% compared with the training modes of BERT-BASE and BERT-WMDE respectively, and the accuracy rate of the anti-training enhanced by adding the word mask data is reduced by 0.44% compared with the accuracy rate of the anti-training enhanced by not adding the word mask data. The results of the countertraining experiments show that the WMDE-AL performance is better for small data sets as well, and is obviously improved compared with a benchmark model. But for slightly larger datasets the performance of the AL method directly over WMDE-AL method is slightly better. According to the result analysis, the AL method and the WMDE-AL method can effectively improve the sample characteristic diversity, and the confrontation sample is utilized to improve the text representation quality, so that the specific target emotion classification performance is improved.

TABLE 4 Accuracy values for the enhanced BERT-BASE countertraining

4.3 model comparison

The model performance comparison includes: (1) The performances of the WMDE, AL and WMDE-AL methods proposed by the patent of the invention are compared with the performances of BERT-BASE, and the three methods are compared at the same time; (2) Compared to the model that currently performs better on the Laptop and Restaurant datasets. The evaluation indexes are Accuracy and F1 values, and the performance of the reference model is shown in table 5, which can be obtained as follows:

(1) The features were extracted using a deep learning model with TD-LSTM (Tang et al.2016) with 71.83% and 78.00% accuracy on the Laptop and Restaurant datasets, respectively. The deep learning specific target emotion classification model solves the problem that the machine learning solves the complex artificial feature extraction problem, and the performance of the deep learning model is mostly superior to the best machine learning performance.

(2) MemNet (Tang et al.2016) combines multiple attention by adopting linear combination to extract target entity characteristic information, so that specific target emotion classification performance is improved, and the accuracy rates are 72.20% and 81.00% respectively.

(3) RAM (Chen et al.2017) adopts GRU network structure to combine multiple attention weights, different attention emotion feature vectors are combined in a nonlinear mode, and the accuracy rates are 74.49% and 80.23% respectively.

(4) MGAN (Fan et el.2018) captures word-level interaction relation between target entities and sentences by adopting fine-grained and coarse-grained attention, and then carries out specific target emotion classification, wherein the accuracy rates of the classification are 75.39% and 81.25%, respectively.

(5) RepWalk (Zheng et al.2020) employs a random copy walk based on syntax trees to capture contextual features of sentence information, effectively utilizing syntax structures to improve sentence representation with 78.20% and 83.80% accuracy, respectively.

(6) The BERT-PT (Xu et al.2019) retrains the context information BERT model by adopting a large-scale specific target field corpus, improves the quality of the final task word representation, and has the accuracy rates of 78.07% and 84.95% respectively.

TABLE 5 Overall Performance comparison of target-specific sentiment classification models on Laptop and Restaurant datasets

/>

In table 5, # indicates the experimental result of the present patent, "-" indicates the experimental result in the reference thereof indicates that it is not recorded, and the remaining data indicates that it is from the original document.

In order to evaluate the performance of the method provided by the patent of the invention on a specific target emotion analysis task, BERT-BASE is adopted as a target model of countermeasure training, and the following three comparative experiments are respectively carried out: (1) Firstly, verifying the effectiveness of WMDE in generating new corpora, selecting an optimal result as an experimental result through five times of WMDE experiments, wherein the accuracy rates of the Laptop and Restaurant data sets are 79.00% and 84.38% respectively; (2) Verifying the effectiveness of AL of the original data set, and performing countermeasure training by using a clean sample and a countermeasure sample, wherein the accuracy rates of the AL are respectively 79.94% and 85.71%, and are respectively 1.87% and 0.76% higher than that of a BERT-PT model; (3) The verification shows that a new training sample is generated through AL and the anti-training is carried out by fusing the original sample, namely the WMDE-AL method provided by the invention has the accuracy rates of 80.88% and 85.27% respectively, and is 2.81% and 0.32% higher than the accuracy rate of a BERT-PT model respectively.

5 conclusion

A specific target emotion analysis model for word mask data enhancement and confrontation learning aims to improve the feature distribution diversity of a clean sample and improve BERT-BASE emotion classification performance by using confrontation training. Experimental results on Laptop and Restaurant data sets show that the addition of WMDE, AL and WMDE-AL by BERT-BASE is obviously improved compared with BERT-BASE specific target emotion analysis, wherein the AL and WMDE-AL methods are superior to the BERT-PT model at the advanced level at present. The main conclusions are as follows: (1) Performing word shielding data enhancement on a specific target field corpus through synonym replacement and random insertion words, keeping the semantics and the grammar structure of sentences unchanged and shielding entity words from being replaced, and effectively performing data enhancement on a field data set; (2) And respectively carrying out confrontation learning on the original data and the data after data enhancement, and utilizing a clean sample and a confrontation sample to improve the specific target emotion classification identification degree so as to achieve the purpose of confrontation defense.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A specific target emotion analysis method for enhancing and counterlearning by fusing word shielding data is characterized by comprising the following steps of:

s1, synonym replacement and random word insertion are carried out on sentences by using a target entity shielding mode to generate effective samples and the effective samples are fused with original samples;

s2, constructing a BERT-BASE-based confrontation learning specific target emotion classification model, taking the data fused in the S1 as the input of the model, and training the emotion classification model by using a clean sample and a confrontation sample together:

enhanced data Da _Out As a clean sample, da _Out ＝S _Sr ∪S _Ri ，S _Sr For data after synonym replacement, S _Ri The data after random insertion; for each batch of clean samples, the clean sample is first used to generate the opposing perturbation r of the word embedding layer _adv Thereby generating a challenge sample; the Adv-BERT model performs every batch training against the sample,each batch training of clean samples was performed using the BERT model;

the emotion classification model comprises a BERT model and an Adv-BERT model;

the antagonistic learning includes: applying the antagonistic learning in the ABSA task, adding the antagonistic disturbance in an embedding layer of the model, wherein the probability that the emotion of the target entity aspect is y in one sentence is p (y | S) _BertIn ,aspect)，S _BertIn Representing the input to the BERT model, the embedded layer of the model therefore adds the loss function against disturbance as follows:

-log p(y|E _w +r _adv ,aspect；θ) (1)

wherein

p(y|E _w +r _adv Aspect; theta) represents the addition of the antagonistic disturbance r _adv Emotional prediction probability of r _adv Representing the counterdisturbance, r representing an counterdisturbance to the input, α representing a disturbance scaling factor, | | · | | | representing a norm, argmin representing the r variable that minimizes the objective function, and then assigning this r value to r _adv ，

Representing the predicted probability after adding the perturbation r, E _w A word-embedding tensor representing a clean sample, θ represents a neural network parameter, and>

a constant set representing current parameters of the neural network classifier;

and S3, finally obtaining a specific target emotion analysis result.

2. The method for analyzing emotion of a specific target in fused word mask data enhancement and antagonistic learning according to claim 1, further comprising the steps of:

3. The method for analyzing emotion of a specific target in fused word mask data enhancement and antagonistic learning according to claim 1, wherein the calculation method of synonym substitution in step S1 is:

wherein S is _Sr Representing data after synonym replacement;

F _SR () represents a synonym replacement data enhancement function;

S _In an input representing an original corpus;

is the ith word of an original sample;

aspect represents a specific target entity;

rep (-) represents a word replacement function;

indicating the id word needing to be replaced;

id _Sr a location representing a word replacement;

indicates that based on the ith word->

Randomly finding num in Wordnet library ₁ A synonym;

| A = means not equal.

4. The method for analyzing emotion of a specific target in fused word mask data enhancement and counterstudy according to claim 1, wherein the calculation method for randomly inserting words in step S1 is:

wherein S is _Ri Representing the data after random insertion;

F _RI (. To) represents a random insertion data enhancement function;

insert (·) denotes inserting a word after the id's word;

indicating the id-th word needing to be inserted;

Ran(Wordnet,num ₂ ) Indicating that num is randomly found in Wordnet library ₂ A word.

5. The method for analyzing emotion of a specific target in fused word mask data enhancement and counterstudy according to claim 1, wherein the loss function of each batch training of the clean sample is calculated as follows:

wherein L is _clean (. Represents the loss function of a clean sample, N _batch Denotes the size of a batch, theta denotes the neural network parameter, p (y) _i |E _i ,aspect _i (ii) a Theta) represents the emotion prediction probability function of the ith sample in a batch, y _i A true tag representing the ith sample, E _i Embedding layer tensor, aspect, representing the ith clean sample _i A specific target entity representing the ith sample;

wherein L is _adv (. Represents a loss function of challenge samples, N _batch Denotes the size of a batch, theta denotes the neural network parameter, p (y) _i |E _adv(i) ,aspect _i (ii) a Theta) represents the ith confrontation sample emotion prediction function, E _adv(i) The embedded layer tensor representing the ith challenge sample.

6. The method for analyzing emotion of a specific target in fused word mask data enhancement and antagonistic learning according to claim 5, further comprising:

minimize the loss function for each batch of clean and challenge samples:

where L (-) represents the model loss function,

7. The method for analyzing emotion of a specific target in fused word mask data enhancement and antagonistic learning according to claim 1, wherein the hidden layer of the BERT model adopts a Gaussian error linear unit as an activation function:

/>

8. The method for analyzing emotion of a specific target in a fused word with masked data enhancement and antagonistic learning according to claim 1, wherein said antagonistic learning further comprises:

finding out the confrontation perturbation by using a fast gradient descent method, calculating the confrontation perturbation by using back propagation in a neural network, then adding the confrontation perturbation and the word vector of the original embedded layer to obtain a confrontation sample, and obtaining the confrontation perturbation r _adv The calculation is as follows:

wherein

The challenge sample loss function for ABSA is therefore as follows:

wherein the confrontation sample E _adv Is represented as follows:

where α represents the anti-perturbation scaling factor, g _w Represents the gradient of the word embedding layer in the model, | · | | non-calculation ₂ Represents a two-norm;

representing gradient operators, E _w Means for indicating drynessWord-embedded tensor of a clean sample, and/or>

A constant set representing current parameters of the neural network classifier; lambda [ alpha ] _i (-) represents the eigenvalues of the matrix, g _w ^H Denotes g _w The conjugate transpose of (1); n denotes the total number of samples, p (y) _i |E _adv(i) ,aspect _i (ii) a θ) represents the prediction probability of the ith challenge sample, y _i A true tag representing the ith sample, E _adv(i) Embedding layer tensor, aspect, representing the ith countermeasure sample _i Representing the ith specific target entity, and theta represents a neural network parameter;

r _adv representing resistance to disturbance, E _seg Word-embedding tensor, E, representing clean samples _pos The position representing a clean sample embeds the tensor,