CN114970557A - Knowledge enhancement-based cross-language structured emotion analysis method - Google Patents
Knowledge enhancement-based cross-language structured emotion analysis method Download PDFInfo
- Publication number
- CN114970557A CN114970557A CN202210423028.0A CN202210423028A CN114970557A CN 114970557 A CN114970557 A CN 114970557A CN 202210423028 A CN202210423028 A CN 202210423028A CN 114970557 A CN114970557 A CN 114970557A
- Authority
- CN
- China
- Prior art keywords
- word
- sentence
- training
- embedding
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a knowledge enhancement-based cross-language structured emotion analysis method. In the present invention, a countermeasure-based embedded adapter is designed: semantically rich word-embedded representations are dynamically learned through a word-level based Attention (Attention) mechanism. Meanwhile, in order to improve the robustness of representation, a countermeasure mechanism is arranged to add disturbance to word embedding. The invention also designs an encoding layer based on the graph neural network: structured knowledge (e.g., syntactic parse trees) is important to structured sentiment analysis tasks; also, although the word order between different semantics is inconsistent, the syntactic structure is similar. To this end, the present invention incorporates structured knowledge (e.g., syntactic structures) into the model, learning a structured representation. And finally, the invention performs decoding operation through a decoding layer to extract the target, the bearer, the viewpoint words and the emotion polarity information contained in the text.
Description
Technical Field
The invention relates to a structured emotion analysis method.
Background
With the continuous warming of social media, both users and their production content are growing at an explosive rate, which radically changes the way information is accepted and disseminated by the public as well as enterprises. Structured sentiment analysis is a very meaningful job in the presence of large data on the order of tens of millions of news per day. For example: a media worker can train an emotion analysis model to know favorite and disliked movies of people according to a large number of comments about movies on the internet; an investor can construct a model which is helpful for stock market prediction, and the optimistic degree of the stocks is evaluated by the leave messages of people in forums; a government worker can evaluate the emotional change of people watching Tutt speeches through an emotion analysis model so as to analyze the love degree of the speeches. For this reason, structured sentiment analysis is proposed, which can identify the sentiment of users expressed on real-time events such as financial news, sports, weather, entertainment and the like on a social platform, and is crucial for many applications.
Specifically, structured emotion analysis refers to extracting structured knowledge (such as objects, opinion words, holders, etc.) in text and predicting their emotions, and is an important research direction in the field of Natural Language Processing (NLP). The task comprises two subtasks of structured extraction and emotion analysis. First, the structured extraction task automatically extracts the main body and each component part from the text, and gives the relationship existing between each part. Then, for a given structured data, its corresponding emotion is predicted. The method depends on entity extraction and relationship extraction, but has higher difficulty compared with the entity extraction and the relationship extraction, and relates to methods and technologies of multiple subjects such as natural language processing, machine learning, pattern matching and the like. In recent years, with the development of deep neural networks, especially the wide application of large-scale pre-training methods, the effect of a structured emotion analysis task is greatly improved.
However, due to the complex labeling of the structured emotion analysis task, the acquisition cost is high, and the data set is small, so that the effect of the neural network model is greatly limited. To this end, cross-language migration methods are proposed for structured extraction, thereby reducing the need for annotation data. Most cross-language structure migrations suffer from language-specific problems that are too dependent on bilingual dictionaries and parallel corpora, which require additional resources or tools. Feng et al propose applying cross-language migration to the task of sequence annotation without regard to complexity. Wang et al use similar spatially distributed representations across languages for relationship extraction.
In recent years, multilingual pre-training models (e.g., mBERT, XLM, etc.) have enjoyed great success in cross-language migration. Liu et al and Nguyen et al apply multilingual word embedding to cross-language tasks such as semantic role labeling, dependency parsing, named entity recognition. However, most current approaches are based on modeling a pre-training word vector. There are in fact many cross-language pre-training models and have different semantic information. Because the optimization objectives for each pre-trained model are different, the data sets are also different. More, structured knowledge is widely used for structured extraction tasks, and how to use the structured emotion analysis task in cross-language is not fully researched.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: with the development of the pre-training language model, a plurality of pre-training model structures, training targets and training data are arbitrarily proposed aiming at cross-language structured extraction, so that different cross-language pre-training models have different semantic information, and therefore, no good research has been found on how to fully utilize all cross-language pre-training models to improve word embedding expression; meanwhile, structured knowledge is important for structured emotion analysis tasks, structured information (such as targets, holders, viewpoint words and the like) is close to each other in a syntax tree, different languages are labeled based on the same syntax rules, so that the syntax structures of the structured information are similar, and how to combine the syntax structures for the cross-language structured emotion analysis tasks is ignored by current work.
In order to solve the technical problems, the technical scheme of the invention is to provide a knowledge enhancement-based cross-language structured emotion analysis method, which is characterized in that a training corpus of a source language is adopted Whose labels are sets of view tuplesWherein the content of the first and second substances,which is indicative of the number of samples,representing a sample containing a number of view tuples; set of viewpoint tuplesThe kth view tuple o k =(h k ,t k ,e k ,p k ) Represents the kth view tuple o k Holder h of k By the term of opinion e k For the target t k Expressing emotional polarity p k Wherein: is a training corpusThe jth sentence x in j The sub-string of (a) is, respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j The start position of (2) is set,are respectively sentences x j InThe word at the location of the location,are respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j In the end position of (a) to (b),are respectively sentences x j InAnd (3) the cross-language structured emotion analysis method specifically comprises the following steps of:
s101, constructing and training a countermeasure embedding adapter, designing a word attention mechanism when constructing the countermeasure embedding adapter so as to capture a plurality of embedded important implicit distributed semantics pre-trained on different corpora by different training strategies and tasks, and then improving the robustness of word embedding by adopting a countermeasure training strategy;
when training against the embedded adapter, getSet of cross-language pre-training modelsWill train the corpusPer sentence input set in (2)Is/are as followsPre-training a model in a cross-language mode so as to obtain word embedding vectors of sentences of each sentence; for training corporaFor any sentence, the words are fused and used through a word-level attention mechanismObtaining a word embedding vector by a cross-language pre-training model so as to obtain a final word embedding vector corresponding to each sentence, and finally adding disturbance into the final word embedding vector;
step S102, constructing and training a grammar GCN encoder:
obtaining training corpusConstructing a graph for each sentence based on the syntactic parse tree, then calculating to obtain an in-out degree matrix of the graph, and obtaining a grammar GCN encoder according to the graph and the in-out degree matrix; embedding the words obtained in the step S101 after adding disturbance into a vector input Grammar (GCN) encoder to obtain a structural representation of a uniform space, thereby obtaining a structural hidden representation with rich and stable information;
step S103, constructing and training a decoder:
based on the structural hidden representation with rich and stable information, extracting viewpoint words by predicting the starting and ending positions of the viewpoint words, and regarding the viewpoint words as trigger words of each viewpoint; then, extracting the target and the holder, and predicting the emotional polarity of given expression;
and step S104, for any sentence x obtained in real time, obtaining a word embedding vector added with disturbance by using the trained confrontation embedding adapter, inputting the word embedding vector into the trained grammar GCN encoder to obtain the hidden layer representation of each word in the sentence x, and finally extracting all viewpoint tuples contained in the sentence x by using the trained decoder.
Preferably, the step S101 specifically includes the following steps:
step S1011, obtaining training corpusThe jth sentence x in j The word embedding vector of (1), comprising the steps of:
will sentence x j Are respectively inputAfter a cross-language pre-training model is obtainedEmbedding different words into the vector, wherein the sentence x j Input the ith cross-language pre-training model M i The word-embedded vector obtained after this is represented as In the formula (I), the compound is shown in the specification, representing a sentence x j The first word of Chinese l By cross-language pretrainingTraining model M i Word embedding, | x, is obtained j I represents the sentence x j The total number of words in;
fusing sentence x-based attention mechanisms through word-level j Obtained byDifferent word embedding vectors to obtain final word embedding vector E j ,Wherein e is l Embedding a vector E for a word j The first word in the embedding has:
in the formula, v a 、W a And b a In order to train the parameters, the user may,denotes v a The transposed matrix of (2);
step S1012, setRepresentation for sentence x j In which r is l Representation for sentence x j Word embedding e of the first word in l The disturbance of (2); sentence x j Adding disturbance r j Is represented byFurther use ofIndicates that there isWherein the content of the first and second substances,
wherein g is | x j L g l The splicing of the two pieces of the paper is carried out,indicating e for word embedding l Is calculated by the gradient, | · | 2 The norm of L2 is shown, represents the loss for a single sample, e is a parameter used to control the degree of perturbation;
based on countering disturbancesMinimizing maximum likelihood of resistance trainingThereby obtaining a sentence for the sentence x j Worst disturbance ofThe setup of the confrontational training is as follows:
during training, sentence x j Adding perturbationsObtaining the word embedding vector after adding the disturbance
Preferably, the step S102 includes the steps of:
during training, training corpora are obtainedMiddle sentence x j The set of relationships of the syntax parse tree is denoted as E j (ii) a As a sentence x j Construction drawing G j ,G j =(V j ,E j ),v l Is the l-th node, is sentence x j The first word w in (1) l (ii) a Based on the graph G j Establishing an adjacency matrixIf it is in the graph G j Node v in m And node v n There is a connecting edge between them, then A mn 1, otherwise A mn =0,A mn For the elements adjacent to the m-th row and n-th column in the matrix A, a graph G is obtained j And has an entry and exit degree matrix D mm =∑ n A mn ,D mm The element in the mth row and mth column of the access matrix D is 0, and the other elements in the access matrix D are 0.
Based on the graph G j The adjacency matrix A and the degree of access matrix D construct a sentence x j The syntax GCN encoder of (1), the syntax GCN encoder having P +1 graphics convolution layer in common and the hidden representation of the P-th graphics convolution layerIs learned from the hidden representation of its adjacent p-1 th layer of the graphics convolution layer,as a sentence x j The first word w in (1) l Hidden representation at p-th layer, H (p) Is represented as follows:
Adding the perturbed sentence x obtained in step S102 j Word-embedded vectorObtaining a structured representation of a unified space from the P +1 th layer after entering a syntax GCN encoderWherein the content of the first and second substances,representing the finally obtained sentence x j The first word of Chinese l The hidden layer representation of the method can obtain the structural hidden representation with rich information and robustness.
Preferably, the step S103 specifically includes the following steps:
step S1031, viewpoint word extraction: to extract the opinion words in the sentence, when trained, two binary classifiers are used to predict sentence x j The first word w in l Is the term of point of view e k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula: CE (·) represents a cross-entropy function;is a sample viewpoint word e k Labels for the start and end positions, if l is a terme k A start or end position of, thenEqual to 1;
step S1032, target word extraction: to take into account the opinion information when extracting the target, the predicted sentence x is predicted from the opinion term tokens when training j The first word w in l Is a target t k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula (I), the compound is shown in the specification,are parameters that can be learned;is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the start position,is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the end position of (a); [ a; b]The connection operation of a and b is shown;
in the formula (I), the compound is shown in the specification,is the sample object t k Labels for start and end positions, if l is the target t k A start or end position of, thenEqual to 1;
step S1033, the holder extracts: during training, sentence x is predicted from the viewpoint word representations j The first word w in l Is the holder h k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula (I), the compound is shown in the specification,is the sample holder h k Labels for the start and end positions, if l is the holder h k A start or end position of, thenEqual to 1;
step S1034, emotion polarity prediction:
during training, the maximum pooling method is used to obtain sentence x j The sentence of (1) characterizes r s =Maxpooling(H (P) ) And connecting it with the viewpoint word representation to make polarity classification to obtain sentence x j Expressing emotional polarity p k Probability of (c):
in the formula (I), the compound is shown in the specification,is the sample emotional polarity p k The label of (1).
In the present invention, a countermeasure-based embedded adapter is designed: semantically rich word-embedded representations are dynamically learned through a word-level based Attention (Attention) mechanism. Meanwhile, in order to improve the robustness of representation, a countermeasure mechanism is arranged to add disturbance to word embedding. The invention also designs an encoding layer based on the graph neural network: structured knowledge (e.g., syntactic analysis trees) is important to structured sentiment analysis tasks; also, although the word order between different semantics is inconsistent, the syntactic structure is similar. To this end, the present invention incorporates structured knowledge (e.g., syntactic structures) into the model, learning a structured representation. Finally, the invention performs decoding operation through a decoding layer to extract the target, the holder, the viewpoint words and the emotion polarity information contained in the text.
The invention focuses on cross-language structured emotion analysis, provides cross-language structured emotion analysis based on knowledge enhancement, trains on a source language, and migrates to a target language for testing, thereby reducing the requirement on target language labeling data and improving the effect of neural network model extraction. Therefore, the invention provides a knowledge-enhanced cross-language structured emotion analysis model, which adds implicit knowledge and explicit knowledge into a cross-language structured emotion analysis task. First, the present invention designs a countermeasure-based embedded adaptor that adaptively combines multiple embedded representations using word-level attention mechanism and a countermeasure strategy, learning semantically rich and robust word representations. Furthermore, inspired by the existing work, the invention integrates the general syntactic dependency into the cross-language structured emotion analysis. The syntax tree has important significance for the structured emotion analysis task, and meanwhile, different semantics have similar syntax structures, so that the structured representation can be well learned by combining the syntax structures into the task.
Drawings
FIG. 1 is a diagram of a knowledge-based enhanced cross-language structured migration model of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The existing structured emotion analysis data set relates to fewer languages and is smaller in scale, and the performance of a neural network model is limited to a great extent. Therefore, in the knowledge enhancement-based cross-language structured emotion analysis method disclosed by the invention, a knowledge enhancement cross-language structured migration is proposed, as shown in fig. 1. First, the present invention designs a countermeasure-based embedded learning adapter to adaptively capture implicit semantic information from multi-language embedding to learn semantically rich and robust information representations. In addition, the present invention incorporates a syntactic GCN encoder to learn structural representations between different languages. Finally, the invention performs structured extraction based on the sentence representations learned by the two parts.
In the present invention, formalization is defined as follows:
given a corpus of a source languageWhose labels are sets of view tuplesWherein the content of the first and second substances,which is indicative of the number of samples,the representative sample contains the viewpoint tuple number. Set of viewpoint tuplesThe kth viewpoint tuple o k =(h k ,t k ,t k ,p k ) Represents the kth view tuple o k Holder of h k By the term of opinion e k For the target t k Expressing emotion polesProperty p k Wherein: is a training corpusThe jth sentence x in j The sub-string of (a) is, respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j In the above-mentioned position of the start position,are respectively sentences x j InThe word at the location of the location, respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j In the end position of (a) to (b), are respectively sentences x j In (1)The word at the location.
Based on the definition, the cross-language structured emotion analysis method based on knowledge enhancement provided by the invention specifically comprises the following steps of:
and S101, constructing and training a counterattack embedding adapter, and learning a rich and robust cross-language migration word embedding method through the counterattack embedding adapter. To this end, in building the anti-embedding adapter, the invention first designs a word attention mechanism to capture the important implicit distributed semantics of multiple embeddings pre-trained with different training strategies and tasks on different corpora. The invention then employs an antagonistic training strategy to improve the robustness of word embedding.
Step S101 specifically includes the following steps:
step S1011, designing a word level attention mechanism:
multilingual pre-training language models, such as mBERT and XLM, have been widely used for different cross-language tasks with great success. In fact, developers and researchers have released a large number of multilingual pre-trained language models. There are more than 100 models on Hugging Face website. These models have different semantic information because they are trained on different large-scale datasets using different targets and settings. For example, mBERT-base-cast and mBERT-base-uncased are trained on the first 104 languages with the largest Wikipedia using a Mask Language Modeling (MLM) target based on lower case or upper case text. The XLM-RoBERTA model was pre-trained on these 100 languages with 2.5TB of data. However, most existing work uses one of these models as word embedding. In order to obtain better word representation, the invention designs a word-level attention mechanism to better combine multiple cross-language pre-training word embedding.
Specifically, obtaining a compound ofSet of cross-language pre-training modelsDuring trainingWill train the corpusPer sentence input set in (2)Is/are as followsAnd pre-training the model in a cross-language mode, so as to obtain a word embedding vector of each sentence. Since the sub-words of different cross-language pre-training models after processing the same sentence are different, the training corpus is differentFor any sentence, the invention usesAnd taking the average value of the sub-words obtained by the cross-language pre-training model as the final word embedding of the word, thereby obtaining the final word embedding vector corresponding to each sentence.
In this embodiment, during training, training corpora are obtainedThe jth sentence x in China j The word embedding vector specifically comprises the following steps:
will sentence x j Are respectively inputAfter a cross-language pre-training model is obtainedEmbedding different words into the vector, wherein the sentence x j Input the ith cross-language pre-training model M i The word-embedded vector obtained after this is represented as In the formula (I), the compound is shown in the specification, representing a sentence x j The first word of Chinese l By pre-training the model M across languages i Word embedding, | x, is obtained j I represents the sentence x j The total number of words in.
Fusing sentence-based x through word-level attention mechanism j Obtained byDifferent word embedding vectors to obtain final word embedding vector E j ,Wherein e is l Embedding a vector E for a word j The first word in the embedding has:
in the formula, v a 、W a And b a In order to train the parameters in a trainable manner,a representation matrix v a Is transposed matrix of
In the word-level attention mechanism provided by the invention, weights of different dimensions of different words are different, for example, an emotion expression word focuses on emotion information, and a target word focuses on entity information.
Step S1012, word embedding based on the confrontation: the word-level attention mechanism obtains semantically rich word representations, however, instability of the cross-language transfer model is caused by type and semantic differences between the source language and the target language. Therefore, to improve the robustness of word embedding, the present invention applies antagonism training to the input word embedding space of cross-language transfer. Existing research has shown that antagonism training is a novel regularization algorithm that improves robustness by adding some interference to the input.
In particular, during the training, the user can,representation for sentence x j In which r is l Representation for sentence x j Word embedding e of the first word in l The disturbance of (2). Sentence x j Adding disturbance r j Is represented byFurther use ofIndicates that there isWherein, the first and the second end of the pipe are connected with each other,
s.t.||r j ||<∈
in the formula (I), the compound is shown in the specification, denotes the loss for the jth sample, | denotes the L1 norm, e is the parameter used to control the degree of disturbance.
wherein g is | x j L g l The splicing of the two pieces of the paper is carried out,to representIndicating e for word embedding l Is calculated by the gradient, | · | 2 Representing the L2 norm.
Based on countering disturbancesMinimizing maximum likelihood of resistance trainingThereby obtaining a sentence for the sentence x j Worst disturbance ofThe setup of the confrontational training is as follows:
during training, sentence x j Adding perturbationsThen obtain the word embedding vector after adding the disturbance
Step S102, constructing and training a grammar GCN encoder: the emphasis of the antagonistic embedded adapter is to learn a rich and robust distribution characterization, while explicit knowledge is ignored. Therefore, in order to learn a cross-language structural characterization, the invention introduces a syntactic GCN encoder, which integrates a dependency parsing tree into a cross-language structured emotion analysis. Since the syntax parse tree plays a crucial role in the structured extraction. As shown in fig. 1, it can be found from the syntax tree that the holder "long-level observer group of the south african community" and the target "muga president" are both on one subtree. Furthermore, the distance between the representation of the parse tree and the target (or holder) is close to a sentence. All this shows that the model can learn the structural relationships between targets, holders and expressions by parsing the similar words on the tree. Second, two tree structures with similar semantic sentences are similar in multiple languages because they are labeled according to the same linguist's syntactic rules. Thus, the present invention proposes a syntactic GCN encoder to model explicit structural knowledge of cross-language migrations.
When training, firstly, obtain training corpusAnd constructing a graph for each sentence based on the syntactic parse tree, then calculating to obtain an in-out degree matrix of the graph, and obtaining a grammar GCN encoder according to the graph and the in-out degree matrix. And embedding the words obtained in the step S101 after adding disturbance into a vector input Grammar (GCN) encoder to obtain a structural representation of a uniform space, thereby obtaining a structural hidden representation with rich and stable information.
Wherein, during training, the embodiment obtains the corpus by the open source tool StanzaMiddle sentence x j The set of relationships of the syntax parse tree is denoted as E j . As a sentence x j Construction drawing G j ,G j =(V j ,E j ),v l Is the l-th node, is sentence x j The first word w in (1) l . Based on the graph G j Establishing an adjacency matrixIf it is in the graph G j Node v in m And node v n There is a connecting edge between them, then A mn 1, otherwise A mn =0,A mn For the elements adjacent to the m-th row and n-th column in the matrix A, a graph G is obtained j And has an entry and exit degree matrix D mm =∑ n A mn ,D mm The element in the mth row and mth column of the access matrix D is 0, and the other elements in the access matrix D are 0.
Based on the graph G j The adjacency matrix A and the degree of access matrix D construct a sentence x j The syntax GCN encoder of (1), the syntax GCN encoder having P +1 graphics convolution layer in common and the hidden representation of the P-th graphics convolution layerIs learned from the hidden representation of its adjacent p-1 th layer of the graphics convolution layer,as a sentence x j The first word w in (1) l Hidden representation at p-th layer, H (p) Is represented as follows:
Adding the perturbed sentence x obtained in step S102 j Word-embedded vectorObtaining a structured representation of a unified space from the P +1 th layer after entering a syntax GCN encoderWherein the content of the first and second substances,representing the finally obtained sentence x j The first word w in l The hidden representation of the data is obtained, thereby obtaining the structural hidden representation with rich and robust information.
Step S103, constructing and training a decoder: based on the structural hidden representation with rich and stable information, the invention adopts a simple decoding strategy for the four subtasks. First, the present invention extracts viewpoint words by predicting the start and end positions of the viewpoint words. These viewpoint words are treated as trigger words for each viewpoint. The invention then extracts the target and the bearer and predicts the emotional polarity for a given expression.
Step S103 specifically includes the following steps:
step S1031, viewpoint word extraction: to extract the opinion words in sentences, the present invention predicts sentence x using two binary classifiers when training j The first word of Chinese l Is the term of point of view e k Probability of starting position ofOr probability of an end positionAs shown in the following formula:
in the formula: CE (·) represents a cross-entropy function;is a sample viewpoint word e k Labels for the start and end positions, if l is the term e k A start or end position of, thenEqual to 1.
In the prediction process, ifIf it is greater than 0.5, l is the term e k The start position of (2); if it isIf it is greater than 0.5, l is the term e k The end position of (2).
Step S1032, target word extraction: to take into account the opinion information when extracting the target, the present invention predicts the predicted sentence x from the opinion term characterization when training j The first word w in l Is a target t k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula (I), the compound is shown in the specification,are parameters that can be learned;is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the start position,is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the end position of (a); [ a; b]Showing the connection operation of a and b.
in the formula (I), the compound is shown in the specification,is the sample object t k Labels for start and end positions, if l is the target t k A start or end position of, thenEqual to 1.
In the prediction process, ifIf it is greater than 0.5, l is the target t k The start position of (2); if it isIf it is greater than 0.5, l is the target t k The end position of (1).
Step S1033, the holder extracts: as with target word extraction, in training, the present invention predicts sentence x based on viewpoint word characterization j The first word of Chinese l Is the holder h k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula (I), the compound is shown in the specification,is the sample holder h k Labels for the start and end positions, if l is the holder h k A start or end position of, thenEqual to 1.
In the prediction process, ifIf greater than 0.5, l is the holder h k The start position of (2); if it isIf the value is more than 0.5, l is the holder h k The end position of (1).
Step S1034, emotion polarity prediction:
finally, the aim of the invention is to predict and assign the term of opinion e k Relative emotional polarity p k . In training, the present invention uses the max-posing method to obtain sentence x j The sentence of (1) characterizes r s =Maxpooling(H (P) ) And view it togetherThe word features are connected to carry out polarity classification to obtain a sentence x j Expressing emotional polarity p k Probability of (c):
Based on the emotional probability distribution, the loss function designed by the invention is as follows:
in the formula (I), the compound is shown in the specification,is the sample emotional polarity p k Is marked with a label
It should be noted that the tuple (target t) is expressed due to the viewpoint word k Term "viewpoint" e k And the holder h k ) The invention incorporates the term of opinion e for this purpose k To predict emotion.
Finally, through a decoder model, all viewpoint tuples contained in the sentence can be finally extracted by the method, wherein the viewpoint tuples contain the bearer, the target word, the viewpoint word and the emotion polarity.
And step S104, for any sentence x obtained in real time, obtaining a word embedding vector added with disturbance by using the trained confrontation embedding adapter, inputting the word embedding vector into the trained grammar GCN encoder to obtain the hidden layer representation of each word in the sentence x, and finally extracting all viewpoint tuples contained in the sentence x by using the trained decoder.
Aiming at the problem of data loss of the structured event, the invention provides a method for cross-language structured migration with enhanced knowledge. The invention combines the latest technologies of a graph convolution network, a countermeasure model and the like, and creates a cross-language and cross-form structured extraction model according to the characteristics of the structured extraction task. The invention is mainly different from the traditional structured extraction method in the following two aspects
(1) The counter-embedded adapter:
the goal of the present invention is to learn a word embedding that is informative and robust for cross-language translation. First, a word-level attention mechanism is designed to capture multiple embedded significant distribution implied semantic information pre-trained with different training strategies and tasks on different corpora. Then, an antagonistic training strategy is employed to improve the robustness of the embedding.
(2) GCN-based structured representation learning:
the emphasis of the antagonistic embedded encoder is to learn a rich and robust distribution characterization, while the explicit knowledge is ignored. Therefore, modeling semantic information displayed by using a GCN mode is reasonably applied to structured emotion analysis.
In the present invention, a model for cross-language structured sentiment analysis is presented. An antagonistic embedding adaptator is designed for learning informativeness and robustness embedding. Then, a syntactic GCN encoder is introduced to learn the parse-tree based structural representation. We compared the model to supervised and unsupervised baselines over five datasets in four languages. Experimental results show that the model has great advantages in cross-language migration. Meanwhile, ablation experimental studies were also conducted to demonstrate the effectiveness of each module in the model.
(1) Knowledge-enhanced cross-language structured sentiment analysis model performance
The present invention is directed to the use of knowledge learned in one language to improve generalization ability in another language. To verify the validity of the model, it is evaluated for cross-language migration, trained on the source language, and tested on the target language without the tagged data. Twenty-five migration tasks were performed on five data sets in four low-resource languages, and the model of the present invention was compared to supervised and unsupervised baselines, as shown in table 1 below.
Table 1 is an evaluation table of the results of the experiment
From this table, the following results can be obtained. The migration model of the invention has obviously better performance in cross-language structured emotion analysis than the unsupervised baseline. In particular, the model of the invention achieves the best performance in all indexes, and so on the data set. In particular, the target F1 is improved by more than three points on three of the data sets. All this shows that the information structure representation of the present invention can help the model migrate structured emotions between different languages.
(2) Robust embedded encoder and GCN-based structured representation learning effectiveness
To investigate the effectiveness of each module consisting of the model of the present invention, ablation tests were performed on five data sets, as shown in table 2 below. The average score of all source data sets except the target data set is reported. In particular, the present invention removes the antagonistic embedded adapters (-AEA) and the syntactic GCN encoder (-SGCNE), respectively, from the model.
Table 2 shows the results of the ablation experiment
The results indicate that both the adversarial embedded adapter and the syntactic GCN encoder are important to this task. In particular, the adversarial embedding adapter can capture the diversity of the underlying features from various multilingual embedding that contains different semantic information. It learns information embedding through attention mechanism and improves robustness through antagonistic training strategy. The final word embedding can well improve the performance of cross-language migration. At the same time, the structural characterization learned by the syntactic GCN encoder can further improve the model effect, since the dependency parse tree is crucial for structured sentiment.
Claims (4)
1. A cross-language structured emotion analysis method based on knowledge enhancement is characterized in that training corpora of source language are adoptedWhose labels are sets of view tuplesWherein, the first and the second end of the pipe are connected with each other,which is indicative of the number of samples,representing a sample containing a number of view tuples; set of viewpoint tuplesThe kth view tuple o k =(h k ,t k ,e k ,p k ) Represents the kth view tuple o k Holder of h k By the term of opinion e k For the target t k Expressing emotional polarity p k Wherein: is a training corpusThe jth sentence x in j The sub-string of (a) is, respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j In the above-mentioned position of the start position,are respectively sentences x j InThe word at the location of the location,respectively the bearer h k Target t k Term "viewpoint" e k In sentence x j In the end position of (a) to (b),are respectively sentences x j InAnd (3) the cross-language structured emotion analysis method specifically comprises the following steps of:
s101, constructing and training a countermeasure embedding adapter, designing a word attention mechanism when constructing the countermeasure embedding adapter so as to capture a plurality of embedded important implicit distributed semantics pre-trained on different corpora by different training strategies and tasks, and then improving the robustness of word embedding by adopting a countermeasure training strategy; when training against the embedded adapter, getSet of cross-language pre-training modelsWill train the corpusPer sentence input set in (2)Is/are as followsPre-training a model in a cross-language mode so as to obtain word embedding vectors of sentences of each sentence; for training corporaFor any sentence, the words are fused and used through a word-level attention mechanismObtaining a word embedding vector by a cross-language pre-training model so as to obtain a final word embedding vector corresponding to each sentence, and finally adding disturbance into the final word embedding vector;
step S102, constructing and training a grammar GCN encoder:
obtaining training corpusConstructing a graph for each sentence based on the syntactic parse tree, then calculating to obtain an in-out degree matrix of the graph, and obtaining a grammar GCN encoder according to the graph and the in-out degree matrix; embedding the words obtained in the step S101 after adding disturbance into a vector input Grammar (GCN) encoder to obtain a structural representation of a uniform space, thereby obtaining a structural hidden representation with rich and stable information;
step S103, constructing and training a decoder:
based on the structural hidden representation with rich and stable information, extracting viewpoint words by predicting the starting and ending positions of the viewpoint words, and regarding the viewpoint words as trigger words of each viewpoint; then, extracting the target and the holder, and predicting the emotional polarity of given expression;
and step S104, for any sentence x obtained in real time, obtaining a word embedding vector added with disturbance by using the trained confrontation embedding adapter, inputting the word embedding vector into the trained grammar GCN encoder to obtain the hidden layer representation of each word in the sentence x, and finally extracting all viewpoint tuples contained in the sentence x by using the trained decoder.
2. The knowledge-enhancement-based cross-language structured emotion analysis method of claim 1, wherein the step S101 specifically comprises the steps of:
step S1011, obtaining training corpusThe jth sentence x in j The word embedding vector of (1), comprising the steps of:
will sentence x j Are respectively inputAfter a cross-language pre-training model is obtainedEmbedding different words into the vector, wherein the sentence x j Input the ith cross-language pre-training model M i The word-embedded vector obtained after this is represented as In the formula (I), the compound is shown in the specification, representing a sentence x j The first word w in l By pre-training the model M across languages i Word embedding, | x, is obtained j I represents the sentence x j The total number of words in;
fusing sentence x-based attention mechanisms through word-level j Obtained byDifferent word embedding vectors to obtain final word embedding vector E j ,Wherein e is l Embedding a vector E for a word j The first word in the embedding has:
in the formula, v a 、W a And b a In order to train the parameters, the user may,denotes v a The transposed matrix of (2);
step S1012, setRepresentation for sentence x j In which r is l Representation for sentence x j Word embedding e of the first word in l The disturbance of (2); sentence x j Adding disturbance r j Is represented byFurther use ofIndicates that there isWherein the content of the first and second substances,
wherein g is | x j L g l The splicing of the two pieces of the paper is carried out,indicating e for word embedding l Is calculated by gradient, | · | 2 The norm of L2 is shown,l (-) denotes the loss for the jth sample, ∈ is the parameter used to control the degree of perturbation;
based on countering disturbancesMinimizing maximum likelihood of resistance trainingThereby obtaining a sentence for the sentence x j Worst disturbance ofThe setup of the confrontational training is as follows:
3. The knowledge-enhancement-based cross-language structured emotion analysis method of claim 1, wherein the step S102 comprises the steps of:
during training, training corpora are obtainedMiddle sentence x j The set of relationships of the syntax parse tree is denoted as E j (ii) a As a sentence x j Construction drawing G j ,G j =(V j ,E j ),v l Is the l-th node, is sentence x j The first word w in (1) l (ii) a Based on the graph G j Establishing an adjacency matrixIf it is in the graph G j Node v in m And node v n There is a connecting edge between them, then A mn 1, otherwise A mn =0,A mn For the elements adjacent to the m-th row and n-th column in the matrix A, a graph G is obtained j And has an entry and exit degree matrix D mm =∑ n A mn ,D mm The element in the mth row and mth column of the access matrix D is 0, and the other elements in the access matrix D are 0.
Based on the graph G j The adjacency matrix A and the degree of access matrix D construct a sentence x j The syntax GCN encoder of (1), the hidden representation of the p-th graphics convolution layerIs learned from the hidden representation of its adjacent p-1 th layer of the graphics convolution layer,as a sentence x j The first word w in (1) l Hidden representation at p-th layer, H (p) Is represented as follows:
Adding the perturbed sentence x obtained in step S102 j Word-embedded vectorAfter the syntax GCN encoder is input, a structural representation of a unified space is obtained from a P +1 layerWherein the content of the first and second substances,representing the finally obtained sentence x j The first word w in l The hidden representation of the data is obtained, thereby obtaining the structural hidden representation with rich and robust information.
4. The knowledge-enhancement-based cross-language structured emotion analysis method of claim 3, wherein the step S103 specifically comprises the steps of:
step S1031, viewpoint word extraction: to extract the opinion words in sentences, during training, two binary classifiers are used to predict sentence x j The first word w in l Is the term of point of view e k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula: CE (·) represents a cross-entropy function;is a sample viewpoint word e k Labels for the start and end positions, if l is the term e k A start or end position of, thenEqual to 1;
step S1032, target word extraction: to take into account the opinion information when extracting the target, the predicted sentence x is predicted from the opinion term tokens when training j The first word w in l Is a target t k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula,Are parameters that can be learned;is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the start position,is a term of view e obtained by a syntactic GCN encoder k A hidden layer representation of the word at the end position of (a); [ a; b]The connection operation of a and b is shown;
in the formula (I), the compound is shown in the specification,is the sample object t k Labels for start and end positions, if l is the target t k A start or end position of, thenEqual to 1;
step S1033, the holder extracts: during training, the sentence x is predicted according to the viewpoint word representation j The first word w in l Is the holder h k Probability of starting position ofOr probability of end positionAs shown in the following formula:
in the formula (I), the compound is shown in the specification,is the sample holder h k Labels for the start and end positions, if l is the holder h k A start or end position of, thenEqual to 1;
step S1034, emotion polarity prediction:
during training, the method of maximum pooling is used to obtain sentence x j The sentence of (1) characterizes r s =Maxpooling(H (P) ) And connecting the character with the viewpoint word characteristics for polarity classification to obtain a sentence x j Expressing emotional polarity p k Probability of (c):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210423028.0A CN114970557A (en) | 2022-04-21 | 2022-04-21 | Knowledge enhancement-based cross-language structured emotion analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210423028.0A CN114970557A (en) | 2022-04-21 | 2022-04-21 | Knowledge enhancement-based cross-language structured emotion analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114970557A true CN114970557A (en) | 2022-08-30 |
Family
ID=82978789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210423028.0A Pending CN114970557A (en) | 2022-04-21 | 2022-04-21 | Knowledge enhancement-based cross-language structured emotion analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114970557A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115204183A (en) * | 2022-09-19 | 2022-10-18 | 华南师范大学 | Knowledge enhancement based dual-channel emotion analysis method, device and equipment |
-
2022
- 2022-04-21 CN CN202210423028.0A patent/CN114970557A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115204183A (en) * | 2022-09-19 | 2022-10-18 | 华南师范大学 | Knowledge enhancement based dual-channel emotion analysis method, device and equipment |
CN115204183B (en) * | 2022-09-19 | 2022-12-27 | 华南师范大学 | Knowledge enhancement-based two-channel emotion analysis method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Peng et al. | A survey on deep learning for textual emotion analysis in social networks | |
Palangi et al. | Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval | |
Qiu et al. | DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
Peng et al. | Phonetic-enriched text representation for Chinese sentiment analysis with reinforcement learning | |
CN109753566A (en) | The model training method of cross-cutting sentiment analysis based on convolutional neural networks | |
Cai et al. | Intelligent question answering in restricted domains using deep learning and question pair matching | |
Svoboda et al. | New word analogy corpus for exploring embeddings of Czech words | |
Li et al. | UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning | |
CN110046353A (en) | Aspect level emotion analysis method based on multi-language level mechanism | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
Dou et al. | Improving word embeddings for antonym detection using thesauri and sentiwordnet | |
CN110222344A (en) | A kind of composition factor analysis algorithm taught for pupil's composition | |
Zahidi et al. | Different valuable tools for Arabic sentiment analysis: a comparative evaluation. | |
CN112784602A (en) | News emotion entity extraction method based on remote supervision | |
Huang et al. | A window-based self-attention approach for sentence encoding | |
Steur et al. | Next-generation neural networks: Capsule networks with routing-by-agreement for text classification | |
Marreddy et al. | Multi-task text classification using graph convolutional networks for large-scale low resource language | |
Da et al. | Deep learning based dual encoder retrieval model for citation recommendation | |
Yao | Attention-based BiLSTM neural networks for sentiment classification of short texts | |
Xie et al. | A novel attention based CNN model for emotion intensity prediction | |
Tarride et al. | A comparative study of information extraction strategies using an attention-based neural network | |
CN114970557A (en) | Knowledge enhancement-based cross-language structured emotion analysis method | |
Ojo et al. | Transformer-based approaches to sentiment detection | |
CN111159405B (en) | Irony detection method based on background knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |