CN115455937A

CN115455937A - Negative analysis method based on syntactic structure and comparative learning

Info

Publication number: CN115455937A
Application number: CN202210980687.4A
Authority: CN
Inventors: 庞晓轩; 蔡铭
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-12-09

Abstract

The invention discloses a negative analysis method based on a syntactic structure and comparative learning. Firstly, syntactic information in a dependency tree and a composition tree is introduced, and then information interaction is carried out through a graph convolution neural network; meanwhile, point and edge information is added outside the topological structure of the graph, and information fusion is carried out through an attention mechanism; secondly, performing negative word recognition and negative scope recognition by using a conditional random field; randomly replacing words in the sentence with different types of negative words, and performing semantic association on the different types of negative words through contrast learning so as to enhance the recognition capability of the model on the difficult negative words; and finally, adopting a replacement word detection task to strengthen the sensitivity of the model to the negative words. The invention injects syntactic information into the pre-training model, enhances the recognition capability of difficult negative words through comparison and learning, and achieves the advanced level of the current research in negative analysis of official data sets.

Description

Negative analysis method based on syntactic structure and comparative learning

Technical Field

The invention belongs to the field of negative analysis, and particularly relates to a negative analysis method based on a syntactic structure and comparative learning.

Background

The invention solves the negative word recognition (negotion Detection) and negative Scope recognition (negotion Scope Detection) tasks. The task is essentially a sequence annotation task of sentences, namely, negative words in the sentences are annotated and the influence ranges of the negative words in the sentences are annotated. For example, given a sentence "I own him nothing, and his friends area not mine", two negative words in the sentence need to be predicted, which are "nothing" and "not", respectively; and the negative action domains corresponding to the two are predicted, namely ' I own him ' and ' his friend area ' \ 8230and mine '.

In the past, a rule-like model is established through data analysis to identify the positions of negative words and action domains thereof, or the relation between the preceding text and the following text in a word vector is strengthened by using sequence-type features. In recent years, more and more negative analysis work has focused on pre-trained models with stronger linguistic representation capabilities. However, the current technology only uses one classification label output by the pre-training model to describe the position of the negative word and its scope in the sentence, and this way often ignores the rich syntactic structure information in the negative sentence. In addition, the diversity of negative word expressions is not considered in the current technology, the same negative analysis model is adopted for negative expressions in different forms such as limited negative words, pronoun negative words, connection negative words and the like, and the simplification method definitely omits negative related partial semantic information.

Aiming at the two problems, the invention provides a negative analysis method based on syntactic structure and comparative learning, which injects syntactic information into a pre-training model and semantically associates different types of negative words through the comparative learning, so that the negative analysis effect reaches the advanced level of the current research.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a negative analysis method based on syntactic structure and comparative learning.

The purpose of the invention is realized by the following technical scheme: a negative analysis method based on syntactic structure and comparative learning comprises the following steps:

a negative analysis method based on syntactic structure and comparative learning is characterized by comprising the following steps:

(1) Replacing a word in an original sentence with negative words of different types, replacing the word with a synonym and an antisense word to obtain a plurality of replacement sentences of the sentence, wherein each replacement sentence and the original sentence jointly form a sentence group, integrating the obtained sentence groups into a negative replacement data set, and encoding each sentence in the data set into a vector by using BERT based on a converter;

(2) Inputting the vectors obtained in the step (1) into a contrast learning model, wherein the model draws the BERT vectors of sentence pairs with similar meanings in the sentence group closer by using a contrast learning loss function and pushes the BERT vectors of sentence pairs with opposite meanings farther; inputting the vector obtained in the step (1) into a replacement word detection model, predicting whether a word is replaced in the step (1) by the model, and updating a BERT vector by using a replacement word detection loss; the BERT vector is updated to BERT;

(3) Inputting a sentence to be analyzed, a position of a negative word in the sentence and a corresponding negative scope of the negative word in the sentence, carrying out vector coding by using BERT to obtain a new sentence vector, and using segment coding of replacing the BERT by linear position coding as indicating information of the position of the negative word in the sentence;

(4) Inputting the new sentence vector obtained in the step (3) into a dependency graph convolution neural network, if a dependency relationship exists between any two words in the sentence, performing convolution operation on the two word vectors and updating the two word vectors by the dependency graph convolution neural network, wherein the updated information quantity is determined by an embedded matrix and a gating mechanism of the dependency relationship; inputting the new sentence vector obtained in the step (3) into a component diagram convolutional neural network, firstly coding component names through a component coding layer, averagely adding word vectors into component name codes, then inputting the coding vector into the component diagram convolutional neural network layer for convolution operation, and finally returning the component vectors to the words through an attention mechanism by a component decoding layer;

(5) And (4) splicing the last layer of hidden layer vector of the dependency graph convolutional neural network and the component decoding layer vector in the step (4) to obtain a sentence vector injected with syntax information, and then carrying out sequence labeling.

Further, the negative part-of-speech type in step (1) includes qualifiers, adjectives, adverbs, verbs.

Further, in the step (2), the BERT vectors of sentence pairs with similar meanings in the sentence group are pulled up, and the BERT vectors of sentence pairs with opposite meanings are pushed away, specifically, sentence vectors in two sentence pairs consisting of the original sentence, the anti-sense word replacing sentence and the limiting word replacing sentence are pulled up to the direction of the other sentence vector in the sentence pair; the sentence vectors in the other four sentence pairs are pushed away in the opposite direction to the other sentence vector in the sentence pair.

Further, the vector coding in the step (3) refers to the last hidden layer vector coding of BERT, and a bidirectional long-short term memory network layer is introduced on the basis of the coding to enhance the sequence modeling capability of the model, the number of layers is set to be 2, and the hidden layer dimension is set to be 100.

Further, the topological structure of the dependency graph in the step (4) is obtained by analyzing a Stanford analyzer, the number of layers of the convolutional neural network is set to be 4, and the dimension of a hidden layer is set to be 100; and a self-loop and relation graph convolution neural network mechanism is adopted, so that the dependency relationship among the words is captured more accurately, and the embedding dimension of the edge relationship is 30.

Further, the topological structure of the component diagram in the step (4) is obtained by analyzing a Stanford resolver; the coding layer performs average weighting integration on word information to form component information, and simultaneously introduces component embedding, so that a model can accurately capture component syntax information, and the dimensionality of the component syntax information is set to be 30; the number of layers of the convolutional neural network is set to be 4, and the dimension of a hidden layer is set to be 100; the decoding layer is accomplished by an attention mechanism that causes the model to return constituent information to the word.

Further, the sequence label in the step (5) is composed of a conditional random field and a normalized exponential function classifier.

The invention has the following beneficial effects:

the method can inject syntactic information into the pre-training model and fully consider the diversity of negative words, and the effect on the standard data set of SEM-2012 Share Task CD-SCO negative word recognition and negative scope recognition reaches the advanced level of the current research.

Drawings

FIG. 1 is a diagram of a graph convolution neural network model based on a syntactic structure according to the present invention;

FIG. 2 is a schematic diagram of a comparative learning model structure according to the present invention.

Detailed Description

The following are specific examples of the present invention and further describe the technical solutions of the present invention, but the present invention is not limited to these examples.

The invention provides a negative analysis method based on syntactic structure and comparative learning, which comprises the following implementation steps of:

(1) Randomly replacing a word in a sentence with different types of negative words, then replacing the word with a synonym and an antisense word to obtain a plurality of replacement sentences of the sentence, and forming a sentence group by the replacement sentences and the original sentence. For example, for the sentence "He wa polar and professional", the word "polar" is replaced by the negative word "not polar" defining the word type, the synonym "courteous", and the antisense word "immunopolite", which together form a sentence group. The multiple sentence sets are integrated into a negative replacement data set. Each sentence in the data set is then encoded into a vector format using a converter-based bidirectional representation encoder (BERT). The negative word types comprise qualifiers, adjectives, adverbs and verbs.

(2) And (2) inputting the sentence vectors obtained in the step (1) into a contrast learning model, and using a contrast learning loss function to pull up BERT vectors of sentence pairs with similar meanings in the sentence group and push away the BERT vectors of sentence pairs with opposite meanings. Specifically, the sentence vectors in two sentence pairs consisting of the original sentence, the anti-synonym replacing sentence and the qualifier replacing sentence are pulled to the direction of the other sentence vector in the sentence pair; the sentence vectors in the other four sentence pairs in fig. 2 are pushed away in the opposite direction to the other sentence vector in the sentence pair.

(3) Inputting the sentence vector obtained in the step (1) into a replacement word detection model, predicting whether a word is replaced in the step (1) or not by the model, and updating the BERT vector by using the loss of the replacement word detection. For example, for the above example sentence "He wa polar and professional", the replacement word detection model predicts "polar" as the replaced word, and the model uses context-dependent coding information derived from the last hidden layer coding information of BERT.

(4) Through steps (2) and (3), the BERT vector is updated to BERT.

(5) Inputting a sentence to be analyzed, a position of a negative word in the sentence and a corresponding negative scope of the negative word in the sentence, and coding by using BERT to obtain sentence embedding. And (3) adopting linear Position coding (Position Embedding) to replace the segment coding of BERT as the indication information of the Position of the negative word in the sentence.

(6) And (4) embedding the sentence obtained in the step (5) into an input Dependency graph convolutional neural network (Dependency GCN), if a Dependency relationship exists between any two words in the sentence, performing convolution operation on the two word vectors by the Dependency graph convolutional neural network and updating, wherein the updated information quantity is determined by an embedding matrix and a gating mechanism of the Dependency relationship. In the sentence "You can not move", the "You", "can", "not" and "move" generate dependency relationships in three dependency forms of "nsubj", "aux" and "neg", respectively, as shown in fig. 1, and the dependency graph convolution neural network performs convolution operation and vector update on the words generating mutual dependency according to the difference of the dependency forms.

(7) And (4) embedding the sentences obtained in the step (5) into an input component graph convolutional neural network (Constitient GCN), wherein the input component graph convolutional neural network comprises a component coding layer, a component graph convolutional neural network layer and a component decoding layer. Firstly, the component names are coded through a component coding layer, word vectors are evenly added into component name codes, then the coded vectors are input into a component diagram convolution neural network layer to carry out convolution operation, and finally the component vectors are returned to the words through an attention mechanism by a component decoding layer. As shown in fig. 1, in the sentence "You can not move", the component analysis result is a tree result composed of "S", "NP", and "VP", and child nodes of each component in the tree are other components or words. The component encoding layer transmits vectors in the tree to a father node from a child node, the component graph convolutional neural network layer performs convolutional operation and vector updating on the components according to different component forms, and the component decoding layer returns the component vectors to word vectors through a Attention mechanism (C-T Attention).

(8) And (4) splicing the last layer of hidden layer vector representation of the dependency graph convolutional neural network in the step (6) and the decoding layer vector representation in the step (4) to obtain a sentence representation after syntax information is injected, and then carrying out sequence labeling. As shown in FIG. 1, the sequence notation in the sentence "You can not move" is "B I O B". Wherein B is the beginning word of the negative scope, I is the middle or end word of the shortened negative scope, and O indicates that the negative scope is not in any negative scope.

The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims

1. A negative analysis method based on syntactic structure and comparative learning is characterized by comprising the following steps:

(2) Inputting the vector obtained in the step (1) into a comparison learning model, wherein the model draws the BERT vectors of sentence pairs with similar meanings in the sentence group closer by using a comparison learning loss function, and pushes the BERT vectors of sentence pairs with opposite meanings farther; inputting the vector obtained in the step (1) into a replacement word detection model, predicting whether a word is replaced in the step (1) by the model, and updating a BERT vector by using a replacement word detection loss; the BERT vector is updated to BERT;

(4) Inputting the new sentence vector obtained in the step (3) into a dependency graph convolution neural network, if a dependency relationship exists between any two words in the sentence, carrying out convolution operation on the two word vectors and updating the two word vectors by the dependency graph convolution neural network, wherein the updated information quantity is determined by an embedded matrix and a gating mechanism of the dependency relationship; inputting the new sentence vector obtained in the step (3) into a component diagram convolutional neural network, firstly coding component names through a component coding layer, averagely adding word vectors into component name codes, then inputting the coding vector into the component diagram convolutional neural network layer for convolution operation, and finally returning the component vectors to the words through an attention mechanism by a component decoding layer;

2. The negative analysis method based on syntactic structure and comparative learning according to claim 1, wherein the negative word types in step (1) comprise qualifier, adjective, adverb, verb.

3. The negative analysis method based on syntactic structure and comparative learning according to claim 1, wherein said step (2) draws back BERT vectors of sentence pairs with similar meanings in the sentence group and draws back BERT vectors of sentence pairs with opposite meanings, specifically, draws back sentence vectors of two sentence pairs consisting of an original sentence and an anti-sense word replacement sentence, an anti-sense word replacement sentence and a qualifier replacement sentence in the direction of the other sentence vector in the sentence pair; the sentence vectors in the other four sentence pairs are pushed away in the opposite direction to the other sentence vector in the sentence pair.

4. The negative analysis method based on syntax structure and contrast learning according to claim 1, wherein the vector coding in step (3) refers to the vector coding of the last hidden layer of BERT, and on the basis of the coding, a bidirectional long-short term memory network layer is introduced to enhance the sequence modeling capability of the model, the number of layers is set to 2, and the hidden layer dimension is set to 100.

5. The negative analysis method based on syntactic structure and contrast learning according to claim 1, wherein the topological structure of the dependency graph in step (4) is obtained by parsing with a stanford parser, the number of layers of the convolutional neural network is set to 4, and the hidden layer dimension is set to 100; and a self-loop and relation graph convolution neural network mechanism is adopted, so that the dependency relationship between words is captured more accurately, and the embedding dimension of the edge relationship is 30.

6. The negative analysis method based on syntactic structure and comparative learning of claim 1, wherein the topology of the component diagram in step (4) is obtained by parsing with a Stanford parser; the coding layer performs average weighting integration on word information to form component information, and simultaneously introduces component embedding, so that a model can accurately capture component syntax information, and the dimensionality of the component syntax information is set to be 30; the number of layers of the convolutional neural network is set to be 4, and the dimension of a hidden layer is set to be 100; the decoding layer is completed by an attention mechanism that causes the model to return constituent information to the word.

7. The negative analysis method based on syntactic structure and contrast learning of claim 1, wherein said sequence label in step (5) is composed of conditional random field and normalized exponential function classifier.