CN111160027A

CN111160027A - Cyclic neural network event time sequence relation identification method based on semantic attention

Info

Publication number: CN111160027A
Application number: CN201911335582.8A
Authority: CN
Inventors: 徐小良; 高通; 王宇翔
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-15

Abstract

The invention discloses a method for identifying a time sequence relation of a recurrent neural network event based on semantic attention, which mainly comprises the following steps: firstly, carrying out syntactic dependency relationship analysis on an input event sentence, intercepting related trigger word meaning dependency branches, and then acquiring a corresponding hidden state vector by using a recurrent neural network; then, calculating attention weight vectors except for the trigger words, fusing different participles according to different weights, and splicing the participles with the trigger words to obtain event sentence state vectors; finally, the event sentence state vector is put intosoftmaxAnd predicting the time sequence relation in the function. The method can effectively capture the semantic information hidden in the event sentence, and can effectively associate and fuse different participles, thereby improving the identification precision of the event time sequence relation.

Description

Cyclic neural network event time sequence relation identification method based on semantic attention

Technical Field

The invention relates to the field of natural language processing, in particular to a recurrent neural network event time sequence relation identification method based on semantic attention.

Background

Events have received much attention in the field of natural language processing as an important form of knowledge representation. An event is a set of related descriptions about a subject that objectively characterize what a particular subject (person or persons and objects) happens in a detailed time and place environment as an important way of conveying information. The event time sequence relation refers to the chronological sequence relation of events when the events occur, is a semantic relation among the events, and connects the evolution process of a certain subject event from the beginning to the end and the mutual relation of the events. An example of event timing relationship identification (taken from the Timebank-detect corpus) is listed below.

Event sentence 1: conseco Inc. sand it is sealing for the reconstruction on Dec 7 of the 800000remaining shares.

Event sentence 2: the actual center all conversion rights on The stockwell terminate on Nov 30.

In the above example, there are two events, "catching" and "terminate," which are preceded by a timing relationship, i.e., the "catching" event occurs before the "terminate" event. The goal of event timing relationship identification is to accurately identify the timing relationship of related events according to a given corpus.

In the previous research work, the most common method for determining the time sequence relationship of events is a pattern matching method, and the method identifies the relationship between events by the relationship between event triggering words through matching the event relationship pairs in the text according to the manually defined template. The trigger is a predicate for identifying an event, and there are many common verbs and nouns. However, the manually defined event relationship template is limited by the data format or content, and is also prone to low recall rate. In addition, the template construction usually has certain field limitations, different fields make different templates, and a universal event relation template cannot be applied to various different types of events. With the establishment of a corpus and a knowledge base, a plurality of research works begin to introduce a machine learning method to carry out the research of event timing, the basic idea is to obtain the dependency relationship and entity labels among each participle in a sentence by additionally analyzing the syntax and lexical characteristics of related sentences, and finally put the participle into classifiers such as an SVM (support vector machine) for classification, but the accuracy rate of the obtained timing relationship is low and is only more than 40%. With the rapid development of deep learning technology, some researchers apply models such as CNN and RNN to event timing sequence recognition, and the effect is further improved. Subsequently, some researchers have applied semantic dependencies to construct the input vector representation by intercepting the shortest dependency path associated with the trigger. However, the method only intercepts the unidirectional branches related to the trigger words, neglects the information of partial neighbors of the trigger words, and may cause the omission of some important semantic information.

Through analysis, the semantic information implicit in the event sentence is difficult to capture by the methods, and effective connection and information fusion are lacked among different participles, so that the recognition accuracy is not ideal.

Disclosure of Invention

The invention provides a recurrent neural network event time sequence relation recognition method based on a semantic attention mechanism, and aims to solve the problems that semantic information implicit in an event sentence is difficult to capture and effective connection and fusion information among different participles are lacked in the conventional event time sequence relation recognition method.

The technical scheme of the invention is as follows:

step 1: and constructing a trigger word sense dependent branch. Trigger is a predicate used to identify an event, and there are many verbs and nouns in general. Firstly, carrying out syntactic dependency relationship analysis on an input event sentence to obtain a complete dependency syntactic tree, searching the position of a trigger word, and searching a father node and a brother node of the trigger word until a root node is finished; if the trigger is not a leaf node, its child nodes are recursively searched downward from the trigger position. And combining the two parts of information to form a trigger word meaning dependence branch. Each participle in the trigger word sense dependent branch has three corresponding vectors, namely a word vector x_vPart of speech vector x_pAnd dependent branch vector x_t. The three vectors are spliced to form an input vector x of the word segmentation, namely:

step 2: a hidden state vector is obtained. Respectively training from the head and the tail of the semantic dependent branch of the trigger word by utilizing a cyclic neural network to obtain the forward propagation information h of the event sentence_leftAnd back propagation information h_rightAnd then splicing the two vectors to obtain a hidden state vector h corresponding to the input vector x, namely:

h＝[h_left；h_right](2)

step3, calculating attention weight vectors except for trigger words, wherein different participles in an event sentence have different influence degrees on event time sequence, and adjusting the influence degrees among different words by introducing the attention weight vector β₁,h₂,h₃,…,h_mAnd m is the number of word segmentation. Then, calculation of the attention weight vector excluding the trigger word is started.

u_i＝tanh(W_uh_i+b_u) ⑶

Wherein, W_uIs a weight vector, b_uAs an offset value, β_iFor a certain participle h in an event sentence_iT represents the position index of the trigger word in the event sentence.

Fusing each participle to different degrees according to the attention weight vector obtained by calculation, and splicing the participle with the trigger word vector to obtain an event sentence state vector e^*Namely:

wherein h is_tRepresenting a trigger word hidden state vector.

Step 4: and (6) classifying the result. The experimental corpus is trained in the form of event sentence pairs, namely, two event sentences exist in a line of corpus, and after each event sentence is trained through the steps, state vectors are respectively obtained

And

the two vectors are spliced and then put into a softmax function for classification, and the most possible time sequence relation is predicted, namely:

wherein, W_leftAnd W_rightRepresenting a weight vector, W, with respect to the state vector_classRepresenting weight vectors with respect to classification, b_classRepresenting the bias value for the classification.

The invention has the beneficial effects that:

(1) the invention provides a cyclic neural network event time sequence relation identification method based on a semantic attention mechanism. The method can effectively capture the semantic information hidden in the event sentence, and effective fusion and connection can be established among different participles.

(2) The invention provides a recurrent neural network event time sequence relation identification method based on a semantic attention mechanism, which calculates attention weight vectors except for trigger words. Different participles in the event sentence have different degrees of influence on the event time sequence, different vectors are fused in different degrees by calculating the attention weight vector, and because the trigger word vector is the most important vector in the event sentence, the trigger word is not put into the calculation of the attention weight vector, but is additionally spliced in the subsequent vector calculation fusion, so that the trigger word information is completely reserved.

Drawings

FIG. 1 is an example 1 of triggering word sense dependent branches referred to in the identification of the recurrent neural network event timing relationship based on the semantic attention mechanism proposed by the present invention.

FIG. 2 is an example 2 of trigger word sense dependent branches referred to in the identification of recurrent neural network event timing relationships based on the semantic attention mechanism proposed by the present invention.

FIG. 3 is a flow chart of the recurrent neural network event timing relationship identification based on the semantic attention mechanism proposed by the present invention.

FIG. 4 is a model diagram of the recurrent neural network event timing relationship identification based on the semantic attention mechanism proposed in the present invention.

Detailed Description

For a better understanding of the present invention, the invention will be further explained with reference to the accompanying drawings and specific examples, wherein the following detailed description is given:

the invention comprises the following steps:

step 1: and constructing a trigger word sense dependent branch. Trigger is a predicate used to identify an event, and there are many verbs and nouns in general. Firstly, carrying out syntactic dependency relationship analysis on an input event sentence to obtain a complete dependency syntactic tree, searching the position of a trigger word, and searching a father node and a brother node of the trigger word until a root node is finished; if the trigger is not a leaf node, its child nodes are recursively searched downward from the trigger position. Through experimental result analysis, the effect is best when the two times of downward recursion searching are carried out, the method can effectively capture the semantic information hidden in the event sentence, and effective fusion and connection can be established among different participles. And combining the two parts of information to form a trigger word meaning dependence branch. Each participle in the trigger word sense dependent branch has three corresponding vectors, namely a word vector x_vPart of speech vector x_pAnd dependent branch vector x_t. The three vectors are spliced to form an input vector x of the word segmentation, namely:

for example, for a Timebank-Dense corpus:

And analyzing the syntactic dependency relationship of the event sentence to obtain a complete dependency syntactic tree, wherein the specific tree structure is shown in fig. 1 and fig. 2. Then finding the specific positions of the triggering words "capturing" and "terminate", and starting from the current position of the triggering word, searching the participles related to the triggering word according to the above rules. For the first event sentence, the participles related to the triggering word "catching" are "said", "it", "is", "for", "redaction" by searching; for the second event sentence, the participle related to the trigger word "terminate" is "said", "rights", "will", "all", "conversion", "on", "Nov" by search. By triggering the word dependence branches, the parts of speech and the dependence relationship of the words can be acquired. Converting them into corresponding vector information, and obtaining the vector representation e of the event sentence 1₁＝{x_said,x_it,x_is,x_calling,x_for,x_redemptionVector representation e of f and event sentence 2₂＝{x_said,x_rights,x_all,x_will,x_terminate,x_conversion,x_on,x_Nov}。

h＝[h_left；h_right]⑼

e.g. event sentence e described above₁And e₂Training can obtain corresponding hidden state vector

And

step3, calculating attention weight vectors except for trigger words, wherein different participles in an event sentence have different influence degrees on event time sequences, and the influence degrees among different words are adjusted by introducing the attention weight vector β₁,h₂,h₃,…,h_mAnd m is the number of word segmentation. Then, calculation of the attention weight vector excluding the trigger word is started.

u_i＝tanh(W_uh_i+b_u) (10)

wherein h is_tRepresenting a trigger word hidden state vector.

Event sentence h such as described above_e1Invoking the above formula can obtain u_said＝tanh(W_uh_said+b_u)，u_it＝tanh(W_uh_it+b_u)，u_for＝tanh(W_uh_for+b_u) And u_redemption＝tanh(W_uh_redemption+b_u) And each participle is then given a corresponding attention weight value β_said，β_it，β_forAnd β_redemption. Then calculate the event sentence h_e1State vector of

Similarly, an event sentence h can be obtained_e2State vector of

And

Event sentence state vectors such as those described above

And

and vector splicing and putting the vector spliced data into a softmax method to obtain an array with the length of 6. Because six relationships are defined in the Timebank-detect corpus, an array of length 6 is created. The probability of the time sequence relation 'BEFORE' is presumed to be the maximum according to the result, so that the time sequence relation of the predicted events 'catching' and 'terminate' is 'BEFORE'.

The experiment used the accuracy P, recall R and F1 values as evaluation criteria. Five different experimental tasks are set in the experiment, and the DP-based LSTMs models proposed by CNN, LSTM, Bi-LSTM and Cheng Fei and the method provided by the invention are respectively used for training and comparison, and the actual results are shown in the table:

TABLE 1 comparative results of the experiments

As can be seen from the above training data, Bi-LSTM has better processing effect than the traditional CNN model. Because CNN can only extract the feature with unchanged position in the process of capturing the word feature of the sentence and lacks the consideration of the global context information, and the hidden state of Bi-LSTM can fully memorize and learn the information of the whole context, better performance effect is obtained. The DP-based LSTMs model is different from the input vector of the Bi-LSTM, the DP-based LSTMs model only intercepts the shortest dependent path, ignores the information of the neighbor nodes of the trigger word part, causes the omission of some important semantic information, and only uses the single-layer LSTM for training, and has a little deviation of the actual effect. An attention mechanism is introduced on the basis of the Bi-LSTM model, and the experimental effect is further improved. The attention mechanism is that for the Bi-LSTM model, the influence degree of different participles on the context can be obtained, so that the context relation between the participles and the event trigger word can be fully mined, and finally the time sequence relation of the event candidate pair is correctly predicted.

The input vectors of this experiment included word vector W_vPart of speech vector W_pAnd dependent branch vector W_t. The three vectors are combined, and the influence of different vector combinations on the event timing identification is observed.

TABLE 2 different input vector combinations influence the results

As can be seen from the results in table 2, when a word vector, a part-of-speech vector, and a dependent branch vector are simultaneously input during input, context semantic information can be sufficiently represented, and event timing identification can be performed better.

The embodiments of the present invention are explained in detail with reference to the drawings, but the embodiments of the present invention are not limited thereto, and modifications and substitutions by other skilled persons based on the present invention are within the protection scope of the present invention.

Claims

1. A recurrent neural network event time sequence relation identification method based on semantic attention comprises the following steps:

step 1: constructing a semantic dependent branch of the trigger word;

analyzing the syntactic dependency relationship of the event sentence to obtain a complete dependency syntactic tree, finding the position of the trigger word, obtaining the father brother node of the trigger word, and then upwards recursively finding the father node of the trigger word until the root node is finished; if the trigger word is not a leaf node, searching a child node of the trigger word from the position of the trigger word in a downward recursion mode;

combining the two parts of information obtained by the upward and downward recursive search to form a trigger word semantic dependence branch; wherein each participle in the trigger word sense dependent branch has three corresponding vectors, namely a word vector x_vPart of speech vector x_pAnd dependent branch vector x_t(ii) a The three vectors are spliced to form an input vector x of the word segmentation, namely:

step 2: acquiring a hidden state vector;

respectively training from the head and the tail of the semantic dependent branch of the trigger word by utilizing a cyclic neural network to obtain a forward propagation information vector h of the event sentence_leftAnd a back propagation information vector h_rightAnd then splicing the two vectors to obtain a hidden state vector h corresponding to the input vector x, namely:

h＝[h_left；h_right](2)

step 3: calculating attention weight vectors except for the trigger words;

let the hidden state vector of the trigger word sense dependent branch be h ═ h₁,h₂,h₃,…,h_mM is the number of word segments; calculating attention weight vectors except for the trigger words;

u_i＝tanh(W_uh_i+b_u) ⑶

wherein, W_uIs a weight vector, b_uAs an offset value, β_iFor a certain participle h in an event sentence_iT represents a position subscript of the trigger word in the event sentence;

wherein h is_tRepresenting a hidden state vector of the trigger word, W₁And W₂Is a shared learned weight vector;

step 4: classifying results;

after each event sentence passes through the steps, state vectors are respectively obtained

And

wherein, W_classRepresenting weight vectors with respect to classification, b_classRepresenting the bias value for the classification.