CN114707508A

CN114707508A - Event detection method based on multi-hop neighbor information fusion of graph structure

Info

Publication number: CN114707508A
Application number: CN202210388221.5A
Authority: CN
Inventors: 李川; 田国强
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-07-05
Anticipated expiration: 2042-04-13
Also published as: CN114707508B

Abstract

The invention provides an event detection method based on multi-hop neighbor information fusion of a graph structure, which comprises the following implementation steps: (1) generating a training set; (2) constructing a multi-hop neighbor information fusion network based on a graph structure; (3) training a multi-hop neighbor information fusion network based on a graph structure; (4) events in natural language text are detected. The invention constructs a multi-hop neighbor information fusion network based on a graph structure, utilizes the multi-hop syntax information in the syntax dependency tree and uses a multi-label attention mechanism to fuse the multi-hop syntax information, has the specific extraction of the more effective syntax characteristics for event detection, and improves the accuracy and efficiency of the event detection.

Description

Event detection method based on multi-hop neighbor information fusion of graph structure

Technical Field

The invention belongs to the technical field of computers, and further relates to an event detection method based on multi-hop neighbor information fusion of a graph structure in the technical field of natural language processing. The method and the device can detect the event type in the natural language text by detecting the trigger words expressing the events in the natural language text.

Background

Event detection is an important task of information extraction in natural language processing. The main target of the task is to identify the event examples presented in the text and determine the corresponding event types, and the task has wide application in the fields of intelligent transportation, social media, network public opinion analysis, event knowledge graph and the like. Event detection aims at detecting the trigger words of events in the text. The method based on the characteristic engineering is relatively dependent on the characteristics of manual design and lacks expansibility; the method based on the traditional machine learning has limitation in learning deep or more complex nonlinear relation; in recent years, methods based on deep learning are widely used in the field of natural language processing, can automatically extract features from natural language texts, and are very effective in event detection tasks. However, the event detection method based on the sequence structure in the prior art is low in efficiency in extracting the dependence relationship with a longer distance. The event detection method based on the graph structure can only learn the syntactic information of direct neighbors of nodes corresponding to the current words in the syntactic graph, but ignore the neighbor node information of multi-hop syntactic arcs between the current words, and when the current candidate trigger word and other words have dependency relationships and multi-hop syntactic arc connection, the method possibly ignores the relationship, thereby influencing the performance of event detection. And lack of focusing attention on more effective features for the event detection task leads to the same weight assigned to all features, thus being difficult to distinguish more important features in sentences, which is also a short board in the existing event detection technology.

An event detection method based on a hybrid attention network is proposed in a patent document ' an event detection method and device based on the hybrid attention network ' (application number: 202011600231.8, application publication number: CN 112307740A) applied by the national defense science and technology university of China's liberation army. Firstly, preprocessing a plurality of languages of texts, and converting an input text into a bit vector sequence; then, the method adopts a bidirectional gating circulation unit to respectively process vector sequences in the forward direction and the backward direction by using a BiGRU network consisting of two GRU layers; finally, the bi-directional gated round-robin unit representation of the input sequence is used as a representation vector for the entire sentence, and event detection is accomplished by mixing the attention layer and the classification layer. The method has the disadvantages that the efficiency is low when the dependency relationship with a longer distance is extracted by only using the BiGRU network to code the sentences.

Thought in its published paper "BGCN: a method for detecting events based on a BERT model and a GCN network is provided in BERT and graph convolution network-based trigger word detection (3 months 2021 of No. 7 of Computer Science, volume 48). According to the event detection method, the BERT word vector is introduced to strengthen feature representation, a syntactic structure is introduced to capture long-distance dependence, and the event trigger word is detected. Firstly, the method uses BERT and training word vectors, combines other input characteristics to represent a sentence coding layer of an input sentence, and converts each word in a text sequence into a word vector; then, coding a vector sequence obtained by a sentence representation layer by using a bidirectional LSTM network; after passing through the Bi-LSTM coding layer, inputting the output vector sequence into a GCN layer based on syntax to extract syntax information; obtaining the expression vectors of all words through a GCN layer, and sending the expression vectors into a full connection layer to identify a trigger word label; finally, the trigger word classification task is completed using the Softmax layer. The method has the disadvantage that the GCN used for extracting the syntactic information can only capture the information of the words corresponding to the nodes directly adjacent to the current candidate trigger word when the syntactic characteristics are extracted. When there is a dependency relationship between the current candidate trigger word and other words and there is a multi-hop syntactic arc connection between them, the method may ignore such a relationship and thus affect the performance of event detection.

In summary, most methods based on graph structures in the prior art use Graph Convolution Networks (GCNs) for syntactic feature extraction, and the GCNs can only capture information of words corresponding to nodes directly adjacent to current candidate trigger words. When there is a dependency relationship between the current candidate trigger word and other words and there is a multi-hop syntactic arc connection between them, it is possible to ignore this relationship and thus affect the performance of event detection. In addition, the traditional event detection method does not focus on the more effective characteristics for event detection, so that the weights assigned to all the characteristics are the same, and the more important characteristics in the sentence are difficult to distinguish, so that the effect of event detection is influenced.

Disclosure of Invention

The invention aims to provide an event detection method based on multi-hop neighbor information fusion of a graph structure, aiming at overcoming the defects of the prior art, and solving the problems that the efficiency of the event detection method based on a sequence only is low, partial dependency relationship is possibly neglected based on a GCN structure, and the importance degree of different partial features is difficult to distinguish due to lack of attention mechanism in the event detection of the prior art.

The idea of achieving the purpose of the invention is that the invention constructs a sequence information extraction sub-network consisting of a forward GRU layer and a reverse GRU layer, and the network can respectively extract context characteristics in a forward and backward self-adaptive manner based on a bidirectional GRU network with a sequence structure. The invention constructs a syntactic information extraction sub-network consisting of three graph attention layers with completely same structures, and takes the context characteristics extracted by the sequence information extraction sub-network and the adjacent matrix corresponding to the syntactic dependency tree as the input of the syntactic information extraction sub-network. And then constructing a multi-label attention fusion sub-network, calculating the linear combination of the context vector of each classification label through an attention mechanism, and capturing the dense part of important information in a context paragraph, thereby identifying the importance degree of the candidate trigger word on each classification label, fully fusing and extracting the multi-hop syntactic information output by the sub-network by utilizing the syntactic information. And finally, constructing a trigger word recognition sub-network for recognizing the trigger words.

In order to achieve the purpose, the method comprises the following specific steps:

step 1, generating a training set:

step 1.1, selecting at least 500 natural language texts to form a sample set, wherein each text at least comprises 1 complete event, and each event at least comprises 1 trigger word;

step 1.2, labeling an event trigger word, event trigger word position information, part of speech information, entity type information and event type of each event sentence in each natural language text in a sample set;

step 1.3, obtaining a word vector corresponding to each trigger word in each marked text by using a word vector pre-training tool, and mapping all sentences in each text into a word vector matrix;

step 1.4, forming a training set by all word vector matrixes in the sample set;

step 2, constructing a multi-hop neighbor information fusion network based on a graph structure:

step 2.1, a sequence information extraction sub-network is built, and the structure of the sequence information extraction sub-network sequentially comprises the following steps: the device comprises a forward GRU layer, a reverse GRU layer, a hidden layer and a splicing layer; setting the time step length of both the forward GRU layer and the reverse GRU layer to be 30, and setting the number of the hidden layers to be 125; setting the dimension of the splicing layer to be 250;

step 2.2, constructing a syntax information extraction sub-network formed by connecting three first, second and third graph attention layers with the same structure in series; each of the drawing attention layers includes: the system comprises a first mapping layer, a splicing layer, a second mapping layer, a LeakyRelu activation layer, a Softmax activation layer, a third mapping layer and an ELU activation layer;

the number of input nodes of the first mapping layer, the number of output nodes of the second mapping layer, the number of input nodes of the first mapping layer, the number of output nodes of the third mapping layer, the number of output nodes of the first mapping layer, the number of output nodes of the third mapping layer, the number of input nodes of the first mapping layer, the number of output nodes of the third mapping layer, the number of input nodes of the third mapping layer, the number of output nodes of the third mapping layer, the number of input nodes of the first mapping layer, the number of input nodes of the third mapping layer, the number of input nodes of the first mapping layer, the third mapping layer, the input nodes, the number of input nodes of the first mapping layer, the third mapping layer, the input nodes, the first mapping layer, the third mapping layer, the first mapping layer, the third mapping layer, and the third mapping layer, the first mapping layer, the third mapping layer, the first mapping layer, the third mapping layer, the first mapping layer, the second mapping layer, the first mapping layer, the second mapping layer, the first mapping layer, and the second mapping layer, the third mapping layer, and the second mapping layer, the first mapping layer, and the first mapping layer, the second mapping layer, the first;

setting the number of input nodes of the splicing layer to be 150 and the number of output nodes to be 300;

the LeakyRelu activation layer is realized by using a LeakyRelu activation function, and the Softmax activation layer is realized by using a Softmax function; the ELU activation layer is realized by using an ELU activation function; in summary, the number of input nodes of the attention layer of the first graph is 250, and the number of output nodes is 150;

step 2.3, a multi-label attention fusion sub-network is built, and the structure sequentially comprises the following steps: a mapping layer, a Softmax layer, and a fusion layer; respectively setting the number of input nodes and the number of output nodes of the mapping layer to be 150; the Softmax activation layer is implemented by using a Softmax function, and the fusion layer performs summation operation on the product of the output of the Softmax layer and the output of each attention layer; setting the number of input and output nodes of the fusion layer to be 150;

step 2.4, a trigger word recognition sub-network is built, and the structure of the trigger word recognition sub-network sequentially comprises the following steps: the system comprises a first full connection layer, a ReLU activation layer, a second full connection layer and a Softmax activation layer; the number of input nodes of the first full-connection layer and the number of output nodes of the second full-connection layer are both set to be 150, and the number of output nodes of the second full-connection layer is respectively set to be 150 and 2; the ReLU activation layer is realized by using a ReLU activation function; the Softmax activation layer is realized by using a Softmax function;

2.5, sequentially connecting a sequence information extraction sub-network, a syntax information extraction sub-network, a multi-label attention fusion sub-network and a trigger word recognition sub-network in series to form a multi-hop neighbor information fusion network based on a graph structure;

step 3, training the multi-hop neighbor information fusion network based on the graph structure:

inputting the training set into a multi-hop neighbor information fusion network based on a graph structure, and iteratively updating parameters of each layer in the network by using a back propagation gradient descent method until a loss function with bias of the network is converged to obtain the trained multi-hop neighbor information fusion network based on the graph structure;

step 4, detecting events in the natural language text:

step 4.1, preprocessing each sentence in the natural language text to be detected by using a natural language processing tool to obtain an adjacency matrix corresponding to the syntax dependency tree of each sentence in the text to be detected;

step 4.2, obtaining a word vector corresponding to each trigger word of each sentence in the natural language text to be detected by using a word vector pre-training tool; forming word vectors of all trigger words in each sentence into a word vector matrix of the sentence;

and 4.3, inputting the adjacent matrix and the word vector matrix corresponding to the syntactic dependency tree of each sentence into a trained multi-hop neighbor information fusion network based on a graph structure, calculating the probability value of identifying the words in each sentence as event trigger words through a Softmax layer, and taking the category corresponding to the highest probability value as the event detection result.

Compared with the prior art, the invention has the following advantages:

first, since the present invention constructs a syntax information extraction sub-network that learns the importance of multi-hop neighbor words of each trigger word in a syntax graph by a graph attention network. The problem that the efficiency is low when the dependence relationship with a longer distance is extracted by an event detection method based on a sequence structure in the prior art is solved, and the defect that the event detection method based on a GCN structure can only learn the syntactic information of the direct neighbor of the node corresponding to the current word in the syntactic graph, so that the information of the words corresponding to the nodes connected with the current word by the multi-hop arcs can be possibly ignored is overcome, so that the invention can fully extract the dependency relationship between the characteristics of sentences and the words while keeping higher efficiency, and improves the accuracy of event detection.

Secondly, because the invention constructs a multi-label attention fusion sub-network, the sub-network calculates the linear combination of the context vector of each classification label through a self-attention mechanism, fully utilizes the context semantic information of the current word and captures the importance degree of the candidate trigger word to each classification label. The method and the device overcome the problem that the event detection effect is not ideal enough due to the fact that the prior art fails to focus attention on semantic features more effective for the event detection task during event detection, and improve the performance of event detection.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a multi-hop neighbor information fusion network based on a graph structure according to the present invention;

FIG. 3 is a diagram of a syntactic dependency tree generated by the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The steps of the present invention are described in further detail with reference to fig. 1.

Step 1, generating a training set.

Step 1.1, selecting at least 500 natural language texts to form a sample set, wherein each text at least comprises 1 complete event, and each event at least comprises 1 trigger word.

The event refers to: indicating that a state change occurred at a particular time and at a particular location, involving one or more participants.

The event trigger words refer to: the event information can accurately represent events and keywords, usually verbs or nouns, representing event types, which are core units of events.

The event trigger word position information is as follows: the absolute position of the event trigger word in the event sentence, that is, the event trigger word is the first word in the event sentence.

And step 1.2, labeling the event trigger words, the event trigger word position information, the part of speech information, the entity type information and the event types of each event sentence in each natural language text in the sample set.

And step 1.3, obtaining a Word vector corresponding to each trigger Word in each marked text by using an open source Word vector toolkit Word2vec, and mapping all sentences in each text into a Word vector matrix.

And 1.4, forming a training set by all word vector matrixes in the sample set.

The embodiment of the invention generates sample set data of a training set, adopts ACE2005 English corpus published by language data Association of the university of Pennsylvania in 2006, 2 months, and the English corpus comprises 599 documents, each document consists of a plurality of sentences, and each sentence consists of a plurality of words.

All the words in all the documents of the ACE2005 English corpus are input into a Skip-gram model of an open source Word vector toolkit Word2vec, and a Word vector of each Word is output. Marking type information corresponding to entities appearing in sentences by adopting a BIO (B-begin, I-inside, O-outside) marking mode of 'entity start, entity interior and non-entity', and randomly generating an entity type information embedding table; randomly generating a part-of-speech tagging label embedding table; the position information embedding table is randomly generated. The entity type vector, part of speech vector, and position vector of each word are obtained by looking up 3 randomly generated tables, all of which are set to 50 dimensions. For example, the Word "soldiers" in ACE2005 english corpus is input into Skip-gram model of Word2vec together with all other words, and a 250-dimensional Word vector corresponding to the Word is output.

And forming a matrix corresponding to the sentence by using word vectors of all words in each sentence, wherein the row number of the matrix is the total number of the words in the sentence, and the column number is the dimension of the word vector corresponding to the sentence. For example, 7 words of a sentence "He ws killed in action in Iraq" in a document of ACE2005 english corpus are "He", "ws", "killed", "in", "action", "in", "Iraq", respectively. Each word corresponds to a 250-dimensional word vector, and the sentence can be mapped into a 7 × 250 matrix.

And (3) filling a matrix mapped by each sentence in the ACE2005 English corpus, intercepting the first 50 rows if the matrix exceeds 50 rows, and filling zero to 50 rows below the matrix if the matrix is less than 50 rows. For example, zero padding is applied to the bottom of the7 × 250 matrix mapped by the sentence "He ws kill in action in Iraq" to 50 rows, resulting in a 50 × 250 matrix.

And forming a training set by word vector matrixes mapped by all sentences in the aligned ACE2005 English corpus.

And 2, constructing a multi-hop neighbor information fusion network based on the graph structure.

The construction of the multi-hop neighbor information fusion network based on the graph structure according to the present invention is described in further detail with reference to fig. 2.

step 2.2, constructing a syntactic information extraction sub-network formed by connecting three first, second and third graph attention layers with the same structure in series; each of the drawing attention layers includes: the device comprises a first mapping layer, a splicing layer, a second mapping layer, a LeakyRelu activation layer, a Softmax activation layer, a third mapping layer and an ELU activation layer.

the LeakyRelu activation layer is realized by using a LeakyRelu activation function, and the Softmax activation layer is realized by using a Softmax function; the ELU activation layer is implemented using an ELU activation function. To sum up, the number of input nodes in the first graph attention layer is 250, and the number of output nodes is 150.

Step 2.3, a multi-label attention fusion sub-network is built, and the structure sequentially comprises the following steps: mapping layer, Softmax layer, fusion layer. Respectively setting the number of input nodes and the number of output nodes of the mapping layer to be 150; the Softmax activation layer is implemented using a Softmax function, and the fusion layer sums the product of the output of the Softmax layer and the output of each of the attention layers. The number of input and output nodes of the fusion layer is set to 150.

Step 2.4, a trigger word recognition sub-network is built, and the structure of the trigger word recognition sub-network sequentially comprises the following steps: a first fully-connected layer, a ReLU active layer, a second fully-connected layer, and a Softmax active layer. The number of input nodes of the first full-connection layer and the number of output nodes of the second full-connection layer are both set to be 150, and the number of output nodes of the second full-connection layer is respectively set to be 150 and 2; the ReLU activation layer is realized by using a ReLU activation function; the Softmax activation layer is implemented using a Softmax function.

And 2.5, connecting the sequence information extraction sub-network with the syntax information extraction sub-network in series, then connecting the sequence information extraction sub-network with the multi-label attention fusion sub-network in series, and finally connecting the sequence information extraction sub-network with the trigger word recognition sub-network in series to form the multi-hop neighbor information fusion network based on the graph structure.

And 3, training the multi-hop neighbor information fusion network based on the graph structure.

Inputting the training set into a multi-hop neighbor information fusion network based on a graph structure, and iteratively updating parameters of each layer in the network by using a back propagation gradient descent method until a loss function with bias of the network is converged to obtain the trained multi-hop neighbor information fusion network based on the graph structure.

The loss function of the multi-hop neighbor information fusion network based on the graph structure is a loss function with bias, and is defined as follows:

wherein J (-) represents a loss function with bias, θ represents a set of parameters in the function, max (-) represents a max operation, N_stRepresenting the total number of sentences, n, input into a training set of a multi-hop neighbor information fusion network based on a graph structure_iRepresenting the ith sentence s input into the training set of the multi-hop neighbor information fusion network based on the graph structure_iTotal number of words in, j represents sentence s_iThe word sequence number in (i), (O) indicates a switch function for distinguishing the loss of the tag "O" from the loss of the event type tag, and when the tag type is "O", i (O) is 1, otherwise, i (O) is 0. log (-) denotes a base 10 logarithm operation,

expressed in a parameter of theta and a sentence of s_iThen, the jth word w_jThe probability of t is represented by omega, the bias weight is represented by omega, and the larger omega is, the larger the influence of the trigger word label on the network is.

And 4, detecting an event in the natural language text.

And 4.1, preprocessing each sentence in the natural language text to be detected by using a natural language processing tool to obtain an adjacency matrix corresponding to the syntactic dependency tree of each sentence in the text to be detected.

Referring to fig. 3, a detailed description will be made of an implementation process of preprocessing each sentence in a natural language text to be detected by using a natural language processing tool in the embodiment of the present invention.

IN the embodiment of the invention, "He ws killed IN action IN Iraq" English IN English natural language text to be detected is input into a natural language processing tool Stanford CoreNLP, the obtained syntax dependency tree is shown IN FIG. 3, the "PRP" IN FIG. 3 represents a person pronoun, "VBD" represents a verb past expression, "VBN" represents a verb past word segmentation, "IN" represents a preposition, "NN" represents a noun singular number, and "NNP" represents a proper noun. In FIG. 3, "nsubj: pass" indicates a passive noun subject, "aux: pass" indicates a past verb aid, "obl: in" indicates an indirect noun, "case" indicates a lattice tag, and "nmod: in" indicates a noun modifier.

Step 4.2, obtaining a word vector corresponding to each trigger word of each sentence in the natural language text to be detected by using a word vector tool; and forming a word vector matrix of the sentence by using word vectors of all trigger words in each sentence.

In the embodiment of the invention, each sentence in the natural language text to be detected is input into the open source Word vector toolkit Word2vec, and a Word vector corresponding to each trigger Word of each sentence in the natural language text to be detected is obtained.

The word vector matrix refers to: forming word vectors of all words in each sentence into a word vector matrix of the sentence, wherein the row number of the word vector matrix is the total number of the words in the sentence, and the column number is the dimension of the word vector corresponding to the sentence; each word vector matrix is aligned, and if the matrix exceeds 50 rows, the first 50 rows are truncated, and if the matrix is less than 50 rows, zero is padded to 50 rows below the matrix.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. and (5) simulating experimental conditions.

The hardware platform of the simulation experiment of the invention is as follows: the processor is AMD R74800H CPU, the main frequency is 2.9GHz, and the memory is 16 GB.

The software platform of the simulation experiment of the invention is as follows: windows 10 operating system and python 3.6.

The corpus used in the simulation experiment of the invention is ACE2005 English corpus, the corpus Data collects news, broadcast, forum, blog and the like, and is issued by language Data society (Linguistic Data Consortium, LDC) of the university of Pennsylvania in 2006, the markup format of the corpus adopts XML language, and comprises 599 documents, each document comprises a plurality of sentences, and each sentence comprises a plurality of words.

2. And (5) analyzing simulation contents and results thereof.

The simulation experiment of the invention adopts the invention and four prior arts (a maximum entropy MaxENT event detection method based on a characteristic method, a dynamic multi-pooling DMCNN event detection method based on a Convolutional Neural Network (CNN), a dbRNN event detection method based on a dependence bridge based on a Recurrent Neural Network (RNN), and a GCN-ED event detection method based on a Graph Neural Network (GNN)) to respectively extract text characteristics of input linguistic data and classify the input linguistic data according to the extracted text characteristics to obtain an event detection result.

In the simulation experiment, the four prior arts adopted refer to:

the maximum entropy MaxENT Event detection method in The prior art refers to that AHN et al, in "The Stage of Event Extraction, Proceedings of The work shop on identifying and responding out Time and events, Sydney: association for computerized Linguistics, 2006: 1-8, the maximum entropy MaxENT event detection method is shortened for short.

The prior art Dynamic Multi-Pooling DMCNN Event detection method refers to the method of CHEN Y B et al in "Event Extraction visual Dynamic Multi-Point conditional Neural Networks, Proceedings of the 53rd annular Meeting of the Association for Computational Linguitics and the7th International journal Conference Natural Language processing. Beijing: association for computerized Linguistics, 2015: the method for detecting events proposed in 167-.

The prior art dbRNN event detection method based on the recurrent neural network refers to an event detection method which is provided by Lei Sha et al in 'joining extracting event generators and definitions by dependency-bridge rn and definitions-base definition interaction [ C ]// third-Second AAAI Conference on scientific interest.2018', and is called a dbRNN event detection method based on the dependent bridge recurrent neural network for short.

The prior art event detection method based on the Graph neural network GCN-ED refers to an event detection method proposed by Nguyen et al in "Graph connected networks with alignment-aware opening for event detection [ C ]// third-connected AAAI reference on identification information.2018", which is called as a Graph convolution-based network GCN-ED event detection method for short.

And (3) evaluating the classification results of the five methods by using three evaluation indexes (accuracy P, recall R and F values). The accuracy P, recall R and F values were calculated using the following formulas, and all calculated results are plotted in table 1:

TABLE 1 quantitative analysis table of detection results of events of the present invention and various prior arts in simulation experiment

As can be seen from table 1, the performance of the sequence-based event detection methods (DMCNCN, dbRNN) is generally superior to the feature-based event detection method (maximum entropy MaxENT event detection method). This is because the event detection method based on the artificially designed features relies too much on the quality of the artificially designed features, which directly affects the detection performance when the artificially designed features are not reasonable, and the feature-based method has no extensibility and can be limited to only a specific field. And the neural network can automatically learn and extract text features, so that the defect of a feature-based method is avoided. However, sequence-based event detection methods also have drawbacks, with traditional neural networks such as: the CNN and the RNN have the problem of low efficiency when extracting the dependence relationship with long distance, and the graph neural network can make up for the defect, so that the GCN-ED can effectively improve the performance of the sequence-based event detection method. Finally, compared with GCN-ED, the method has the advantage that the method can extract the syntactic information of the multi-hop adjacent node which is far away from the current candidate trigger word. In the trigger word classification task, the accuracy of the text model reaches 78.2%, the recall rate is 73.1%, and the F value is 75.6%, which are higher than those of the baseline model method. In the task of triggering word classification, the model maintains high accuracy and recall rate, and simultaneously achieves the highest F value of 75.6%. In conclusion, the comparison result shows that the performance of the model is improved compared with that of a feature-based event detection method and a sequence-based event detection method, which shows that the graph neural network based on the graph structure can fully extract syntactic information, so that the graph neural network has better event detection performance.

The above simulation experiments show that: the method takes a word vector matrix corresponding to a sentence and an adjacent matrix corresponding to a syntactic dependency tree as input, and adaptively accumulates the context characteristics through a Bi-GRU module. Extracting multi-granularity syntactic characteristics by using a multi-hop graph attention network, and fusing the multi-granularity syntactic characteristics through a multi-label attention fusion mechanism. The model avoids the defects of a characteristic-based method and the defects of a sequence-based method, and the performance is further improved compared with the event detection model based on the traditional neural network.

Claims

1. An event detection method based on multi-hop neighbor information fusion of a graph structure is characterized in that a multi-hop neighbor information fusion network is constructed and trained; the method comprises the following specific steps:

step 1, generating a training set:

step 1.4, forming a training set by all word vector matrixes in the sample set;

step 2.2, constructing a syntactic information extraction sub-network formed by connecting three first, second and third graph attention layers with the same structure in series; each of the drawing attention layers includes: the system comprises a first mapping layer, a splicing layer, a second mapping layer, a LeakyRelu activation layer, a Softmax activation layer, a third mapping layer and an ELU activation layer;

step 2.3, a multi-label attention fusion sub-network is built, and the structure sequentially comprises the following steps: a mapping layer, a Softmax layer and a fusion layer; respectively setting the number of input nodes and the number of output nodes of the mapping layer to be 150; the Softmax activation layer is implemented by using a Softmax function, and the fusion layer performs summation operation on the product of the output of the Softmax layer and the output of each attention layer; setting the number of input and output nodes of the fusion layer to be 150;

step 2.4, a trigger word recognition sub-network is built, and the structure of the trigger word recognition sub-network sequentially comprises the following steps: the device comprises a first full connection layer, a ReLU activation layer, a second full connection layer and a Softmax activation layer; the number of input nodes of the first full-connection layer and the number of output nodes of the second full-connection layer are both set to be 150, and the number of output nodes of the second full-connection layer is set to be 150 and 2 respectively; the ReLU activation layer is realized by using a ReLU activation function; the Softmax activation layer is realized by using a Softmax function;

step 4, detecting events in the natural language text:

2. The method for detecting events based on multi-hop neighbor information fusion of claim 1, wherein the events in step 1.1 refer to: indicating that a state change occurred at a particular time and at a particular location, involving one or more participants.

3. The method for detecting the event based on the multi-hop neighbor information fusion of the graph structure according to claim 1, wherein the event trigger in step 1.1 refers to: the event information can accurately represent events and keywords, usually verbs or nouns, representing event types, which are core units of events.

4. The method for detecting the event based on the multi-hop neighbor information fusion of the graph structure of claim 1, wherein the event trigger word position information in step 1.1 is: the absolute position of the event trigger word in the event sentence, that is, the event trigger word is the first word in the event sentence.

5. The method for detecting events based on multi-hop neighbor information fusion of claim 1, wherein the loss function with offset in step 3 is as follows:

wherein J (-) represents a loss function with bias, θ represents a set of parameters in the function, max (-) represents a max operation, N_stRepresenting the total number of sentences in the training set input to the multi-hop neighbor information fusion network, n_iRepresenting the ith sentence s input into the training set of the multi-hop neighbor information fusion network_iTotal number of words in, j represents sentence s_iWherein, i (O) represents a switch function for distinguishing the loss of the tag "O" from the loss of the event type tag, when the tag type is "O", i (O) is 1, otherwise, i (O) is 0, log (·) represents a base 10 logarithm operation,

6. The method for detecting multi-hop neighbor information fusion event based on graph structure as claimed in claim 1, wherein said word vector matrix in step 4.2 refers to: forming word vectors of all words in each sentence into a word vector matrix of the sentence, wherein the row number of the word vector matrix is the total number of the words in the sentence, and the column number is the dimension of the word vector corresponding to the sentence; each word vector matrix is aligned, and if the matrix exceeds 50 rows, the first 50 rows are truncated, and if the matrix is less than 50 rows, zero is padded to 50 rows below the matrix.