CN114647730A

CN114647730A - Event detection method integrating graph attention and graph convolution network

Info

Publication number: CN114647730A
Application number: CN202210301353.XA
Authority: CN
Inventors: 焦新涛; 陈智山; 陈国镒; 阙永杰; 钟庆豪
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-21

Abstract

The invention relates to an event detection method for a fusion graph attention and graph convolution network. The event detection method for the fusion graph attention and graph convolution network comprises the following steps: s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected; s2: obtaining a BIO vector, a POS vector and a dependency syntactic graph according to the sentence vector to be detected; s3: coding the BIO vector to obtain a BERT vector; s4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector; s5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector; s6: and carrying out classification and identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected. The event detection method can fully extract the text information and improve the accuracy of identifying and classifying the trigger words.

Description

Event detection method integrating graph attention and graph convolution network

Technical Field

The invention relates to the technical field of event detection, in particular to an event detection method fusing graph attention and graph convolution network.

Background

After the era of data explosion development, the amount of data on the internet has seen an exponential increase, with large amounts of valuable information being implicated that has not been explored. In the face of such a large amount of data, it is time-consuming and labor-consuming to screen information purely by manpower, and information cannot be extracted in large batches, but some important data are omitted, so that many organizations and individuals are lost. Therefore, the natural language technology is used for automatically screening and extracting information needed by people to ensure the timeliness and integrity of the information, and the method is an effective means for mining data and automatically extracting the information, such as content recommendation in an e-commerce platform, public opinion monitoring of a social network platform and an intelligent question and answer function in online customer service. In recent years, deep learning has been widely used in the field of natural language processing, such as parsing, named entity recognition, and so on, and natural language processing techniques have thus been greatly leaped.

The event extraction is a method for extracting information in the existing natural language processing technology, aims to present an unstructured text containing event information in a structured form, and is widely applied to the fields of automatic abstracting, automatic question answering, information retrieval and the like. Events in the definition of Automatic Content Extraction (ACE) are composed of Trigger words (Trigger) and elements (attribute) describing the structure of the event. The trigger word is a core word which can clearly indicate the occurrence of an event in a sentence, is a characteristic word which determines the most important type of the event, and determines the category and the subcategory of the event, and is generally a verb or a noun phrase. The elements are used to populate the event template, both of which describe the event itself in its entirety.

The Event extraction task is composed of two steps of Event Detection (Event Detection) and Event element identification (alignment Detection). The event detection mainly identifies trigger words and represented event types and sub-types according to contexts, the ACE2005 data set defines 8 event categories and 33 seed categories, and each event category or sub-category corresponds to a unique event template. Please refer to fig. 1, which is a diagram illustrating an example of event detection. In this sentence, the first event belongs to a death (die) event in the life (life) category, the trigger word is die; the second event belongs to an attack (attack) event in the conflict (conflict) category, and the trigger word is fire.

The existing event detection method mainly identifies and classifies event-triggered words through a Recurrent Neural Network (RNN) and a Graph Neural Network (GNN). When event detection is performed by using GNNs, only a single Graph Convolutional neural Network (GCN) or a Graph Attention neural Network (GAT) is used, and this usage method does not well consider the advantages and disadvantages of the two networks, and has certain limitations: the graph convolution neural network can dynamically learn the neighbor weights, but different weights cannot be distributed to each neighbor node according to the importance of the nodes by neglecting the relation existing among the nodes; the graph attention neural network considers the relationship between each node but has no way to dynamically learn the neighbor weights and does not fully utilize the information of the edges. The advantages of the two cannot be well combined for complementation by single use, and incomplete model extraction information and low accuracy can be caused.

Disclosure of Invention

Based on this, the invention aims to provide an event detection method fusing graph attention and graph convolution network, which combines graph convolution neural network and graph attention neural network, fully extracts text information by the complementary advantages of the graph convolution neural network and the graph attention neural network, and improves the accuracy of trigger word recognition and classification.

The invention is realized by the following technical scheme:

an event detection method for fusing graph attention and graph convolution network comprises the following steps:

s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected;

s2: obtaining a BIO vector, a POS vector and a dependency syntax diagram according to the sentence vector to be detected;

s3: coding the BIO vector to obtain a BERT vector;

s4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector;

s5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector;

s6: and carrying out classification and identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected.

According to the event detection method for the fusion graph attention and graph convolution network, the GCN vector and the GAT vector are combined to obtain the fusion vector during feature extraction, on one hand, neighbor weights can be dynamically learned, and on the other hand, side information can be fully utilized, different weights are distributed to neighbor nodes according to the importance of the nodes, so that text information can be fully extracted, and the accuracy of trigger word identification and classification is improved.

Further, the obtaining mode of the sentence vector to be detected is as follows: performing text-to-number coding on each word in the sentence to be detected in a dictionary lookup manner to obtain the vector of the sentence to be detected;

the BIO vector is obtained in the following mode: carrying out BIO labeling on the sentence vector to be detected, and labeling each word as B-X, I-X or O, thereby obtaining the BIO vector;

the POS vector is obtained in the following mode: performing POS labeling on the sentence vector to be detected to obtain a POS vector;

the dependency syntax graph is obtained in the following mode: and analyzing to obtain a dependency syntax tree of the sentence vector to be tested by a dependency technology, forming a node set by words in the dependency syntax tree, and obtaining a dependency edge set according to a dependency arc in the dependency syntax tree, thereby constructing the dependency syntax graph.

Through BIO labeling, the type and the position of each word in a sentence can be labeled, the recognition capability and the tag classification capability of the trigger word are improved, and therefore the accuracy of event detection is improved. The POS labeling mode can solve the problem that the part of speech of a word cannot be extracted by single hot coding.

Further, the dependent edge set comprises a forward dependent edge set, an inverse dependent edge set and a self-loop dependent edge set;

the dependency syntax diagrams include forward dependency syntax diagrams, inverse dependency syntax diagrams, and self-looping dependency syntax diagrams.

By constructing three different edge sets to construct different dependency syntax graphs of the same sentence to achieve the best effect of feature extraction, syntax information can be extracted to the maximum extent.

Further, the definition of the word vector is:

x_i＝[w_ib,w_ip] (4)

in the formula, x_iRepresents the word w_iIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is a_ibRepresents the word w_iOf the BERT vector, w_ipRepresents the word w_iThe POS vector of [,]representing a join operation;

the definition of the tagged sentence vector is as follows:

H＝(x₁,x₂,...,x_n) (5)

in the formula, H represents a markup sentence vector.

Further, step S5 is specifically:

s51: calculating a dependency weight and an attention coefficient of each dependency edge in the dependency syntax graph, calculating the GCN vector according to the dependency weight, and calculating the GAT vector according to the attention coefficient;

s52: calculating to obtain the fusion vector according to the GCN vector and the GAT vector;

s53: and judging whether the calculation frequency of the fusion vector reaches the preset frequency, if so, executing the step S6, otherwise, taking the fusion vector as the tagged sentence vector, and returning to the step S51.

The fusion vector is obtained through GCN vector and GAT vector calculation, so that different weights can be distributed according to different direct importance of different nodes while paying attention to neighbor nodes in the feature extraction process, the information extraction capability is enhanced, and the event detection precision is improved.

Further, the formula for calculating the dependency weight is as follows:

in the formula (I), the compound is shown in the specification,

represents the dependency weight of the dependency edge, σ () represents a sigmoid function,

a word vector representing the start node of the dependency edge,

representing a weight matrix in the gating mechanism,

represents a bias matrix in the gating mechanism, m represents the number of computations, and when m is equal to 1,

the calculation formula of the GCN vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the GCN vector, f() It is shown that the activation function is,

indicating that the starting end is node v_iIs determined by the sum of the dependency weights of the dependent edges of (a), j ∈ N_iRepresenting a node v_jIs the node v_iOf the neighboring node.

Further, the attention coefficient is calculated by the formula:

in the formula (I), the compound is shown in the specification,

representing said attention coefficient, representing said node v_jFor the node v_iAnd said node v_jIs the node v_iThe neighbor node of (2); k is as large as N_iRepresenting a node v_kIs the node v_iOne of the neighbor nodes of (1); w represents a transformation matrix by which W is the number of pairs of nodes v_iWord vector of

Performing dimension scaling to obtain the node v_iIntermediate vector of

Calculating the node v by the same method_jIntermediate vector of

Represents the node v_iIntermediate vector of

And said node v_jIntermediate vector of

The components are combined together in a splicing mode; a represents a single-layer feedforward neural network, and T represents the transposition of a matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node v_jFor the node v_iThe denominator part represents the node v_iOf said node v_iThe sum of the attention coefficients is finally divided after the operation of an exponential function of a natural constant e, so that the softmax normalization is realized;

the calculation formula of the GAT vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the GAT vector.

Further, the calculation formula of the fusion vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the fused vector.

Further, step S6 is specifically:

calculating to obtain a classification feature vector according to the fusion vector, and calculating to obtain the classification probability of each word according to the classification feature vector; outputting the first K words as trigger words according to the classification probability, or setting a probability threshold according to the data distribution of the classification probability, and outputting the words with the classification probability larger than the probability threshold as the trigger words; and judging the event type of the trigger word.

Further, the calculation formula of the classification feature vector is as follows:

in the formula, h_finRepresenting the classification feature vector, g () representing a non-linear activation function, W' and b representing a weight matrix and an offset value, respectively;

the calculation formula of the classification probability is as follows:

y_i＝softmax(h_fin) (12)

in the formula, y_iRepresents the ith word w_iThe classification probability, softmax () representing a softmax function

Compared with the prior art, the event detection method provided by the invention has the advantages that the graph attention network and the graph convolution network are combined in parallel, so that the neighbor weight can be dynamically learned, the edge information can be fully utilized, different weights are distributed to neighbor nodes according to the importance of the nodes, the problem of error transmission caused by the traditional pipeline model is solved through the advantage complementation of the edge information and the neighbor nodes, the text information can be fully extracted during event detection, and the accuracy of trigger word identification and classification is improved. In addition, when data preprocessing of the text is carried out, through BIO labeling, POS labeling and construction of different dependency syntax diagrams, syntax information can be extracted to the maximum extent, the recognition capability and the tag classification capability of the trigger words are further improved, and therefore the accuracy of event detection is improved.

For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.

Drawings

FIG. 1 is a diagram illustrating a sample event detection in the background art;

FIG. 2 is a flowchart illustrating steps of an event detection method for a converged graph attention and graph convolution network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a process of converting a sentence to be tested into a sentence vector to be tested according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a process of performing BIO labeling on a sentence vector to be detected according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a POS tagging process performed on a sentence vector to be tested according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a structure of a dependency syntax tree according to an embodiment of the present invention;

fig. 7 is a network architecture diagram of an event detection model for a converged graph attention and graph convolution network according to an embodiment of the present invention.

Detailed Description

The existing GNN-based event detection method only adopts single GCN or GAT, so that the extraction of text information is incomplete and the accuracy of event detection is low. According to the method, BIO labeling and POS labeling are adopted during data preprocessing, different dependency syntax graphs are constructed, and then GCN and GAT are combined in parallel during feature extraction, so that semantic and syntax information can be fully extracted, and the accuracy of identifying and classifying the trigger words is improved.

Please refer to fig. 2, which is a flowchart illustrating a method for detecting an event in a converged graph attention and graph convolution network according to the present embodiment. The event detection method comprises the following steps:

the obtaining mode of the sentence vector to be detected is as follows: and acquiring a sentence to be detected for event detection, and performing text-to-number coding on each word in the sentence to be detected in a dictionary searching mode.

The dictionary is a built-in dictionary of BERT (Bidirectional Encoder representation model based on converter), and the BERT is a self-coding language model for pre-training the semantic meaning of the word source of the sentence to be tested; according to the input requirement of BERT, adding a classification identifier [ CLS ] at the beginning of a sentence to be detected, adding a sentence separator [ SEP ] at the end of the sentence to be detected, setting a word which cannot be searched and coded in a dictionary as an unknown identifier [ UNK ], and filling the gap with a filling identifier [ PAD ] character for a sentence with insufficient length.

In this example, the baseIn an input routine of an information extraction system of an ACE2005 data set, a sentence to be detected is used as a research object of event detection, and a detection result is a trigger word and a type thereof in the sentence to be detected. Suppose the sentence to be tested is: s ═ w₁,w₂,...,w_n) Wherein w is_iRepresents a word, i ∈ [1, n ]]And n is a positive integer. Then, the sentence to be tested is processed into a sentence vector to be tested as follows: s' ═ ([ CLS)],S,[SEP])。

the BIO vector is obtained in the following way:BIO labeling is carried out on the sentence vector to be detected, each word is labeled as B-X, I-X or O, and thus BIO vector S is obtained_b'＝(w_1b,w_2b,...,w_nb). Wherein, BIO label is a mode of sequence label; x represents a phrase type (e.g., noun phrase, preposition phrase, or verb phrase); b, Begin, which indicates that the word is located at the beginning of the segment; i is Intermediate, which indicates that the word is located in the middle of the segment; o, Other, indicates that it is not of any type, and is used to mark irrelevant characters; B-X indicates that the word belongs to the X type and is positioned at the beginning of the positioned segment, and I-X indicates that the word belongs to the X type and is positioned in the middle of the positioned segment.

Through BIO labeling, the type and the position of each word in a sentence can be labeled, the recognition capability and the tag classification capability of the trigger word are improved, and therefore the accuracy of event detection is improved.

The POS vector acquisition mode is as follows:performing Part-of-speech tagging (POS tagging) on the sentence vector to be tested to obtain a POS vector S_p'＝(w_1p,w_2p,...,w_np). Wherein the dimension of each POS vector is 50 dimensions.

The POS labeling mode can solve the problem that the part of speech of a word cannot be extracted by one-hot (one-hot) codes.

The dependency syntax graph is obtained in the following manner:analyzing to obtain dependency syntax tree of the sentence vector to be tested by dependency technology, forming a node set by words in the dependency syntax tree, and rootingAnd obtaining a dependency edge set according to the dependency arcs in the dependency syntax tree, thereby constructing a dependency syntax graph.

The dependency syntax aims to determine the syntax structure of the sentence to be tested or the dependency relationship between words in the sentence to be tested, the dependency syntax tree is a display form of the dependency syntax, and the words in the dependency syntax tree directly have dependency relationship with each other to form a dependency pair. The dependency relationship of each dependency pair is expressed by a directed arc, called dependency arc, and different dependency relationships exist between different dependency pairs. And constructing a dependency syntax tree of the sentence to be tested by a dependency technology, and then carrying out composition according to the dependency syntax tree.

The dependency syntax diagram is defined as:

G(V,E)＝V+E (1)

in the formula, G (V, E) represents a dependency syntax diagram; v ═ V (V)₁,v₂,...,v_n) Representing a set of nodes, node v_iRepresenting a word w in the sentence to be examined or in the dependency syntax tree_i；E＝(K₁₂,K₁₃,...,K_ij) Represents a dependency edge set, K_ijRepresenting according to the word w_iAnd the word w_jThe dependency edge constructed by the dependency arc between j ∈ N_i，N_iRepresenting a node v_i(or node i) a collection of neighbor nodes.

The dependent edge includes a forward edge, a backward edge and a self-circulation edge, then the dependent edge K_ijIs defined as:

in the formula, K (v)_i,v_j) Indicating a positive side, from the positive side K (v)_i,v_j) Form a forward dependency edge set E_fFurther, a forward dependency syntax diagram G (V, E) ═ V + E is constructed_f；K(v_j,v_i) Indicating a reverse side, from reverse side K (v)_j,v_i) Composition of an inverse dependency edge set E_rAnd further constructing an inverse dependency syntax diagram G (V, E) ═ V + E_r；K(v_i,v_i) Representing self-looping edges, from self-looping edge K (v)_i,v_i) Composition of self-circulating dependency edge set E_lAnd further constructing a self-circulation dependency syntax diagram G (V, E) ═ V + E_lThen, there are:

namely, the dependency edge set comprises a forward dependency edge set, an inverse dependency edge set and a self-circulation dependency edge set, and the constructed dependency syntax graph also comprises a forward dependency syntax graph, an inverse dependency syntax graph and a self-circulation dependency syntax graph. By constructing three different edge sets to construct different dependency syntax graphs of the same sentence to achieve the best effect of feature extraction, syntax information can be extracted to the maximum extent.

S3: and coding the BIO vector to obtain a BERT vector.

Wherein the dimensionality of each BERT vector is 768 dimensions; in this embodiment, the BIO vector is encoded by a conventional BERT model, which may be specifically a BERT-BASE model.

S4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector; the definition of the word vector is:

x_i＝[w_ib,w_ip] (4)

in the formula, x_iRepresents the word w_iIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is a_ibRepresents the word w_iOf the BERT vector, w_ipRepresents the word w_iThe POS vector of [,]representing a join operation; wherein the dimension of each word vector is 818 dimensions.

The definition of the annotated sentence vector is:

H＝(x₁,x₂,...,x_n) (5)

in the formula, H represents a markup sentence vector.

the method comprises the following steps:

s51: calculating the dependency weight and the attention coefficient of each dependency edge in the dependency syntax graph, calculating to obtain a GCN vector according to the dependency weight, and calculating a GAT vector according to the attention coefficient; the gating mechanism is introduced to reduce noise in the dependency syntax graph when calculating the dependency weights, and the forward edge K (v) is used below_i,v_j) The way of acquiring the GCN vector and the GAT vector is described as an example.

The calculation formula of the dependent weight is as follows:

in the formula (I), the compound is shown in the specification,

represents the dependency weight of the dependency edge, σ () represents the sigmoid function,

a word vector representing the dependent edge start end node,

representing a weight matrix in the gating mechanism,

representing a bias matrix in a gating mechanism; m represents the number of calculations, and when m is 1,

the formula for calculating the GCN vector is:

in the formula (I), the compound is shown in the specification,

representing a GCN vector, f () representing an activation function,

indicating that the starting end is node v_iIs determined by the sum of the dependency weights of the dependent edges of (1), j ∈ N_iRepresenting a node v_jIs a node v_iOf the neighboring node.

The formula for calculating the attention coefficient is as follows:

in the formula (I), the compound is shown in the specification,

representing the attention coefficient, representing node v_jTo node v_iOf importance, and node v_jIs a node v_iThe neighbor node of (2); k is as large as N_iRepresenting a node v_kIs a node v_iOne of the neighbor nodes of (1); w denotes a transformation matrix by which

To node v_iWord vector of

Dimension scaling is carried out to obtain a node v_iIntermediate vector of (2)

Calculating to obtain a node v by the same method_jIntermediate vector of

Represents node v_iIntermediate vector of

And node v_jIntermediate vector of

The components are combined together in a splicing mode; a represents a single-layer feedforward neural network, and T represents the transpose of a matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node v_jFor node v_iThe denominator part represents the node v_iAll neighbor nodes of (2) to node v_iAnd finally, the sum of the attention coefficients is divided after the sum is operated by an exponential function of a natural constant e, so that the softmax normalization is realized.

The GAT vector is calculated as:

in the formula (I), the compound is shown in the specification,

representing the GAT vector.

S52: calculating to obtain a fusion vector according to the GCN vector and the GAT vector;

the calculation formula of the fusion vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the fused vector.

S53: judging whether the calculation times of the fusion vector reaches the preset times, if so, executing the step S6, otherwise, taking the fusion vector as a marked sentence vector, and returning to the step S51; in the present embodiment, if the predetermined number of times is 3, it is determined whether the number of times of calculation reaches 3, that is, whether m is equal to 3.

S6: and carrying out classification and identification on the fusion vector to obtain a trigger word of the sentence to be detected and the classification of the trigger word.

Calculating to obtain a classification feature vector according to the fusion vector, and calculating to obtain the classification probability of each word according to the classification feature vector; outputting the former K words as recognized trigger words according to the classification probability, or setting a probability threshold according to the data distribution of the classification probability, and outputting the words with the classification probability larger than the probability threshold as the trigger words; and finally, judging the event type of the trigger word.

The calculation formula of the classification feature vector is as follows:

in the formula, h_finRepresenting the classification feature vector, g () representing a non-linear activation function, and W' and b representing the weight matrix and bias value, respectively.

The calculation formula of the classification probability is as follows:

y_i＝soft max(h_fin) (12)

in the formula, y_iRepresents the ith word w_iSoftmax () represents the softmax function.

The method of the present invention is described for a single sentence, but the method can be similarly applied to event detection for multiple sentences or text. The invention is illustrated below by means of a specific example.

Please refer to fig. 3, which is a schematic diagram illustrating a process of converting a sentence to be tested into a vector of the sentence to be tested according to this embodiment. Taking The sentence "The sentences of The subjects, viewed between The jungles and The sentence result, adding a classification identifier [ CLS ] at The beginning of The sentence, adding a sentence separator [ SEP ] at The end of The sentence, and then converting words in The sentence into digital codes through a BERT built-in dictionary to obtain a sentence vector to be detected.

Please refer to fig. 4, which is a schematic diagram illustrating a process of performing BIO labeling on a sentence vector to be detected according to this embodiment. In the example sentence shown in the figure, the trigger word is fire, but since fire has only one word and is not a phrase composition, the phrase type X is not marked, and only the start mark B and no intermediate mark I are markedAnd if The labels of other words are all O, The sentence is marked as 'O-The O-blood O-of O-detectors O-screened O-between O-The O-jungles O-camera O-evidence O-that O-The O-jungles O-year O-evidence O-that O-The O-year B-fixed' after BIO marking. For BIO vector S_b'＝(w_1b,w_2b,...,w_nb)，w_1b＝O-The，……，w_14b＝B-fired。

Please refer to fig. 5, which is a schematic diagram illustrating a process of performing POS annotation on a sentence vector to be detected according to this embodiment. Where DT denotes a qualifier (determinant), NN denotes a singular number or quality (Noun) IN a Noun, IN denotes a preposition or subordinate conjunction, NNs denotes a plural number (Noun, public) IN a Noun, VBN denotes a past participle (Verb, past Verb) of a Verb, VBD denotes a past expression (Verb, past Verb) of a Verb, and PRP $ denotes a principal Pronoun (POS), and the sentence is labeled "DT, NN, IN, NNs, VBN, IN, DT, NNs, VBD, NN, IN, PRP $, VBD, VBN". For POS vector S_p'＝(w_1p,w_2p,...,w_np)，w_1p＝DT，……，w_14p＝VBN。

Then the word vector x₁＝[O-The,DT]，……，x₁₄＝[B-fired,VBN]；

The annotated sentence vector H ═ x₁,x₂,...,x₁₄)。

Please refer to fig. 6, which is a schematic structural diagram of the dependency syntax tree according to this embodiment. The dependency relationship of each dependency pair is represented by a directional dependency arc, which is represented as an arrow in fig. 6, the start end of the arrow is a dependent word, and the end of the arrow is a dominant word; the unidirectional arrow represents one dependent arc and the bidirectional arrow represents two dependent arcs. The text on the dependency arcs represents different dependencies, and each dependency arc represents a semantic association between two words, i.e., there is an edge present in the dependency syntax diagram.

After a dependency syntax graph is constructed according to a dependency syntax tree, feature extraction is carried out on the labeled sentence vector and the dependency syntax graph to obtain a fusion vector, classification and identification are carried out on the fusion vector to obtain the classification probability of each word, a plurality of words with high probability are output according to set conditions to serve as trigger words, and the event category of each trigger word is judged.

In The sentence "The object of computers screened between The jungles behind The object and The best of The fixed," The calculated classification probability of The fixed is The highest, if only one trigger word is set to be output (namely, K is 1), The trigger word "fixed" is output and classified as an attack event. In the example of fig. 1, a sentence may include multiple events or multiple trigger words, and the value of K or the probability threshold should be appropriately adjusted during event detection, so as to improve the accuracy and reliability of event detection.

Please refer to table 1, which is an experimental result table of the event detection method provided in this embodiment and the conventional event detection method. Where P denotes the accuracy of event detection (precision), which is defined as:

r represents the recall (recall) of event detection, defined as:

f1 represents the harmonic mean of correct rate and recall, which is defined as:

as can be seen from table 1, the accuracy and the recall rate of the event detection method provided by the present embodiment are both greater than those of other existing event detection technologies; the F1 value integrates the results of the accuracy and the recall rate, and the F1 value is obviously superior to the prior art, which shows that the event detection method provided by the invention has obvious effectiveness, accuracy and reliability.

Table 1 comparison table of detection results of the event detection method provided in this embodiment and the existing event detection method

Based on the event detection method, the invention also provides an event detection model fusing graph attention and graph convolution network. Please refer to fig. 7, which is a network architecture diagram of an event detection model for a fusion graph attention and graph convolution network provided in this embodiment. The event detection model comprises a data preprocessing module, at least one fusion map neural network module and a classification identification module.

The data preprocessing module comprises a sentence vectorization unit, a BIO labeling unit, a POS labeling unit, a dependency syntax analysis unit, a BERT embedding unit and a labeled sentence acquisition unit. The sentence vectorization unit converts words in the sentence to be detected from text into digital codes to obtain a sentence vector to be detected; the BIO labeling unit carries out BIO labeling on the sentence vector to be detected, and labels each word as B-X, I-X or O, so as to obtain a BIO vector; the POS marking unit marks the POS of the sentence vector to be detected to obtain a POS vector; the dependency syntax analyzing unit analyzes the dependency syntax tree of the sentence to be tested through a dependency technology, the words in the dependency syntax tree form a node set, and a dependency edge set is obtained according to a dependency arc in the dependency syntax tree, so that a dependency syntax graph is constructed; the BERT embedding unit encodes the BIO vector to obtain a BERT vector; the labeled sentence acquisition unit combines the BERT vector and the POS vector to obtain a word vector, and acquires a labeled sentence vector according to the word vector.

And the fusion graph neural network module performs feature extraction on the labeled sentence vector and the dependency syntax graph to obtain a fusion vector. The module comprises a GCN unit, a GAT unit and a fusion unit which are arranged in parallel. The GCN unit calculates the dependency weight of each dependency edge in the dependency syntax graph and obtains a GCN vector according to the dependency weight calculation; the GAT unit calculates an attention coefficient of each dependency edge in the dependency syntax graph and calculates a GAT vector according to the attention coefficient; and the fusion unit calculates to obtain a fusion vector according to the GCN vector and the GAT vector.

In this embodiment, the event detection model sequentially overlaps three fusion graph neural network modules, a fusion vector output by a previous fusion graph neural network module is used as a labeled sentence vector input by a next fusion graph neural network module, and a fusion vector calculated by a last fusion graph neural network module is directly input into the classification and identification module.

And the classification identification module is used for performing classification identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected, and comprises a full connection layer and a softmax layer. The full-connection layer calculates to obtain a classification feature vector according to the fusion vector; and the softmax layer calculates the classification probability of each word according to the classification characteristic vectors, outputs the words with high classification probability as trigger words according to set conditions, and finally judges the event type of the trigger words.

Compared with the prior art, the event detection method provided by the invention combines the graph convolution neural network and the graph attention neural network, can dynamically learn the neighbor weights, can distribute different weights to neighbor nodes according to the importance of the nodes, and can fully extract text information during event detection through the complementary advantages of the two, thereby improving the accuracy of identifying and classifying trigger words. In addition, when data preprocessing of the text is carried out, through BIO labeling, POS labeling and construction of different dependency syntax diagrams, syntax information can be extracted to the maximum extent, the recognition capability and the tag classification capability of the trigger words are further improved, and therefore the accuracy of event detection is improved.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. An event detection method for fusing graph attention and graph convolution network is characterized by comprising the following steps:

s3: coding the BIO vector to obtain a BERT vector;

2. The event detection method for a converged graph attention and graph convolution network of claim 1, wherein:

the obtaining mode of the sentence vector to be detected is as follows: performing text-to-number coding on each word in the sentence to be detected in a dictionary lookup manner to obtain the vector of the sentence to be detected;

3. The event detection method of a converged graph attention and graph convolution network of claim 2, wherein:

the dependency edge set comprises a forward dependency edge set, a backward dependency edge set and a self-circulation dependency edge set;

4. The event detection method for a converged graph attention and graph convolution network of claim 1, wherein:

the definition of the word vector is:

in the formula, x_iRepresents the word w_iIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is a_ibRepresents the word w_iOf the BERT vector, w_ipRepresenting a word w_iThe POS vector of [,]representing a join operation;

the definition of the tagged sentence vector is as follows:

H＝(x₁,x₂,...,x_n) (5)

in the formula, H represents a markup sentence vector.

5. The method for detecting the event fusing the graph attention and the graph convolution network according to claim 4, wherein the step S5 specifically comprises:

s51: calculating a dependency weight and an attention coefficient of each dependency edge in the dependency syntax diagram, calculating to obtain the GCN vector according to the dependency weight, and calculating the GAT vector according to the attention coefficient;

6. The event detection method of a converged graph attention and graph convolution network of claim 5, wherein:

the calculation formula of the dependency weight is as follows:

in the formula (I), the compound is shown in the specification,

a word vector representing the start node of the dependency edge,

representing a weight matrix in the gating mechanism,

the calculation formula of the GCN vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the GCN vector, f () representing an activation function,

7. The event detection method of a converged graph attention and graph convolution network of claim 6, wherein:

the calculation formula of the attention coefficient is as follows:

in the formula (I), the compound is shown in the specification,

representing said attention coefficient, representing said node v_jFor the node v_iAnd said node v_jIs the node v_iThe neighbor node of (2); k belongs to N_iRepresenting a node v_kIs the node v_iOne of the neighbor nodes of (1); w represents a transformation matrix by which W is the pair of the nodes v_iWord vector of

Performing dimension scaling to obtain the node v_iIntermediate vector of

The node v is obtained by the same calculation_jIntermediate vector of

Represents the node v_iIntermediate vector of (2)

And said node v_jIntermediate vector of

The components are combined together in a splicing mode; a represents a single layer of a feed-forward neural network,t represents the transpose of the matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node v_jFor the node v_iThe denominator part represents the node v_iAll neighbor nodes of (2) to the node v_iThe sum of the attention coefficients is finally divided after the operation of an exponential function of a natural constant e, so that the softmax normalization is realized;

the GAT vector has the calculation formula as follows:

in the formula (I), the compound is shown in the specification,

representing the GAT vector.

8. The event detection method of a converged graph attention and graph convolution network of claim 7, wherein:

the calculation formula of the fusion vector is as follows:

in the formula (I), the compound is shown in the specification,

representing the fused vector.

9. The method for detecting the event fusing the graph attention and the graph convolution network according to claim 1, wherein the step S6 specifically comprises:

10. The event detection method of a converged graph attention and graph convolution network of claim 9, wherein:

the calculation formula of the classification feature vector is as follows:

the calculation formula of the classification probability is as follows:

y_i＝softmax(h_fin) (12)

in the formula, y_iRepresents the ith word w_iSoftmax () represents a softmax function.