CN113887213A

CN113887213A - Event detection method and device based on multilayer graph attention network

Info

Publication number: CN113887213A
Application number: CN202111164755.1A
Authority: CN
Inventors: 包先雨; 吴共庆; 何俐娟; 柯培超; 陆振亚; 王歆; 程立勋; 蔡伊娜; 郑文丽; 慕容灏鼎; 蔡屹
Original assignee: Hefei University of Technology; Shenzhen Academy of Inspection and Quarantine
Current assignee: Hefei University of Technology; Shenzhen Academy of Inspection and Quarantine
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-04
Also published as: WO2023050470A1

Abstract

The application provides an event detection method and device based on a multilayer graph attention network, which comprises the steps of obtaining context words in event text information, and determining a syntactic information adjacent matrix and a splicing vector corresponding to the context words; taking the adjacency matrix and the splicing vector as the input of an artificial neural network to obtain an output vector; generating aggregation information according to the splicing vector and the output vector in an aggregation mode; and determining the trigger word category of the context word according to the aggregation information. By simultaneously combining the syntactic information and the context information of the context words, the method effectively solves the problems that information is easy to lose and errors are easy to propagate by using a syntactic analysis tool; and by combining the jump connection module in the attention network layer, the situation that the final trigger word classification is not ideal due to excessive propagation of some short-distance syntax information is avoided, and the precision, recall rate and F1 value of the trigger word classification are effectively improved.

Description

Event detection method and device based on multilayer graph attention network

Technical Field

The present application relates to the field of natural language processing, and in particular, to an event detection method and apparatus based on a multi-layer graph attention network.

Background

The Knowledge Graph (Knowledge Graph) describes concepts, entities and relations in an objective world in a structured form, expresses information of the internet into a form closer to a human cognitive world, and provides the capability of better organizing, managing and understanding mass information of the internet. The knowledge graph is proposed by google in 2012 and successfully applied to a search engine, belongs to the important research field of artificial intelligence, namely the research category of knowledge engineering, and is a killer mace application for establishing large-scale knowledge resources by using knowledge engineering. Typical examples are the knowledge graph introduced in 2012 after google purchased Freebase (a free knowledge database), graph search of Facebook (social network service website), Microsoft Satori (Microsoft) and domain specific knowledge bases of business, finance, life sciences, etc.

The event knowledge in the knowledge graph is hidden in the internet resources and comprises the existing structured semantic knowledge, the structured information of a database, the semi-structured information resources and the unstructured resources, and the resources with different properties have different knowledge acquisition methods. Identification and extraction of events it is investigated how event information can be identified and extracted from text describing the event information and presented in a structured form, including the time, place, participation role at which it occurred and the change in action or state associated with it.

The traditional event detection method omits syntactic characteristics contained between words in a sentence, only utilizes characteristics at the sentence level, and the event detection is easy to cause low recognition efficiency and classification precision of trigger words due to ambiguity of words. In recent years, however, a method of improving event detection using syntax information has proven to be effective. For example, the thesis "no trigger event detection method of merging syntactic information" proposes to use syntactic information and combine ATTENTION mechanism (ATTENTION) to realize the connection of scattered event information in sentences to improve the accuracy of event detection; the paper Vietnamese news event detection integrating dependency information and a convolutional neural network utilizes the characteristics among convolutional coding non-continuous words integrating dependency syntax information and then integrates the characteristics of the two parts as event codes, thereby realizing the event detection.

Disclosure of Invention

In view of the above, the present application is proposed to provide a method for event detection based on a multi-layer graph attention network that overcomes or at least partially solves the above mentioned problems, comprising the steps of:

an event detection method based on a multilayer graph attention network comprises the following steps:

obtaining context words in event text information, and determining a syntactic information adjacency matrix and a splicing vector corresponding to the context words;

taking the adjacency matrix and the splicing vector as the input of an artificial neural network to obtain an output vector;

generating aggregation information according to the splicing vector and the output vector in an aggregation mode;

and determining the trigger word category of the context word according to the aggregation information.

Further, the step of obtaining a context word in the event text information and determining a syntactic information adjacency matrix and a concatenation vector corresponding to the context word includes:

determining syntactic information corresponding to the context words according to the context words;

generating the syntax information adjacency matrix according to the syntax information;

and generating the splicing vector according to the word embedding vector of the context word.

Further, the step of determining syntactic information corresponding to the context word according to the context word includes:

and analyzing the event text information through syntactic dependency, and generating syntactic information corresponding to the context word according to the analysis result of the event text information.

Further, the step of obtaining an output vector by using the adjacency matrix and the splicing vector as input of the artificial neural network includes:

generating a tensor by the adjacent matrixes in the same batch;

and inputting the tensor and the splicing vector into an artificial neural network for calculation, and generating the output vector according to the calculation result of the artificial neural network.

Further, the step of determining the trigger word category of the context word according to the aggregation information includes:

determining a trigger word of the context word according to the aggregation information, and classifying the trigger word according to a classifier module.

An event detection apparatus based on a multi-layer graph attention network, comprising:

the system comprises an acquisition module, a judgment module and a display module, wherein the acquisition module is used for acquiring context words in event text information and determining a syntactic information adjacent matrix and a splicing vector corresponding to the context words;

the computing module is used for taking the adjacency matrix and the splicing vector as the input of an artificial neural network to obtain an output vector;

the aggregation module is used for generating aggregation information according to the splicing vector and the output vector in an aggregation mode;

and the classification module is used for determining the trigger word category of the context word according to the aggregation information.

Further, the obtaining module includes:

the expression submodule is used for determining syntactic information corresponding to the context words according to the context words;

a generating submodule, configured to generate the syntax information adjacency matrix according to the syntax information;

and the splicing submodule is used for generating the splicing vector according to the word embedding vector of the context word.

Further, the expression submodule comprises:

and the dependency analysis submodule is used for analyzing the event text information through syntactic dependency and generating syntactic information corresponding to the context word according to the analysis result of the event text information.

Further, the calculation module includes:

the array conversion submodule is used for generating a tensor by the adjacent matrixes of the same batch;

and the artificial neural network calculation submodule is used for inputting the tensor and the splicing vector into an artificial neural network for calculation and generating the output vector according to the result of the artificial neural network calculation.

Further, the classification module includes:

and the trigger word processing submodule is used for determining the trigger words of the context words according to the aggregation information and classifying the trigger words according to the classifier module.

The application has the following advantages:

in the embodiment of the application, context words in event text information are obtained, and a syntactic information adjacency matrix and a splicing vector corresponding to the context words are determined; taking the adjacency matrix and the splicing vector as the input of an artificial neural network to obtain an output vector; generating aggregation information according to the splicing vector and the output vector in an aggregation mode; and determining the trigger word category of the context word according to the aggregation information. By simultaneously combining the syntactic information and the context information of the context words, the method can effectively solve the problems that information is easy to lose and errors are easy to propagate by using a syntactic analysis tool; and by combining the jump connection module in the attention network layer, more original characteristics can be reserved, the situation that the classification of final trigger words is not ideal due to excessive propagation of some short-distance syntactic information is avoided, and the precision, the recall rate and the F1 value of the classification of the trigger words are effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings needed to be used in the description of the present application will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a flowchart illustrating steps of an event detection method based on a multi-layer graph attention network according to an embodiment of the present application;

FIG. 2 is a diagram of a syntactic dependency tree provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a adjacency matrix provided by an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating an attention network provided by an embodiment of the present application;

fig. 5 is a flowchart illustrating an event detection method based on a multi-layer graph attention network according to an embodiment of the present application;

fig. 6 is a block diagram of an event detection apparatus based on a multi-layer graph attention network according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, an event detection method based on a multi-layer graph attention network according to an embodiment of the present application is shown;

the method comprises the following steps:

s110, obtaining context words in event text information, and determining a syntactic information adjacent matrix and a splicing vector corresponding to the context words;

s120, taking the adjacency matrix and the splicing vector as input of an artificial neural network to obtain an output vector;

s130, generating aggregation information according to the splicing vector and the output vector in an aggregation mode;

s140, determining the trigger word category of the context word according to the aggregation information.

Hereinafter, the event detection method based on the multi-layer graph attention network in the present exemplary embodiment will be further described.

As described in step S110, a context word in the event text information is obtained, and a syntactic information adjacency matrix and a concatenation vector corresponding to the context word are determined.

In an embodiment of the present application, a specific process of "obtaining context words in the event text information and determining the syntactic information adjacency matrix and the concatenation vector corresponding to the context words" in step S110 may be further described in conjunction with the following description.

Determining syntactic information corresponding to the context word according to the context word as described in the following steps;

in an embodiment of the present application, the specific process of "determining syntax information corresponding to the context word according to the context word" may be further described in conjunction with the following description.

The syntactic dependency is a structure that reveals the syntactic structure by analyzing the dependency relationship between components in a language unit, and the syntactic dependency analysis recognizes grammatical components such as "a predicate object" and "a shape complement" in a sentence, and emphasizes the relationship between analyzed words. The core of the sentence in the syntactic dependency analysis is a predicate verb, then other components are found around the predicate, and finally the sentence is analyzed into a syntactic dependency tree. A syntactic dependency tree may describe dependencies between words.

In a specific implementation, event text information is acquired, the event text information is identified, syntactic dependency analysis is performed by using a Stanford Core NLP (Standard Language Processing, Stanford Natural Language Processing tool), each sentence in the event text is analyzed, an event trigger in the sentence is identified, and the dependency relationship between the event trigger and an event parameter and/or between the event parameter and the event parameter is emphasized and analyzed to form a syntactic dependency tree.

The event trigger word is a word which can represent the occurrence of an event most in the event, is the projection of an event concept on the word and phrase level, is the basis and the recourse of event identification, and is also an important characteristic for determining the event category, generally a verb or a noun; the event parameter refers to information describing the time, place, and person of the event occurrence.

Referring to FIG. 2, a diagram of a syntactic dependency tree provided by an embodiment of the present application is shown. As shown in the figure, the sentence "I go to Beijing Tiananmen to see the rising of the sun", in the constructed syntactic dependency tree, the core predicate of the sentence is "go", which is the root of the syntactic dependency tree, the subject of the "go" is "I", the object of the "go" is "Beijing Tiananmen", the object of the other "go" is "sun", and the syntactic dependency tree can describe the dependency relationship between context words.

Generating the syntax information adjacency matrix according to the syntax information as described in the following steps;

the adjacency matrix is a matrix representing the adjacency relationship between vertices. Let G ═ (V, E) be a figure, where V ═ V₁,v₂,…,v_nV is a vertex, E is an edge, a one-dimensional array is used for storing data of all the vertices in the graph, and a two-dimensional array is used for storing data of the relationship (the edge or the arc) between the vertices, and the two-dimensional array is called as an adjacent matrix. The adjacency matrices are further divided into directed graph adjacency matrices and undirected graph adjacency matrices. The adjacency matrix of G is an n-th order square matrix having the following properties: for an undirected graph, the adjacency matrix must be symmetric, and the major diagonal is zero, the minor diagonal is not necessarily 0, which is not necessary for a directed graph. In an undirected graph, the degree of any vertex i is the ith column (orRow i) is the number of all non-zero elements in row i, the out-degree of a vertex i in the directed graph is the number of all non-zero elements in row i, the in-degree is the number of all non-zero elements in column i, and the syntactic dependency relationship between two event parameters is stored by adopting an adjacent matrix of the directed graph.

As an example, each sentence is based on a syntactic dependency tree formed by syntactic dependency analysis, and a corresponding adjacency matrix is generated based on the syntactic dependency tree.

In a specific implementation, referring to fig. 3, a schematic diagram of an adjacency matrix provided by an embodiment of the present application is shown, where the adjacency matrix shown in fig. 3 corresponds to the syntactic dependency tree shown in fig. 2. The trigger words "go", "Beijing" and "Tiananmen" in FIG. 2 are parallel objects, so that the value of the intersection position of the row where "go" is located and the column value where "Beijing" and "Tiananmen" are located in the corresponding adjacent matrix is 1. Each word is taken as a node, and the words are seven words, namely 'I', 'go', 'Beijing', 'Tiananmen', 'watch', 'sun' and 'rise', so that the seven words are a 7X7 square matrix. If syntactic arcs exist between the two words, the corresponding position of the matrix is 1, and if the syntactic arcs exist between the two words, the corresponding position of the matrix is 0. And performing syntax dependence relationship of the stored text by adopting an adjacent matrix of the directed graph, wherein if the dependence relationship exists between the words, the value of the corresponding adjacent matrix element is 1, and the value of the corresponding adjacent matrix element is 0 between the words without the dependence relationship. Dependencies between the context words can be represented by the adjacency matrix.

The stitching vector is generated from the word-embedded vectors of the context words, as described in the following steps.

It should be noted that, the word-level information in the sentence needs to be converted into a real-valued vector as an input of the artificial neural network. Let X ═ { X1, X2, X3, …, xn } be a sentence of length n, where xi is the ith word in the sentence. In the natural language processing task, semantic information of words is related to positions of the words in sentences, and part of speech and entity type information have an effect of promoting recognition of triggering words and understanding of semantics. The method and the device take a spliced vector formed by splicing a sense vector, an entity vector, a part of speech vector and a position vector of a context word as the input of the artificial neural network.

In a specific implementation, 4 different word embedding vectors including a sense vector, an entity vector, a part of speech vector and a position vector of the context word are spliced into a first spliced vector, the first spliced vector is input into a Bi-LSTM neural network layer to generate a second spliced vector, the second spliced vector is used as one of input vectors of a multilayer graph attention network, and the spliced vector can acquire semantic information among the context words.

And step S120, taking the adjacency matrix and the splicing vector as input of an artificial neural network, and obtaining an output vector.

It should be noted that the artificial neural network is a multi-layer Graph Attention network (Graph Attention Networks). The conventional graph convolution network has various limitations, so that the conventional graph convolution network cannot well process directed graphs, cannot be suitable for induced tasks (the induced tasks refer to that graph structures needing to be processed in a training stage and a testing stage are different), and cannot process dynamic graphs. Under the graph attention network, even if the structure of the graph is changed in the prediction process, the influence on the graph attention network is not large, and only the parameters need to be adjusted and the calculation needs to be carried out again. The operation mode of the graph attention network is the operation of vertex by vertex, and each operation needs to be completed by circularly traversing all the vertices on the graph. The vertex-by-vertex operation means that the constraint of Laplacian matrix (Laplacian matrix) in the original graph structure is removed, so that the problem of the directed graph is solved easily.

In an embodiment of the present application, a specific process of "taking the adjacency matrix and the splicing vector as the input of the artificial neural network to obtain the output vector" in step S120 may be further described with reference to the following description.

Generating a tensor from the adjacent matrixes of the same batch according to the following steps;

in one specific implementation, sentences identified at the same time in the event text information are in one batch, the adjacency matrixes of the sentences in the same batch form a tensor, and the adjacency matrix set is expressed as

The formation tensor is expressed as A ∈ R^N*N*KWherein K ═ T^VAnd | N is the number of nodes.

As an example, referring to fig. 4, a schematic diagram illustrating an attention network provided in an embodiment of the present application is shown, which is divided into two steps of calculating an attention coefficient and weighting and summing. The tensor and the second stitching vector are used as input of the attention layer of the graph and are expressed as

Wherein N is the number of nodes, and F is the number of node features; output is as

Where F' represents the new node feature vector dimension. And calculating the attention coefficient of the node i and the peripheral neighbor node j epsilon Ni, wherein the calculation formula is represented as follows as shown in the left side of the figure 4:

wherein a is an R^F′×R^F′Mapping of → R, W ∈ R^F′×FIs a weight matrix.

The graph attention network can use an attention mechanism to calculate the similarity coefficient weight of the node i and the adjacent node j for each node, so that the graph structure is not completely relied on.

The attention coefficient is normalized by softmax, and the calculation formula is as follows:

where, | | represents vector concatenation, e_ijAnd alpha_ijAre all called "attention coefficient", alpha_ijIs at e_ijAnd carrying out normalization on the basis.

After the attention coefficients of all the nodes are normalized, the features of the neighbor nodes are subjected to weighted summation to generate an output vector, and the calculation formula is expressed as follows:

wherein W is a weight matrix multiplied by the features, σ is a nonlinear activation function, j ∈ N_iJ traversed in (a) represents all nodes adjacent to i.

As shown on the right side of fig. 4, for a three-tier graph attention network, the multi-tier attention mechanism assigns different attention weights to different features. For a multi-layer graph attention network, the calculation formula is as follows:

if the multi-layer graph attention network is applied to the output layer, the calculation formula is expressed as follows:

as stated in step S130, aggregate information is generated according to the concatenation vector and the output vector.

As an example, in the attention network of each layer, aggregation of syntax information is implemented by a Skip-Connection module (Skip-Connection), and the concatenation vector is skipped over the attention network of each layer by the Skip-Connection module and is subjected to an aggregation operation with the output vector. The short-distance syntax information can be prevented from being excessively transmitted through the jump connection module, more original syntax information can be reserved, and the condition that the final trigger word classification effect is poor is avoided.

As stated in step S140, the trigger word category of the context word is determined according to the aggregation information.

In an embodiment of the present application, the specific process of "determining the trigger category of the context word according to the aggregation information" in step S140 may be further described in conjunction with the following description.

As an example, determining a trigger word of the context word according to the aggregation information, classifying the trigger word through a preset condition of a classifier module, and determining an event type corresponding to an event sentence according to a classification category of the trigger word. The event types are different types which are defined in advance.

Specifically, the preset condition of the classifier module is to aggregate information of different modules, pass through a full connection layer, and then select the category with the maximum category probability from the category probabilities corresponding to each context word as the label of the current trigger word prediction by using a softmax function (the softmax function maps the outputs of a plurality of neurons into a (0, 1) interval, which can be regarded as probability to be understood, thereby performing multi-classification).

The following experimental demonstration is performed on the event detection method based on the multi-layer graph attention network provided by the embodiment of the application:

the experimental environment is as follows: the system comprises a Pythrch-1.8.0 (an open source Python machine learning library), an Nvidia GeForce RTX 3060 (a display card chip), Windows 10 (a computer operating system), Inter i7-11700k, a memory 16G and a hard disk 1T.

The experimental data are shown in table 1:

TABLE 1 comparative results of the experiments

The experimental results are as follows: the experiment used Precision (Precision, P), Recall (Recall, R), F1 value (F1-score) as the observed variables, P, R, F1 is defined as follows:

in order to ensure the accuracy of the experiment, the division of the data set in the experiment is consistent with that of the data sets of other event detection methods, and the experimental result proves that compared with the traditional event detection method only utilizing sentence-level features, the F1-score of the event detection method provided by the embodiment is about 8% higher; compared with the method based on the graph neural network, the event method provided by the embodiment also achieves the highest value on F1-score and Recall.

Referring to fig. 5, a flow diagram of an event detection method based on a multi-layer graph attention network is shown;

in a specific implementation, after event text information is acquired, analyzing the event text information through a syntactic analysis technology to generate a syntactic dependency tree, generating an adjacent matrix corresponding to the context word according to the syntactic dependency tree, and generating a tensor from the adjacent matrix of sentences in the same batch; splicing the embedded vectors of 4 different words of the context word into a first spliced vector, inputting the first spliced vector into a Bi-LSTM neural network layer to generate a second spliced vector, and inputting the adjacency matrix and the second spliced vector into a multilayer graph attention network to generate an output vector so as to perform aggregation operation on syntactic information of different depths; the splicing vector is subjected to aggregation operation by skipping a multilayer graph attention network through a skip connection module; and aggregating the output vector and the spliced vector, classifying the trigger words of the context words through a classifier module, and determining the event type corresponding to the event sentence.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Referring to fig. 6, an event detection apparatus based on a multi-layer graph attention network according to an embodiment of the present application is shown;

the method specifically comprises the following steps:

an obtaining module 610, configured to obtain a context word in event text information, and determine a syntactic information adjacency matrix and a concatenation vector corresponding to the context word;

a calculating module 620, configured to use the adjacency matrix and the splicing vector as inputs of an artificial neural network, and obtain an output vector;

an aggregation module 630, configured to generate aggregation information according to aggregation of the stitching vector and the output vector;

and the classification module 640 is configured to determine a trigger word category of the context word according to the aggregation information.

In an embodiment of the present application, the obtaining module 610 includes:

In an embodiment of the present application, the expression submodule includes:

In an embodiment of the present application, the calculating module 620 includes:

In an embodiment of the present application, the classification module 640 includes:

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The event detection method and device based on the multi-layer graph attention network provided by the application are introduced in detail, a specific example is applied in the text to explain the principle and the implementation of the application, and the description of the above embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An event detection method based on a multilayer graph attention network is characterized by comprising the following steps:

2. The method for detecting events based on the multi-layer graph attention network of claim 1, wherein the step of obtaining context words in the event text information and determining the syntactic information adjacency matrix and the concatenation vector corresponding to the context words comprises:

3. The method for detecting events based on multilayer graph attention network of claim 2, wherein the step of determining syntactic information corresponding to the context word according to the context word comprises:

4. The event detection method based on the multilayer graph attention network according to claim 1, wherein the step of taking the adjacency matrix and the splicing vector as input of the artificial neural network to obtain an output vector comprises:

generating a tensor by the adjacent matrixes in the same batch;

5. The method for detecting events based on multi-layer graph attention network of claim 1, wherein the step of determining the trigger word class of the context word according to the aggregation information comprises:

6. An event detection device based on a multilayer graph attention network, comprising:

7. The event detection device based on the multilayer graph attention network of claim 6, wherein the obtaining module comprises:

8. The event detection device based on the multi-layer graph attention network of claim 7, wherein the expression submodule comprises:

9. The event detection device based on the multilayer graph attention network of claim 6, wherein the calculation module comprises:

10. The event detection device based on the multi-layer graph attention network of claim 6, wherein the classification module comprises: