CN114647730A - Event detection method integrating graph attention and graph convolution network - Google Patents

Event detection method integrating graph attention and graph convolution network Download PDF

Info

Publication number
CN114647730A
CN114647730A CN202210301353.XA CN202210301353A CN114647730A CN 114647730 A CN114647730 A CN 114647730A CN 202210301353 A CN202210301353 A CN 202210301353A CN 114647730 A CN114647730 A CN 114647730A
Authority
CN
China
Prior art keywords
vector
dependency
node
sentence
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210301353.XA
Other languages
Chinese (zh)
Inventor
焦新涛
陈智山
陈国镒
阙永杰
钟庆豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202210301353.XA priority Critical patent/CN114647730A/en
Publication of CN114647730A publication Critical patent/CN114647730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an event detection method for a fusion graph attention and graph convolution network. The event detection method for the fusion graph attention and graph convolution network comprises the following steps: s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected; s2: obtaining a BIO vector, a POS vector and a dependency syntactic graph according to the sentence vector to be detected; s3: coding the BIO vector to obtain a BERT vector; s4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector; s5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector; s6: and carrying out classification and identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected. The event detection method can fully extract the text information and improve the accuracy of identifying and classifying the trigger words.

Description

Event detection method integrating graph attention and graph convolution network
Technical Field
The invention relates to the technical field of event detection, in particular to an event detection method fusing graph attention and graph convolution network.
Background
After the era of data explosion development, the amount of data on the internet has seen an exponential increase, with large amounts of valuable information being implicated that has not been explored. In the face of such a large amount of data, it is time-consuming and labor-consuming to screen information purely by manpower, and information cannot be extracted in large batches, but some important data are omitted, so that many organizations and individuals are lost. Therefore, the natural language technology is used for automatically screening and extracting information needed by people to ensure the timeliness and integrity of the information, and the method is an effective means for mining data and automatically extracting the information, such as content recommendation in an e-commerce platform, public opinion monitoring of a social network platform and an intelligent question and answer function in online customer service. In recent years, deep learning has been widely used in the field of natural language processing, such as parsing, named entity recognition, and so on, and natural language processing techniques have thus been greatly leaped.
The event extraction is a method for extracting information in the existing natural language processing technology, aims to present an unstructured text containing event information in a structured form, and is widely applied to the fields of automatic abstracting, automatic question answering, information retrieval and the like. Events in the definition of Automatic Content Extraction (ACE) are composed of Trigger words (Trigger) and elements (attribute) describing the structure of the event. The trigger word is a core word which can clearly indicate the occurrence of an event in a sentence, is a characteristic word which determines the most important type of the event, and determines the category and the subcategory of the event, and is generally a verb or a noun phrase. The elements are used to populate the event template, both of which describe the event itself in its entirety.
The Event extraction task is composed of two steps of Event Detection (Event Detection) and Event element identification (alignment Detection). The event detection mainly identifies trigger words and represented event types and sub-types according to contexts, the ACE2005 data set defines 8 event categories and 33 seed categories, and each event category or sub-category corresponds to a unique event template. Please refer to fig. 1, which is a diagram illustrating an example of event detection. In this sentence, the first event belongs to a death (die) event in the life (life) category, the trigger word is die; the second event belongs to an attack (attack) event in the conflict (conflict) category, and the trigger word is fire.
The existing event detection method mainly identifies and classifies event-triggered words through a Recurrent Neural Network (RNN) and a Graph Neural Network (GNN). When event detection is performed by using GNNs, only a single Graph Convolutional neural Network (GCN) or a Graph Attention neural Network (GAT) is used, and this usage method does not well consider the advantages and disadvantages of the two networks, and has certain limitations: the graph convolution neural network can dynamically learn the neighbor weights, but different weights cannot be distributed to each neighbor node according to the importance of the nodes by neglecting the relation existing among the nodes; the graph attention neural network considers the relationship between each node but has no way to dynamically learn the neighbor weights and does not fully utilize the information of the edges. The advantages of the two cannot be well combined for complementation by single use, and incomplete model extraction information and low accuracy can be caused.
Disclosure of Invention
Based on this, the invention aims to provide an event detection method fusing graph attention and graph convolution network, which combines graph convolution neural network and graph attention neural network, fully extracts text information by the complementary advantages of the graph convolution neural network and the graph attention neural network, and improves the accuracy of trigger word recognition and classification.
The invention is realized by the following technical scheme:
an event detection method for fusing graph attention and graph convolution network comprises the following steps:
s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected;
s2: obtaining a BIO vector, a POS vector and a dependency syntax diagram according to the sentence vector to be detected;
s3: coding the BIO vector to obtain a BERT vector;
s4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector;
s5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector;
s6: and carrying out classification and identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected.
According to the event detection method for the fusion graph attention and graph convolution network, the GCN vector and the GAT vector are combined to obtain the fusion vector during feature extraction, on one hand, neighbor weights can be dynamically learned, and on the other hand, side information can be fully utilized, different weights are distributed to neighbor nodes according to the importance of the nodes, so that text information can be fully extracted, and the accuracy of trigger word identification and classification is improved.
Further, the obtaining mode of the sentence vector to be detected is as follows: performing text-to-number coding on each word in the sentence to be detected in a dictionary lookup manner to obtain the vector of the sentence to be detected;
the BIO vector is obtained in the following mode: carrying out BIO labeling on the sentence vector to be detected, and labeling each word as B-X, I-X or O, thereby obtaining the BIO vector;
the POS vector is obtained in the following mode: performing POS labeling on the sentence vector to be detected to obtain a POS vector;
the dependency syntax graph is obtained in the following mode: and analyzing to obtain a dependency syntax tree of the sentence vector to be tested by a dependency technology, forming a node set by words in the dependency syntax tree, and obtaining a dependency edge set according to a dependency arc in the dependency syntax tree, thereby constructing the dependency syntax graph.
Through BIO labeling, the type and the position of each word in a sentence can be labeled, the recognition capability and the tag classification capability of the trigger word are improved, and therefore the accuracy of event detection is improved. The POS labeling mode can solve the problem that the part of speech of a word cannot be extracted by single hot coding.
Further, the dependent edge set comprises a forward dependent edge set, an inverse dependent edge set and a self-loop dependent edge set;
the dependency syntax diagrams include forward dependency syntax diagrams, inverse dependency syntax diagrams, and self-looping dependency syntax diagrams.
By constructing three different edge sets to construct different dependency syntax graphs of the same sentence to achieve the best effect of feature extraction, syntax information can be extracted to the maximum extent.
Further, the definition of the word vector is:
xi=[wib,wip] (4)
in the formula, xiRepresents the word wiIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is aibRepresents the word wiOf the BERT vector, wipRepresents the word wiThe POS vector of [,]representing a join operation;
the definition of the tagged sentence vector is as follows:
H=(x1,x2,...,xn) (5)
in the formula, H represents a markup sentence vector.
Further, step S5 is specifically:
s51: calculating a dependency weight and an attention coefficient of each dependency edge in the dependency syntax graph, calculating the GCN vector according to the dependency weight, and calculating the GAT vector according to the attention coefficient;
s52: calculating to obtain the fusion vector according to the GCN vector and the GAT vector;
s53: and judging whether the calculation frequency of the fusion vector reaches the preset frequency, if so, executing the step S6, otherwise, taking the fusion vector as the tagged sentence vector, and returning to the step S51.
The fusion vector is obtained through GCN vector and GAT vector calculation, so that different weights can be distributed according to different direct importance of different nodes while paying attention to neighbor nodes in the feature extraction process, the information extraction capability is enhanced, and the event detection precision is improved.
Further, the formula for calculating the dependency weight is as follows:
Figure BDA0003565574990000031
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000032
represents the dependency weight of the dependency edge, σ () represents a sigmoid function,
Figure BDA0003565574990000033
a word vector representing the start node of the dependency edge,
Figure BDA0003565574990000034
representing a weight matrix in the gating mechanism,
Figure BDA0003565574990000035
represents a bias matrix in the gating mechanism, m represents the number of computations, and when m is equal to 1,
Figure BDA0003565574990000036
the calculation formula of the GCN vector is as follows:
Figure BDA0003565574990000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000042
representing the GCN vector, f() It is shown that the activation function is,
Figure BDA0003565574990000043
indicating that the starting end is node viIs determined by the sum of the dependency weights of the dependent edges of (a), j ∈ NiRepresenting a node vjIs the node viOf the neighboring node.
Further, the attention coefficient is calculated by the formula:
Figure BDA0003565574990000044
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000045
representing said attention coefficient, representing said node vjFor the node viAnd said node vjIs the node viThe neighbor node of (2); k is as large as NiRepresenting a node vkIs the node viOne of the neighbor nodes of (1); w represents a transformation matrix by which W is the number of pairs of nodes viWord vector of
Figure BDA0003565574990000046
Performing dimension scaling to obtain the node viIntermediate vector of
Figure BDA0003565574990000047
Calculating the node v by the same methodjIntermediate vector of
Figure BDA0003565574990000048
Figure BDA0003565574990000049
Represents the node viIntermediate vector of
Figure BDA00035655749900000410
And said node vjIntermediate vector of
Figure BDA00035655749900000411
The components are combined together in a splicing mode; a represents a single-layer feedforward neural network, and T represents the transposition of a matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node vjFor the node viThe denominator part represents the node viOf said node viThe sum of the attention coefficients is finally divided after the operation of an exponential function of a natural constant e, so that the softmax normalization is realized;
the calculation formula of the GAT vector is as follows:
Figure BDA00035655749900000412
in the formula (I), the compound is shown in the specification,
Figure BDA00035655749900000413
representing the GAT vector.
Further, the calculation formula of the fusion vector is as follows:
Figure BDA00035655749900000414
in the formula (I), the compound is shown in the specification,
Figure BDA00035655749900000415
representing the fused vector.
Further, step S6 is specifically:
calculating to obtain a classification feature vector according to the fusion vector, and calculating to obtain the classification probability of each word according to the classification feature vector; outputting the first K words as trigger words according to the classification probability, or setting a probability threshold according to the data distribution of the classification probability, and outputting the words with the classification probability larger than the probability threshold as the trigger words; and judging the event type of the trigger word.
Further, the calculation formula of the classification feature vector is as follows:
Figure BDA0003565574990000051
in the formula, hfinRepresenting the classification feature vector, g () representing a non-linear activation function, W' and b representing a weight matrix and an offset value, respectively;
the calculation formula of the classification probability is as follows:
yi=softmax(hfin) (12)
in the formula, yiRepresents the ith word wiThe classification probability, softmax () representing a softmax function
Compared with the prior art, the event detection method provided by the invention has the advantages that the graph attention network and the graph convolution network are combined in parallel, so that the neighbor weight can be dynamically learned, the edge information can be fully utilized, different weights are distributed to neighbor nodes according to the importance of the nodes, the problem of error transmission caused by the traditional pipeline model is solved through the advantage complementation of the edge information and the neighbor nodes, the text information can be fully extracted during event detection, and the accuracy of trigger word identification and classification is improved. In addition, when data preprocessing of the text is carried out, through BIO labeling, POS labeling and construction of different dependency syntax diagrams, syntax information can be extracted to the maximum extent, the recognition capability and the tag classification capability of the trigger words are further improved, and therefore the accuracy of event detection is improved.
For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a diagram illustrating a sample event detection in the background art;
FIG. 2 is a flowchart illustrating steps of an event detection method for a converged graph attention and graph convolution network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a process of converting a sentence to be tested into a sentence vector to be tested according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a process of performing BIO labeling on a sentence vector to be detected according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a POS tagging process performed on a sentence vector to be tested according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a structure of a dependency syntax tree according to an embodiment of the present invention;
fig. 7 is a network architecture diagram of an event detection model for a converged graph attention and graph convolution network according to an embodiment of the present invention.
Detailed Description
The existing GNN-based event detection method only adopts single GCN or GAT, so that the extraction of text information is incomplete and the accuracy of event detection is low. According to the method, BIO labeling and POS labeling are adopted during data preprocessing, different dependency syntax graphs are constructed, and then GCN and GAT are combined in parallel during feature extraction, so that semantic and syntax information can be fully extracted, and the accuracy of identifying and classifying the trigger words is improved.
Please refer to fig. 2, which is a flowchart illustrating a method for detecting an event in a converged graph attention and graph convolution network according to the present embodiment. The event detection method comprises the following steps:
s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected;
the obtaining mode of the sentence vector to be detected is as follows: and acquiring a sentence to be detected for event detection, and performing text-to-number coding on each word in the sentence to be detected in a dictionary searching mode.
The dictionary is a built-in dictionary of BERT (Bidirectional Encoder representation model based on converter), and the BERT is a self-coding language model for pre-training the semantic meaning of the word source of the sentence to be tested; according to the input requirement of BERT, adding a classification identifier [ CLS ] at the beginning of a sentence to be detected, adding a sentence separator [ SEP ] at the end of the sentence to be detected, setting a word which cannot be searched and coded in a dictionary as an unknown identifier [ UNK ], and filling the gap with a filling identifier [ PAD ] character for a sentence with insufficient length.
In this example, the baseIn an input routine of an information extraction system of an ACE2005 data set, a sentence to be detected is used as a research object of event detection, and a detection result is a trigger word and a type thereof in the sentence to be detected. Suppose the sentence to be tested is: s ═ w1,w2,...,wn) Wherein w isiRepresents a word, i ∈ [1, n ]]And n is a positive integer. Then, the sentence to be tested is processed into a sentence vector to be tested as follows: s' ═ ([ CLS)],S,[SEP])。
S2: obtaining a BIO vector, a POS vector and a dependency syntax diagram according to the sentence vector to be detected;
the BIO vector is obtained in the following way:BIO labeling is carried out on the sentence vector to be detected, each word is labeled as B-X, I-X or O, and thus BIO vector S is obtainedb'=(w1b,w2b,...,wnb). Wherein, BIO label is a mode of sequence label; x represents a phrase type (e.g., noun phrase, preposition phrase, or verb phrase); b, Begin, which indicates that the word is located at the beginning of the segment; i is Intermediate, which indicates that the word is located in the middle of the segment; o, Other, indicates that it is not of any type, and is used to mark irrelevant characters; B-X indicates that the word belongs to the X type and is positioned at the beginning of the positioned segment, and I-X indicates that the word belongs to the X type and is positioned in the middle of the positioned segment.
Through BIO labeling, the type and the position of each word in a sentence can be labeled, the recognition capability and the tag classification capability of the trigger word are improved, and therefore the accuracy of event detection is improved.
The POS vector acquisition mode is as follows:performing Part-of-speech tagging (POS tagging) on the sentence vector to be tested to obtain a POS vector Sp'=(w1p,w2p,...,wnp). Wherein the dimension of each POS vector is 50 dimensions.
The POS labeling mode can solve the problem that the part of speech of a word cannot be extracted by one-hot (one-hot) codes.
The dependency syntax graph is obtained in the following manner:analyzing to obtain dependency syntax tree of the sentence vector to be tested by dependency technology, forming a node set by words in the dependency syntax tree, and rootingAnd obtaining a dependency edge set according to the dependency arcs in the dependency syntax tree, thereby constructing a dependency syntax graph.
The dependency syntax aims to determine the syntax structure of the sentence to be tested or the dependency relationship between words in the sentence to be tested, the dependency syntax tree is a display form of the dependency syntax, and the words in the dependency syntax tree directly have dependency relationship with each other to form a dependency pair. The dependency relationship of each dependency pair is expressed by a directed arc, called dependency arc, and different dependency relationships exist between different dependency pairs. And constructing a dependency syntax tree of the sentence to be tested by a dependency technology, and then carrying out composition according to the dependency syntax tree.
The dependency syntax diagram is defined as:
G(V,E)=V+E (1)
in the formula, G (V, E) represents a dependency syntax diagram; v ═ V (V)1,v2,...,vn) Representing a set of nodes, node viRepresenting a word w in the sentence to be examined or in the dependency syntax treei;E=(K12,K13,...,Kij) Represents a dependency edge set, KijRepresenting according to the word wiAnd the word wjThe dependency edge constructed by the dependency arc between j ∈ Ni,NiRepresenting a node vi(or node i) a collection of neighbor nodes.
The dependent edge includes a forward edge, a backward edge and a self-circulation edge, then the dependent edge KijIs defined as:
Figure BDA0003565574990000071
in the formula, K (v)i,vj) Indicating a positive side, from the positive side K (v)i,vj) Form a forward dependency edge set EfFurther, a forward dependency syntax diagram G (V, E) ═ V + E is constructedf;K(vj,vi) Indicating a reverse side, from reverse side K (v)j,vi) Composition of an inverse dependency edge set ErAnd further constructing an inverse dependency syntax diagram G (V, E) ═ V + Er;K(vi,vi) Representing self-looping edges, from self-looping edge K (v)i,vi) Composition of self-circulating dependency edge set ElAnd further constructing a self-circulation dependency syntax diagram G (V, E) ═ V + ElThen, there are:
Figure BDA0003565574990000072
namely, the dependency edge set comprises a forward dependency edge set, an inverse dependency edge set and a self-circulation dependency edge set, and the constructed dependency syntax graph also comprises a forward dependency syntax graph, an inverse dependency syntax graph and a self-circulation dependency syntax graph. By constructing three different edge sets to construct different dependency syntax graphs of the same sentence to achieve the best effect of feature extraction, syntax information can be extracted to the maximum extent.
S3: and coding the BIO vector to obtain a BERT vector.
Wherein the dimensionality of each BERT vector is 768 dimensions; in this embodiment, the BIO vector is encoded by a conventional BERT model, which may be specifically a BERT-BASE model.
S4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector; the definition of the word vector is:
xi=[wib,wip] (4)
in the formula, xiRepresents the word wiIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is aibRepresents the word wiOf the BERT vector, wipRepresents the word wiThe POS vector of [,]representing a join operation; wherein the dimension of each word vector is 818 dimensions.
The definition of the annotated sentence vector is:
H=(x1,x2,...,xn) (5)
in the formula, H represents a markup sentence vector.
S5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector;
the method comprises the following steps:
s51: calculating the dependency weight and the attention coefficient of each dependency edge in the dependency syntax graph, calculating to obtain a GCN vector according to the dependency weight, and calculating a GAT vector according to the attention coefficient; the gating mechanism is introduced to reduce noise in the dependency syntax graph when calculating the dependency weights, and the forward edge K (v) is used belowi,vj) The way of acquiring the GCN vector and the GAT vector is described as an example.
The calculation formula of the dependent weight is as follows:
Figure BDA0003565574990000081
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000082
represents the dependency weight of the dependency edge, σ () represents the sigmoid function,
Figure BDA0003565574990000083
a word vector representing the dependent edge start end node,
Figure BDA0003565574990000084
representing a weight matrix in the gating mechanism,
Figure BDA0003565574990000085
representing a bias matrix in a gating mechanism; m represents the number of calculations, and when m is 1,
Figure BDA0003565574990000086
the formula for calculating the GCN vector is:
Figure BDA0003565574990000087
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000088
representing a GCN vector, f () representing an activation function,
Figure BDA0003565574990000089
indicating that the starting end is node viIs determined by the sum of the dependency weights of the dependent edges of (1), j ∈ NiRepresenting a node vjIs a node viOf the neighboring node.
The formula for calculating the attention coefficient is as follows:
Figure BDA00035655749900000810
in the formula (I), the compound is shown in the specification,
Figure BDA00035655749900000811
representing the attention coefficient, representing node vjTo node viOf importance, and node vjIs a node viThe neighbor node of (2); k is as large as NiRepresenting a node vkIs a node viOne of the neighbor nodes of (1); w denotes a transformation matrix by which
Figure BDA0003565574990000091
To node viWord vector of
Figure BDA0003565574990000092
Dimension scaling is carried out to obtain a node viIntermediate vector of (2)
Figure BDA0003565574990000093
Calculating to obtain a node v by the same methodjIntermediate vector of
Figure BDA0003565574990000094
Figure BDA0003565574990000095
Represents node viIntermediate vector of
Figure BDA0003565574990000096
And node vjIntermediate vector of
Figure BDA0003565574990000097
The components are combined together in a splicing mode; a represents a single-layer feedforward neural network, and T represents the transpose of a matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node vjFor node viThe denominator part represents the node viAll neighbor nodes of (2) to node viAnd finally, the sum of the attention coefficients is divided after the sum is operated by an exponential function of a natural constant e, so that the softmax normalization is realized.
The GAT vector is calculated as:
Figure BDA0003565574990000098
in the formula (I), the compound is shown in the specification,
Figure BDA0003565574990000099
representing the GAT vector.
S52: calculating to obtain a fusion vector according to the GCN vector and the GAT vector;
the calculation formula of the fusion vector is as follows:
Figure BDA00035655749900000910
in the formula (I), the compound is shown in the specification,
Figure BDA00035655749900000911
representing the fused vector.
S53: judging whether the calculation times of the fusion vector reaches the preset times, if so, executing the step S6, otherwise, taking the fusion vector as a marked sentence vector, and returning to the step S51; in the present embodiment, if the predetermined number of times is 3, it is determined whether the number of times of calculation reaches 3, that is, whether m is equal to 3.
S6: and carrying out classification and identification on the fusion vector to obtain a trigger word of the sentence to be detected and the classification of the trigger word.
Calculating to obtain a classification feature vector according to the fusion vector, and calculating to obtain the classification probability of each word according to the classification feature vector; outputting the former K words as recognized trigger words according to the classification probability, or setting a probability threshold according to the data distribution of the classification probability, and outputting the words with the classification probability larger than the probability threshold as the trigger words; and finally, judging the event type of the trigger word.
The calculation formula of the classification feature vector is as follows:
Figure BDA00035655749900000912
in the formula, hfinRepresenting the classification feature vector, g () representing a non-linear activation function, and W' and b representing the weight matrix and bias value, respectively.
The calculation formula of the classification probability is as follows:
yi=soft max(hfin) (12)
in the formula, yiRepresents the ith word wiSoftmax () represents the softmax function.
The method of the present invention is described for a single sentence, but the method can be similarly applied to event detection for multiple sentences or text. The invention is illustrated below by means of a specific example.
Please refer to fig. 3, which is a schematic diagram illustrating a process of converting a sentence to be tested into a vector of the sentence to be tested according to this embodiment. Taking The sentence "The sentences of The subjects, viewed between The jungles and The sentence result, adding a classification identifier [ CLS ] at The beginning of The sentence, adding a sentence separator [ SEP ] at The end of The sentence, and then converting words in The sentence into digital codes through a BERT built-in dictionary to obtain a sentence vector to be detected.
Please refer to fig. 4, which is a schematic diagram illustrating a process of performing BIO labeling on a sentence vector to be detected according to this embodiment. In the example sentence shown in the figure, the trigger word is fire, but since fire has only one word and is not a phrase composition, the phrase type X is not marked, and only the start mark B and no intermediate mark I are markedAnd if The labels of other words are all O, The sentence is marked as 'O-The O-blood O-of O-detectors O-screened O-between O-The O-jungles O-camera O-evidence O-that O-The O-jungles O-year O-evidence O-that O-The O-year B-fixed' after BIO marking. For BIO vector Sb'=(w1b,w2b,...,wnb),w1b=O-The,……,w14b=B-fired。
Please refer to fig. 5, which is a schematic diagram illustrating a process of performing POS annotation on a sentence vector to be detected according to this embodiment. Where DT denotes a qualifier (determinant), NN denotes a singular number or quality (Noun) IN a Noun, IN denotes a preposition or subordinate conjunction, NNs denotes a plural number (Noun, public) IN a Noun, VBN denotes a past participle (Verb, past Verb) of a Verb, VBD denotes a past expression (Verb, past Verb) of a Verb, and PRP $ denotes a principal Pronoun (POS), and the sentence is labeled "DT, NN, IN, NNs, VBN, IN, DT, NNs, VBD, NN, IN, PRP $, VBD, VBN". For POS vector Sp'=(w1p,w2p,...,wnp),w1p=DT,……,w14p=VBN。
Then the word vector x1=[O-The,DT],……,x14=[B-fired,VBN];
The annotated sentence vector H ═ x1,x2,...,x14)。
Please refer to fig. 6, which is a schematic structural diagram of the dependency syntax tree according to this embodiment. The dependency relationship of each dependency pair is represented by a directional dependency arc, which is represented as an arrow in fig. 6, the start end of the arrow is a dependent word, and the end of the arrow is a dominant word; the unidirectional arrow represents one dependent arc and the bidirectional arrow represents two dependent arcs. The text on the dependency arcs represents different dependencies, and each dependency arc represents a semantic association between two words, i.e., there is an edge present in the dependency syntax diagram.
After a dependency syntax graph is constructed according to a dependency syntax tree, feature extraction is carried out on the labeled sentence vector and the dependency syntax graph to obtain a fusion vector, classification and identification are carried out on the fusion vector to obtain the classification probability of each word, a plurality of words with high probability are output according to set conditions to serve as trigger words, and the event category of each trigger word is judged.
In The sentence "The object of computers screened between The jungles behind The object and The best of The fixed," The calculated classification probability of The fixed is The highest, if only one trigger word is set to be output (namely, K is 1), The trigger word "fixed" is output and classified as an attack event. In the example of fig. 1, a sentence may include multiple events or multiple trigger words, and the value of K or the probability threshold should be appropriately adjusted during event detection, so as to improve the accuracy and reliability of event detection.
Please refer to table 1, which is an experimental result table of the event detection method provided in this embodiment and the conventional event detection method. Where P denotes the accuracy of event detection (precision), which is defined as:
Figure BDA0003565574990000111
r represents the recall (recall) of event detection, defined as:
Figure BDA0003565574990000112
f1 represents the harmonic mean of correct rate and recall, which is defined as:
Figure BDA0003565574990000113
as can be seen from table 1, the accuracy and the recall rate of the event detection method provided by the present embodiment are both greater than those of other existing event detection technologies; the F1 value integrates the results of the accuracy and the recall rate, and the F1 value is obviously superior to the prior art, which shows that the event detection method provided by the invention has obvious effectiveness, accuracy and reliability.
Table 1 comparison table of detection results of the event detection method provided in this embodiment and the existing event detection method
Figure BDA0003565574990000114
Based on the event detection method, the invention also provides an event detection model fusing graph attention and graph convolution network. Please refer to fig. 7, which is a network architecture diagram of an event detection model for a fusion graph attention and graph convolution network provided in this embodiment. The event detection model comprises a data preprocessing module, at least one fusion map neural network module and a classification identification module.
The data preprocessing module comprises a sentence vectorization unit, a BIO labeling unit, a POS labeling unit, a dependency syntax analysis unit, a BERT embedding unit and a labeled sentence acquisition unit. The sentence vectorization unit converts words in the sentence to be detected from text into digital codes to obtain a sentence vector to be detected; the BIO labeling unit carries out BIO labeling on the sentence vector to be detected, and labels each word as B-X, I-X or O, so as to obtain a BIO vector; the POS marking unit marks the POS of the sentence vector to be detected to obtain a POS vector; the dependency syntax analyzing unit analyzes the dependency syntax tree of the sentence to be tested through a dependency technology, the words in the dependency syntax tree form a node set, and a dependency edge set is obtained according to a dependency arc in the dependency syntax tree, so that a dependency syntax graph is constructed; the BERT embedding unit encodes the BIO vector to obtain a BERT vector; the labeled sentence acquisition unit combines the BERT vector and the POS vector to obtain a word vector, and acquires a labeled sentence vector according to the word vector.
And the fusion graph neural network module performs feature extraction on the labeled sentence vector and the dependency syntax graph to obtain a fusion vector. The module comprises a GCN unit, a GAT unit and a fusion unit which are arranged in parallel. The GCN unit calculates the dependency weight of each dependency edge in the dependency syntax graph and obtains a GCN vector according to the dependency weight calculation; the GAT unit calculates an attention coefficient of each dependency edge in the dependency syntax graph and calculates a GAT vector according to the attention coefficient; and the fusion unit calculates to obtain a fusion vector according to the GCN vector and the GAT vector.
In this embodiment, the event detection model sequentially overlaps three fusion graph neural network modules, a fusion vector output by a previous fusion graph neural network module is used as a labeled sentence vector input by a next fusion graph neural network module, and a fusion vector calculated by a last fusion graph neural network module is directly input into the classification and identification module.
And the classification identification module is used for performing classification identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected, and comprises a full connection layer and a softmax layer. The full-connection layer calculates to obtain a classification feature vector according to the fusion vector; and the softmax layer calculates the classification probability of each word according to the classification characteristic vectors, outputs the words with high classification probability as trigger words according to set conditions, and finally judges the event type of the trigger words.
Compared with the prior art, the event detection method provided by the invention combines the graph convolution neural network and the graph attention neural network, can dynamically learn the neighbor weights, can distribute different weights to neighbor nodes according to the importance of the nodes, and can fully extract text information during event detection through the complementary advantages of the two, thereby improving the accuracy of identifying and classifying trigger words. In addition, when data preprocessing of the text is carried out, through BIO labeling, POS labeling and construction of different dependency syntax diagrams, syntax information can be extracted to the maximum extent, the recognition capability and the tag classification capability of the trigger words are further improved, and therefore the accuracy of event detection is improved.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (10)

1. An event detection method for fusing graph attention and graph convolution network is characterized by comprising the following steps:
s1: vectorizing the sentence to be detected to obtain a sentence vector to be detected;
s2: obtaining a BIO vector, a POS vector and a dependency syntax diagram according to the sentence vector to be detected;
s3: coding the BIO vector to obtain a BERT vector;
s4: combining the BERT vector and the POS vector to obtain a word vector, and obtaining a labeled sentence vector according to the word vector;
s5: performing feature extraction on the marked sentence vector and the dependency syntax diagram to obtain a GCN vector and a GAT vector, and calculating according to the GCN vector and the GAT vector to obtain a fusion vector;
s6: and carrying out classification and identification on the fusion vector to obtain the trigger words and the classifications of the sentences to be detected.
2. The event detection method for a converged graph attention and graph convolution network of claim 1, wherein:
the obtaining mode of the sentence vector to be detected is as follows: performing text-to-number coding on each word in the sentence to be detected in a dictionary lookup manner to obtain the vector of the sentence to be detected;
the BIO vector is obtained in the following mode: carrying out BIO labeling on the sentence vector to be detected, and labeling each word as B-X, I-X or O, thereby obtaining the BIO vector;
the POS vector is obtained in the following mode: performing POS labeling on the sentence vector to be detected to obtain a POS vector;
the dependency syntax graph is obtained in the following mode: and analyzing to obtain a dependency syntax tree of the sentence vector to be tested by a dependency technology, forming a node set by words in the dependency syntax tree, and obtaining a dependency edge set according to a dependency arc in the dependency syntax tree, thereby constructing the dependency syntax graph.
3. The event detection method of a converged graph attention and graph convolution network of claim 2, wherein:
the dependency edge set comprises a forward dependency edge set, a backward dependency edge set and a self-circulation dependency edge set;
the dependency syntax diagrams include forward dependency syntax diagrams, inverse dependency syntax diagrams, and self-looping dependency syntax diagrams.
4. The event detection method for a converged graph attention and graph convolution network of claim 1, wherein:
the definition of the word vector is:
Figure FDA0003565574980000011
in the formula, xiRepresents the word wiIs the word vector of (i ∈ [1, n ]]N is a positive integer; w is aibRepresents the word wiOf the BERT vector, wipRepresenting a word wiThe POS vector of [,]representing a join operation;
the definition of the tagged sentence vector is as follows:
H=(x1,x2,...,xn) (5)
in the formula, H represents a markup sentence vector.
5. The method for detecting the event fusing the graph attention and the graph convolution network according to claim 4, wherein the step S5 specifically comprises:
s51: calculating a dependency weight and an attention coefficient of each dependency edge in the dependency syntax diagram, calculating to obtain the GCN vector according to the dependency weight, and calculating the GAT vector according to the attention coefficient;
s52: calculating to obtain the fusion vector according to the GCN vector and the GAT vector;
s53: and judging whether the calculation frequency of the fusion vector reaches the preset frequency, if so, executing the step S6, otherwise, taking the fusion vector as the tagged sentence vector, and returning to the step S51.
6. The event detection method of a converged graph attention and graph convolution network of claim 5, wherein:
the calculation formula of the dependency weight is as follows:
Figure FDA0003565574980000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003565574980000022
represents the dependency weight of the dependency edge, σ () represents a sigmoid function,
Figure FDA0003565574980000023
a word vector representing the start node of the dependency edge,
Figure FDA0003565574980000024
representing a weight matrix in the gating mechanism,
Figure FDA0003565574980000025
represents a bias matrix in the gating mechanism, m represents the number of computations, and when m is equal to 1,
Figure FDA0003565574980000026
the calculation formula of the GCN vector is as follows:
Figure FDA0003565574980000027
in the formula (I), the compound is shown in the specification,
Figure FDA0003565574980000028
representing the GCN vector, f () representing an activation function,
Figure FDA0003565574980000029
indicating that the starting end is node viIs determined by the sum of the dependency weights of the dependent edges of (a), j ∈ NiRepresenting a node vjIs the node viOf the neighboring node.
7. The event detection method of a converged graph attention and graph convolution network of claim 6, wherein:
the calculation formula of the attention coefficient is as follows:
Figure FDA00035655749800000210
in the formula (I), the compound is shown in the specification,
Figure FDA00035655749800000211
representing said attention coefficient, representing said node vjFor the node viAnd said node vjIs the node viThe neighbor node of (2); k belongs to NiRepresenting a node vkIs the node viOne of the neighbor nodes of (1); w represents a transformation matrix by which W is the pair of the nodes viWord vector of
Figure FDA00035655749800000212
Performing dimension scaling to obtain the node viIntermediate vector of
Figure FDA00035655749800000213
The node v is obtained by the same calculationjIntermediate vector of
Figure FDA00035655749800000214
Figure FDA00035655749800000215
Represents the node viIntermediate vector of (2)
Figure FDA00035655749800000216
And said node vjIntermediate vector of
Figure FDA00035655749800000217
The components are combined together in a splicing mode; a represents a single layer of a feed-forward neural network,t represents the transpose of the matrix; LeakyReLU represents an activation function for performing nonlinear processing; the molecular part of formula (8) represents the node vjFor the node viThe denominator part represents the node viAll neighbor nodes of (2) to the node viThe sum of the attention coefficients is finally divided after the operation of an exponential function of a natural constant e, so that the softmax normalization is realized;
the GAT vector has the calculation formula as follows:
Figure FDA0003565574980000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003565574980000032
representing the GAT vector.
8. The event detection method of a converged graph attention and graph convolution network of claim 7, wherein:
the calculation formula of the fusion vector is as follows:
Figure FDA0003565574980000033
in the formula (I), the compound is shown in the specification,
Figure FDA0003565574980000034
representing the fused vector.
9. The method for detecting the event fusing the graph attention and the graph convolution network according to claim 1, wherein the step S6 specifically comprises:
calculating to obtain a classification feature vector according to the fusion vector, and calculating to obtain the classification probability of each word according to the classification feature vector; outputting the first K words as trigger words according to the classification probability, or setting a probability threshold according to the data distribution of the classification probability, and outputting the words with the classification probability larger than the probability threshold as the trigger words; and judging the event type of the trigger word.
10. The event detection method of a converged graph attention and graph convolution network of claim 9, wherein:
the calculation formula of the classification feature vector is as follows:
Figure FDA0003565574980000035
in the formula, hfinRepresenting the classification feature vector, g () representing a non-linear activation function, W' and b representing a weight matrix and an offset value, respectively;
the calculation formula of the classification probability is as follows:
yi=softmax(hfin) (12)
in the formula, yiRepresents the ith word wiSoftmax () represents a softmax function.
CN202210301353.XA 2022-03-25 2022-03-25 Event detection method integrating graph attention and graph convolution network Pending CN114647730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210301353.XA CN114647730A (en) 2022-03-25 2022-03-25 Event detection method integrating graph attention and graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210301353.XA CN114647730A (en) 2022-03-25 2022-03-25 Event detection method integrating graph attention and graph convolution network

Publications (1)

Publication Number Publication Date
CN114647730A true CN114647730A (en) 2022-06-21

Family

ID=81994647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210301353.XA Pending CN114647730A (en) 2022-03-25 2022-03-25 Event detection method integrating graph attention and graph convolution network

Country Status (1)

Country Link
CN (1) CN114647730A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587597A (en) * 2022-11-23 2023-01-10 华南师范大学 Sentiment analysis method and device of aspect words based on clause-level relational graph
CN115774993A (en) * 2022-12-29 2023-03-10 广东南方网络信息科技有限公司 Conditional error identification method and device based on syntactic analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587597A (en) * 2022-11-23 2023-01-10 华南师范大学 Sentiment analysis method and device of aspect words based on clause-level relational graph
CN115587597B (en) * 2022-11-23 2023-03-24 华南师范大学 Sentiment analysis method and device of aspect words based on clause-level relational graph
CN115774993A (en) * 2022-12-29 2023-03-10 广东南方网络信息科技有限公司 Conditional error identification method and device based on syntactic analysis
CN115774993B (en) * 2022-12-29 2023-09-08 广东南方网络信息科技有限公司 Condition type error identification method and device based on syntactic analysis

Similar Documents

Publication Publication Date Title
CN112163416B (en) Event joint extraction method for merging syntactic and entity relation graph convolution network
CN111737496A (en) Power equipment fault knowledge map construction method
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN111401077B (en) Language model processing method and device and computer equipment
CN112231447B (en) Method and system for extracting Chinese document events
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN109684642B (en) Abstract extraction method combining page parsing rule and NLP text vectorization
CN108628828A (en) A kind of joint abstracting method of viewpoint and its holder based on from attention
CN113157859B (en) Event detection method based on upper concept information
CN114647730A (en) Event detection method integrating graph attention and graph convolution network
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN112733547A (en) Chinese question semantic understanding method by utilizing semantic dependency analysis
CN111091009B (en) Document association auditing method based on semantic analysis
CN115392254A (en) Interpretable cognitive prediction and discrimination method and system based on target task
Nejad et al. A combination of frequent pattern mining and graph traversal approaches for aspect elicitation in customer reviews
CN115455202A (en) Emergency event affair map construction method
CN114881043A (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN113239694B (en) Argument role identification method based on argument phrase
Shahade et al. Multi-lingual opinion mining for social media discourses: an approach using deep learning based hybrid fine-tuned smith algorithm with adam optimizer
CN113378024A (en) Deep learning-based public inspection field-oriented related event identification method
CN116702753A (en) Text emotion analysis method based on graph attention network
CN114707508A (en) Event detection method based on multi-hop neighbor information fusion of graph structure
CN113901813A (en) Event extraction method based on topic features and implicit sentence structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination