CN116383387A

CN116383387A - Combined event extraction method based on event logic

Info

Publication number: CN116383387A
Application number: CN202310363825.9A
Authority: CN
Inventors: 宋胜利; 段欣荣; 李靖阳; 胡光能
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-07-04

Abstract

The invention discloses a method for extracting a combined event based on a rational logic, which comprises the following steps: inputting sentences into a sentence logical relation extraction model to obtain event relation pairs in the sentences; inputting each event in the event relation pair in the sentence into a joint event extraction model based on a graph attention network to obtain an extraction result corresponding to the sentence; the extraction result includes trigger word classification and argument classification. The method and the device improve the accuracy of extracting a plurality of events in the sentence.

Description

Combined event extraction method based on event logic

Technical Field

The invention relates to the technical field of event extraction, in particular to a method for extracting a combined event based on a rational logic.

Background

With the rapid development of internet and text mining technologies, related research of event tasks is increasingly paid attention to by researchers, and a text often contains a plurality of events, and the events may be all described around the same topic. Between these events, there are a variety of rational logic, such as time series, cause and effect, conditions, turns, etc., which by analyzing these rational logic can be used to more deeply understand the evolution and progress of the events in the text and to help infer relationships between the events. Event extraction is an important task to extract structured event information from unstructured data. Typically comprising four sub-tasks: trigger word recognition, event type detection, event argument recognition, and argument character detection. The research methods of sentence-level event extraction can be classified into pipeline-based and joint-based methods. The pipeline mode firstly identifies the event type, and then extracts event arguments; the combination mode avoids the influence of the extraction error of the trigger words on the extraction of the argument by combining the trigger words and the argument. Event extraction is useful in many fields, for example, storing extracted event information in a knowledge base can provide useful information for information retrieval, and thus knowledge reasoning.

The prior art scheme is as follows:

the patent application of the national academy of sciences automation institute, namely an event extraction method, an event extraction device, electronic equipment and a storage medium (patent number: 202110827424.5), provides an event extraction method, which comprises the following steps: inputting a document to be extracted into an event extraction model, wherein the model comprises a sentence-level feature extraction layer, a document-level feature extraction layer, a feature decoding layer and an event prediction layer; the sentence-level feature extraction layer encodes each sentence in the document to be extracted by using a transducer model to obtain a corresponding context feature vector and event element representation vector; the document-level feature extraction layer then extracts features to obtain a document coding vector and a document event element representation vector; the feature decoding layer analyzes and obtains a role relation expression vector, an event relation expression vector and an event-to-role relation expression vector; and finally, extracting a plurality of events in an event prediction layer, realizing the distribution of event elements, and outputting a prediction result. The method has the defects that in the extraction of the events, only the characteristics of the sentence sequence are considered, but the syntactic characteristics of the sentences are ignored, so that the model is difficult to acquire the correlation of a plurality of events in one sentence, and different weight information is not given to different characteristics.

Patent application "causal relation extraction method, device, electronic equipment and readable storage medium" (patent number: 202210308591.3) of Beijing Ming Zhaohui technology Co., ltd. Proposes an event causal relation extraction method comprising the steps of: word segmentation operation is carried out on the text to be extracted to obtain a plurality of unit words, and part-of-speech tagging is carried out on each unit word to obtain part-of-speech identifiers corresponding to each unit word; acquiring a preset event rule set, and combining part-of-speech identifiers with unit words matched with event sub-rules in the preset event rule set to obtain a plurality of unit events; and obtaining a rule model after training, inputting the unit event into the rule model after training, and obtaining a causal relation extraction result of the text to be extracted through the output of the rule model after training. The disadvantage of this approach is that the dependency between words is not taken into account, nor is external lexical information used, so that the semantics of the characters are not fully exploited. In addition, although the accuracy of the method for artificially constructing rules is high for specific fields, the portability is not high, the generalization is weak, and the method cannot be widely used for data in various fields.

The patent application of Shanxi university, namely a chapter-level event extraction method and device based on multi-granularity entity iso-composition (patent number: 202210348614.3), provides a chapter-level event extraction method, which comprises the following steps: the entity extraction is respectively carried out by using context information based on sentences and paragraphs, and the entity sets with two granularities are fused based on a multi-granularity entity selection strategy, so that the entity extraction precision is improved; combining sentences and the screened candidate entities, constructing an iso-composition integrated with multi-granularity entities, and obtaining entities with chapter-level context perception and vectorized representations of the sentences by using a graph convolution network, so that the perception capability of the sentences and the entities on events is improved; finally, multi-label classification of event types and event arguments is carried out, so that event detection and argument identification are realized. The disadvantages of this method are: the dependency relationship among the words is not constructed, and attention weight information is not calculated for different features, so that important text features do not play more roles in the output result.

Drawbacks of the prior art include:

1. in the extraction of the events, only the features of the sentence sequence are considered, but the syntactic features of the sentences are ignored, so that the model is difficult to acquire the correlation of a plurality of events in one sentence, and different weight information is not given to different features.

2. The dependency relationship between words is not considered, and external vocabulary information is not used, so that the event boundary is fuzzy and difficult to determine. In addition, although the accuracy of the method for artificially constructing rules is high for specific fields, the portability is not high, the generalization is weak, and the method cannot be widely used for data in various fields.

3. The dependency relationship among the words is not constructed, and attention weight information is not calculated for different features, so that important text features do not play more roles in the output result.

Disclosure of Invention

In view of the above, the present invention provides a method for extracting a join event based on a rational logic, so as to solve the above technical problems.

The invention discloses a method for extracting a combined event based on a rational logic, which comprises the following steps:

inputting sentences into a sentence logical relation extraction model to obtain event relation pairs in the sentences;

inputting each event in the event relation pair in the sentence into a joint event extraction model based on a graph attention network to obtain an extraction result corresponding to the sentence; the extraction result comprises trigger word classification and argument classification;

the event logic relation extraction model comprises a coding layer, a feature extraction layer and an event relation identification layer;

Inputting sentences into a sentence logical relation extraction model to obtain extraction results corresponding to the sentences, wherein the extraction results comprise:

inputting sentences into a coding layer to obtain text feature matrixes corresponding to the sentences output by the coding layer;

inputting the text feature matrix into the feature extraction layer to obtain a global and local feature representation matrix output by the feature extraction layer;

and inputting the global and local feature representation matrixes into the event relation recognition layer to recognize event relation pairs in sentences.

Further, the inputting the sentence into the coding layer to obtain the text feature matrix corresponding to the sentence output by the coding layer includes:

inputting sentences into an embedding layer in the coding layer to convert each word in the sentences into word vectors, and generating word vector representation matrixes after the word vectors are coded by a BERT model;

introducing an external dictionary by using a softLexicon method, matching characters in sentences with the dictionary to obtain words corresponding to the characters, and respectively putting the words into four word sets according to the positions of the characters in the words: B. m, E, S; the word set respectively indicates that the position of the character forms a word at the beginning, the middle part and the end of the word and independently;

After four word sets of each character in a sentence are obtained, each word set is expressed as a vector with a fixed length, word frequency is used as a weight coefficient of each word, word vectors of all words in each set are embedded for weighted calculation, and the vectors of the word sets of each character are respectively obtained;

splicing vectors of four word sets corresponding to a character into BERT word vectors corresponding to the character to obtain a new word vector tableIndication matrix X ₁ ；

Different weights are given to trigger word features of the events, sequence features of the events and relation connecting word features to be fused, and a multidimensional feature matrix X is obtained ₂ ；

X is to be ₂ And X is ₁ Splicing to obtain final text feature matrix

Further, the inputting the text feature matrix into the feature extraction layer to obtain a global and local feature representation matrix output by the feature extraction layer includes:

inputting the text feature matrix into the convolution layer of the feature extraction layer to obtain the final feature representation D of the multi-layer convolution layer _CNN ；D _CNN ∈R ^n×m Each row of the list represents the vocabulary level features extracted by multi-layer convolution for each word; m is the number of convolution kernels, and n is the number of words in the sentence;

After carrying out maximum pooling operation on all words, obtaining a matrix P, wherein P= [ P ] ₁ ,p ₂ ,...,p _n ]，p _i The vector is obtained after the i-th word is subjected to the maximum pooling operation;

will D _CNN Inputting the self-attention layer of the feature extraction layer to obtain vocabulary level features D' _CNN ，D' _CNN ∈R ^n×m ；

Inputting the text feature matrix into the bi-directional gating circulation unit of the feature extraction layer to obtain an output matrix H _GRU The method comprises the steps of carrying out a first treatment on the surface of the Wherein the bidirectional gating circulation unit consists of a forward GRU and a reverse GRU, the number of hidden units is set to be s, then H _GRU ∈R ⁿ ^×(2×s) ；H _GRU Each row of each word represents a two-way gating of each word the feature of sentence level extracted by the circulation unit;

handle matrix H _GRU Inputting the other self-attention layer of the feature extraction layer to obtain sentence-level features H' _GRU ；

Will H' _GRU 、D' _CNN Inputting the global attention mechanism layer of the feature extraction layer to obtain an output feature matrix G;

matrix P and matrix H _l Output matrix spliced to global attention layer and outputting global and local characteristic representation matrix

Wherein H is _l The output matrix of the last one-dimensional hidden layer of the two-way gating circulating unit layer.

Further, the event relationship identification layer adopts a conditional random field CRF model;

let a tag sequence of CRF output be L= [ L ] ₁ ,l ₂ ,...,l _n ]The total score for one tag sequence L is then:

wherein A is a transfer score matrix,

the slave label l _i To label l _i+1 Transition probability of->

Indicating that the ith character is in label l _i A lower score;

maximizing the correct tag sequence L ^* The calculation method of the objective function of the log likelihood estimation model and the log likelihood estimation function of the log likelihood estimation model are as follows:

the loss function loss of the model is defined as loss= -log (P (L ^* Z), the parameters are optimized by back-propagation.

Further, putting all the events in the event relation pair in the sentence into a set to form a text set, and inputting the text set into a joint event extraction model based on a graph attention network to obtain a extraction result corresponding to the sentence, wherein the method comprises the following steps:

representing word vectors into matrix X ₁ Part-of-speech embedding matrix X ₂ Entity class embedding matrix X ₃ Splicing together to obtain a text feature matrix X;

inputting the text feature matrix X into a Bi-LSTM model of a two-way long-short-term memory network to obtain an output matrix H _LSTM ；

Performing dependency syntactic analysis on the sentence by using the DDParser to obtain a syntactic dependency graph, and expanding the syntactic dependency graph;

taking characteristic nodes and relation edges of the syntactic dependency graph as m-th layer input of an N-order graph semantic neural network, wherein the graph semantic neural network is used for inputting characteristic v of each node in the graph _i Performing polymerization calculation to obtain polymerization characteristics v' _i The method comprises the steps of carrying out a first treatment on the surface of the Finally, an output V 'set of the graph attention network layer is obtained, wherein the number of nodes in the V' set is n+k+m;

the method comprises the steps of performing joint extraction on trigger words and argument by using a trigger word and argument identification layer in a joint event extraction model based on a graph attention network, performing multi-classification tasks by using a BIO labeling method, inputting an output matrix O of the upper layer into a full-connection layer, obtaining a matrix O' after an activation function, and then performing normalization operation on all types of vectors by connecting with a softmax layer, thereby realizing event trigger word classification;

after the candidate trigger words are obtained, performing argument classification on the entity list in the sentence by using the O' output matrix; the plurality of word vectors contained in the trigger words are averaged and pooled to obtain the vector representation T of the candidate trigger words _i Then T is taken _i And other vectors E for each word _j Spliced and input into a fully connected network, and a softmax layer is connected to realize the meta classification.

Further, in the word vector representation matrix X ₁ Part-of-speech embedding matrix X ₂ Entity class embedding matrix X ₃ Before the text feature matrix X can be obtained by splicing together, the method is also packaged The method comprises the following steps:

generating a word vector representation matrix X through ERNIE model coding in the joint event extraction model based on the graph attention network ₁ ；

The joint event extraction model based on the graph attention network carries out word segmentation and part-of-speech tagging on event texts in the input sentences to finally obtain a part-of-speech embedding matrix X corresponding to the sentences S ₂ ；

Entity class marking is carried out on the text according to BIO marking rules, then random initialization is carried out, back propagation is carried out for optimization, a trained entity class vector is obtained, entity class embedded representation corresponding to each word is obtained, and finally an entity class embedded matrix X corresponding to a sentence S is obtained ₃ 。

Further, the joint event extraction model based on the graph attention network performs word segmentation and part-of-speech tagging on the event text in the input sentence to finally obtain a part-of-speech embedding matrix X corresponding to the sentence S ₂ Comprising:

the joint event extraction model based on the graph attention network carries out word segmentation and part-of-speech tagging on event texts in input sentences, then marks the part of speech of each word according to BIO tagging rules, the tags comprise B-pos, I-pos and E-pos, words consisting of single characters are represented by S-pos, pos refers to the part of speech of each word, then the parts of speech of each word are randomly initialized and are optimized through back propagation, trained part-of-speech vectors are obtained, part-of-speech embedding representation corresponding to each part of speech is obtained, and finally part-of-speech embedding matrix X corresponding to sentence S is obtained ₂ 。

Further, the implementation process of expanding the syntax dependency graph is as follows:

two word vector nodes v defining paths between any _i ,v _j The shortest path of (2) is p _ij The edge between any two adjacent term vector nodes is defined as (w _m ,w _m+1 )；w _i Refers to the ith term vector node;

fusing the characteristics of all nodes on the shortest path of two word vector nodes by adopting a BiGRU network, and leading the nodes to GRUThe outputs are respectively

And->

Will->

And->

Splicing to obtain a fused feature vector h, namely the output +.f of BiGRU at the time t>

Taking the two nodes as surrounding nodes of the two nodes respectively;

finally, an extended syntax dependency graph g= (V, E) is obtained, where V is a set of nodes, comprising three subsets V _c 、V _w And V _b ，V _c Is a set of n character vector nodes, n is sentence length, V _w Is the collection of k word vector nodes after word segmentation, V _b The size of the surrounding node set of each word vector node calculated by the shortest path algorithm is m.

Further, the graph annotation force network is characterized by v for each node in the syntactic dependency graph _i Performing polymerization calculation to obtain polymerization characteristics v' _i The calculation method of (2) is shown in the following formula:

wherein K is the number of attention heads, W ^k Is the weight matrix of the kth attention header relative to the node,

is to calculate the weight coefficient of the kth attention, N _i Is node v _i All in a syntactic dependency graphNeighbor node v _j σ is a nonlinear activation function;

through the calculation, an output V 'set of the graph attention network layer is obtained, the number of nodes in the V' set is n+k+m, but k word vector nodes and m surrounding nodes do not need to be classified in the subsequent classification process, so that the k word vector nodes and m surrounding nodes are discarded, only the first n character nodes are left, and the first n character nodes are converted into matrix representation O.

Further, the following formula is adopted to realize event trigger word classification:

O'＝tanh(W _O O+b _O )

wherein, the liquid crystal display device comprises a liquid crystal display device,

is the trigger word type probability distribution of the ith entity, W _T Is a parameter matrix for event-triggered word classification,

wherein n is _T Is the number of event types, n _c Representing vector dimension size;

the meta classification is achieved by the following formula:

is the probability distribution, W, of the role played by the jth entity in the event triggered by the ith candidate trigger word _A Is a parameter matrix of event argument classification, +.>

n _A Is the number of argument types.

Due to the adoption of the technical scheme, the invention has the following advantages: aiming at the problem that the extracted features lack of syntax information, the invention inputs the dependency syntax analysis result into a graph attention network, so that the syntax structure features can be learned; aiming at the problems of fuzzy and difficult determination of event boundaries, the method introduces external vocabulary information and various feature vector representations, considers two different levels of features, namely vocabulary-level features and sentence-level features, and solves the problems of fuzzy event boundaries and incomplete feature selection; aiming at the problem that important text features are not utilized, the invention uses the graph annotation force network to perform feature aggregation on the syntactic dependency graph in event extraction, uses multiple attention mechanisms to construct dependency relationships among words in event logic relation extraction, calculates different attention weights for different features, and improves the accuracy of extracting a plurality of events in sentences.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and other drawings may be obtained according to these drawings for those skilled in the art.

FIG. 1 is a diagram of a multi-feature fusion-based rational logic relation extraction model framework in an embodiment of the invention;

FIG. 2 is a block diagram of a BiGRU according to an embodiment of the invention;

FIG. 3 is a diagram of a federated event extraction model framework based on a graph attention network in accordance with an embodiment of the present invention;

FIG. 4 is a diagram of a syntax dependency graph in accordance with an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and examples, wherein it is apparent that the examples described are only some, but not all, of the examples of the present invention. All other embodiments obtained by those skilled in the art are intended to fall within the scope of the embodiments of the present invention.

Referring to fig. 1, the present invention provides an embodiment of a method for extracting a join event based on a rational logic, which mainly includes two major modules, namely, a rational logic relation extraction model and a join event extraction model. Firstly, a multi-feature fusion-based rational logic relation extraction model is divided into three layers: the frame diagram of the coding layer, the feature extraction layer and the relation recognition layer is shown in figure 1.

1. And constructing text feature vectors.

The invention adopts BERT as the encoder of the model, and in order to further extract semantic information and abstract features from texts, the invention improves the effect of the whole model through fine tuning. Given sentence S, the embedding layer converts each word into a d-dimensional word vector, and finally generates a word vector representation matrix X after BERT model coding ₁ 。

Next, the invention uses a softlecicon method to introduce external word information, matches characters in sentences with a dictionary to obtain a plurality of words corresponding to the characters, and respectively puts the words into four word sets according to the positions of the characters in the words: B. m, E, S. These word sets represent the position of the character at the beginning, middle, end, and alone constitute a word, respectively. In the event that no word is found for the corresponding set of words after matching the dictionary, the corresponding set of words is populated with None. The contents of the four word sets are shown in formula (1).

Wherein L represents an external dictionary, c _i Representing the i-th character in the sentence.

After four word sets of each character in a sentence are obtained, each word set is expressed as a vector with a fixed length, in the calculation process, word frequency is used as a weight coefficient of each word, word vectors of all words in each set are embedded for weighted calculation, and the vector calculation process of the i-th character set S is shown as a formula (2).

a vector representation representing the generated set S, Z representing the sum of the frequencies of occurrence in the dictionary of all words in the four word set, Z (w) representing the frequency of occurrence in the dictionary of word w, x _w Word vectors representing words w in set S are embedded.

Finally, the vectors of the four word sets corresponding to one character are spliced into the BERT word vector corresponding to the character, thereby obtaining a new X ₁ A matrix.

The present invention then vector-represents three features related to the logical relationship of matter: trigger word features of events, sequence features of events, relational junction word features. The three multi-dimensional feature vectors are endowed with different weights to be fused, so that a matrix X is obtained ₂ The calculation process is shown in formula (3).

Where alpha, beta, gamma are weights assigned to the individual features,

representing stitching the matrix.

After obtaining the fused multi-dimensional characteristic matrix X ₂ After that, it is combined with X ₁ Splicing the matrixes to obtain a final text feature matrix

2. Vocabulary level and sentence level feature extraction.

The input of the convolution layer is a text feature matrix X, and the convolution operation is the convolution kernel and the input matrixProduct operation by using a convolution kernel W (W ε R) with a window size W ^w×n ) Sliding a window over the input matrix X to obtain each word X _i Local context feature d of (2) _i . If there are no other words before and after the word, zero padding supplements the sentence.

Setting x _i,i+j Is the vector x from i to i+j _i ,x _i+1 ,...,x _i+j Each word x _i Local context feature d of (2) _i The calculation process of (2) is shown in the formula (4).

d _i ＝σ(W*x _i-w/2:i+w/2 +b) (4)

Wherein σ (·) is a nonlinear activation function, x _i-w/2:i+w/2 The input vector representations in the range of i-w/2 to i+w/2 in the word window w, b being the bias parameter.

For the input matrix X, after passing through the convolution network, the output feature vector d is shown as formula (5), where n represents the length of the sentence.

d＝[d ₁ ,...,d _i ,...,d _n ] (5)

In order to capture local features with different granularities, the invention uses a two-layer CNN structure to acquire important information in an input sequence, wherein the size of a first-layer CNN convolution kernel is 1, the dimension after feature output can be reduced, the second-layer CNN has two CNNs with 2 and 3 convolution kernels respectively, and the calculation result of the first-layer CNN is respectively input into the CNNs with two different sizes of the second-layer CNN, so that more abstract features can be obtained. Finally, splicing the output results of the two CNNs to obtain the final characteristic representation D of the multi-layer convolution layer _CNN 。

Wherein, let the number of convolution kernels be m, then D _CNN ∈R ^n×m 。D _CNN Each line represents a vocabulary-level feature extracted by multi-layer convolution for each word.

Characterization d of each word w through a multi-layer convolutional network _w Performing maximum pooling operation to reduce the dimension of the cells to obtain p _i Vector. And then carrying out maximum pooling operation on all words to obtain a matrix P, as shown in a formula (6).

P＝[p ₁ ,p ₂ ,...,p _n ] (6)

Finally, the multi-layer convolution layer has two output representations, matrix D _CNN As input to the subsequent self-attention layer, matrix P is stitched into the output matrix of the global attention layer as input to the event relationship identification layer.

The self-attention mechanism distributes different weights to the vocabulary according to the importance degree, considers the relation among the words in the global, and finally obtains the feature D 'of the vocabulary level' _CNN ，D' _CNN ∈R ^n×m 。

Because the traditional cyclic neural network RNN has the problems of gradient disappearance and gradient explosion when processing long sentences, the invention introduces the gating cyclic unit, GRU can better solve the gradient disappearance problem and acquire long-term dependency.

To obtain the output at time t, the current time conceals state h _t On the basis of (1), performing an operation to obtain y _t As shown in equation (7).

y _t ＝σ(W _y h _t ) (7)

Since the states in the unidirectional GRU are calculated from the front-to-back transmission, without considering the effect of the context on the state of the context, the invention feeds the input matrix X into a Bi-gating loop unit (Bi-directional Gated Recurrent Unit, biGRU) in order to extract the context semantic features of the text. BiGRU consists of forward GRU and reverse GRU, the principle of which is shown in FIG. 2. Output matrix H _GRU The expression is shown in formula (8).

representing the output of the forward GRU, +.>

Representing the output of the inverted GRU.

Finally, the BiGRU layer has two output representations, matrix H _GRU As input to the subsequent self-attention layer; output matrix H of last one-dimensional hidden layer _l Spliced into the output matrix of the global attention layer as input to the event relationship identification layer. Will also H _GRU Inputting into the self-attention layer, and calculating to obtain sentence level feature H' _GRU 。

And inputting the results of the left channel and the right channel into a global attention mechanism layer for further processing to obtain final representation characteristics. Using the attention weight alpha between the two matrices _i,j Feature matrix H 'output to BiGRU channel' _GRU And (3) carrying out weighting operation to obtain an output characteristic matrix G of the global attention layer, as shown in a formula (9).

Finally, matrix P and matrix H _l Splicing the input matrix of the event relationship recognition layer to the output matrix of the global attention layer to obtain the input matrix of the event relationship recognition layer

3. And a layer for identifying the fact logical relationship.

The conditional random field CRF model uses a transfer matrix to consider the correlation and constraints between tags, thereby obtaining a globally optimal tag sequence. For an input sentence, the global and local feature representation matrix z= [ Z ] is obtained in the feature extraction layer described above ₁ ,z ₂ ,...,z _n ]. Let a tag sequence of CRF output be L= [ L ] ₁ ,l ₂ ,...,l _n ]The total score of a tag sequence L is calculated as shown in equation (10).

Wherein A is a transfer score matrix,

the slave label l _i To label l _i+1 Transition probability of->

Indicating that the ith character is in label l _i Lower score.

The purpose of the CRF optimization function is to let the correct tag sequence L ^* The greater the specific gravity of all tags, the better, i.e. let P (L ^* I Z) is larger and better, maximizing the correct tag sequence L ^* The method of calculating the objective function of the model is shown in formula (11).

Finally, the loss function loss of the model is defined as loss= -log (P (L ^* Z), the parameters are optimized by back propagation.

After identifying the event relation pairs in sentences, the invention constructs a joint event extraction model based on a graph attention network for extracting the events for each event, and the frame diagram is shown in figure 3.

1) Text feature representation and extraction.

The invention uses the ERNIE pre-training model to code sentences, and the model adds Chinese corpora such as hundred degrees encyclopedia, hundred degrees bar, hundred degrees news and the like, and fuses a plurality of external knowledge, so that the effect of the model on Chinese NLP tasks is better. Generating word vector representation matrix X by ERNIE model coding ₁ 。

In order to obtain more semantic information of Chinese words, the invention carries out word segmentation and part-of-speech tagging on an input text, then marks the part-of-speech of each word according to BIO tagging rules, the tags comprise B-pos, I-pos and E-pos, the words consisting of single characters are expressed by S-pos, the pos refers to the part-of-speech of each word, and then the word is marked by randomInitializing and back-propagating to optimize to obtain trained part-of-speech vectors, obtaining part-of-speech embedded representations corresponding to each part of speech, and obtaining a part-of-speech embedded matrix X for sentence S ₂ 。

In addition, the invention carries out entity category labeling on the text according to BIO marking rules, then carries out random initialization and back propagation to optimize to obtain trained entity category vectors, obtains entity category embedded representation corresponding to each word, and obtains entity category embedded matrix X for sentence S ₃ 。

Finally, the three types of embedding are spliced together, and the text feature matrix X is obtained.

The invention uses Bi-LSTM model of two-way long-short term memory network to obtain the information from front to back and from back to front in sentences. For unidirectional LSTM, the calculation is as shown in equations (12) through (14).

h _t ＝o _t ⊙tanh(c _t ) (14)

Wherein W is a parameter matrix to be trained, b is a bias vector, sigma represents a sigmoid function, and congruent represents that dot multiplication operation is performed between vectors.

The output matrix of Bi-LSTM at time t is expressed as

Wherein (1)>

Representing the output of the forward GRU, +.>

Representing the output of the inverted GRU.

2) Construction of a syntactic dependency graph.

First, a dependency syntax analysis is performed on a sentence by using a DDParser to obtain a syntax dependency tree. Defining a syntax dependency tree as an undirected graph g= (V, E), where V is a collection of nodes, comprising two subsets V _c And V _w ，V _c Is a set of n characters, n is sentence length, V _w The method is characterized in that k word sets after word segmentation are used for representing each word by using a pre-trained word vector, and the dimension of the word set is the same as that of a character vector.

For edge set E, there are two words w of grammatical relation in the analysis result _i,j ＝S(c _i ,...,c _j ) And w _u,v ＝S(c _u ,...,c _v ) The invention sets the first and last character vectors of one word to establish the edge relation

It is also necessary to add an opposite edge to each grammar relationship edge in the vector representation that is connected to another word, and also add adjacent edges to adjacent characters and add a self-loop edge to all nodes.

For example, for a sample sentence that has been previously subject to dependency syntax analysis, the partial construction result of its syntax dependency graph is shown in fig. 4. The VOB relation is between 'defeat' and 'Merderivjeff', and 5 sides connected with the character of 'v' are respectively the relation between the character and the defeat of the word, the reverse relation side, the self-loop side and two sides connected with the adjacent characters.

3) The figure focuses on the network algorithm.

Two-word vector node v defining paths existing between any two words _i ,v _j The shortest path of (2) is p _ij The edges between any two adjacent term vector nodes are defined as (w _m ,w _m+1 ) Then p is _ij The calculation method of (2) is shown in the formula (15).

p _ij ＝[(v _i ,w ₁ ),(w ₁ ,w ₂ ),...,(w _n ,v _j )] (15)

Wherein w is _i Refers to word vector nodes.

After the shortest path between two nodes is obtained, the invention still uses a BiGRU network to fuse the characteristics of all nodes on the path, and the outputs of the front GRU and the rear GRU are respectively

And->

Splicing the two vectors together to obtain a fused characteristic vector h, namely the output +.f of BiGRU at the time t>

The h vector merges the features of all other nodes on the shortest path of the two nodes and takes the features as the surrounding nodes of each of the two nodes.

At this point, through the above calculation, the original syntax dependency graph g= (V, E) is extended, where V is the set of nodes, now containing three subsets V _c 、V _w And V _b ，V _c Is a set of n character vector nodes, n is sentence length, V _w Is the collection of k word vector nodes after word segmentation, V _b The size of the surrounding node set of each word vector node calculated by the shortest path algorithm is m. All node sets V that result in a syntactic dependency graph from the three subsets are summarized as shown in equation (16).

V＝{v ₁ ,v ₂ ,...,v _n ,v _n+1 ,...,v _n+k ,v _n+k+1 ,...,v _n+k+m } (16)

Taking characteristic nodes and relation edges of the syntactic dependency graph as m-th layer input of an N-order graph semantic neural network, wherein the graph semantic neural network is used for inputting characteristic v of each node in the graph _i Performing aggregation calculation to obtain aggregation characteristic v _i The' process is shown in equation (17).

is to calculate the weight coefficient of the kth attention, N _i Is node v _i All neighbor nodes v in a syntactic dependency graph _j σ is a nonlinear activation function.

Through the calculation, the invention obtains the output V 'set of the graph attention network layer, wherein the number of nodes in the V' set is n+k+m, but in the subsequent classification process, the k word vector nodes and m surrounding nodes do not need to be classified, so the k word vector nodes and m surrounding nodes are discarded, only the first n character nodes are left, and the first n character nodes are converted into matrix representation O.

4) Trigger words and argument classification.

And finally, carrying out joint extraction on the trigger words and the argument at a trigger word and argument identification layer, carrying out multi-classification tasks by using a BIO labeling method, inputting an output matrix O of the upper layer into a full-connection layer, and then connecting a softmax layer to carry out normalization operation on all types of vectors, thereby realizing event trigger word classification. The calculation process is shown in the formula (18) and the formula (19).

O'＝tanh(W _O O+b _O ) (18)

wherein n is _T Is the number of event types, n _c Representing the vector dimension size.

Through the above calculation, we get candidate trigger words, and then use the output matrix O' to perform argument classification on the entity list in the sentence. The plurality of word vectors contained in the trigger words are averaged and pooled to obtain the vector representation T of the candidate trigger words _i Then we turn T _i And other vectors E for each word _j Spliced and input into a fully connected network, and a softmax layer is connected to realize the meta classification, and the calculation process is shown as a formula (20).

n _A Is the number of argument types.

For ease of understanding, the invention presents a more specific example:

in a situation logical relation extraction task, aiming at the problems that an event boundary is fuzzy and difficult to determine, the method introduces external word information by using a softLexicon method on the basis of a character vector obtained by a pre-training model, and integrates the external word information into the character vector by constructing four word sets for each word, so that the semanteme of the character is enhanced. Meanwhile, aiming at the problem that feature selection is not comprehensive enough, the invention considers vocabulary level features with different granularities and sentence level features containing context semantic information, carries out vector representation on various features such as trigger word features, event sequence features and the like in the event, and calculates different attention weights for different features by constructing dependency relations among words by using an attention mechanism, thereby ensuring that the extracted features are more sufficient and being beneficial to improving the effect of extracting the logical relation of the matters.

In the event extraction task, aiming at the problem that a plurality of events are difficult to extract in the same sentence, and the problem that the correlation of the plurality of events in one sentence is difficult to obtain by a model due to the fact that only the characteristics of sentence sequences are considered during characteristic selection but the syntactic characteristics of the sentences are ignored, the invention converts a result syntax dependency tree of dependency syntactic analysis into a syntax dependency graph according to different methods of Chinese and English, the graph is input into a graph attention network, the syntactic structure characteristics can be learned, the graph attention network can perform characteristic aggregation on the syntax dependency graph, the shortest path between entities is found, the corresponding vectors are spliced to extract the characteristics, and finally the event trigger words, the arguments and the corresponding roles thereof are obtained through joint extraction.

The Att-GRCNN model is a situation logic relation extraction model provided by the invention, the invention uses three evaluation indexes to evaluate the effect of each reference model and Att-GRCNN model on causal relation extraction, and the results on the duee1.0 dataset and the CEC dataset are shown in table 1. The Att-GRCNN model achieves good results on both the duee1.0 dataset and the CEC dataset, and the experimental results of the benchmark model can be exceeded in most of the metrics. Wherein, the F1 index of the Att-GRCNN model on the CEC data set is improved by 2% compared with the GAN-BiGRU-CRF model with the best performance, and the F1 index on the DuEE1.0 data set is also improved by 0.3%.

TABLE 1 comparative experiments of Att-GRCNN model on causal relationship extraction

/>

Compared with 6 reference models, the Att-GRCNN model provided by the invention has the best performance on recall rate and F1 value, and the main reasons are as follows: firstly, the model uses a softLexicon method to introduce external word information, so that the semanteme of the character is enhanced. In addition, the model simultaneously considers vocabulary level features with different granularities and sentence level features containing context semantic information, and carries out vector representation on various features such as trigger word features, event sequence features and the like in the event, thereby enriching the semantic features of the event relationship. In addition, the model also builds the dependency relationship among words through the attention mechanism, and calculates different attention weights for different features, so that the extracted features are more sufficient, and the effect of extracting the rational logic relationship is improved.

The DEP-GAT model is a combined event extraction model provided by the invention, and because event extraction comprises two subtasks, the invention evaluates the performances of each model in two tasks of trigger word recognition and classification and argument recognition and classification respectively, and the results on an ACE-2005 English data set are shown in tables 2 and 3.

Table 2 contrast experiment of DEP-GAT model on trigger word recognition and classification

/>

Table 3 comparative experiments of DEP-GAT model on argument identification and Classification

As shown in tables 2 and 3, in the two tasks of trigger word recognition and classification and argument recognition and classification, the DEP-GAT model provided by the invention has good effects, and the experimental results of the reference model can be basically exceeded in most indexes. The F1 index of the DEP-GAT model in the trigger word recognition task is improved by nearly 2% compared with the HPNet model with the best performance, and is improved by 9.2% compared with the classical JRNN model. In addition, in the meta-recognition and classification tasks, the DEP-GAT model has a larger improvement effect compared with a reference model, and the improvement effect on the F1 value of the classical CNN-based DMCNN model is more remarkable compared with the improvement effect of the best-performing JMEE model on the F1 value by 3.2% and 5.4% respectively. In general, the effectiveness of the DEP-GAT model proposed by the invention can be clearly demonstrated by the experimental results in the two tables above.

The experimental results of the DEP-GAT model can also show that the event extraction method based on the joint mode has certain advantages compared with the event extraction method based on the pipeline mode to a certain extent. For the stagedMaxent model based on the two-stage pipeline method, although the effect of triggering word recognition and classification tasks in the first stage can also be achieved, in the argument recognition and classification tasks in the second stage, the recall rate is lower, and only 20.3% and 19.3% of recall rate can be seen, so that the error in the first stage propagates to the second stage, and the performance of the argument extraction task is seriously affected. Moreover, the experimental value of the DEP-GAT model is improved by 7.6% compared with that of a pipeline-based model such as DMCNN on the F1 value of the trigger word recognition task, and is also improved by 12.5% on the F1 value of the argument recognition task, so that the effectiveness of the DEP-GAT model based on the joint mode in event extraction tasks is proved. In addition to simply comparing the F1 values between multiple models, we can compare the differences of the models in the trigger word classification and argument classification tasks, respectively, for example, the F1 difference of the pipeline-based stagedmaxen model reaches 36.2%, and the F1 difference of the DMCNN model reaches 15.6%. However, based on the joint JRNN model, the F1 difference is 13.9%, the F1 difference of the JMEE model is 13.4%, the F1 difference of the DEP-GAT model provided by the invention is 13.4%, and the performance difference of the DEP-GAT model in two tasks is smaller, which indicates that the event extraction based on the joint mode can ease the propagation of some errors, because the mode does not use the trigger word extraction result when extracting the argument, and the trigger word recognition errors are not transmitted to the argument extraction task.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. A method for extracting a join event based on a rational logic, comprising:

2. The method according to claim 1, wherein the inputting the sentence into the coding layer to obtain the text feature matrix corresponding to the sentence output by the coding layer includes:

Splicing the vectors of the four word sets corresponding to a character into the BERT word vector corresponding to the character to obtain a new word vector representation matrix X ₁ ；

X is to be ₂ And X is ₁ Splicing to obtain final text feature matrix

3. The method of claim 1, wherein said inputting the text feature matrix into the feature extraction layer results in a global and local feature representation matrix output by the feature extraction layer, comprising:

after the maximum pooling operation is carried out on all words, the method is obtainedMatrix P, p= [ P ] ₁ ,p ₂ ,...,p _n ]，p _i The vector is obtained after the i-th word is subjected to the maximum pooling operation;

Inputting the text feature matrix into the bi-directional gating circulation unit of the feature extraction layer to obtain an output matrix H _GRU The method comprises the steps of carrying out a first treatment on the surface of the Wherein the bidirectional gating circulation unit consists of a forward GRU and a reverse GRU, the number of hidden units is set to be s, then H _GRU ∈R ^n×(2×s) ；H _GRU Each row of each word represents a two-way gating of each word the feature of sentence level extracted by the circulation unit;

matrix P and matrix H _l Spliced into an output matrix of the global attention layer, and outputting the global and local feature representation matrices

4. The method of claim 1, wherein the event relationship identification layer employs a conditional random field CRF model;

wherein A is a transfer score matrix,

the slave label l _i To label l _i+1 Transition probability of->

Indicating that the ith character is in label l _i A lower score;

5. The method according to claim 1, wherein the putting all the events in the event relation pair in the sentence into one set forms a text set, and inputting the text set into a joint event extraction model based on a graph attention network to obtain a corresponding extraction result of the sentence, includes:

taking characteristic nodes and relation edges of the syntactic dependency graph as m-th layer input of an N-order graph semantic neural network, wherein the graph semantic neural network is used for inputting characteristic v of each node in the graph _i Performing aggregate calculationsObtaining the polymerization characteristic v' _i The method comprises the steps of carrying out a first treatment on the surface of the Finally, an output V 'set of the graph attention network layer is obtained, wherein the number of nodes in the V' set is n+k+m;

6. The method of claim 5, wherein, in said representing the word vector by matrix X ₁ Part-of-speech embedding matrix X ₂ Entity class embedding matrix X ₃ Before the text feature matrix X is obtained by splicing together, the method further includes:

7. The method of claim 6 wherein the graph-attention-network-based joint event extraction model performs word segmentation and part-of-speech tagging on event text in an input sentence to finally obtain a part-of-speech embedding matrix X corresponding to the sentence S ₂ Comprising:

8. The method of claim 5, wherein the extending the syntax dependency graph is implemented by:

The features of all nodes on the shortest path of two word vector nodes are fused by adopting a BiGRU network, and the outputs of the front GRU and the rear GRU are respectively

And->

Will->

And->

Taking the two nodes as surrounding nodes of the two nodes respectively;

9. The method of claim 5, wherein the graph annotation force network characterizes v for each node in a syntactic dependency graph _i Performing polymerization calculation to obtain polymerization characteristics v' _i The calculation method of (2) is shown in the following formula:

is to calculate the weight coefficient of the kth attention, N _i Is node v _i All neighbor nodes v in a syntactic dependency graph _j σ is a nonlinear activation function;

10. The method of claim 5, wherein event-triggered word classification is implemented using the formula:

O'＝tanh(W _O O+b _O )

where nT is the number of event types, n _c Representing vector dimension size;

the meta classification is achieved by the following formula:

n _A Is the number of argument types.