CN117236436A

CN117236436A - Cross-sentence multi-layer bidirectional network event detection method based on external knowledge

Info

Publication number: CN117236436A
Application number: CN202311529807.XA
Authority: CN
Inventors: 谢文; 陈欣儿; 吕明翰; 肖聪; 王明文; 罗文兵; 黄琪
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2023-12-15

Abstract

The application discloses a method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge, which comprises the following steps: using a data set disclosed in the event detection field; the semantic encoder is used as an entrance of a multi-layer bidirectional network model, and the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure; the obtained feature vector is directly input into a multi-layer bidirectional network structure to obtain a predicted event type label vector, and the output of a bidirectional decoder in the last layer bidirectional network structure is the final output; obtaining a predicted probability distribution through linear change; finally, event type classification is obtained. The beneficial effects of the application are as follows: the reasoning and understanding capability of the cross-sentence multi-layer bidirectional network event detection method to the context is enhanced, and event types with few samples are classified efficiently; the semantic relation in the text data is better captured, interaction between document-level semantics and events is realized, and the problem of missing of current document-level information is well solved.

Description

Cross-sentence multi-layer bidirectional network event detection method based on external knowledge

Technical Field

The application relates to the field of event detection, in particular to a method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge.

Background

Event detection is one of the key tasks of event extraction, and is aimed at identifying event triggers from given texts and classifying the event triggers into predefined event types, and plays an important role in a plurality of application fields such as information retrieval, knowledge graph construction, emotion analysis and the like. Since the beginning of the 60 s of the 20 th century, early researchers relied primarily on cumbersome manually written rules and pattern matching methods. With the advent of statistical natural language processing, researchers began to process event detection tasks using traditional machine learning algorithms, such as maximum entropy, support vector machines, and hidden markov models. These methods typically require a large number of manual labeling and rule definition, and therefore require a significant amount of time and effort in adapting to new fields or languages.

In recent years, due to the strong feature representation extraction capability of the deep neural network, the deep neural network model is widely used in event detection tasks, so that the event detection performance is improved, and the deep neural network model has strong generalization capability. The deep neural network takes text vector expression as input, obtains richer information expression by imitating the process of human neuron information transmission, generally processes an event detection task into a classification problem for the event detection task, sends the text vector into a neural network model, and carries out classification judgment on the expression of each word through multiple times of network coding to determine whether the word is a trigger word or not and identify the type of the event.

Early deep learning mainly researches sentence-level event detection, ignores document information, or models document-level semantics and event interdependence information respectively, and most of the existing event detection methods consider that each sentence is mutually independent. An independent sentence is only a part of a document and only a part of information of an event is expressed sometimes, so that most researches neglect the interdependence and semantic information of the document-level event, and how to effectively mine the internal relation between different semantic units have very important significance.

The trigger word is used as the core of event detection, and can most express the occurrence of an event, and the accurate identification and classification of the trigger word directly affect the performance of event detection. In the event detection field, trigger words exist in multiple forms such as multiple words, multiple meanings and the like, and corresponding event types belong to different contexts. Thus, the meaning of the trigger word is determined from different contexts, which directly affects the performance of the event detection model. At the same time, one problem that event detection is often faced with is data sparsity, i.e., some events occur very infrequently, which also creates an imbalance in the distribution of event types in the data set, which traditional methods have difficulty dealing with, which may not be able to capture complex data relationships and patterns, as they often use linear models or statistical-based methods. These methods have limited performance in processing highly non-linear sparse data. In addition, we observe that sentences containing events are typically very short, and that the information obtained from the sentences themselves may be limited, which also weakens the learning process of the sentences in the encoder.

Disclosure of Invention

In order to solve the problems, the application provides a cross-sentence multi-layer bidirectional network event detection method based on external knowledge, which focuses on sentence-level event detection and forms the event detection into a sequence labeling task, thus providing more information and flexibility than a common word classification task.

The technical scheme adopted by the application is as follows: a method for detecting cross-sentence multi-layer bidirectional network event based on external knowledge comprises the following steps:

step S1, using a data set disclosed in the event detection field, wherein the data set comprises a document and an event type, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise sentences and trigger words;

step S2, formalized definition of event detection tasks;

given a document containing N sentences，S ₁ Representing the first sentence, S ₂ Representing a second sentence, S _N Represents an nth sentence; />，/>Representing word 1 in sentence S, < ->Representing word 2 in sentence S, < ->Representing the Z-th word in sentence S;

predicted event type tag vector，/>Respectively represent 1 st word +.>Word 2->Z-th word->A corresponding predicted event type label;

step S3, the semantic encoder is used as an inlet of a multi-layer bidirectional network model, the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure, and the input of the semantic encoder is as follows:

；

wherein,representing the input sequence of sentence S after data preprocessing, [ CLS ]]And [ SEP ]]Two markers in the English pre-training model BERT are represented, < >>Indicates the name in tag 1 in the dataset,/->Representing the name in the mth tag in the data set, wherein m is more than or equal to 1 and less than or equal to 8; [ CLS ]]Representing the beginning of the input sequence of sentence S after data preprocessing, [ SEP ]]Delimiters representing different parts of the input sequence obtained by preprocessing the sentence S;

the names in 8 labels in the data set are used as external knowledge, so that the attention mechanism in the English pre-training model BERT is enhanced, and the training is performed in a supervised learning mode; input sequence obtained by preprocessing sentence S through dataInputting into semantic encoder, obtaining sentence +.>Is a feature vector of (1);

step S4, inputting the obtained feature vector into a multi-layer bidirectional network structure to obtain a predicted event type label vector, wherein the multi-layer bidirectional network structure consists of a plurality of bidirectional decoders and information aggregation layers, each layer bidirectional network structure consists of one bidirectional decoder and one information aggregation layer, and the bidirectional decoder of each layer is input into the information aggregation layer;

step S5, the predicted event type label vector output by the two-way decoder of the previous layer is passed through a long-term and short-term memory network in the information aggregation layer to obtain the last stateThe information is summarized and transmitted to the next layer of bidirectional network structure, and the output of the bidirectional decoder in the last layer of bidirectional network structure is the final output;

step S6, finally outputting a predicted event type label vector, and obtaining predicted probability distribution in a label prediction layer through linear change and a softmax function; calculating losses by weighted cross entropy loss functionsAnd optimizing and updating parameters of the semantic encoder and the multi-layer bidirectional network structure, and finally obtaining event type classification.

Further, in step S1, the data set is an ACE2005 data set, where the ACE2005 data set is various types of data consisting of entities, relationships and event comments issued by the language data alliance, including english, arabic and chinese training data, the ACE2005 data set is selected from the english data set, the ACE2005 data set includes 599 documents and 33 event types, and is divided into 529 training sets, 30 verification sets and 40 test sets, and the training sets include 12426 sentences, 4214 trigger words; the verification set comprises 777 sentences and 483 trigger words; the test set comprises 667 sentences and 422 trigger words; every 8 sentences in the documents of the ACE2005 data set are cut into a new document, the rest sentences after cutting are repeatedly filled, and therefore the condition of 8 sentences of the new document is met.

Further, in step 2, the predicted event type label vectorAnd (3) expanding in a sequence labeling task form, and defining 67 event types, namely 33 types of B and I, and 1 type of non-trigger word NONE.

Further, the semantic encoder in step S3 is composed of an english pretraining model BERT, a two-way long-short-term memory network and an attention mechanism, and the specific contents are:

the English pre-training model BERT is a deep neural network model based on a transducer architecture and consists of bidirectional encoders in a multi-layer transducer architecture;

input sequence obtained by preprocessing sentence S through dataInputting into English pre-training model BERT to obtain sentenceThe calculation process is shown in the formula (1);

（1）；

wherein,respectively represent 1 st word +.>Word 2->Z-th word->The 1 st feature vector, the 2 nd feature vector and the Z-th feature vector are obtained through an English pre-training model BERT;

in the formula (1), the interception isTo the first->Part of (a) sentence->Feature vector +.>；

Two-way long-short term memory network: inputting the feature vector obtained by the English pre-training model BERT into a two-way long-short-term memory network to obtain an output vector, wherein the calculation process is shown in a formula (2);

（2）；

wherein,representing the i-th feature vector +.>Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>An ith feature vector obtained by an English pre-training model BERT,，/>output vector representing forward long and short term memory network, < >>Output vector representing backward long-short term memory network, < >>Representing a cascading operation;

attention mechanism: inputting an output vector of the two-way long-short-term memory network to obtain semantic representation, wherein the calculation process is shown in a formula (3), a formula (4) and a formula (5);

（3）；

（4）；

（5）；

wherein,representing the i-th word +.>And j-th word in sentence S +.>Attention score of->，/>Representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent a trainable weight matrix and a bias vector in the attention mechanism, +.>Represents the j obtained by English pre-training model BERTFeature vector->Output vector of bidirectional long-short-term memory network, +.>Representing the ith word in sentence SAnd j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word +.>Attention score of->Representing the i-th word +.>Is a final semantic representation of (1);

sentence obtaining by means of attention mechanismFinal semantic representation vector；

Representing word 1 +.>Final semantic representation,/->Representing word 2 +.>Final semantic representation,/->Representing the Z-th word +.>Is used to determine the final semantic representation of (a).

Further, the bidirectional decoder in step S4 is composed of a forward long-short-term memory network and a backward long-short-term memory network, and the specific contents are as follows:

the bi-directional decoder generates event type label vector corresponding to the word and finally semantically represents the sentence S to the vectorObtaining a predicted event type tag vector +_ by a bi-directional decoder>The specific calculation is shown as a formula (6), a formula (7), a formula (8) and a formula (9);

（6）；

（7）；

（8）；

（9）；

wherein,and->Respectively representing the states of a forward long-term memory network and a backward long-term memory network, +.>Representing the i-th word +.>Forward label vector of->Representing the i-th word +.>Is used to determine the backward tag vector of (1),and->Respectively representing a forward long-term memory network and a backward long-term memory network, < + >>Represents the i-1 th word +.>Forward label vector of->Representing the status of the forward long-short term memory network at the previous moment, < >>Representing the transfer function tahn,/for deriving event tag vectors>And b represents a trainable weight matrix and a bias vector in the bi-directional decoder, respectively, +.>Represents the i+1th word +.in sentence S>Is (are) backward tag vector(s)>Representing the state of the backward long-short-term memory network at the later moment; finally predicted event type tag vector>And (3) representing.

Further, the information aggregation layer in step S5 is formed by a single long-short-period memory network layer, and the specific contents are as follows:

selecting the last state of a long-short-term memory networkAs summary information, calculate as formula (10);

（10）；

wherein,representing a long-short-term memory network->、/>Respectively representing the ith state and the (i-1) th state of the long-short-period memory network, +.>Representing the i-th word +.>Event tag vectors of (2);

the summary information of sentence S is=/>，/>Indicating the last state of the long-short-term memory network.

Further, the multi-layer bidirectional network structure is formed by combining a bidirectional decoder and an information aggregation layer, the information aggregation layer in the bidirectional network structure aggregates the information of neighbor sentences into the bidirectional decoder of the next layer bidirectional network structure, and propagates the information among sentences, the related information of the sentences S is stored in a plurality of adjacent sentences, a plurality of bidirectional long-short-term memory network decoders are introduced to stack, and the semantic information of the neighbor sentences is aggregated in an iterative mode;

for the t-layer bidirectional decoding layer, outputting calculation such as formula (11), formula (12), formula (13) and formula (14);

（11）；

（12）；

（13）；

（14）；

wherein,representing the upper layer, i.e. t-1 layer, immediately preceding sentence +.>Information of->Representing the upper layer, i.e. t-1 layer, followed by a sentence, i.e +.>Information of (2);

ith word in sentence SThe corresponding predicted event tag vector is denoted +.>；

As the number of layers of the bi-directional network structure increases, the bi-directional decoder will capture information of more distant sentences, at layer t the bi-directional decoder,sentence S captures +.>The two-way decoder and the information aggregation layer encode and propagate the same information to set the parameters of different layers the same.

Further, the label prediction layer in step S6 specifically includes:

ith word in sentence SIs defined as +.>，/>Maximum number of layers；/>Maximum number of layers->The method comprises the steps of carrying out a first treatment on the surface of the t represents the number of layers->Representing the i-th word +.>Event tag vector at layer t, +.>，/>Representing weight decay parameters;

will be the ith word in sentence SFinal event tag vector +.>The label probability distribution is obtained by inputting through linear change and then through a softmax function, and the calculation process is shown as a formula (15) and a formula (16);

（15）；

（16）；

wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>And->Respectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the i-th word +.>Probability of event type k, +.>Representing the i-th word +.>Score belonging to event type k, < >>Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;

introducing weighted cross entropy loss function to calculate lossLet the multi-layer bi-directional network model pay more attention to the event type with less training set samples, loss function +.>The calculation process formula (17) of (2) is shown;

（17）；

where d represents a new document composed of every 8 sentences,weights representing the kth event type, +.>Representing the ith in sentence SPersonal word->Is (are) true tags->Representing the i-th word +.>Is the probability of event type k.

The beneficial effects of the application are as follows:

(1) The application uses the name of the predefined type in the ACE2005 data set as external knowledge, enhances the reasoning and understanding capability of the cross-sentence multi-layer bidirectional network event detection method to the context, and the additional information is used for well assisting the cross-sentence multi-layer bidirectional network event detection method to classify the event types with few samples.

(2) According to the application, a powerful English pre-training model BERT is used as a part of a semantic encoder, so that the language understanding and characterization capability of the cross-sentence multi-layer bidirectional network event detection method is greatly improved, the semantic relation in text data is better captured, the multi-layer bidirectional network structure brings neighbor sentence information to an input sentence, interaction between document-level semantics and events is realized, and the problem of current document-level information deletion is well solved.

(3) Aiming at the problem of serious unbalance of ACE2005 data set event type distribution, the application introduces a weighted cross entropy loss function, so that the cross-sentence multi-layer bidirectional network event detection method is more concerned with the type with less sample number, and different weights are allowed to be distributed for different categories so as to balance the sample number difference among the different categories. The method is beneficial to improving the performance of the cross-sentence multi-layer bidirectional network event detection method on an unbalanced data set, and ensures that the cross-sentence multi-layer bidirectional network event detection method can perform proper learning on each category without being excessively influenced by the number of samples.

Drawings

FIG. 1 is a diagram of an overall model framework of the present application.

Detailed Description

As shown in fig. 1, firstly, adding names of predefined types in an ACE data set as external knowledge after inputting sentences in an input sentence to solve the problems of data sparsity and unbalanced distribution of event types in the data set, and effectively improving the defect of limited information of sentences; then adding the English pre-training model BERT into a semantic encoder to help the external knowledge-based cross-sentence multi-layer bidirectional network event detection model to better understand the context, semantic information and relation of the text, wherein after the external knowledge is introduced, the semantic understanding capability of the external knowledge-based cross-sentence multi-layer bidirectional network event detection model can be expanded, so that the processing capability of the phenomenon that the distribution of the event types in a data set is uneven and the occurrence frequency of certain specific event types is low is improved; and secondly, introducing a multi-layer two-way long-short-term memory network to simultaneously capture interaction between the event and the semantic information, so that the semantic information of front and rear neighbor sentences in the document can be obtained by sentences, better semantic supplementation can be obtained, and the problem of document-level information deletion can be solved. The method comprises the steps of summarizing single sentence information by utilizing an information aggregation layer, wherein the single sentence information is used as semantic information of neighbor sentences captured by a multi-layer two-way long-short-term memory network; finally, the label vector of each word in the sentence is obtained, and then the label vector is input into a softmax function through simple linear transformation to obtain the final predicted label probability.

The application works and implements in this way, a cross sentence multi-layer two-way network event detection method based on the external knowledge, characterized by: the method comprises the following steps:

step S2, formalized definition of event detection tasks;

given a document containing N sentences，S ₁ Representing the first sentence, S ₂ Representing a second sentence, S _N Representation ofAn nth sentence; />，/>Representing word 1 in sentence S, < ->Representing word 2 in sentence S, < ->Representing the Z-th word in sentence S;

；

input sequence obtained by preprocessing sentence S through dataInputting into English pre-training model BERT to obtain sentenceThe calculation process is as shown in formula (1)Showing;

（1）；

（2）；

wherein,representation of BER of English pre-training modelT derived i-th eigenvector->Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>An ith feature vector obtained by an English pre-training model BERT,，/>output vector representing forward long and short term memory network, < >>Output vector representing backward long-short term memory network, < >>Representing a cascading operation;

（3）；

（4）；

（5）；

wherein,representing the i-th word +.>And j-th word in sentence S +.>Is used to determine the attention score of (a),，/>representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent a trainable weight matrix and a bias vector in the attention mechanism, +.>Represents the j-th eigenvector ++obtained by English pre-training model BERT>Output vector of bidirectional long-short-term memory network, +.>Representing the i-th word +.>And j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word +.>Attention score of->Representing the i-th word +.>Is a final semantic representation of (1);

（6）；

（7）；

（8）；

（9）；

（10）；

（11）；

（12）；

（13）；

（14）；

Further, the label prediction layer in step S6 specifically includes:

（15）；

（16）；

wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>And->Respectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the i-th word +.>Probability of event type k, +.>Representing the i-th word +.>Belonging to the thingScore for part type k->Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;

（17）；

where d represents a new document composed of every 8 sentences,weights representing the kth event type, +.>Representing the i-th word +.>Is (are) true tags->Representing the i-th word +.>Is the probability of event type k.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge is characterized by comprising the following steps: the method comprises the following steps:

step S2, formalized definition of event detection tasks;

；

input sequence obtained by preprocessing sentence S through dataInputting into semantic encoder, obtaining sentencesIs a feature vector of (1);

s4, inputting the obtained feature vector into a multi-layer bidirectional network structure to obtain a predicted event type label vector, wherein the multi-layer bidirectional network structure consists of a plurality of bidirectional decoders and an information aggregation layer, each layer bidirectional network structure consists of one bidirectional decoder and one information aggregation layer, and the output of the bidirectional decoder of each layer is input into the information aggregation layer;

step S5, the predicted event type label vector output by the two-way decoder of the upper layer is passed through a long-term and short-term memory network in the information aggregation layer to obtain the last stateThe information is summarized and transmitted to the next layer of bidirectional network structure, and the output of the bidirectional decoder in the last layer of bidirectional network structure is the final output;

2. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 1, wherein the method comprises the following steps: in step S1, the data set is an ACE2005 data set, the ACE2005 data set is selected from english data sets, every 8 sentences in the documents of the ACE2005 data set are cut into a new document, the remaining sentences after cutting, and the last sentence is repeatedly filled, thereby satisfying the condition of 8 sentences of a new document.

3. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 2, wherein the method comprises the following steps:

in step 2, predicted event type tag vectorAnd (3) expanding in a sequence labeling task form, and defining 67 event types, namely 33 types of B and I, and 1 type of non-trigger word NONE.

4. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 3, wherein the method comprises the following steps:

the semantic encoder in the step S3 consists of an English pre-training model BERT, a two-way long-short-term memory network and an attention mechanism, and comprises the following specific contents:

（1）；

wherein,respectively represent 1 st word +.>Word 2->Z-th word->Obtained by English pre-training model BERTThe 1 st feature vector, the 2 nd feature vector and the Z-th feature vector;

（2）；

wherein,representing the i-th feature vector +.>Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>The ith feature vector obtained by English pre-training model BERT>，/>Output vector representing forward long and short term memory network, < >>An output vector representing a backward long and short term memory network,representing cascading operations

（3）；

（4）；

（5）；

wherein,representing the i-th word +.>And j-th word in sentence S +.>Attention score of->，/>Representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent attentionWeight matrix and bias vector trainable in mechanism, < ->Represents the j-th eigenvector ++obtained by English pre-training model BERT>Output vector of bidirectional long-short-term memory network, +.>Representing the i-th word +.>And j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word in sentence SAttention score of->Representing the i-th word +.>Is a final semantic representation of (1);

sentence obtaining by means of attention mechanismFinal semantic representation vector->；

5. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 4, wherein the method comprises the following steps:

the bidirectional decoder in step S4 is composed of a forward long-short-term memory network and a backward long-short-term memory network, and the specific contents are as follows:

（6）；

（7）；

（8）；

（9）；

wherein,and->Respectively representing the states of a forward long-term memory network and a backward long-term memory network, +.>Representing the i-th word +.>Forward label vector of->Representing the i-th word +.>Is (are) backward tag vector(s)>Andrespectively representing a forward long-term memory network and a backward long-term memory network, < + >>Represents the i-1 th word +.>Forward label vector of->Representing the status of the forward long-short term memory network at the previous moment, < >>Representing the transfer function tahn,/for deriving event tag vectors>And b represents a trainable weight matrix and a bias vector in the bi-directional decoder, respectively, +.>Represents the i+1th word +.in sentence S>Is (are) backward tag vector(s)>Representing the state of the backward long-short-term memory network at the later moment; finally predicted event type tag vector>And (3) representing.

6. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 5, wherein the method comprises the following steps:

the information aggregation layer in the step S5 is composed of an independent long-period and short-period memory network layer, and comprises the following specific contents:

（10）；

7. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 6, wherein the method comprises the following steps:

the multi-layer bidirectional network structure is formed by combining a bidirectional decoder and an information aggregation layer, the information aggregation layer in the bidirectional network structure aggregates the information of neighbor sentences into the bidirectional decoder of the next layer bidirectional network structure, and propagates the information among sentences, the related information of the sentences S is stored in a plurality of adjacent sentences, a plurality of bidirectional long-short-term memory network decoders are introduced to stack, and the semantic information of the neighbor sentences is aggregated in an iterative mode;

（11）；

（12）；

（13）；

（14）；

8. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 7, wherein the method comprises the following steps:

the label prediction layer in step S6 specifically comprises the following steps:

ith word in sentence SIs defined as +.>，/>Maximum number of layers->；/>Maximum number of layers->The method comprises the steps of carrying out a first treatment on the surface of the t represents the number of layers,/>representing the i-th word +.>Event tag vector at layer t, +.>，/>Representing weight decay parameters;

（15）；

（16）；

wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>Andrespectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the ith word in sentence SProbability of event type k, +.>Representing the i-th word +.>Score belonging to event type k, < >>Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;

（17）；