CN117236436A - Cross-sentence multi-layer bidirectional network event detection method based on external knowledge - Google Patents

Cross-sentence multi-layer bidirectional network event detection method based on external knowledge Download PDF

Info

Publication number
CN117236436A
CN117236436A CN202311529807.XA CN202311529807A CN117236436A CN 117236436 A CN117236436 A CN 117236436A CN 202311529807 A CN202311529807 A CN 202311529807A CN 117236436 A CN117236436 A CN 117236436A
Authority
CN
China
Prior art keywords
representing
sentence
word
layer
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311529807.XA
Other languages
Chinese (zh)
Inventor
谢文
陈欣儿
吕明翰
肖聪
王明文
罗文兵
黄琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202311529807.XA priority Critical patent/CN117236436A/en
Publication of CN117236436A publication Critical patent/CN117236436A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge, which comprises the following steps: using a data set disclosed in the event detection field; the semantic encoder is used as an entrance of a multi-layer bidirectional network model, and the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure; the obtained feature vector is directly input into a multi-layer bidirectional network structure to obtain a predicted event type label vector, and the output of a bidirectional decoder in the last layer bidirectional network structure is the final output; obtaining a predicted probability distribution through linear change; finally, event type classification is obtained. The beneficial effects of the application are as follows: the reasoning and understanding capability of the cross-sentence multi-layer bidirectional network event detection method to the context is enhanced, and event types with few samples are classified efficiently; the semantic relation in the text data is better captured, interaction between document-level semantics and events is realized, and the problem of missing of current document-level information is well solved.

Description

Cross-sentence multi-layer bidirectional network event detection method based on external knowledge
Technical Field
The application relates to the field of event detection, in particular to a method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge.
Background
Event detection is one of the key tasks of event extraction, and is aimed at identifying event triggers from given texts and classifying the event triggers into predefined event types, and plays an important role in a plurality of application fields such as information retrieval, knowledge graph construction, emotion analysis and the like. Since the beginning of the 60 s of the 20 th century, early researchers relied primarily on cumbersome manually written rules and pattern matching methods. With the advent of statistical natural language processing, researchers began to process event detection tasks using traditional machine learning algorithms, such as maximum entropy, support vector machines, and hidden markov models. These methods typically require a large number of manual labeling and rule definition, and therefore require a significant amount of time and effort in adapting to new fields or languages.
In recent years, due to the strong feature representation extraction capability of the deep neural network, the deep neural network model is widely used in event detection tasks, so that the event detection performance is improved, and the deep neural network model has strong generalization capability. The deep neural network takes text vector expression as input, obtains richer information expression by imitating the process of human neuron information transmission, generally processes an event detection task into a classification problem for the event detection task, sends the text vector into a neural network model, and carries out classification judgment on the expression of each word through multiple times of network coding to determine whether the word is a trigger word or not and identify the type of the event.
Early deep learning mainly researches sentence-level event detection, ignores document information, or models document-level semantics and event interdependence information respectively, and most of the existing event detection methods consider that each sentence is mutually independent. An independent sentence is only a part of a document and only a part of information of an event is expressed sometimes, so that most researches neglect the interdependence and semantic information of the document-level event, and how to effectively mine the internal relation between different semantic units have very important significance.
The trigger word is used as the core of event detection, and can most express the occurrence of an event, and the accurate identification and classification of the trigger word directly affect the performance of event detection. In the event detection field, trigger words exist in multiple forms such as multiple words, multiple meanings and the like, and corresponding event types belong to different contexts. Thus, the meaning of the trigger word is determined from different contexts, which directly affects the performance of the event detection model. At the same time, one problem that event detection is often faced with is data sparsity, i.e., some events occur very infrequently, which also creates an imbalance in the distribution of event types in the data set, which traditional methods have difficulty dealing with, which may not be able to capture complex data relationships and patterns, as they often use linear models or statistical-based methods. These methods have limited performance in processing highly non-linear sparse data. In addition, we observe that sentences containing events are typically very short, and that the information obtained from the sentences themselves may be limited, which also weakens the learning process of the sentences in the encoder.
Disclosure of Invention
In order to solve the problems, the application provides a cross-sentence multi-layer bidirectional network event detection method based on external knowledge, which focuses on sentence-level event detection and forms the event detection into a sequence labeling task, thus providing more information and flexibility than a common word classification task.
The technical scheme adopted by the application is as follows: a method for detecting cross-sentence multi-layer bidirectional network event based on external knowledge comprises the following steps:
step S1, using a data set disclosed in the event detection field, wherein the data set comprises a document and an event type, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise sentences and trigger words;
step S2, formalized definition of event detection tasks;
given a document containing N sentences,S 1 Representing the first sentence, S 2 Representing a second sentence, S N Represents an nth sentence; />,/>Representing word 1 in sentence S, < ->Representing word 2 in sentence S, < ->Representing the Z-th word in sentence S;
predicted event type tag vector,/>Respectively represent 1 st word +.>Word 2->Z-th word->A corresponding predicted event type label;
step S3, the semantic encoder is used as an inlet of a multi-layer bidirectional network model, the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure, and the input of the semantic encoder is as follows:
wherein,representing the input sequence of sentence S after data preprocessing, [ CLS ]]And [ SEP ]]Two markers in the English pre-training model BERT are represented, < >>Indicates the name in tag 1 in the dataset,/->Representing the name in the mth tag in the data set, wherein m is more than or equal to 1 and less than or equal to 8; [ CLS ]]Representing the beginning of the input sequence of sentence S after data preprocessing, [ SEP ]]Delimiters representing different parts of the input sequence obtained by preprocessing the sentence S;
the names in 8 labels in the data set are used as external knowledge, so that the attention mechanism in the English pre-training model BERT is enhanced, and the training is performed in a supervised learning mode; input sequence obtained by preprocessing sentence S through dataInputting into semantic encoder, obtaining sentence +.>Is a feature vector of (1);
step S4, inputting the obtained feature vector into a multi-layer bidirectional network structure to obtain a predicted event type label vector, wherein the multi-layer bidirectional network structure consists of a plurality of bidirectional decoders and information aggregation layers, each layer bidirectional network structure consists of one bidirectional decoder and one information aggregation layer, and the bidirectional decoder of each layer is input into the information aggregation layer;
step S5, the predicted event type label vector output by the two-way decoder of the previous layer is passed through a long-term and short-term memory network in the information aggregation layer to obtain the last stateThe information is summarized and transmitted to the next layer of bidirectional network structure, and the output of the bidirectional decoder in the last layer of bidirectional network structure is the final output;
step S6, finally outputting a predicted event type label vector, and obtaining predicted probability distribution in a label prediction layer through linear change and a softmax function; calculating losses by weighted cross entropy loss functionsAnd optimizing and updating parameters of the semantic encoder and the multi-layer bidirectional network structure, and finally obtaining event type classification.
Further, in step S1, the data set is an ACE2005 data set, where the ACE2005 data set is various types of data consisting of entities, relationships and event comments issued by the language data alliance, including english, arabic and chinese training data, the ACE2005 data set is selected from the english data set, the ACE2005 data set includes 599 documents and 33 event types, and is divided into 529 training sets, 30 verification sets and 40 test sets, and the training sets include 12426 sentences, 4214 trigger words; the verification set comprises 777 sentences and 483 trigger words; the test set comprises 667 sentences and 422 trigger words; every 8 sentences in the documents of the ACE2005 data set are cut into a new document, the rest sentences after cutting are repeatedly filled, and therefore the condition of 8 sentences of the new document is met.
Further, in step 2, the predicted event type label vectorAnd (3) expanding in a sequence labeling task form, and defining 67 event types, namely 33 types of B and I, and 1 type of non-trigger word NONE.
Further, the semantic encoder in step S3 is composed of an english pretraining model BERT, a two-way long-short-term memory network and an attention mechanism, and the specific contents are:
the English pre-training model BERT is a deep neural network model based on a transducer architecture and consists of bidirectional encoders in a multi-layer transducer architecture;
input sequence obtained by preprocessing sentence S through dataInputting into English pre-training model BERT to obtain sentenceThe calculation process is shown in the formula (1);
(1);
wherein,respectively represent 1 st word +.>Word 2->Z-th word->The 1 st feature vector, the 2 nd feature vector and the Z-th feature vector are obtained through an English pre-training model BERT;
in the formula (1), the interception isTo the first->Part of (a) sentence->Feature vector +.>
Two-way long-short term memory network: inputting the feature vector obtained by the English pre-training model BERT into a two-way long-short-term memory network to obtain an output vector, wherein the calculation process is shown in a formula (2);
(2);
wherein,representing the i-th feature vector +.>Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>An ith feature vector obtained by an English pre-training model BERT,,/>output vector representing forward long and short term memory network, < >>Output vector representing backward long-short term memory network, < >>Representing a cascading operation;
attention mechanism: inputting an output vector of the two-way long-short-term memory network to obtain semantic representation, wherein the calculation process is shown in a formula (3), a formula (4) and a formula (5);
(3);
(4);
(5);
wherein,representing the i-th word +.>And j-th word in sentence S +.>Attention score of->,/>Representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent a trainable weight matrix and a bias vector in the attention mechanism, +.>Represents the j obtained by English pre-training model BERTFeature vector->Output vector of bidirectional long-short-term memory network, +.>Representing the ith word in sentence SAnd j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word +.>Attention score of->Representing the i-th word +.>Is a final semantic representation of (1);
sentence obtaining by means of attention mechanismFinal semantic representation vector
Representing word 1 +.>Final semantic representation,/->Representing word 2 +.>Final semantic representation,/->Representing the Z-th word +.>Is used to determine the final semantic representation of (a).
Further, the bidirectional decoder in step S4 is composed of a forward long-short-term memory network and a backward long-short-term memory network, and the specific contents are as follows:
the bi-directional decoder generates event type label vector corresponding to the word and finally semantically represents the sentence S to the vectorObtaining a predicted event type tag vector +_ by a bi-directional decoder>The specific calculation is shown as a formula (6), a formula (7), a formula (8) and a formula (9);
(6);
(7);
(8);
(9);
wherein,and->Respectively representing the states of a forward long-term memory network and a backward long-term memory network, +.>Representing the i-th word +.>Forward label vector of->Representing the i-th word +.>Is used to determine the backward tag vector of (1),and->Respectively representing a forward long-term memory network and a backward long-term memory network, < + >>Represents the i-1 th word +.>Forward label vector of->Representing the status of the forward long-short term memory network at the previous moment, < >>Representing the transfer function tahn,/for deriving event tag vectors>And b represents a trainable weight matrix and a bias vector in the bi-directional decoder, respectively, +.>Represents the i+1th word +.in sentence S>Is (are) backward tag vector(s)>Representing the state of the backward long-short-term memory network at the later moment; finally predicted event type tag vector>And (3) representing.
Further, the information aggregation layer in step S5 is formed by a single long-short-period memory network layer, and the specific contents are as follows:
selecting the last state of a long-short-term memory networkAs summary information, calculate as formula (10);
(10);
wherein,representing a long-short-term memory network->、/>Respectively representing the ith state and the (i-1) th state of the long-short-period memory network, +.>Representing the i-th word +.>Event tag vectors of (2);
the summary information of sentence S is=/>,/>Indicating the last state of the long-short-term memory network.
Further, the multi-layer bidirectional network structure is formed by combining a bidirectional decoder and an information aggregation layer, the information aggregation layer in the bidirectional network structure aggregates the information of neighbor sentences into the bidirectional decoder of the next layer bidirectional network structure, and propagates the information among sentences, the related information of the sentences S is stored in a plurality of adjacent sentences, a plurality of bidirectional long-short-term memory network decoders are introduced to stack, and the semantic information of the neighbor sentences is aggregated in an iterative mode;
for the t-layer bidirectional decoding layer, outputting calculation such as formula (11), formula (12), formula (13) and formula (14);
(11);
(12);
(13);
(14);
wherein,representing the upper layer, i.e. t-1 layer, immediately preceding sentence +.>Information of->Representing the upper layer, i.e. t-1 layer, followed by a sentence, i.e +.>Information of (2);
ith word in sentence SThe corresponding predicted event tag vector is denoted +.>
As the number of layers of the bi-directional network structure increases, the bi-directional decoder will capture information of more distant sentences, at layer t the bi-directional decoder,sentence S captures +.>The two-way decoder and the information aggregation layer encode and propagate the same information to set the parameters of different layers the same.
Further, the label prediction layer in step S6 specifically includes:
ith word in sentence SIs defined as +.>,/>Maximum number of layers;/>Maximum number of layers->The method comprises the steps of carrying out a first treatment on the surface of the t represents the number of layers->Representing the i-th word +.>Event tag vector at layer t, +.>,/>Representing weight decay parameters;
will be the ith word in sentence SFinal event tag vector +.>The label probability distribution is obtained by inputting through linear change and then through a softmax function, and the calculation process is shown as a formula (15) and a formula (16);
(15);
(16);
wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>And->Respectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the i-th word +.>Probability of event type k, +.>Representing the i-th word +.>Score belonging to event type k, < >>Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;
introducing weighted cross entropy loss function to calculate lossLet the multi-layer bi-directional network model pay more attention to the event type with less training set samples, loss function +.>The calculation process formula (17) of (2) is shown;
(17);
where d represents a new document composed of every 8 sentences,weights representing the kth event type, +.>Representing the ith in sentence SPersonal word->Is (are) true tags->Representing the i-th word +.>Is the probability of event type k.
The beneficial effects of the application are as follows:
(1) The application uses the name of the predefined type in the ACE2005 data set as external knowledge, enhances the reasoning and understanding capability of the cross-sentence multi-layer bidirectional network event detection method to the context, and the additional information is used for well assisting the cross-sentence multi-layer bidirectional network event detection method to classify the event types with few samples.
(2) According to the application, a powerful English pre-training model BERT is used as a part of a semantic encoder, so that the language understanding and characterization capability of the cross-sentence multi-layer bidirectional network event detection method is greatly improved, the semantic relation in text data is better captured, the multi-layer bidirectional network structure brings neighbor sentence information to an input sentence, interaction between document-level semantics and events is realized, and the problem of current document-level information deletion is well solved.
(3) Aiming at the problem of serious unbalance of ACE2005 data set event type distribution, the application introduces a weighted cross entropy loss function, so that the cross-sentence multi-layer bidirectional network event detection method is more concerned with the type with less sample number, and different weights are allowed to be distributed for different categories so as to balance the sample number difference among the different categories. The method is beneficial to improving the performance of the cross-sentence multi-layer bidirectional network event detection method on an unbalanced data set, and ensures that the cross-sentence multi-layer bidirectional network event detection method can perform proper learning on each category without being excessively influenced by the number of samples.
Drawings
FIG. 1 is a diagram of an overall model framework of the present application.
Detailed Description
As shown in fig. 1, firstly, adding names of predefined types in an ACE data set as external knowledge after inputting sentences in an input sentence to solve the problems of data sparsity and unbalanced distribution of event types in the data set, and effectively improving the defect of limited information of sentences; then adding the English pre-training model BERT into a semantic encoder to help the external knowledge-based cross-sentence multi-layer bidirectional network event detection model to better understand the context, semantic information and relation of the text, wherein after the external knowledge is introduced, the semantic understanding capability of the external knowledge-based cross-sentence multi-layer bidirectional network event detection model can be expanded, so that the processing capability of the phenomenon that the distribution of the event types in a data set is uneven and the occurrence frequency of certain specific event types is low is improved; and secondly, introducing a multi-layer two-way long-short-term memory network to simultaneously capture interaction between the event and the semantic information, so that the semantic information of front and rear neighbor sentences in the document can be obtained by sentences, better semantic supplementation can be obtained, and the problem of document-level information deletion can be solved. The method comprises the steps of summarizing single sentence information by utilizing an information aggregation layer, wherein the single sentence information is used as semantic information of neighbor sentences captured by a multi-layer two-way long-short-term memory network; finally, the label vector of each word in the sentence is obtained, and then the label vector is input into a softmax function through simple linear transformation to obtain the final predicted label probability.
The application works and implements in this way, a cross sentence multi-layer two-way network event detection method based on the external knowledge, characterized by: the method comprises the following steps:
step S1, using a data set disclosed in the event detection field, wherein the data set comprises a document and an event type, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise sentences and trigger words;
step S2, formalized definition of event detection tasks;
given a document containing N sentences,S 1 Representing the first sentence, S 2 Representing a second sentence, S N Representation ofAn nth sentence; />,/>Representing word 1 in sentence S, < ->Representing word 2 in sentence S, < ->Representing the Z-th word in sentence S;
predicted event type tag vector,/>Respectively represent 1 st word +.>Word 2->Z-th word->A corresponding predicted event type label;
step S3, the semantic encoder is used as an inlet of a multi-layer bidirectional network model, the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure, and the input of the semantic encoder is as follows:
wherein,representing the input sequence of sentence S after data preprocessing, [ CLS ]]And [ SEP ]]Two markers in the English pre-training model BERT are represented, < >>Indicates the name in tag 1 in the dataset,/->Representing the name in the mth tag in the data set, wherein m is more than or equal to 1 and less than or equal to 8; [ CLS ]]Representing the beginning of the input sequence of sentence S after data preprocessing, [ SEP ]]Delimiters representing different parts of the input sequence obtained by preprocessing the sentence S;
the names in 8 labels in the data set are used as external knowledge, so that the attention mechanism in the English pre-training model BERT is enhanced, and the training is performed in a supervised learning mode; input sequence obtained by preprocessing sentence S through dataInputting into semantic encoder, obtaining sentence +.>Is a feature vector of (1);
step S4, inputting the obtained feature vector into a multi-layer bidirectional network structure to obtain a predicted event type label vector, wherein the multi-layer bidirectional network structure consists of a plurality of bidirectional decoders and information aggregation layers, each layer bidirectional network structure consists of one bidirectional decoder and one information aggregation layer, and the bidirectional decoder of each layer is input into the information aggregation layer;
step S5, the predicted event type label vector output by the two-way decoder of the previous layer is passed through a long-term and short-term memory network in the information aggregation layer to obtain the last stateThe information is summarized and transmitted to the next layer of bidirectional network structure, and the output of the bidirectional decoder in the last layer of bidirectional network structure is the final output;
step S6, finally outputting a predicted event type label vector, and obtaining predicted probability distribution in a label prediction layer through linear change and a softmax function; calculating losses by weighted cross entropy loss functionsAnd optimizing and updating parameters of the semantic encoder and the multi-layer bidirectional network structure, and finally obtaining event type classification.
Further, in step S1, the data set is an ACE2005 data set, where the ACE2005 data set is various types of data consisting of entities, relationships and event comments issued by the language data alliance, including english, arabic and chinese training data, the ACE2005 data set is selected from the english data set, the ACE2005 data set includes 599 documents and 33 event types, and is divided into 529 training sets, 30 verification sets and 40 test sets, and the training sets include 12426 sentences, 4214 trigger words; the verification set comprises 777 sentences and 483 trigger words; the test set comprises 667 sentences and 422 trigger words; every 8 sentences in the documents of the ACE2005 data set are cut into a new document, the rest sentences after cutting are repeatedly filled, and therefore the condition of 8 sentences of the new document is met.
Further, in step 2, the predicted event type label vectorAnd (3) expanding in a sequence labeling task form, and defining 67 event types, namely 33 types of B and I, and 1 type of non-trigger word NONE.
Further, the semantic encoder in step S3 is composed of an english pretraining model BERT, a two-way long-short-term memory network and an attention mechanism, and the specific contents are:
the English pre-training model BERT is a deep neural network model based on a transducer architecture and consists of bidirectional encoders in a multi-layer transducer architecture;
input sequence obtained by preprocessing sentence S through dataInputting into English pre-training model BERT to obtain sentenceThe calculation process is as shown in formula (1)Showing;
(1);
wherein,respectively represent 1 st word +.>Word 2->Z-th word->The 1 st feature vector, the 2 nd feature vector and the Z-th feature vector are obtained through an English pre-training model BERT;
in the formula (1), the interception isTo the first->Part of (a) sentence->Feature vector +.>
Two-way long-short term memory network: inputting the feature vector obtained by the English pre-training model BERT into a two-way long-short-term memory network to obtain an output vector, wherein the calculation process is shown in a formula (2);
(2);
wherein,representation of BER of English pre-training modelT derived i-th eigenvector->Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>An ith feature vector obtained by an English pre-training model BERT,,/>output vector representing forward long and short term memory network, < >>Output vector representing backward long-short term memory network, < >>Representing a cascading operation;
attention mechanism: inputting an output vector of the two-way long-short-term memory network to obtain semantic representation, wherein the calculation process is shown in a formula (3), a formula (4) and a formula (5);
(3);
(4);
(5);
wherein,representing the i-th word +.>And j-th word in sentence S +.>Is used to determine the attention score of (a),,/>representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent a trainable weight matrix and a bias vector in the attention mechanism, +.>Represents the j-th eigenvector ++obtained by English pre-training model BERT>Output vector of bidirectional long-short-term memory network, +.>Representing the i-th word +.>And j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word +.>Attention score of->Representing the i-th word +.>Is a final semantic representation of (1);
sentence obtaining by means of attention mechanismFinal semantic representation vector
Representing word 1 +.>Final semantic representation,/->Representing word 2 +.>Final semantic representation,/->Representing the Z-th word +.>Is used to determine the final semantic representation of (a).
Further, the bidirectional decoder in step S4 is composed of a forward long-short-term memory network and a backward long-short-term memory network, and the specific contents are as follows:
the bi-directional decoder generates event type label vector corresponding to the word and finally semantically represents the sentence S to the vectorObtaining a predicted event type tag vector +_ by a bi-directional decoder>The specific calculation is shown as a formula (6), a formula (7), a formula (8) and a formula (9);
(6);
(7);
(8);
(9);
wherein,and->Respectively representing the states of a forward long-term memory network and a backward long-term memory network, +.>Representing the i-th word +.>Forward label vector of->Representing the i-th word +.>Is used to determine the backward tag vector of (1),and->Respectively representing a forward long-term memory network and a backward long-term memory network, < + >>Represents the i-1 th word +.>Forward label vector of->Representing the status of the forward long-short term memory network at the previous moment, < >>Representing the transfer function tahn,/for deriving event tag vectors>And b represents a trainable weight matrix and a bias vector in the bi-directional decoder, respectively, +.>Represents the i+1th word +.in sentence S>Is (are) backward tag vector(s)>Representing the state of the backward long-short-term memory network at the later moment; finally predicted event type tag vector>And (3) representing.
Further, the information aggregation layer in step S5 is formed by a single long-short-period memory network layer, and the specific contents are as follows:
selecting the last state of a long-short-term memory networkAs summary information, calculate as formula (10);
(10);
wherein,representing a long-short-term memory network->、/>Respectively representing the ith state and the (i-1) th state of the long-short-period memory network, +.>Representing the i-th word +.>Event tag vectors of (2);
the summary information of sentence S is=/>,/>Indicating the last state of the long-short-term memory network.
Further, the multi-layer bidirectional network structure is formed by combining a bidirectional decoder and an information aggregation layer, the information aggregation layer in the bidirectional network structure aggregates the information of neighbor sentences into the bidirectional decoder of the next layer bidirectional network structure, and propagates the information among sentences, the related information of the sentences S is stored in a plurality of adjacent sentences, a plurality of bidirectional long-short-term memory network decoders are introduced to stack, and the semantic information of the neighbor sentences is aggregated in an iterative mode;
for the t-layer bidirectional decoding layer, outputting calculation such as formula (11), formula (12), formula (13) and formula (14);
(11);
(12);
(13);
(14);
wherein,representing the upper layer, i.e. t-1 layer, immediately preceding sentence +.>Information of->Representing the upper layer, i.e. t-1 layer, followed by a sentence, i.e +.>Information of (2);
ith word in sentence SThe corresponding predicted event tag vector is denoted +.>
As the number of layers of the bi-directional network structure increases, the bi-directional decoder will capture information of more distant sentences, at layer t the bi-directional decoder,sentence S captures +.>The two-way decoder and the information aggregation layer encode and propagate the same information to set the parameters of different layers the same.
Further, the label prediction layer in step S6 specifically includes:
ith word in sentence SIs defined as +.>,/>Maximum number of layers;/>Maximum number of layers->The method comprises the steps of carrying out a first treatment on the surface of the t represents the number of layers->Representing the i-th word +.>Event tag vector at layer t, +.>,/>Representing weight decay parameters;
will be the ith word in sentence SFinal event tag vector +.>The label probability distribution is obtained by inputting through linear change and then through a softmax function, and the calculation process is shown as a formula (15) and a formula (16);
(15);
(16);
wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>And->Respectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the i-th word +.>Probability of event type k, +.>Representing the i-th word +.>Belonging to the thingScore for part type k->Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;
introducing weighted cross entropy loss function to calculate lossLet the multi-layer bi-directional network model pay more attention to the event type with less training set samples, loss function +.>The calculation process formula (17) of (2) is shown;
(17);
where d represents a new document composed of every 8 sentences,weights representing the kth event type, +.>Representing the i-th word +.>Is (are) true tags->Representing the i-th word +.>Is the probability of event type k.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (8)

1. A method for detecting a cross-sentence multi-layer bidirectional network event based on external knowledge is characterized by comprising the following steps: the method comprises the following steps:
step S1, using a data set disclosed in the event detection field, wherein the data set comprises a document and an event type, and is divided into a training set, a verification set and a test set, and the training set, the verification set and the test set comprise sentences and trigger words;
step S2, formalized definition of event detection tasks;
given a document containing N sentences,S 1 Representing the first sentence, S 2 Representing a second sentence, S N Represents an nth sentence; />,/>Representing word 1 in sentence S, < ->Representing word 2 in sentence S, < ->Representing the Z-th word in sentence S;
predicted event type tag vector,/>Respectively represent 1 st word +.>Word 2->Z-th word->A corresponding predicted event type label;
step S3, the semantic encoder is used as an inlet of a multi-layer bidirectional network model, the multi-layer bidirectional network model is divided into a semantic encoder and a multi-layer bidirectional network structure, and the input of the semantic encoder is as follows:
wherein,representing the input sequence of sentence S after data preprocessing, [ CLS ]]And [ SEP ]]Two markers in the English pre-training model BERT are represented, < >>Indicates the name in tag 1 in the dataset,/->Representing the name in the mth tag in the data set, wherein m is more than or equal to 1 and less than or equal to 8; [ CLS ]]Representing the beginning of the input sequence of sentence S after data preprocessing, [ SEP ]]Delimiters representing different parts of the input sequence obtained by preprocessing the sentence S;
input sequence obtained by preprocessing sentence S through dataInputting into semantic encoder, obtaining sentencesIs a feature vector of (1);
s4, inputting the obtained feature vector into a multi-layer bidirectional network structure to obtain a predicted event type label vector, wherein the multi-layer bidirectional network structure consists of a plurality of bidirectional decoders and an information aggregation layer, each layer bidirectional network structure consists of one bidirectional decoder and one information aggregation layer, and the output of the bidirectional decoder of each layer is input into the information aggregation layer;
step S5, the predicted event type label vector output by the two-way decoder of the upper layer is passed through a long-term and short-term memory network in the information aggregation layer to obtain the last stateThe information is summarized and transmitted to the next layer of bidirectional network structure, and the output of the bidirectional decoder in the last layer of bidirectional network structure is the final output;
step S6, finally outputting a predicted event type label vector, and obtaining predicted probability distribution in a label prediction layer through linear change and a softmax function; calculating losses by weighted cross entropy loss functionsAnd optimizing and updating parameters of the semantic encoder and the multi-layer bidirectional network structure, and finally obtaining event type classification.
2. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 1, wherein the method comprises the following steps: in step S1, the data set is an ACE2005 data set, the ACE2005 data set is selected from english data sets, every 8 sentences in the documents of the ACE2005 data set are cut into a new document, the remaining sentences after cutting, and the last sentence is repeatedly filled, thereby satisfying the condition of 8 sentences of a new document.
3. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 2, wherein the method comprises the following steps:
in step 2, predicted event type tag vectorAnd (3) expanding in a sequence labeling task form, and defining 67 event types, namely 33 types of B and I, and 1 type of non-trigger word NONE.
4. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 3, wherein the method comprises the following steps:
the semantic encoder in the step S3 consists of an English pre-training model BERT, a two-way long-short-term memory network and an attention mechanism, and comprises the following specific contents:
the English pre-training model BERT is a deep neural network model based on a transducer architecture and consists of bidirectional encoders in a multi-layer transducer architecture;
input sequence obtained by preprocessing sentence S through dataInputting into English pre-training model BERT to obtain sentenceThe calculation process is shown in the formula (1);
(1);
wherein,respectively represent 1 st word +.>Word 2->Z-th word->Obtained by English pre-training model BERTThe 1 st feature vector, the 2 nd feature vector and the Z-th feature vector;
two-way long-short term memory network: inputting the feature vector obtained by the English pre-training model BERT into a two-way long-short-term memory network to obtain an output vector, wherein the calculation process is shown in a formula (2);
(2);
wherein,representing the i-th feature vector +.>Output vector via bidirectional long-short-term memory network, < >>Representing the i-th word +.>The ith feature vector obtained by English pre-training model BERT>,/>Output vector representing forward long and short term memory network, < >>An output vector representing a backward long and short term memory network,representing cascading operations
Attention mechanism: inputting an output vector of the two-way long-short-term memory network to obtain semantic representation, wherein the calculation process is shown in a formula (3), a formula (4) and a formula (5);
(3);
(4);
(5);
wherein,representing the i-th word +.>And j-th word in sentence S +.>Attention score of->,/>Representing the i-th feature vector +.>Output vector of bidirectional long-short-term memory network>Transpose (S)>And->Respectively represent attentionWeight matrix and bias vector trainable in mechanism, < ->Represents the j-th eigenvector ++obtained by English pre-training model BERT>Output vector of bidirectional long-short-term memory network, +.>Representing the i-th word +.>And j-th word in sentence S +.>Attention weight in between; />Representing the i-th word +.>And the mth word in sentence SAttention score of->Representing the i-th word +.>Is a final semantic representation of (1);
sentence obtaining by means of attention mechanismFinal semantic representation vector->
Representing word 1 +.>Final semantic representation,/->Representing word 2 +.>Final semantic representation,/->Representing the Z-th word +.>Is used to determine the final semantic representation of (a).
5. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 4, wherein the method comprises the following steps:
the bidirectional decoder in step S4 is composed of a forward long-short-term memory network and a backward long-short-term memory network, and the specific contents are as follows:
the bi-directional decoder generates event type label vector corresponding to the word and finally semantically represents the sentence S to the vectorObtaining a predicted event type tag vector +_ by a bi-directional decoder>The specific calculation is shown as a formula (6), a formula (7), a formula (8) and a formula (9);
(6);
(7);
(8);
(9);
wherein,and->Respectively representing the states of a forward long-term memory network and a backward long-term memory network, +.>Representing the i-th word +.>Forward label vector of->Representing the i-th word +.>Is (are) backward tag vector(s)>Andrespectively representing a forward long-term memory network and a backward long-term memory network, < + >>Represents the i-1 th word +.>Forward label vector of->Representing the status of the forward long-short term memory network at the previous moment, < >>Representing the transfer function tahn,/for deriving event tag vectors>And b represents a trainable weight matrix and a bias vector in the bi-directional decoder, respectively, +.>Represents the i+1th word +.in sentence S>Is (are) backward tag vector(s)>Representing the state of the backward long-short-term memory network at the later moment; finally predicted event type tag vector>And (3) representing.
6. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 5, wherein the method comprises the following steps:
the information aggregation layer in the step S5 is composed of an independent long-period and short-period memory network layer, and comprises the following specific contents:
selecting the last state of a long-short-term memory networkAs summary information, calculate as formula (10);
(10);
wherein,representing a long-short-term memory network->、/>Respectively representing the ith state and the (i-1) th state of the long-short-period memory network, +.>Representing the i-th word +.>Event tag vectors of (2);
the summary information of sentence S is=/>,/>Indicating the last state of the long-short-term memory network.
7. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 6, wherein the method comprises the following steps:
the multi-layer bidirectional network structure is formed by combining a bidirectional decoder and an information aggregation layer, the information aggregation layer in the bidirectional network structure aggregates the information of neighbor sentences into the bidirectional decoder of the next layer bidirectional network structure, and propagates the information among sentences, the related information of the sentences S is stored in a plurality of adjacent sentences, a plurality of bidirectional long-short-term memory network decoders are introduced to stack, and the semantic information of the neighbor sentences is aggregated in an iterative mode;
for the t-layer bidirectional decoding layer, outputting calculation such as formula (11), formula (12), formula (13) and formula (14);
(11);
(12);
(13);
(14);
wherein,representing the upper layer, i.e. t-1 layer, immediately preceding sentence +.>Information of->Representing the upper layer, i.e. t-1 layer, followed by a sentence, i.e +.>Information of (2);
ith word in sentence SThe corresponding predicted event tag vector is denoted +.>
As the number of layers of the bi-directional network structure increases, the bi-directional decoder will capture information of more distant sentences, at layer t the bi-directional decoder,sentence S captures +.>The two-way decoder and the information aggregation layer encode and propagate the same information to set the parameters of different layers the same.
8. The method for detecting the cross-sentence multilayer bidirectional network event based on the external knowledge according to claim 7, wherein the method comprises the following steps:
the label prediction layer in step S6 specifically comprises the following steps:
ith word in sentence SIs defined as +.>,/>Maximum number of layers->;/>Maximum number of layers->The method comprises the steps of carrying out a first treatment on the surface of the t represents the number of layers,/>representing the i-th word +.>Event tag vector at layer t, +.>,/>Representing weight decay parameters;
will be the ith word in sentence SFinal event tag vector +.>The label probability distribution is obtained by inputting through linear change and then through a softmax function, and the calculation process is shown as a formula (15) and a formula (16);
(15);
(16);
wherein,representing the i-th word +.>Final event tag vector +.>Output after linear transformation, ++>Andrespectively represent a trainable weight matrix and a bias vector in a label prediction layer, < + >>Representing the ith word in sentence SProbability of event type k, +.>Representing the i-th word +.>Score belonging to event type k, < >>Representing the i-th word +.>The score belonging to the event type Q, D is a training document set, and Q is the total number of event types;
introducing weighted cross entropy loss function to calculate lossLet the multi-layer bi-directional network model pay more attention to the event type with less training set samples, loss function +.>The calculation process formula (17) of (2) is shown;
(17);
where d represents a new document composed of every 8 sentences,weights representing the kth event type, +.>Representing the i-th word +.>Is (are) true tags->Representing the i-th word +.>Is the probability of event type k.
CN202311529807.XA 2023-11-16 2023-11-16 Cross-sentence multi-layer bidirectional network event detection method based on external knowledge Pending CN117236436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311529807.XA CN117236436A (en) 2023-11-16 2023-11-16 Cross-sentence multi-layer bidirectional network event detection method based on external knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311529807.XA CN117236436A (en) 2023-11-16 2023-11-16 Cross-sentence multi-layer bidirectional network event detection method based on external knowledge

Publications (1)

Publication Number Publication Date
CN117236436A true CN117236436A (en) 2023-12-15

Family

ID=89084838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311529807.XA Pending CN117236436A (en) 2023-11-16 2023-11-16 Cross-sentence multi-layer bidirectional network event detection method based on external knowledge

Country Status (1)

Country Link
CN (1) CN117236436A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416956A (en) * 2020-11-19 2021-02-26 重庆邮电大学 Question classification method based on BERT and independent cyclic neural network
CN113505200A (en) * 2021-07-15 2021-10-15 河海大学 Sentence-level Chinese event detection method combining document key information
CN114662586A (en) * 2022-03-18 2022-06-24 南京邮电大学 Method for detecting false information based on common attention multi-mode fusion mechanism
CN115034224A (en) * 2022-01-26 2022-09-09 华东师范大学 News event detection method and system integrating representation of multiple text semantic structure diagrams
CN115221325A (en) * 2022-07-25 2022-10-21 中国人民解放军军事科学院军事科学信息研究中心 Text classification method based on label semantic learning and attention adjustment mechanism
CN115510236A (en) * 2022-11-23 2022-12-23 中国人民解放军国防科技大学 Chapter-level event detection method based on information fusion and data enhancement
US20230127652A1 (en) * 2021-10-25 2023-04-27 Adobe Inc. Event understanding with deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416956A (en) * 2020-11-19 2021-02-26 重庆邮电大学 Question classification method based on BERT and independent cyclic neural network
CN113505200A (en) * 2021-07-15 2021-10-15 河海大学 Sentence-level Chinese event detection method combining document key information
US20230127652A1 (en) * 2021-10-25 2023-04-27 Adobe Inc. Event understanding with deep learning
CN115034224A (en) * 2022-01-26 2022-09-09 华东师范大学 News event detection method and system integrating representation of multiple text semantic structure diagrams
CN114662586A (en) * 2022-03-18 2022-06-24 南京邮电大学 Method for detecting false information based on common attention multi-mode fusion mechanism
CN115221325A (en) * 2022-07-25 2022-10-21 中国人民解放军军事科学院军事科学信息研究中心 Text classification method based on label semantic learning and attention adjustment mechanism
CN115510236A (en) * 2022-11-23 2022-12-23 中国人民解放军国防科技大学 Chapter-level event detection method based on information fusion and data enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONGFANG LOU ET AL.: "MLBiNet: A Cross-Sentence Collective Event Detection Network", 《PROCEEDINGS OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》, pages 4829 *

Similar Documents

Publication Publication Date Title
CN108829818B (en) Text classification method
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN111143550B (en) Method for automatically identifying dispute focus based on hierarchical attention neural network model
CN110222188B (en) Company notice processing method for multi-task learning and server
CN109858041B (en) Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN110532557B (en) Unsupervised text similarity calculation method
CN110287323B (en) Target-oriented emotion classification method
CN111506732B (en) Text multi-level label classification method
CN111797241B (en) Event Argument Extraction Method and Device Based on Reinforcement Learning
CN113157859B (en) Event detection method based on upper concept information
CN110297889B (en) Enterprise emotional tendency analysis method based on feature fusion
CN112784041B (en) Chinese short text sentiment orientation analysis method
CN112560486A (en) Power entity identification method based on multilayer neural network, storage medium and equipment
CN113806547B (en) Deep learning multi-label text classification method based on graph model
CN111666373A (en) Chinese news classification method based on Transformer
CN111538841B (en) Comment emotion analysis method, device and system based on knowledge mutual distillation
CN113065341A (en) Automatic labeling and classifying method for environmental complaint report text
CN114722835A (en) Text emotion recognition method based on LDA and BERT fusion improved model
CN116822625A (en) Divergent-type associated fan equipment operation and detection knowledge graph construction and retrieval method
CN113343690A (en) Text readability automatic evaluation method and device
CN109446326A (en) Biomedical event based on replicanism combines abstracting method
CN115544252A (en) Text emotion classification method based on attention static routing capsule network
CN113065356B (en) IT equipment operation and maintenance fault suggestion processing method based on semantic analysis algorithm
CN114048314A (en) Natural language steganalysis method
CN117271701A (en) Method and system for extracting system operation abnormal event relation based on TGGAT and CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination