CN116595407A

CN116595407A - Event argument detection method and system based on label sequence consistency modeling

Info

Publication number: CN116595407A
Application number: CN202310388963.2A
Authority: CN
Inventors: 郭嘉丰; 靳小龙; 程学旗; 官赛萍; 张付俊; 席鹏弼
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2023-08-15

Abstract

The invention provides an event argument detection method and system based on label sequence consistency modeling. The method mainly comprises word sequence semantic coding, word tag sequence labeling, error prone tag sequence generation and contrast learning regularization. Semantic representation learning is carried out on the preprocessed words by adopting BERT and a training language model, and event type information is integrated into a representation vector; word tag sequence labeling uses a fully connected network to make predictions of tag probability distributions corresponding to each word; generating the error-prone tag sequence according to a certain strategy and the probability distribution of the word tag sequence; the contrast learning regularization is to construct regularization loss based on contrast learning of the error-prone tag sequence and the correct tag sequence, so that the consistency of the word sequence tags is improved.

Description

Event argument detection method and system based on label sequence consistency modeling

Technical Field

The invention relates to the field of information extraction, in particular to a method for improving the detection effect of event arguments in event extraction tasks.

Background

Events (events) as a structured representation of information refer to what is actually happening involving some participants. As a special class of information extraction tasks, the goal of event extraction is to extract instances of predefined event types from a given text. The event consists of Trigger words (Trigger) and arguments (Argument), wherein the Trigger words are words in the text which can express occurrence of the event most clearly and are core verbs of sentences in which the event is located; an argument is an entity that is related to an event and plays a role in the event. Generally, event extraction can be generally divided into trigger word extraction and argument extraction, wherein the trigger word extraction task aims at finding the trigger word of an event and judging the type of the event; the objective of the argument extraction task is to execute an event argument detection task to judge whether an entity in the text is an element related to an event for a given text and event trigger words, and if so, judge the role played by the entity in the event, such as an initiator, a receiver, an attacker, an attacked person and the like. With the advance of research work on event extraction in recent years, the existing method has better effect on trigger word extraction task, in element extraction, most technologies simplify the trigger word extraction task, candidate element entities are regarded as known information, and in real application scenes, most cases need to extract complete structured event information from plain text, and the entity serving as the candidate element needs to be preferentially found from the text. This step is called an event argument detection task as a subtask of argument extraction. Most of the existing event argument detection tasks are based on a sequence labeling method, and have some defects:

1) In the existing event argument detection technology based on sequence labeling, most modeling of tag sequence consistency is not considered, and the tag sequence consistency specifically refers to whether word tag sequences in an argument are correct, complete and compatible. The sequence labeling scheme converts the event argument detection task into a problem of labeling all words in a text with corresponding labels, so that the condition that a certain argument is correctly detected is that the labels of all words in the argument must be labeled correctly, and the constraint condition is that the accuracy requirement for sequence labeling is very strict and the global consistency requirement between word sequence labels is very high. In addition, the decoding method of the local optimal label is adopted for each word, so that the final prediction result is further aggravated to be unsatisfied with consistency, and various missing error phenomena exist, including label mislabel in the argument, isolated label outside the argument and the like. In order to cope with the problem, some methods model transition probabilities among labels by adopting a method based on a conditional random field, but the probability of all possible label sequences needs to be calculated during training, a large amount of probability transition calculation is also needed during decoding, the calculation complexity is high, the modeling effect of the conditional random field on global consistency is limited by the unidirectional property of the conditional random field, and finally, the effect of relieving errors such as missing of labels in the theoretical element is limited.

2) Another drawback of the sequence labeling-based approach is that the number distribution of individual labels is highly unbalanced, since the sum of the number of words of all argument digits is still low relative to the number of words in the whole input text, whereas the BIO-label system will convert the words of non-argument digits in the text into O (out side, non-argument inner word) labels in the sequence labeling, which results in that the O labels will be much more than B (Begin, argument inner first word) and I (Inside argument other words). The long tail distribution of the labels makes the model marked as a single target by the sequence easy to have the problem of over fitting, thereby influencing the accuracy and the integrity of the final decoding result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an event argument detection method based on label sequence consistency modeling, which comprises the following steps:

a training corpus preprocessing step, namely acquiring a training corpus of marked event argument character categories and event types, segmenting the text in the training corpus, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of a language characterization model BERT;

a word sequence semantic coding step, namely inputting a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code a sub-word sequence, mapping event types into distributed expression vectors, respectively splicing the distributed expression vectors with each word vector, and fusing the distributed expression vectors through a linear network to obtain a word sense expression vector fused with event type information;

A word tag sequence labeling step, namely inputting word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selecting the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;

generating error-prone tag sequences, namely dividing the word sequences into correct tag sequences which are predicted correctly and error tag sequences which are predicted incorrectly according to the predicted meta role category and the marked event meta role category of each word;

comparing and learning regularization step, performing representation learning on the error tag sequence and the correct tag sequence, and training the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;

and an event argument detection step, namely sequentially inputting the text of the event argument to be detected and the event type thereof into the Transformer model and the fully-connected network after training is completed, and obtaining the argument character category of the text.

The event argument detection method based on label sequence consistency modeling, wherein the word sequence semantic coding step comprises the following steps:

precoding the input word and word sequence T through a BERT language model with corpus pre-training to obtain richer dynamic semantic expression C= { C ₀ ,c ₁ ,…,c _n }；

Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X ₀ ,x ₁ ,…,x _n Interactive fusion is carried out on the information of the two through a fully connected network, so that the context vector representation H= { H of the fusion event information of each word is finally obtained ₀ ,h ₁ ,…,h _n The overall calculation process is as follows:

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V ₁ 、W ₂ 、b ₁ 、b ₂ Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.

The event argument detection method based on label sequence consistency modeling comprises the following steps:

contextual representation h of event information for each word fusion _i Predicting its tag probability distribution p= { P using linear layer ₀ ,p ₁ ,…,p _n }，p _i Tag probability distribution vector representing the i-th word:

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively linear transformationsThe matrix and the corresponding bias term are transformed to obtain a 3-dimensional vector, j, k belong to {0,1,2} respectively represent the indexes corresponding to the labels O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.

Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:

wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.

The event argument detection method based on label sequence consistency modeling, wherein the error prone label sequence generation step comprises the following steps:

conversion of standard correct tag sequences intoWherein->For the positive corresponding to the ith wordDetermining a label;

when locally predicting label L ^pred Hard tag sequence L by greedy decoding ^greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L ^neg ＝L ^greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:

when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s _k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +. >Negative sample set +.> The specific negative sampling process is as follows:

wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;

the contrast learning regularization step includes:

for each tag sequence L ε { L ^gold ,L ^neg ,L ^pred Representation learning is performed, where l= { L ₀ ,l ₁ ,…,l _n -a }; for BIO labels, a label parameter matrix W to be trained is set _L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W _L The representation q= { Q for each position tag in each tag sequence can be obtained ₀ ,q ₁ ,…,q _n }：

q _i ＝l _i ·W _L

After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information ₀ ,u ₁ ,…,u _n }：

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Wherein W is ₄ ，W ₅ ，b ₄ ，b ₅ Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.

Sequence representation learning is carried out on U by using a transducer to obtain a representation vector Z= { Z corresponding to each word and label ₀ ,z ₁ ,…,z _n Using the average value of the vectors of each position as the final vector representation O E { O } ^pred ,O ^gold ,O ^neg }：

Z＝Transformer(U)

Using ternary interval loss function as loss function for build contrast task

Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:

The event argument detection step comprises:

labeling by using a greedy decoding method, and obtaining L ^greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding

The invention also provides an event argument detection system based on label sequence consistency modeling, which comprises:

the training corpus preprocessing module is used for acquiring training corpuses of marked event argument character categories and event types, segmenting the texts in the training corpuses, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of the language characterization model BERT;

the word sequence semantic coding module inputs a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code the sub-word sequence, maps event types into distributed expression vectors, respectively splices the distributed expression vectors with each word vector and fuses the distributed expression vectors through a linear network to obtain word sense expression vectors fused with event type information;

the word tag sequence labeling module inputs word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selects the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;

The error-prone tag sequence generation module divides the word sequence into a correct tag sequence with correct prediction and an error tag sequence with error prediction according to the role category of the predicted argument of each word and the role category of the marked event argument;

the comparison learning regularization module performs representation learning on the error tag sequence and the correct tag sequence, and trains the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;

and the event argument detection module sequentially inputs the text of the event argument to be detected and the event type thereof into the transducer model and the fully-connected network after training is completed, so as to obtain the argument character category of the text.

The event argument detection system based on label sequence consistency modeling, wherein the word sequence semantic coding module comprises:

Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X ₀ ,x ₁ ,…,x _n Information of the two is mutually fused through a fully connected network to obtainContext vector representation h= { H of fused event information to final each word ₀ ,h ₁ ,…,h _n The overall calculation process is as follows:

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

The event argument detection system based on label sequence consistency modeling, wherein the word label sequence labeling module comprises:

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, j, k belongs to {0,1,2} respectively representing the corresponding index of the tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +.> Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.

The event argument detection system based on label sequence consistency modeling, wherein the error prone label sequence generation module comprises:

conversion of standard correct tag sequences intoWherein->The correct label corresponding to the ith word;

when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s _k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +.>Negative sample set +.> The specific negative sampling process is as follows:

the contrast learning regularization module includes:

For each tag sequence L ε { L ^gold ,N ^neg ,L ^pred Representation learning is performed, where l= { L ₀ ,l ₁ ,…,l _n -a }; for BIO labels, a label parameter matrix W to be trained is set _L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W _L The representation q= { Q for each position tag in each tag sequence can be obtained ₀ ,q ₁ ,…,q _n }：

q _i ＝l _i ·W _L

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Z＝Transformer(U)

Using ternary interval loss function as loss function for build contrast task

The event argument detection module comprises:

The invention also provides a storage medium for storing a program for executing any event argument detection method based on tag sequence consistency modeling.

The invention also provides a client, which is used for any event argument detection system based on label sequence consistency modeling.

The advantages of the invention are as follows:

firstly, contrast learning is used to improve the consistency of the interior of a word tag sequence and alleviate the problems of less recall and false detection caused by local tag misleakage; secondly, the contrast learning task loss is used as a regularization term to carry out joint training, so that the problem of overfitting easily occurring in single sequence labeling is relieved; the performance of the event argument detection task is improved, the event argument detection F1 value on the RAMS public data test set reaches 44.2%, and the method is superior to the existing BERT-based sequence labeling technology, and the F1 value of the technology is only 39.3%.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a block diagram of the overall process of the present invention.

Detailed Description

In order to overcome the defects in the prior art, the invention provides an event argument detection method based on label sequence consistency modeling. And generating an error tag sequence conforming to typical error characteristics according to the correct tag sequence by adopting a specific sampling strategy, respectively performing representation learning on the correct tag sequence and the error tag sequence, and modeling the consistency of the tags in the comparison learning of positive and negative samples by a model by a comparison learning method. In addition, the comparison learning task and the original sequence labeling task are combined to learn, so that the problem of overfitting easily caused by a single sequence labeling model is relieved, and the effect of argument detection is improved.

The event argument role prediction method provided by the invention comprises the following steps:

1) Preprocessing the training corpus: the training corpus used in the invention is selected from a RAMS data set, the processing process is that Word Piece is carried out on a text by using a Word Piece method, each Word is converted into an ID corresponding to a BERT pre-training dictionary, self-defined [ Event ] special tags are added before and after the corresponding position of a trigger Word, and finally [ CLS ] special tags and [ SEP ] special tags which are consistent with BERT and training tasks are respectively added at the beginning and the end of a sentence;

2) Word sequence semantic coding: the method comprises the steps of pre-coding by using a BERT pre-training language model, pre-coding a word id sequence processed in the previous step by using a multi-layer transducer model of the BERT, and obtaining semantic features of words by using the language model BERT pre-trained by large-scale corpus. Then, the correct event type is mapped to a leachable distributed expression vector, and the leachable distributed expression vector is respectively spliced with each word vector and fused through a linear network to obtain the word meaning expression vector fused with the event type information.

3) Word tag sequence labeling: using BIO label system, using full connection network to predict probability distribution of each word semantic expression vector belonging to each label in BIO label system.

4) Generating error-prone tag sequences: and sampling the confusable error label sequence according to the prediction probability distribution. Wherein the prediction does not conform to the preset label category, i.e. the prediction failure belongs to confusing matters.

5) Contrast learning regularization: and carrying out semantic coding on the tag sequence by using the transducer model to obtain semantic representation vectors, carrying out representation learning on the error tag sequence and the correct tag sequence, constructing a comparison learning task as a regularization item, and improving the tag consistency of the word sequence.

In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.

The invention provides an event argument detection method based on label sequence consistency modeling, and the whole flow of the method is shown in figure 1. The method mainly comprises word sequence semantic coding, word tag sequence labeling, error prone tag sequence generation and contrast learning regularization. Semantic representation learning is carried out on the preprocessed words by adopting BERT and a training language model, and event type information is integrated into a representation vector; word tag sequence labeling uses a fully connected network to make predictions of tag probability distributions corresponding to each word; generating the error-prone tag sequence according to a certain strategy and the probability distribution of the word tag sequence; the contrast learning regularization is to construct regularization loss based on contrast learning of the error-prone tag sequence and the correct tag sequence, so that the consistency of the word sequence tags is improved. The specific method comprises the following steps:

s1, pre-coding by using a BERT pre-training language model, adding a custom [ Event ] special label before and after the corresponding position of a trigger word, respectively adding [ CLS ] and [ SEP ] special labels consistent with the BERT and training tasks at the head and the tail of the sentence, and marking the positions of the head and the sentence for the pre-training task of the pre-training language model BERT, wherein the use is to keep consistent with the positions of the head and the sentence, so as to obtain more accurate semantic features.

And inputting the processed word sequence into a multi-layer transducer model of the BERT to pre-encode the word sequence. Then, mapping the event type to a leachable distributed expression vector through a lookup mapping, respectively splicing the event type with each word vector, and fusing the event type with each word vector through a linear network to obtain a word sense expression vector fused with event type information. Wherein the word vector refers to the ebedding corresponding to each word position obtained by the BERT language model coding.

S2, based on a BIO label system, predicting probability distribution of each word belonging to each label by using a fully-connected network for semantic expression vectors of each word, wherein a ReLU is used as an activation function, and a Softmax function is used for modeling the probability distribution.

S3, sampling to generate a confusable error label sequence according to the estimated probability distribution.

S4, performing representation learning on the error tag sequence and the correct tag sequence, constructing a comparison learning task, taking the loss as a regularization term to participate in training, and improving the tag consistency of the word sequence, wherein ternary interval loss is adopted.

Specifically, S1 comprises 3 sub-steps as follows.

S101, preprocessing training data. The method comprises the steps of segmenting a text through a WordPieceTokenizer module in a Transformers library, adding a custom [ Event ] special tag before and after a trigger word corresponding position, then adding [ CLS ] and [ SEP ] special tags consistent with BERT and training tasks at the head and the tail of a sentence respectively, and filling the input of the same batch of words into the same length according to the longest text length of the batch of words.

S102, BERT pre-training model coding. The method can obtain more abundant dynamic semantic expression C= { C compared with the traditional static word vector by pre-encoding the input word and word sequence T through the BERT language model of large-scale corpus pre-training ₀ ,c ₁ ,…,c _n }，c _n Is the dynamic semantic expression of the nth word.

C＝BERT(T)

S103, independently encoding semantic information of an input event type E, using an independent trainable parameter matrix V as an expression vector of each event type to participate in model training, and then splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X ₀ ,x ₁ ,…,x _n }，x _n For the intermediate semantic expression of the nth word, finally, the information of the nth word and the information of the nth word are interactively fused through a fully connected network to obtain the context vector representation H= { H of fused event information of each word finally ₀ ,h ₁ ,…,h _n The overall calculation process is as follows:

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V ₁ 、W ₂ 、b ₁ 、b ₂ Is a linear transformation matrix and corresponding bias term. ReLU is a nonlinear rectifier, here used as an activation function.

S2 also comprises 3 sub-steps as follows.

S201, fusing the context representation h of the event information for each word _i The present invention also uses a linear layer to predict its tag probability distribution p= { P ₀ ,p ₁ ,…,p _n }，p _i The tag probability distribution vector representing the i-th word is calculated as follows:

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, wherein j and k are {0,1,2} respectively representing the corresponding indexes of the tag O, B, I, namelyRepresenting the probability that the i-th word is labeled as the k-th label. The local prediction label corresponding to the input text can be obtained after calculation>The predictive tag is here designed as a soft tag, wherein +.> The soft label corresponding to the i-th word is a vector with the length of 3.

S202, calculating a loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task.

Also, specifically, S3 is also divided into two steps.

S301, processing a correct label sequence. Conversion of standard correct tag sequences into Wherein->For the correct label corresponding to the ith word, with the local predictive label L ^pred In contrast, the hard tag is used herein as a representation of a one-hot (one-hot) encoded vector, e.g., B-tag corresponds to a tag vector of [0,1,0 ]]The vector corresponding to the I tag is [0,1 ]]。

S302, generating an error label sequence. When predicting soft label L ^pred Obtained by greedy decodingTo hard tag sequence L ^greedy When the sequence is inconsistent with the correct label sequence, the invention directly takes the sequence as the generated negative sample of the error label, namely, the negative sample set L does not need to be additionally subjected to negative sampling ^neg ＝L ^greedy . According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process is as follows:

when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, namely the k-th argument in the current event is referred tob is the start position (begin) of the argument reference, e is the end position (end) of the argument reference, a indicates that the position belongs to the argument (argument) and is not a trigger word, and A is selected _k The argument refers to the word which is most easily wrongly marked by the sequence among the internal words ++>S _k Representing the position of the word in the text and then replacing its corresponding correct tag with the most confusing error tag, thereby forming a negative sample of error tag sequence +.>Negative sample set +.> The specific negative sampling process is as follows:

where onehot is the operation of converting the integer index into a one-hot encoded vector, mid is the median fetch operation. When the probabilities corresponding to two labels with the greatest probability of a certain word are closer, the judgment of the label is more unreliable, and the prediction error is easy to be caused, so that the word corresponding to the label with the least confidence is selected, and the label is replaced by the label with the confusion of another wrong model.

Finally, for the step S4, for a specific construction flow, the invention firstly applies to each tag sequence L epsilon { L } ^gold ,L ^neg ,L ^pred Representation learning is performed, where l= { L ₀ ,l ₁ ,…,l _n }. For BIO labels, the invention sets a trainable label parameter matrix W _L Wherein each column corresponds to a feature vector for each tag in the BIO. From this matrix, the representation q= { Q for each position tag in each tag sequence can be obtained ₀ ,q ₁ ,…,q _n }：

q _i ＝l _i ·W _L

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Wherein W is ₄ ，W ₅ ，b ₄ ，b ₅ Is a linear transformation matrix and a corresponding bias term thereof, and is vector splicingAnd (3) operating.

Finally, the invention uses a transducer to perform sequence representation learning on U to obtain a representation vector Z= { Z corresponding to each word and label ₀ ,z ₁ ,…,z _n Using the average value of the vectors of each position as the final vector representation O E { O } ^pred ,O ^gold ,O ^neg }：

Z＝Transformer(U)

To make O ^pred Near O ^gold And is far away from O ^neg The invention uses a ternary interval loss function (Triplet margin loss) as the loss function for the build comparison task:

wherein margin is a hyper-parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position in the expression space should be not less than margin. Finally, the invention uses the loss function joint training of both the sequence labeling task and the contrast learning regularization task, wherein alpha and beta are super parameters:

In the prediction stage, the invention uses a greedy decoding method to label the obtained L ^greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decodingFig. 2 is a block diagram of the overall method of the present invention.

The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, j, k belongs to {0,1,2} respectively representing the corresponding index of the tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.

the contrast learning regularization module includes:

q _i ＝l _i ·W _L

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Z＝Transformer(U)

Using ternary interval loss function as loss function for build contrast task

The event argument detection module comprises:

labeling by using a greedy decoding method, and obtaining L ^greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding/>

Claims

1. The event argument detection method based on label sequence consistency modeling is characterized by comprising the following steps of:

2. The method for detecting event arguments based on tag sequence consistency modeling of claim 1, wherein the word sequence semantic coding step comprises:

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

3. The method for event argument detection based on tag sequence consistency modeling of claim 2, wherein the word tag sequence labeling step comprises:

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively linear transformation momentsThe array and the corresponding bias term are transformed to obtain a 3-dimensional vector, j, k belong to {0,1,2} respectively represent the indexes corresponding to the labels O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.

4. The method for event argument detection based on tag sequence consistency modeling of claim 3, wherein the error prone tag sequence generating step comprises:

the contrast learning regularization step includes:

q _i ＝l _i ·W _L

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Using a transducer pair USequence representation learning is carried out to obtain a representation vector Z= { Z corresponding to each word and each label ₀ ,z ₁ ,…,z _n Using the average value of the vectors of each position as the final vector representation O E { O } ^pred ,O ^gold ,O ^neg }：

Z＝Transformer(U)

Using ternary interval loss function as loss function for build contrast task

The event argument detection step comprises:

5. An event argument detection system based on tag sequence consistency modeling, comprising:

6. The tag sequence consistency modeling based event argument detection system of claim 5, wherein the word sequence semantic coding module comprises:

BERT language pre-trained by corpusThe language model pre-codes the input word and word sequence T to obtain a richer dynamic semantic expression C= { C ₀ ,c ₁ ,…,c _n }；

x _i ＝[c _i ||V(E)]

h _i ＝W ₂ ·(ReLU(W ₁ ·x _i +b ₁ ))+b ₂

7. The tag sequence consistency modeling based event argument detection system of claim 6, wherein the word tag sequence labeling module comprises:

z _i ＝W ₃ ·h _i +b ₃

wherein W is ₃ ,b ₃ Respectively a linear transformation matrix and a corresponding bias term, and obtaining a 3-dimensional vector after transformation, wherein j and k belong to {0,1,2, respectively, represent the index corresponding to tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +.> Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.

8. The tag sequence consistency modeling based event argument detection system of claim 7, wherein the error prone tag sequence generating module comprises:

when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s _k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +.>Negative sample set +.> Specific negative sampling over-samplingThe process is as follows:

The contrast learning regularization module includes:

q _i ＝l _i ·W _L

u _i ＝W ₅ ·(W ₄ ·[h _i ||q _i ]+b ₄ )+b ₅

Z＝Transformer(U)

Using ternary interval loss function as loss function for build contrast task

The event argument detection module comprises:

9. A storage medium storing a program for executing the event argument detection method based on tag sequence consistency modeling according to any one of claims 1 to 4.

10. A client for use in the event argument detection system of any one of claims 5 to 8 modeled based on tag sequence consistency.