CN116595407A - Event argument detection method and system based on label sequence consistency modeling - Google Patents

Event argument detection method and system based on label sequence consistency modeling Download PDF

Info

Publication number
CN116595407A
CN116595407A CN202310388963.2A CN202310388963A CN116595407A CN 116595407 A CN116595407 A CN 116595407A CN 202310388963 A CN202310388963 A CN 202310388963A CN 116595407 A CN116595407 A CN 116595407A
Authority
CN
China
Prior art keywords
word
sequence
tag
event
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310388963.2A
Other languages
Chinese (zh)
Inventor
郭嘉丰
靳小龙
程学旗
官赛萍
张付俊
席鹏弼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202310388963.2A priority Critical patent/CN116595407A/en
Publication of CN116595407A publication Critical patent/CN116595407A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an event argument detection method and system based on label sequence consistency modeling. The method mainly comprises word sequence semantic coding, word tag sequence labeling, error prone tag sequence generation and contrast learning regularization. Semantic representation learning is carried out on the preprocessed words by adopting BERT and a training language model, and event type information is integrated into a representation vector; word tag sequence labeling uses a fully connected network to make predictions of tag probability distributions corresponding to each word; generating the error-prone tag sequence according to a certain strategy and the probability distribution of the word tag sequence; the contrast learning regularization is to construct regularization loss based on contrast learning of the error-prone tag sequence and the correct tag sequence, so that the consistency of the word sequence tags is improved.

Description

Event argument detection method and system based on label sequence consistency modeling
Technical Field
The invention relates to the field of information extraction, in particular to a method for improving the detection effect of event arguments in event extraction tasks.
Background
Events (events) as a structured representation of information refer to what is actually happening involving some participants. As a special class of information extraction tasks, the goal of event extraction is to extract instances of predefined event types from a given text. The event consists of Trigger words (Trigger) and arguments (Argument), wherein the Trigger words are words in the text which can express occurrence of the event most clearly and are core verbs of sentences in which the event is located; an argument is an entity that is related to an event and plays a role in the event. Generally, event extraction can be generally divided into trigger word extraction and argument extraction, wherein the trigger word extraction task aims at finding the trigger word of an event and judging the type of the event; the objective of the argument extraction task is to execute an event argument detection task to judge whether an entity in the text is an element related to an event for a given text and event trigger words, and if so, judge the role played by the entity in the event, such as an initiator, a receiver, an attacker, an attacked person and the like. With the advance of research work on event extraction in recent years, the existing method has better effect on trigger word extraction task, in element extraction, most technologies simplify the trigger word extraction task, candidate element entities are regarded as known information, and in real application scenes, most cases need to extract complete structured event information from plain text, and the entity serving as the candidate element needs to be preferentially found from the text. This step is called an event argument detection task as a subtask of argument extraction. Most of the existing event argument detection tasks are based on a sequence labeling method, and have some defects:
1) In the existing event argument detection technology based on sequence labeling, most modeling of tag sequence consistency is not considered, and the tag sequence consistency specifically refers to whether word tag sequences in an argument are correct, complete and compatible. The sequence labeling scheme converts the event argument detection task into a problem of labeling all words in a text with corresponding labels, so that the condition that a certain argument is correctly detected is that the labels of all words in the argument must be labeled correctly, and the constraint condition is that the accuracy requirement for sequence labeling is very strict and the global consistency requirement between word sequence labels is very high. In addition, the decoding method of the local optimal label is adopted for each word, so that the final prediction result is further aggravated to be unsatisfied with consistency, and various missing error phenomena exist, including label mislabel in the argument, isolated label outside the argument and the like. In order to cope with the problem, some methods model transition probabilities among labels by adopting a method based on a conditional random field, but the probability of all possible label sequences needs to be calculated during training, a large amount of probability transition calculation is also needed during decoding, the calculation complexity is high, the modeling effect of the conditional random field on global consistency is limited by the unidirectional property of the conditional random field, and finally, the effect of relieving errors such as missing of labels in the theoretical element is limited.
2) Another drawback of the sequence labeling-based approach is that the number distribution of individual labels is highly unbalanced, since the sum of the number of words of all argument digits is still low relative to the number of words in the whole input text, whereas the BIO-label system will convert the words of non-argument digits in the text into O (out side, non-argument inner word) labels in the sequence labeling, which results in that the O labels will be much more than B (Begin, argument inner first word) and I (Inside argument other words). The long tail distribution of the labels makes the model marked as a single target by the sequence easy to have the problem of over fitting, thereby influencing the accuracy and the integrity of the final decoding result.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an event argument detection method based on label sequence consistency modeling, which comprises the following steps:
a training corpus preprocessing step, namely acquiring a training corpus of marked event argument character categories and event types, segmenting the text in the training corpus, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of a language characterization model BERT;
a word sequence semantic coding step, namely inputting a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code a sub-word sequence, mapping event types into distributed expression vectors, respectively splicing the distributed expression vectors with each word vector, and fusing the distributed expression vectors through a linear network to obtain a word sense expression vector fused with event type information;
A word tag sequence labeling step, namely inputting word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selecting the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;
generating error-prone tag sequences, namely dividing the word sequences into correct tag sequences which are predicted correctly and error tag sequences which are predicted incorrectly according to the predicted meta role category and the marked event meta role category of each word;
comparing and learning regularization step, performing representation learning on the error tag sequence and the correct tag sequence, and training the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;
and an event argument detection step, namely sequentially inputting the text of the event argument to be detected and the event type thereof into the Transformer model and the fully-connected network after training is completed, and obtaining the argument character category of the text.
The event argument detection method based on label sequence consistency modeling, wherein the word sequence semantic coding step comprises the following steps:
precoding the input word and word sequence T through a BERT language model with corpus pre-training to obtain richer dynamic semantic expression C= { C 0 ,c 1 ,…,c n };
Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n Interactive fusion is carried out on the information of the two through a fully connected network, so that the context vector representation H= { H of the fusion event information of each word is finally obtained 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.
The event argument detection method based on label sequence consistency modeling comprises the following steps:
contextual representation h of event information for each word fusion i Predicting its tag probability distribution p= { P using linear layer 0 ,p 1 ,…,p n },p i Tag probability distribution vector representing the i-th word:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively linear transformationsThe matrix and the corresponding bias term are transformed to obtain a 3-dimensional vector, j, k belong to {0,1,2} respectively represent the indexes corresponding to the labels O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.
Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:
wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
The event argument detection method based on label sequence consistency modeling, wherein the error prone label sequence generation step comprises the following steps:
conversion of standard correct tag sequences intoWherein->For the positive corresponding to the ith wordDetermining a label;
when locally predicting label L pred Hard tag sequence L by greedy decoding greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L neg =L greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +. >Negative sample set +.> The specific negative sampling process is as follows:
wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;
the contrast learning regularization step includes:
for each tag sequence L ε { L gold ,L neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n -a }; for BIO labels, a label parameter matrix W to be trained is set L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W L The representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.
Sequence representation learning is carried out on U by using a transducer to obtain a representation vector Z= { Z corresponding to each word and label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
Using ternary interval loss function as loss function for build contrast task
Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:
The event argument detection step comprises:
labeling by using a greedy decoding method, and obtaining L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding
The invention also provides an event argument detection system based on label sequence consistency modeling, which comprises:
the training corpus preprocessing module is used for acquiring training corpuses of marked event argument character categories and event types, segmenting the texts in the training corpuses, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of the language characterization model BERT;
the word sequence semantic coding module inputs a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code the sub-word sequence, maps event types into distributed expression vectors, respectively splices the distributed expression vectors with each word vector and fuses the distributed expression vectors through a linear network to obtain word sense expression vectors fused with event type information;
the word tag sequence labeling module inputs word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selects the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;
The error-prone tag sequence generation module divides the word sequence into a correct tag sequence with correct prediction and an error tag sequence with error prediction according to the role category of the predicted argument of each word and the role category of the marked event argument;
the comparison learning regularization module performs representation learning on the error tag sequence and the correct tag sequence, and trains the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;
and the event argument detection module sequentially inputs the text of the event argument to be detected and the event type thereof into the transducer model and the fully-connected network after training is completed, so as to obtain the argument character category of the text.
The event argument detection system based on label sequence consistency modeling, wherein the word sequence semantic coding module comprises:
precoding the input word and word sequence T through a BERT language model with corpus pre-training to obtain richer dynamic semantic expression C= { C 0 ,c 1 ,…,c n };
Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n Information of the two is mutually fused through a fully connected network to obtainContext vector representation h= { H of fused event information to final each word 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.
The event argument detection system based on label sequence consistency modeling, wherein the word label sequence labeling module comprises:
contextual representation h of event information for each word fusion i Predicting its tag probability distribution p= { P using linear layer 0 ,p 1 ,…,p n },p i Tag probability distribution vector representing the i-th word:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, j, k belongs to {0,1,2} respectively representing the corresponding index of the tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +.> Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.
Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:
Wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
The event argument detection system based on label sequence consistency modeling, wherein the error prone label sequence generation module comprises:
conversion of standard correct tag sequences intoWherein->The correct label corresponding to the ith word;
when locally predicting label L pred Hard tag sequence L by greedy decoding greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L neg =L greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +.>Negative sample set +.> The specific negative sampling process is as follows:
wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;
the contrast learning regularization module includes:
For each tag sequence L ε { L gold ,N neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n -a }; for BIO labels, a label parameter matrix W to be trained is set L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W L The representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.
Sequence representation learning is carried out on U by using a transducer to obtain a representation vector Z= { Z corresponding to each word and label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
Using ternary interval loss function as loss function for build contrast task
Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:
The event argument detection module comprises:
labeling by using a greedy decoding method, and obtaining L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding
The invention also provides a storage medium for storing a program for executing any event argument detection method based on tag sequence consistency modeling.
The invention also provides a client, which is used for any event argument detection system based on label sequence consistency modeling.
The advantages of the invention are as follows:
firstly, contrast learning is used to improve the consistency of the interior of a word tag sequence and alleviate the problems of less recall and false detection caused by local tag misleakage; secondly, the contrast learning task loss is used as a regularization term to carry out joint training, so that the problem of overfitting easily occurring in single sequence labeling is relieved; the performance of the event argument detection task is improved, the event argument detection F1 value on the RAMS public data test set reaches 44.2%, and the method is superior to the existing BERT-based sequence labeling technology, and the F1 value of the technology is only 39.3%.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a block diagram of the overall process of the present invention.
Detailed Description
In order to overcome the defects in the prior art, the invention provides an event argument detection method based on label sequence consistency modeling. And generating an error tag sequence conforming to typical error characteristics according to the correct tag sequence by adopting a specific sampling strategy, respectively performing representation learning on the correct tag sequence and the error tag sequence, and modeling the consistency of the tags in the comparison learning of positive and negative samples by a model by a comparison learning method. In addition, the comparison learning task and the original sequence labeling task are combined to learn, so that the problem of overfitting easily caused by a single sequence labeling model is relieved, and the effect of argument detection is improved.
The event argument role prediction method provided by the invention comprises the following steps:
1) Preprocessing the training corpus: the training corpus used in the invention is selected from a RAMS data set, the processing process is that Word Piece is carried out on a text by using a Word Piece method, each Word is converted into an ID corresponding to a BERT pre-training dictionary, self-defined [ Event ] special tags are added before and after the corresponding position of a trigger Word, and finally [ CLS ] special tags and [ SEP ] special tags which are consistent with BERT and training tasks are respectively added at the beginning and the end of a sentence;
2) Word sequence semantic coding: the method comprises the steps of pre-coding by using a BERT pre-training language model, pre-coding a word id sequence processed in the previous step by using a multi-layer transducer model of the BERT, and obtaining semantic features of words by using the language model BERT pre-trained by large-scale corpus. Then, the correct event type is mapped to a leachable distributed expression vector, and the leachable distributed expression vector is respectively spliced with each word vector and fused through a linear network to obtain the word meaning expression vector fused with the event type information.
3) Word tag sequence labeling: using BIO label system, using full connection network to predict probability distribution of each word semantic expression vector belonging to each label in BIO label system.
4) Generating error-prone tag sequences: and sampling the confusable error label sequence according to the prediction probability distribution. Wherein the prediction does not conform to the preset label category, i.e. the prediction failure belongs to confusing matters.
5) Contrast learning regularization: and carrying out semantic coding on the tag sequence by using the transducer model to obtain semantic representation vectors, carrying out representation learning on the error tag sequence and the correct tag sequence, constructing a comparison learning task as a regularization item, and improving the tag consistency of the word sequence.
In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The invention provides an event argument detection method based on label sequence consistency modeling, and the whole flow of the method is shown in figure 1. The method mainly comprises word sequence semantic coding, word tag sequence labeling, error prone tag sequence generation and contrast learning regularization. Semantic representation learning is carried out on the preprocessed words by adopting BERT and a training language model, and event type information is integrated into a representation vector; word tag sequence labeling uses a fully connected network to make predictions of tag probability distributions corresponding to each word; generating the error-prone tag sequence according to a certain strategy and the probability distribution of the word tag sequence; the contrast learning regularization is to construct regularization loss based on contrast learning of the error-prone tag sequence and the correct tag sequence, so that the consistency of the word sequence tags is improved. The specific method comprises the following steps:
s1, pre-coding by using a BERT pre-training language model, adding a custom [ Event ] special label before and after the corresponding position of a trigger word, respectively adding [ CLS ] and [ SEP ] special labels consistent with the BERT and training tasks at the head and the tail of the sentence, and marking the positions of the head and the sentence for the pre-training task of the pre-training language model BERT, wherein the use is to keep consistent with the positions of the head and the sentence, so as to obtain more accurate semantic features.
And inputting the processed word sequence into a multi-layer transducer model of the BERT to pre-encode the word sequence. Then, mapping the event type to a leachable distributed expression vector through a lookup mapping, respectively splicing the event type with each word vector, and fusing the event type with each word vector through a linear network to obtain a word sense expression vector fused with event type information. Wherein the word vector refers to the ebedding corresponding to each word position obtained by the BERT language model coding.
S2, based on a BIO label system, predicting probability distribution of each word belonging to each label by using a fully-connected network for semantic expression vectors of each word, wherein a ReLU is used as an activation function, and a Softmax function is used for modeling the probability distribution.
S3, sampling to generate a confusable error label sequence according to the estimated probability distribution.
S4, performing representation learning on the error tag sequence and the correct tag sequence, constructing a comparison learning task, taking the loss as a regularization term to participate in training, and improving the tag consistency of the word sequence, wherein ternary interval loss is adopted.
Specifically, S1 comprises 3 sub-steps as follows.
S101, preprocessing training data. The method comprises the steps of segmenting a text through a WordPieceTokenizer module in a Transformers library, adding a custom [ Event ] special tag before and after a trigger word corresponding position, then adding [ CLS ] and [ SEP ] special tags consistent with BERT and training tasks at the head and the tail of a sentence respectively, and filling the input of the same batch of words into the same length according to the longest text length of the batch of words.
S102, BERT pre-training model coding. The method can obtain more abundant dynamic semantic expression C= { C compared with the traditional static word vector by pre-encoding the input word and word sequence T through the BERT language model of large-scale corpus pre-training 0 ,c 1 ,…,c n },c n Is the dynamic semantic expression of the nth word.
C=BERT(T)
S103, independently encoding semantic information of an input event type E, using an independent trainable parameter matrix V as an expression vector of each event type to participate in model training, and then splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n },x n For the intermediate semantic expression of the nth word, finally, the information of the nth word and the information of the nth word are interactively fused through a fully connected network to obtain the context vector representation H= { H of fused event information of each word finally 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and corresponding bias term. ReLU is a nonlinear rectifier, here used as an activation function.
S2 also comprises 3 sub-steps as follows.
S201, fusing the context representation h of the event information for each word i The present invention also uses a linear layer to predict its tag probability distribution p= { P 0 ,p 1 ,…,p n },p i The tag probability distribution vector representing the i-th word is calculated as follows:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, wherein j and k are {0,1,2} respectively representing the corresponding indexes of the tag O, B, I, namelyRepresenting the probability that the i-th word is labeled as the k-th label. The local prediction label corresponding to the input text can be obtained after calculation>The predictive tag is here designed as a soft tag, wherein +.> The soft label corresponding to the i-th word is a vector with the length of 3.
S202, calculating a loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task.
Wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
Also, specifically, S3 is also divided into two steps.
S301, processing a correct label sequence. Conversion of standard correct tag sequences into Wherein->For the correct label corresponding to the ith word, with the local predictive label L pred In contrast, the hard tag is used herein as a representation of a one-hot (one-hot) encoded vector, e.g., B-tag corresponds to a tag vector of [0,1,0 ]]The vector corresponding to the I tag is [0,1 ]]。
S302, generating an error label sequence. When predicting soft label L pred Obtained by greedy decodingTo hard tag sequence L greedy When the sequence is inconsistent with the correct label sequence, the invention directly takes the sequence as the generated negative sample of the error label, namely, the negative sample set L does not need to be additionally subjected to negative sampling neg =L greedy . According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process is as follows:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, namely the k-th argument in the current event is referred tob is the start position (begin) of the argument reference, e is the end position (end) of the argument reference, a indicates that the position belongs to the argument (argument) and is not a trigger word, and A is selected k The argument refers to the word which is most easily wrongly marked by the sequence among the internal words ++>S k Representing the position of the word in the text and then replacing its corresponding correct tag with the most confusing error tag, thereby forming a negative sample of error tag sequence +.>Negative sample set +.> The specific negative sampling process is as follows:
where onehot is the operation of converting the integer index into a one-hot encoded vector, mid is the median fetch operation. When the probabilities corresponding to two labels with the greatest probability of a certain word are closer, the judgment of the label is more unreliable, and the prediction error is easy to be caused, so that the word corresponding to the label with the least confidence is selected, and the label is replaced by the label with the confusion of another wrong model.
Finally, for the step S4, for a specific construction flow, the invention firstly applies to each tag sequence L epsilon { L } gold ,L neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n }. For BIO labels, the invention sets a trainable label parameter matrix W L Wherein each column corresponds to a feature vector for each tag in the BIO. From this matrix, the representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and is vector splicingAnd (3) operating.
Finally, the invention uses a transducer to perform sequence representation learning on U to obtain a representation vector Z= { Z corresponding to each word and label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
To make O pred Near O gold And is far away from O neg The invention uses a ternary interval loss function (Triplet margin loss) as the loss function for the build comparison task:
wherein margin is a hyper-parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position in the expression space should be not less than margin. Finally, the invention uses the loss function joint training of both the sequence labeling task and the contrast learning regularization task, wherein alpha and beta are super parameters:
In the prediction stage, the invention uses a greedy decoding method to label the obtained L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decodingFig. 2 is a block diagram of the overall method of the present invention.
The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides an event argument detection system based on label sequence consistency modeling, which comprises:
the training corpus preprocessing module is used for acquiring training corpuses of marked event argument character categories and event types, segmenting the texts in the training corpuses, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of the language characterization model BERT;
the word sequence semantic coding module inputs a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code the sub-word sequence, maps event types into distributed expression vectors, respectively splices the distributed expression vectors with each word vector and fuses the distributed expression vectors through a linear network to obtain word sense expression vectors fused with event type information;
The word tag sequence labeling module inputs word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selects the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;
the error-prone tag sequence generation module divides the word sequence into a correct tag sequence with correct prediction and an error tag sequence with error prediction according to the role category of the predicted argument of each word and the role category of the marked event argument;
the comparison learning regularization module performs representation learning on the error tag sequence and the correct tag sequence, and trains the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;
and the event argument detection module sequentially inputs the text of the event argument to be detected and the event type thereof into the transducer model and the fully-connected network after training is completed, so as to obtain the argument character category of the text.
The event argument detection system based on label sequence consistency modeling, wherein the word sequence semantic coding module comprises:
precoding the input word and word sequence T through a BERT language model with corpus pre-training to obtain richer dynamic semantic expression C= { C 0 ,c 1 ,…,c n };
Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n Interactive fusion is carried out on the information of the two through a fully connected network, so that the context vector representation H= { H of the fusion event information of each word is finally obtained 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.
The event argument detection system based on label sequence consistency modeling, wherein the word label sequence labeling module comprises:
contextual representation h of event information for each word fusion i Predicting its tag probability distribution p= { P using linear layer 0 ,p 1 ,…,p n },p i Tag probability distribution vector representing the i-th word:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively linear transformation matrix and corresponding bias term, obtaining a 3-dimensional vector after transformation, j, k belongs to {0,1,2} respectively representing the corresponding index of the tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.
Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:
wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
The event argument detection system based on label sequence consistency modeling, wherein the error prone label sequence generation module comprises:
conversion of standard correct tag sequences intoWherein->The correct label corresponding to the ith word;
when locally predicting label L pred Hard tag sequence L by greedy decoding greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L neg =L greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +. >Negative sample set +.> The specific negative sampling process is as follows:
wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;
the contrast learning regularization module includes:
for each tag sequence L ε { L gold ,L neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n -a }; for BIO labels, a label parameter matrix W to be trained is set L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W L The representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.
Sequence representation learning is carried out on U by using a transducer to obtain a representation vector Z= { Z corresponding to each word and label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
Using ternary interval loss function as loss function for build contrast task
Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:
The event argument detection module comprises:
labeling by using a greedy decoding method, and obtaining L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding/>
The invention also provides a storage medium for storing a program for executing any event argument detection method based on tag sequence consistency modeling.
The invention also provides a client, which is used for any event argument detection system based on label sequence consistency modeling.

Claims (10)

1. The event argument detection method based on label sequence consistency modeling is characterized by comprising the following steps of:
a training corpus preprocessing step, namely acquiring a training corpus of marked event argument character categories and event types, segmenting the text in the training corpus, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of a language characterization model BERT;
a word sequence semantic coding step, namely inputting a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code a sub-word sequence, mapping event types into distributed expression vectors, respectively splicing the distributed expression vectors with each word vector, and fusing the distributed expression vectors through a linear network to obtain a word sense expression vector fused with event type information;
A word tag sequence labeling step, namely inputting word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selecting the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;
generating error-prone tag sequences, namely dividing the word sequences into correct tag sequences which are predicted correctly and error tag sequences which are predicted incorrectly according to the predicted meta role category and the marked event meta role category of each word;
comparing and learning regularization step, performing representation learning on the error tag sequence and the correct tag sequence, and training the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;
and an event argument detection step, namely sequentially inputting the text of the event argument to be detected and the event type thereof into the Transformer model and the fully-connected network after training is completed, and obtaining the argument character category of the text.
2. The method for detecting event arguments based on tag sequence consistency modeling of claim 1, wherein the word sequence semantic coding step comprises:
precoding the input word and word sequence T through a BERT language model with corpus pre-training to obtain richer dynamic semantic expression C= { C 0 ,c 1 ,…,c n };
Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n Interactive fusion is carried out on the information of the two through a fully connected network, so that the context vector representation H= { H of the fusion event information of each word is finally obtained 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.
3. The method for event argument detection based on tag sequence consistency modeling of claim 2, wherein the word tag sequence labeling step comprises:
contextual representation h of event information for each word fusion i Predicting its tag probability distribution p= { P using linear layer 0 ,p 1 ,…,p n },p i Tag probability distribution vector representing the i-th word:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively linear transformation momentsThe array and the corresponding bias term are transformed to obtain a 3-dimensional vector, j, k belong to {0,1,2} respectively represent the indexes corresponding to the labels O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +. > Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.
Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:
wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
4. The method for event argument detection based on tag sequence consistency modeling of claim 3, wherein the error prone tag sequence generating step comprises:
conversion of standard correct tag sequences intoWherein->The correct label corresponding to the ith word;
when locally predicting label L pred Hard tag sequence L by greedy decoding greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L neg =L greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +. >Negative sample set +.> The specific negative sampling process is as follows:
wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;
the contrast learning regularization step includes:
for each tag sequence L ε { L gold ,L neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n -a }; for BIO labels, a label parameter matrix W to be trained is set L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W L The representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.
Using a transducer pair USequence representation learning is carried out to obtain a representation vector Z= { Z corresponding to each word and each label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
Using ternary interval loss function as loss function for build contrast task
Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:
The event argument detection step comprises:
labeling by using a greedy decoding method, and obtaining L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding
5. An event argument detection system based on tag sequence consistency modeling, comprising:
the training corpus preprocessing module is used for acquiring training corpuses of marked event argument character categories and event types, segmenting the texts in the training corpuses, and obtaining the ID of each word in a pre-training dictionary according to the pre-training dictionary of the language characterization model BERT;
the word sequence semantic coding module inputs a word sequence formed by all word IDs into a multi-layer translation model transducer model of the BERT to pre-code the sub-word sequence, maps event types into distributed expression vectors, respectively splices the distributed expression vectors with each word vector and fuses the distributed expression vectors through a linear network to obtain word sense expression vectors fused with event type information;
the word tag sequence labeling module inputs word meaning expression vectors into a fully-connected network to obtain probability distribution of each word meaning expression vector belonging to each event meta-role category, and selects the event meta-role category with the highest probability in the probability distribution as a predictive meta-role category;
The error-prone tag sequence generation module divides the word sequence into a correct tag sequence with correct prediction and an error tag sequence with error prediction according to the role category of the predicted argument of each word and the role category of the marked event argument;
the comparison learning regularization module performs representation learning on the error tag sequence and the correct tag sequence, and trains the fully-connected network and the transducer model by taking loss of the error tag sequence and the correct tag sequence as regularization items;
and the event argument detection module sequentially inputs the text of the event argument to be detected and the event type thereof into the transducer model and the fully-connected network after training is completed, so as to obtain the argument character category of the text.
6. The tag sequence consistency modeling based event argument detection system of claim 5, wherein the word sequence semantic coding module comprises:
BERT language pre-trained by corpusThe language model pre-codes the input word and word sequence T to obtain a richer dynamic semantic expression C= { C 0 ,c 1 ,…,c n };
Independently encoding semantic information of an input event type E, taking a parameter matrix V as an expression vector of each event type to participate in model training, and splicing a vector corresponding to the event type with an output vector of BERT to obtain an intermediate expression X= { X 0 ,x 1 ,…,x n Interactive fusion is carried out on the information of the two through a fully connected network, so that the context vector representation H= { H of the fusion event information of each word is finally obtained 0 ,h 1 ,…,h n The overall calculation process is as follows:
x i =[c i ||V(E)]
h i =W 2 ·(ReLU(W 1 ·x i +b 1 ))+b 2
v (E) represents vector splicing operation, W, of vector expression corresponding to event type E in parameter matrix V 1 、W 2 、b 1 、b 2 Is a linear transformation matrix and a corresponding bias term; reLU is an activation function.
7. The tag sequence consistency modeling based event argument detection system of claim 6, wherein the word tag sequence labeling module comprises:
contextual representation h of event information for each word fusion i Predicting its tag probability distribution p= { P using linear layer 0 ,p 1 ,…,p n },p i Tag probability distribution vector representing the i-th word:
z i =W 3 ·h i +b 3
wherein W is 3 ,b 3 Respectively a linear transformation matrix and a corresponding bias term, and obtaining a 3-dimensional vector after transformation, wherein j and k belong to {0,1,2, respectively, represent the index corresponding to tag O, B, I,representing a probability that the i-th word is labeled as the kth tag; local predictive label corresponding to input text +.> Wherein->The soft label corresponding to the i-th word is a vector with the length of 3.
Calculating the loss cross entropy loss function corresponding to the labeling part of the optimized sequenceAs a loss function corresponding to the sequence labeling task:
Wherein the method comprises the steps ofRepresenting the real label corresponding to the ith word at the jth position.
8. The tag sequence consistency modeling based event argument detection system of claim 7, wherein the error prone tag sequence generating module comprises:
conversion of standard correct tag sequences intoWherein->The correct label corresponding to the ith word;
when locally predicting label L pred Hard tag sequence L by greedy decoding greedy When the sequence is inconsistent with the correct label sequence, the sequence is taken as a generated false label negative sample, and the negative sample set L neg =L greedy The method comprises the steps of carrying out a first treatment on the surface of the According to the local prediction soft label P obtained by the sequence labeling module, the greedy decoding process comprises the following steps:
when the predicted result is consistent with the correct result, a specific negative sample generation flow is needed, and the kth argument in the current event is namedSelecting word with wrong sequence marking in the inner word of the argument index>s k Representing the position of the word in the text and then replacing its corresponding correct tag with an error tag, thereby constituting a negative sample of the error tag sequence +.>Negative sample set +.> Specific negative sampling over-samplingThe process is as follows:
wherein onehot is an operation of converting an integer index into a one-hot encoding vector, mid is a median fetching operation;
The contrast learning regularization module includes:
for each tag sequence L ε { L gold ,L neg ,L pred Representation learning is performed, where l= { L 0 ,l 1 ,…,l n -a }; for BIO labels, a label parameter matrix W to be trained is set L Wherein each column corresponds to a feature vector of each tag in the BIO according to the matrix W L The representation q= { Q for each position tag in each tag sequence can be obtained 0 ,q 1 ,…,q n }:
q i =l i ·W L
After obtaining the tag expression Q in the sequence, fusing the tag and the information of the word vector by using a linear layer to obtain the word sense expression U= { U of fused tag information 0 ,u 1 ,…,u n }:
u i =W 5 ·(W 4 ·[h i ||q i ]+b 4 )+b 5
Wherein W is 4 ,W 5 ,b 4 ,b 5 Is a linear transformation matrix and a corresponding bias term thereof, and I is vector splicing operation.
Sequence representation learning is carried out on U by using a transducer to obtain a representation vector Z= { Z corresponding to each word and label 0 ,z 1 ,…,z n Using the average value of the vectors of each position as the final vector representation O E { O } pred ,O gold ,O neg }:
Z=Transformer(U)
Using ternary interval loss function as loss function for build contrast task
Wherein margin is a super parameter, meaning that the difference between the distance from the predicted tag sequence position to the error sequence position and the distance from the error sequence position to the correct sequence position in the expression space should not be less than margin; the loss function joint training of both the sequence labeling task and the contrast learning regularization task is used, wherein alpha and beta are super parameters:
The event argument detection module comprises:
labeling by using a greedy decoding method, and obtaining L greedy As final tag sequence, and using word sequence corresponding to several I tags which are started by B and are next to each other in tag sequence as each argument obtained by decoding
9. A storage medium storing a program for executing the event argument detection method based on tag sequence consistency modeling according to any one of claims 1 to 4.
10. A client for use in the event argument detection system of any one of claims 5 to 8 modeled based on tag sequence consistency.
CN202310388963.2A 2023-04-12 2023-04-12 Event argument detection method and system based on label sequence consistency modeling Pending CN116595407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310388963.2A CN116595407A (en) 2023-04-12 2023-04-12 Event argument detection method and system based on label sequence consistency modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310388963.2A CN116595407A (en) 2023-04-12 2023-04-12 Event argument detection method and system based on label sequence consistency modeling

Publications (1)

Publication Number Publication Date
CN116595407A true CN116595407A (en) 2023-08-15

Family

ID=87598051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310388963.2A Pending CN116595407A (en) 2023-04-12 2023-04-12 Event argument detection method and system based on label sequence consistency modeling

Country Status (1)

Country Link
CN (1) CN116595407A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013201A (en) * 2024-03-07 2024-05-10 暨南大学 Flow anomaly detection method and system based on improved BERT fusion contrast learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013201A (en) * 2024-03-07 2024-05-10 暨南大学 Flow anomaly detection method and system based on improved BERT fusion contrast learning

Similar Documents

Publication Publication Date Title
CN110111399B (en) Image text generation method based on visual attention
CN112084314B (en) Knowledge-introducing generating type session system
CN112115687B (en) Method for generating problem by combining triplet and entity type in knowledge base
CN111291534A (en) Global coding method for automatic summarization of Chinese long text
CN111767718B (en) Chinese grammar error correction method based on weakened grammar error feature representation
CN111597778A (en) Method and system for automatically optimizing machine translation based on self-supervision
CN115239944A (en) Image title automatic generation method based on causal reasoning
CN112446221B (en) Translation evaluation method, device, system and computer storage medium
CN110807069B (en) Entity relationship joint extraction model construction method based on reinforcement learning algorithm
CN114168719A (en) Interpretable multi-hop question-answering method and system based on knowledge graph embedding
CN115293139B (en) Training method of speech transcription text error correction model and computer equipment
CN113423004B (en) Video subtitle generating method and system based on decoupling decoding
CN112347796A (en) Mongolian Chinese neural machine translation method based on combination of distillation BERT and improved Transformer
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN116595407A (en) Event argument detection method and system based on label sequence consistency modeling
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN116341651A (en) Entity recognition model training method and device, electronic equipment and storage medium
CN115687638A (en) Entity relation combined extraction method and system based on triple forest
CN115719072A (en) Chapter-level neural machine translation method and system based on mask mechanism
CN114595700A (en) Zero-pronoun and chapter information fused Hanyue neural machine translation method
CN117251562A (en) Text abstract generation method based on fact consistency enhancement
CN112989845B (en) Chapter-level neural machine translation method and system based on routing algorithm
CN113010635B (en) Text error correction method and device
CN114330352A (en) Named entity identification method and system
CN115238048A (en) Quick interaction method for joint chart identification and slot filling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination