CN115563253A - Multi-task event extraction method and device based on question answering - Google Patents

Multi-task event extraction method and device based on question answering Download PDF

Info

Publication number
CN115563253A
CN115563253A CN202211079899.1A CN202211079899A CN115563253A CN 115563253 A CN115563253 A CN 115563253A CN 202211079899 A CN202211079899 A CN 202211079899A CN 115563253 A CN115563253 A CN 115563253A
Authority
CN
China
Prior art keywords
event
vector
word
input
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211079899.1A
Other languages
Chinese (zh)
Inventor
苏锦钿
李泽苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202211079899.1A priority Critical patent/CN115563253A/en
Publication of CN115563253A publication Critical patent/CN115563253A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a multi-task event extraction method and device based on question answering, wherein the method comprises the following steps: acquiring a first input vector; inputting the first input vector into a trigger word extraction model to obtain position information of a trigger word in an original text; acquiring a second input vector; inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring an event type to which the event sample belongs; generating a problem aiming at the role type of the specified argument according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector; and inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles. By introducing the auxiliary screening task, the negative influence of error propagation on the model performance is reduced. The invention can be widely applied to the technical field of natural language processing.

Description

Multi-task event extraction method and device based on question answering
Technical Field
The invention relates to the technical field of natural language processing, in particular to a multi-task event extraction method and device based on question answering.
Background
The event extraction is a classic information extraction task in the NLP field and is also the basis of practical application such as information retrieval, intelligent question answering and knowledge map construction. In the information explosion era, people cannot search a large amount of non-structural data at a fast enough speed and with stable quality, and all information cannot be processed by manpower. In the field of knowledge graphs, an event defined in an Automatic Content Extraction (ACE) profile conference is an occurrence or state change consisting of one or more actions in which one or more roles participate, occurring at a particular point in time or time, within a particular geographic area. The event extraction task researches how to extract event information interesting to the user from text describing the event information and presents the event information in a structured form.
The event extraction technology is used for extracting events which are interested by a user from unstructured information and presenting the events to the user in a structured mode. The event extraction task can be decomposed into 4 subtasks: and triggering word extraction, event identification, argument extraction and role classification tasks. And the argument extraction and the role classification can be combined into an argument role extraction task. The event recognition is a word-based multi-classification task for judging the type of the event to which each word in the sentence belongs. The argument role extraction task is a multi-classification task based on word pairs, and judges the role relationship between any pair of trigger words and entities in the sentence.
The existing event extraction model has the following problems: 1) The existing work based on a pipeline model often has the problem of error propagation, and the output result of the former model has certain negative influence on the performance of the latter model; 2) In the event extraction task on ACE2005 dataset, there is little work to study nested entities; 3) A lot of existing work directly processes trigger words into length 1, but some trigger words with length different from 1 exist in the ACE2005 data set.
Disclosure of Invention
In order to solve at least one of the technical problems in the prior art to a certain extent, the present invention provides a method and an apparatus for extracting multitask events based on question answering.
The technical scheme adopted by the invention is as follows:
a multi-task event extraction method based on question answering comprises the following steps:
generating a problem according to a problem template aiming at the extraction of the trigger word, and acquiring a first input vector;
inputting the first input vector into a trigger word extraction model to obtain position information of a trigger word in an original text;
generating a problem according to a problem template capable of introducing trigger word information, and acquiring a second input vector;
inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring an event type to which the event sample belongs;
generating a problem aiming at the role type of the specified argument according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector;
and inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles.
Further, the generating a question according to the question template extracted for the trigger word and acquiring a first input vector includes:
the original text is divided by spaces and is represented as
Figure BDA0003833241820000021
Wherein m is the number of words of the original text;
generating a question according to a question template aiming at the extraction of the trigger words to obtain an input text expressed as
Figure BDA0003833241820000022
Figure BDA0003833241820000023
Performing word segmentation on an input text by a word segmentation method provided by a pre-training model BERT, converting the input text after word segmentation into a real-valued vector representation to obtain a word vector
Figure BDA0003833241820000024
From the input text, the corresponding block code, denoted B, is obtained a ={0,0,0,1 1 ,1 2 ,…,1 m 1}; wherein the subscript of 1 corresponds to the input text T a Subscripts of (1);
converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure BDA0003833241820000025
Obtaining absolute position information of each word in the input text according to a position coding mode provided by a pre-training model BERT to obtain a position vector
Figure BDA0003833241820000026
Word vector
Figure BDA0003833241820000027
Block vector
Figure BDA0003833241820000028
And a position vector
Figure BDA0003833241820000029
Adding the three vectors to obtain a first input vector v of the trigger word extraction model a
Further, the inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text includes:
inputting the first input vector v a Inputting the input into a pre-training model BERT to obtain the output of the last layer of the BERT network, then connecting with a full connection layer, and then obtaining the type probability corresponding to each word by using a Softmax function;
extracting trigger words according to the obtained type probability, wherein each predicted trigger word is represented as
Figure BDA00038332418200000210
Wherein k represents the number of words of the trigger word;
wherein, extracting the trigger word according to the obtained type probability comprises:
fixing four types, namely B, I, O and PAD, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;
when in prediction, the PAD type is regarded as an O type, each position of the original text is decoded by adopting a BIO ternary labeling mode, and the position information of the trigger word in the original text is obtained and expressed as
Figure BDA0003833241820000031
Wherein
Figure BDA0003833241820000032
Indicating that the predicted trigger word is at the beginning of the original text,
Figure BDA0003833241820000033
indicating that the predicted trigger word is at the end of the original text.
And further, performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.
Further, the generating a question according to a question template capable of introducing trigger word information and obtaining a second input vector includes:
generating a question according to a question template capable of introducing trigger word information to obtain an input text T b
Input text T by word segmentation method provided by pre-training model BERT b Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector
Figure BDA0003833241820000034
According to the input text T b Obtaining a corresponding block code;
converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure BDA0003833241820000035
Obtaining input text T according to position coding mode provided by pre-training model BERT b The absolute position information of each word in the table is obtained to obtain a position vector
Figure BDA0003833241820000036
Word vector
Figure BDA0003833241820000037
Block vector
Figure BDA0003833241820000038
And a position vector
Figure BDA0003833241820000039
Adding the three vectors to obtain a second input vector v of the event recognition model b
Wherein a text T is input b Is shown as
Figure BDA00038332418200000310
Figure BDA00038332418200000311
Further, the inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and obtaining an event type to which the event sample belongs includes:
inputting the second vector v b Inputting the input into a pre-training model BERT to obtain the output of the CLS position of the BERT network, then classifying a full-link layer, obtaining the probability of the event type of the sentence by using a Softmax function, obtaining the event sample with the correct trigger word and the event type thereof, and expressing the predicted event type name as
Figure BDA00038332418200000312
Where n represents the number of words of the event type name.
And further, performing cross entropy loss calculation on the predicted event type probability and the original type label to obtain the loss of the event recognition model so as to train and optimize the event recognition model.
Further, the fixed event types are 34, the first 33 event types correspond to 33 event types of the ACE2005 data set, and the last event type is a None type, which means that the event sample is a sample with a wrong trigger and needs to be discarded.
Further, the generating a problem for a designated argument role type according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector includes:
according to the obtained event type E p Acquiring all argument role types under the event, wherein each argument role name is expressed as
Figure BDA0003833241820000041
Generating a question aiming at the designated argument role according to the question template to obtain an input text T c
Input text T by word segmentation method provided by pre-training model BERT c Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector
Figure BDA0003833241820000042
According to the input text T c Obtaining a corresponding block code;
converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure BDA0003833241820000043
Obtaining input text T according to position coding mode provided by pre-training model BERT c The absolute position information of each word in the table is obtained to obtain a position vector
Figure BDA0003833241820000044
Vector word
Figure BDA0003833241820000045
Block vector
Figure BDA0003833241820000046
And a position vector
Figure BDA0003833241820000047
Adding the three vectors to obtain a third input vector v of the event recognition model c
Wherein a text T is input c Is shown as
Figure BDA0003833241820000048
Figure BDA0003833241820000049
Further, the inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the specified argument roles includes:
a third input vector v c Inputting the input data into a pre-training model BERT to obtain output O of the CLS position of the BERT network 1 And the output O of the last layer 2 ,O 1 Followed by a screening of the full junction layer, O 2 Then, a classified full-connection layer is connected, and the error probability of the event sample and the type probability corresponding to each word are obtained by utilizing a Softmax function;
and acquiring a correct event sample and an argument of the specified argument role in the event according to the error probability of the event sample and the type probability corresponding to each word.
Further, the obtaining a correct event sample and an argument specifying an argument role in the event according to the error probability of the event sample and the type probability corresponding to each word includes:
fixing five types, namely B, I, O, bandI and PAD five types respectively, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;
during prediction, the PAD type is regarded as the O type, and in the BIO ternary standardOn the basis of the comment mode, a BandI type is added to solve the problem of nested entities, and the position information of the argument is obtained by decoding each position of the original text and is expressed as
Figure BDA00038332418200000410
Figure BDA00038332418200000411
Wherein
Figure BDA00038332418200000412
Indicating that the argument is predicted at the beginning of the original text,
Figure BDA00038332418200000413
indicating that the argument is predicted to be at the end of the original text.
Further, optimizing the training model by obtaining the loss includes:
performing cross entropy loss calculation on the predicted sample error probability and the original label to obtain a first loss;
performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain a second loss;
and adding the first loss and the second loss, and jointly training an optimized argument role extraction model according to the loss obtained by adding.
Further, on the basis of a BIO ternary labeling mode, a BandI type is added. For the BandI type, the word representing the current position representation is used as both a separate word of length one and as part of a previously matched word.
Further, the trigger word extraction model, the event recognition model and the argument role extraction model are optimized by adopting an Adam algorithm, and an early-stopping training method is used for preventing overfitting.
The other technical scheme adopted by the invention is as follows:
a question-answer based multitask event extraction device, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The invention has the beneficial effects that: by introducing the auxiliary screening task, the negative influence of error propagation on the performance of the model is reduced; for the selection of the problem template, special schemes for different models are determined, and good effects can be achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for extracting multi-tasking events based on question answering according to an embodiment of the present invention;
fig. 2 is a diagram of a multi-task event extraction model structure based on question answering in the embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
As shown in fig. 1, the present embodiment provides a method for extracting multitask events based on question answering, which includes the following steps:
s1, generating a problem according to a problem template extracted aiming at the trigger word, and acquiring a first input vector.
As shown in FIG. 2, for the input part of the trigger word extraction model, the invention splices the word action and the original text, introduces the problem intention, takes the SEP as the segmentation word of the action and the original text, and marks the beginning and the end by the CLS and the SEP, which are expressed as
Figure BDA0003833241820000061
Taking the event sample D as an example, the original text is "He visited all his friends", the trigger word is "visited", the event type is Meet type, and the corresponding argument roles have induvidual and Group, which correspond to "He" and "all his friends" of the original text, respectively. Obtaining T according to the original text of the event sample D a ={CLS,action,SEP,He,visited,all,his,friendsSEP }. In this step, word action and original text need to be participled by WordPiece method provided by BERT, then spliced with CLS and SEP to obtain input sequence after word segmentation, and converted into corresponding one-hot vector, which is recorded as corresponding one-hot vector
Figure BDA0003833241820000062
Where N represents the length of the input sequence and | V | represents the size of the vocabulary. The corresponding word vector of the input sequence is thus represented as
Figure BDA0003833241820000063
Wherein W t ∈R |V|×e The word vector matrix provided for BERT, e denotes the dimension of the word vector. Likewise, according to T a The corresponding block sequence is obtained as: b is a ={0,0,0,1 1 ,1 2 ,…,1 m ,1}. Where the subscript of 1 corresponds to the subscript in the input text, m corresponds to the length of the input sequence, and m is 5 in the event sample D. Converting the block sequence into corresponding block code by vector matrix W s ∈R |s|×e Encoding a block
Figure BDA0003833241820000064
Converting into real value vector to obtain block vector
Figure BDA0003833241820000065
Where | S | represents the number of blocks; using a matrix of position vectors
Figure BDA0003833241820000066
Encoding positions one-hot
Figure BDA0003833241820000067
Converting into real value vector to obtain position vector
Figure BDA0003833241820000068
Word vector
Figure BDA0003833241820000069
Block vector
Figure BDA00038332418200000610
And a position vector
Figure BDA00038332418200000611
The dimensions of all three vectors are (BS, N, e). Wherein BS represents the size of the batch of training and corresponds to the BERT-Base model, and the dimension of the input vector obtained by the event sample D is (BS, 9,768). Adding the three vectors to obtain an input vector of the trigger word extraction model, and marking as v a
And S2, inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text.
As an alternative embodiment, the first input vector v is used a Inputting the words into a trigger word extraction model, then converting the vectors into new vectors with dimensions (BS, N, 4) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively of a B type, an I type, an O type and a PAD type. During prediction, the PAD type is processed into an O type, and trigger words are extracted in a BIO ternary labeling mode. For the event sample D, the probability that the 'found' position is the B type is predicted to be the maximum, the probability that other positions of the original text are all the O types is predicted to be the maximum, and therefore the position information of the trigger word is extracted
Figure BDA0003833241820000071
Wherein
Figure BDA0003833241820000072
Indicating that the predicted trigger word is at the beginning of the original text,
Figure BDA0003833241820000073
indicating that the predicted trigger word is at the end of the original text. In sample D is represented by
Figure BDA0003833241820000074
The trigger is therefore predicted as "visited". During the training process, PAD is used for constructing labels corresponding to the problem position, CLS and SEP position, and for event samplesThe constructed label sequence is label = { PAD, PAD, PAD, O, B, O, O, PAD }. And performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.
And S3, generating a problem according to a problem template capable of introducing trigger word information, and acquiring a second input vector.
As shown in fig. 2, for the input part of the event recognition model, the trigger word text information and the trigger word position information of the previous model are introduced. Taking the event sample D as an example, constructing a corresponding input text according to the question template, wherein the input text is represented as T b = CLS, the, trigger, word, situted, pos,1, pos, SEP, he, situted, all, his, friends, SEP }. In a manner similar to the step S2, the input text Tb is subjected to a series of conversions to respectively obtain word vectors
Figure BDA0003833241820000075
Block vector
Figure BDA0003833241820000076
And a position vector
Figure BDA0003833241820000077
Adding the three vectors to obtain an input vector of the event recognition model, and recording the input vector as a second input vector v b
And S4, inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring the event type of the event sample.
As an alternative embodiment, the second input vector v is used b Inputting the data into a trigger word extraction model, then converting the vector into a new vector with the dimensionality of (BS, N, 34) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively 33 event types and None types of an ACE2005 data set. In prediction, for the event sample D, the probability that the event type is the Meet type is predicted to be the largest. In the training process, the output result of the last model is combined for training, if a certain sample is in the last modelIf the result of extracting the trigger word is wrong, the label of the sample is set to None, and the labels of the other correct samples are set to the type of the 33 event types originally belonging to the ACE2005 data set. And performing cross entropy loss calculation on the predicted event type probability and the original type label to obtain the loss of the event recognition model so as to train and optimize the event recognition model.
And S5, generating a problem aiming at the designated argument role type according to a problem template capable of introducing trigger word information and event types, and acquiring a third input vector.
As shown in fig. 2, for the input part of the argument role extraction model, text information and position information of trigger words, and corresponding event types are introduced. There is a corresponding argument role group for each of the 33 event types that belong to the ACE2005 dataset. Taking event sample D as an example, the Meet type event corresponds to two argument roles, namely, an indivisual role and a Group role. Constructing two input texts corresponding to different argument roles according to the problem template, and recording the two input texts as
Figure BDA0003833241820000081
Figure BDA0003833241820000082
Figure BDA0003833241820000083
Figure BDA0003833241820000084
In a similar manner to step S2, text will be entered
Figure BDA0003833241820000085
And
Figure BDA0003833241820000086
a series of conversion is carried out to respectively obtain corresponding word vector, block vector and position vector, the three vectors are added to obtain two input vectors of the argument role extraction model, and the two input vectors are respectively marked as argument role extraction model
Figure BDA0003833241820000087
And
Figure BDA0003833241820000088
and S6, inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles.
As an alternative embodiment, will
Figure BDA0003833241820000089
And
Figure BDA00038332418200000810
respectively inputting the characters into argument role extraction models, then converting the vectors into new vectors with dimensions (BS, N, 5) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively of a B type, an I type, an O type, a BandI type and a PAD type. During prediction, the PAD type is processed into an O type, argument extraction is carried out in a new labeling mode, and the extraction of nested arguments is guaranteed to be supported. Specifically, for the BandI type, the word representing the current position is used as both a single word of one length and as part of a previously matched word. With respect to the event sample D of the event,
Figure BDA00038332418200000811
the predicted result is that the probability that the 'He' position is of the B type is the maximum, the probability that other positions are of the O type is the maximum,
Figure BDA00038332418200000812
the predicted result is that the probability of the three positions of the all his friends is respectively the maximum of the B type, the I type and the I type, and the probability of the other positions are all the maximum of the O type. Namely, the argument corresponding to the argument role indicial is "He", and the argument corresponding to the argument role Group is "all his friends". During the training process, PAD will be used to construct problem location, CLS and SEP locationsThe corresponding label, for the event sample D,
Figure BDA00038332418200000813
the constructed tag sequence is label 1 ={PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,B,O,O,O,O,PAD},
Figure BDA00038332418200000814
The constructed tag sequence is then: label 2 = PAD, PAD, PAD, PAD, PAD, PAD, PAD, PAD, PAD, O, O, B, I, I, PAD }. And performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.
In summary, compared with the prior art, the present embodiment has the following advantages and beneficial effects: the invention provides a method for reducing the negative influence of error propagation on the performance of the pipeline model by introducing a specific auxiliary screening task and utilizing the dependency relationship among the pipeline models. For the selection of the problem template, the invention determines the special scheme aiming at different models and obtains good effect. For the problem of nested entities, the invention provides a new encoding party to realize argument role extraction of a nested structure. Experiments prove that the effect is very good in the ACE2005 data set.
The present embodiment further provides a device for extracting multitask events based on question answering, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of fig. 1.
The multi-task event extraction device based on question answering can execute the multi-task event extraction method based on question answering provided by the method embodiment of the invention, can execute any combination of the method embodiments to implement steps, and has corresponding functions and beneficial effects of the method.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A multi-task event extraction method based on question answering is characterized by comprising the following steps:
generating a problem according to a problem template aiming at the extraction of the trigger word, and acquiring a first input vector;
inputting the first input vector into a trigger word extraction model to obtain position information of a trigger word in an original text;
generating a problem according to a problem template capable of introducing trigger word information, and acquiring a second input vector;
inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring an event type to which the event sample belongs;
generating a problem aiming at the role type of the specified argument according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector;
and inputting the third input vector into an argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining argument positions of the appointed argument roles.
2. The question-answer-based multitask event extraction method according to claim 1, wherein said generating questions according to the question template for triggering word extraction and obtaining first input vector includes:
the original text is divided by spaces and is represented as
Figure FDA0003833241810000011
Wherein m is the number of words of the original text;
generating a question according to a question template aiming at the extraction of the trigger words to obtain an input text expressed as
Figure FDA0003833241810000012
Figure FDA0003833241810000013
Performing word segmentation on an input text by a word segmentation method provided by a pre-training model BERT, converting the input text after word segmentation into a real-valued vector representation to obtain a word vector
Figure FDA0003833241810000014
From the input text, the corresponding block code, denoted B, is obtained a ={0,0,0,1 1 ,1 2 ,…,1 m 1}; wherein the subscript of 1 corresponds to the input text T a Subscripts of (1);
converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure FDA0003833241810000015
Obtaining absolute position information of each word in the input text according to a position coding mode provided by a pre-training model BERT to obtain a position vector
Figure FDA0003833241810000016
Word vector
Figure FDA0003833241810000017
Block vector
Figure FDA0003833241810000018
And a position vector
Figure FDA0003833241810000019
Adding the three vectors to obtain a first input vector v of the trigger word extraction model a
3. The question-answer-based multitask event extraction method according to claim 2, wherein the step of inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text comprises the following steps:
inputting the first input vector v a Inputting the input into a pre-training model BERT to obtain the output of the last layer of the BERT network, then connecting with a full connection layer, and then obtaining the type probability corresponding to each word by using a Softmax function;
extracting trigger words according to the obtained type probability, wherein each predicted trigger word is represented as
Figure FDA00038332418100000110
Wherein k represents the number of words of the trigger word;
wherein, extracting the trigger word according to the obtained type probability comprises:
fixing four types, namely B, I, O and PAD, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;
when in prediction, the PAD type is regarded as an O type, each position of the original text is decoded by adopting a BIO ternary labeling mode, and the position information of the trigger word in the original text is obtained and expressed as
Figure FDA0003833241810000021
Wherein
Figure FDA0003833241810000022
Indicating that the predicted trigger word is at the beginning of the original text,
Figure FDA0003833241810000023
indicating that the predicted trigger word is at the end of the original text.
4. The question-answer-based multitask event extraction method as claimed in claim 1, wherein said generating a question according to a question template capable of introducing trigger word information and obtaining a second input vector comprises:
generating questions from a question template capable of introducing trigger word informationTo obtain an input text T b
Input text T by word segmentation method provided by pre-training model BERT b Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector
Figure FDA0003833241810000024
According to the input text T b Obtaining a corresponding block code;
converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure FDA0003833241810000025
Obtaining input text T according to position coding mode provided by pre-training model BERT b The absolute position information of each word in the table is obtained to obtain a position vector
Figure FDA0003833241810000026
Word vector
Figure FDA0003833241810000027
Block vector
Figure FDA0003833241810000028
And a position vector
Figure FDA0003833241810000029
Adding the three vectors to obtain a second input vector v of the event recognition model b (ii) a Wherein a text T is input b Is shown as
Figure FDA00038332418100000210
Figure FDA00038332418100000211
5. The method for extracting multi-task events based on question answering according to claim 4, wherein the step of inputting the second input vector into the event recognition model, screening out the event sample with the correct trigger word, and obtaining the event type to which the event sample belongs comprises:
inputting the second vector v b Inputting the input into a pre-training model BERT to obtain the output of the CLS position of the BERT network, then classifying a full-link layer, obtaining the probability of the event type of the sentence by using a Softmax function, obtaining the event sample with the correct trigger word and the event type thereof, and expressing the predicted event type name as
Figure FDA00038332418100000212
Where n represents the number of words of the event type name.
6. The method for extracting multi-task events based on question answering according to claim 1, wherein the generating a question for a specified argument role type according to a question template capable of introducing trigger word information and event types and obtaining a third input vector comprises:
according to the obtained event type E p Acquiring all argument role types under the event, wherein each argument role name is expressed as
Figure FDA0003833241810000031
Generating a question aiming at the designated argument role according to the question template to obtain an input text T c
Input text T by word segmentation method provided by pre-training model BERT c Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector
Figure FDA0003833241810000032
According to the input text T c Obtaining corresponding block codes;
Converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors
Figure FDA0003833241810000033
Obtaining input text T according to position coding mode provided by pre-training model BERT c Absolute position information of each word in the table, to obtain a position vector
Figure FDA0003833241810000034
Word vector
Figure FDA0003833241810000035
Block vector
Figure FDA0003833241810000036
And a position vector
Figure FDA0003833241810000037
Adding the three vectors to obtain a third input vector v of the event recognition model c (ii) a Wherein a text T is input c Is shown as
Figure FDA0003833241810000038
Figure FDA0003833241810000039
7. The multi-task event extraction method based on question answering according to claim 6, wherein the step of inputting the third input vector into an argument role extraction model, screening out event samples with correct trigger word extraction and event recognition results, and obtaining argument positions of the specified argument roles comprises the steps of:
inputting the third input vector v c Inputting the input into a pre-training model BERT to obtain the output O of the CLS position of the BERT network 1 And the last layerOutput of (2) O 2 ,O 1 Followed by a screening of the full junction layer, O 2 Then, a classified full-connection layer is connected, and the error probability of the event sample and the type probability corresponding to each word are obtained by utilizing a Softmax function;
and acquiring a correct event sample and an argument of the specified argument role in the event according to the error probability of the event sample and the type probability corresponding to each word.
8. The method for extracting multi-task events based on question answering according to claim 7, wherein the obtaining of the correct event sample and the argument specifying the role of the argument in the event according to the error probability of the event sample and the type probability corresponding to each word comprises:
fixing five types, namely B, I, O, bandI and PAD five types respectively, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;
when in prediction, the PAD type is regarded as an O type, a BandI type is added on the basis of a BIO ternary labeling mode to solve the problem of nested entities, and each position of an original text is decoded to obtain position information of a argument, wherein the position information is expressed as
Figure FDA00038332418100000310
Figure FDA00038332418100000311
Wherein
Figure FDA00038332418100000312
Indicating that the argument is predicted at the beginning of the original text,
Figure FDA00038332418100000313
indicating that the argument is predicted to be at the end of the original text.
9. The question-answer-based multitask event extraction method according to claim 1, characterized in that said trigger word extraction model, event recognition model and argument role extraction model are all optimized by Adam algorithm, and an early-stop training method is used to prevent overfitting.
10. A question-answer based multitask event extraction device, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-9.
CN202211079899.1A 2022-09-05 2022-09-05 Multi-task event extraction method and device based on question answering Pending CN115563253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211079899.1A CN115563253A (en) 2022-09-05 2022-09-05 Multi-task event extraction method and device based on question answering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211079899.1A CN115563253A (en) 2022-09-05 2022-09-05 Multi-task event extraction method and device based on question answering

Publications (1)

Publication Number Publication Date
CN115563253A true CN115563253A (en) 2023-01-03

Family

ID=84740077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211079899.1A Pending CN115563253A (en) 2022-09-05 2022-09-05 Multi-task event extraction method and device based on question answering

Country Status (1)

Country Link
CN (1) CN115563253A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627915A (en) * 2023-07-25 2023-08-22 河海大学 Dam emergency working condition event detection method and system based on slot semantic interaction

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627915A (en) * 2023-07-25 2023-08-22 河海大学 Dam emergency working condition event detection method and system based on slot semantic interaction
CN116627915B (en) * 2023-07-25 2023-09-19 河海大学 Dam emergency working condition event detection method and system based on slot semantic interaction

Similar Documents

Publication Publication Date Title
CN110188202B (en) Training method and device of semantic relation recognition model and terminal
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN111914085A (en) Text fine-grained emotion classification method, system, device and storage medium
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN114492460B (en) Event causal relationship extraction method based on derivative prompt learning
CN112163596A (en) Complex scene text recognition method and system, computer equipment and storage medium
CN116150367A (en) Emotion analysis method and system based on aspects
CN114611520A (en) Text abstract generating method
CN115563253A (en) Multi-task event extraction method and device based on question answering
CN114780723A (en) Portrait generation method, system and medium based on guide network text classification
CN113051904B (en) Link prediction method for small-scale knowledge graph
CN112395858B (en) Multi-knowledge point labeling method and system integrating test question data and answer data
CN116187304A (en) Automatic text error correction algorithm and system based on improved BERT
CN116595979A (en) Named entity recognition method, device and medium based on label prompt
CN115221284A (en) Text similarity calculation method and device, electronic equipment and storage medium
CN115879669A (en) Comment score prediction method and device, electronic equipment and storage medium
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN115587184A (en) Method and device for training key information extraction model and storage medium thereof
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN114896396A (en) Text classification and model training method, system, equipment and storage medium
CN114626529A (en) Natural language reasoning fine-tuning method, system, device and storage medium
CN114329005A (en) Information processing method, information processing device, computer equipment and storage medium
CN114049528B (en) Brand name identification method and equipment
CN116227496B (en) Deep learning-based electric public opinion entity relation extraction method and system
CN116308635B (en) Plasticizing industry quotation structuring method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination