CN115563253A

CN115563253A - Multi-task event extraction method and device based on question answering

Info

Publication number: CN115563253A
Application number: CN202211079899.1A
Authority: CN
Inventors: 苏锦钿; 李泽苗
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2023-01-03

Abstract

The invention discloses a multi-task event extraction method and device based on question answering, wherein the method comprises the following steps: acquiring a first input vector; inputting the first input vector into a trigger word extraction model to obtain position information of a trigger word in an original text; acquiring a second input vector; inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring an event type to which the event sample belongs; generating a problem aiming at the role type of the specified argument according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector; and inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles. By introducing the auxiliary screening task, the negative influence of error propagation on the model performance is reduced. The invention can be widely applied to the technical field of natural language processing.

Description

Multi-task event extraction method and device based on question answering

Technical Field

The invention relates to the technical field of natural language processing, in particular to a multi-task event extraction method and device based on question answering.

Background

The event extraction is a classic information extraction task in the NLP field and is also the basis of practical application such as information retrieval, intelligent question answering and knowledge map construction. In the information explosion era, people cannot search a large amount of non-structural data at a fast enough speed and with stable quality, and all information cannot be processed by manpower. In the field of knowledge graphs, an event defined in an Automatic Content Extraction (ACE) profile conference is an occurrence or state change consisting of one or more actions in which one or more roles participate, occurring at a particular point in time or time, within a particular geographic area. The event extraction task researches how to extract event information interesting to the user from text describing the event information and presents the event information in a structured form.

The event extraction technology is used for extracting events which are interested by a user from unstructured information and presenting the events to the user in a structured mode. The event extraction task can be decomposed into 4 subtasks: and triggering word extraction, event identification, argument extraction and role classification tasks. And the argument extraction and the role classification can be combined into an argument role extraction task. The event recognition is a word-based multi-classification task for judging the type of the event to which each word in the sentence belongs. The argument role extraction task is a multi-classification task based on word pairs, and judges the role relationship between any pair of trigger words and entities in the sentence.

The existing event extraction model has the following problems: 1) The existing work based on a pipeline model often has the problem of error propagation, and the output result of the former model has certain negative influence on the performance of the latter model; 2) In the event extraction task on ACE2005 dataset, there is little work to study nested entities; 3) A lot of existing work directly processes trigger words into length 1, but some trigger words with length different from 1 exist in the ACE2005 data set.

Disclosure of Invention

In order to solve at least one of the technical problems in the prior art to a certain extent, the present invention provides a method and an apparatus for extracting multitask events based on question answering.

The technical scheme adopted by the invention is as follows:

a multi-task event extraction method based on question answering comprises the following steps:

generating a problem according to a problem template aiming at the extraction of the trigger word, and acquiring a first input vector;

inputting the first input vector into a trigger word extraction model to obtain position information of a trigger word in an original text;

generating a problem according to a problem template capable of introducing trigger word information, and acquiring a second input vector;

inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring an event type to which the event sample belongs;

generating a problem aiming at the role type of the specified argument according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector;

and inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles.

Further, the generating a question according to the question template extracted for the trigger word and acquiring a first input vector includes:

the original text is divided by spaces and is represented as

Wherein m is the number of words of the original text;

generating a question according to a question template aiming at the extraction of the trigger words to obtain an input text expressed as

Performing word segmentation on an input text by a word segmentation method provided by a pre-training model BERT, converting the input text after word segmentation into a real-valued vector representation to obtain a word vector

From the input text, the corresponding block code, denoted B, is obtained ^a ＝{0,0,0,1 ₁ ，1 ₂ ，…,1 _m 1}; wherein the subscript of 1 corresponds to the input text T ^a Subscripts of (1);

converting block codes into real-valued vectors through a block vector matrix provided by a pre-training model BERT to obtain block vectors

Obtaining absolute position information of each word in the input text according to a position coding mode provided by a pre-training model BERT to obtain a position vector

Word vector

Block vector

And a position vector

Adding the three vectors to obtain a first input vector v of the trigger word extraction model _a 。

Further, the inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text includes:

inputting the first input vector v _a Inputting the input into a pre-training model BERT to obtain the output of the last layer of the BERT network, then connecting with a full connection layer, and then obtaining the type probability corresponding to each word by using a Softmax function;

extracting trigger words according to the obtained type probability, wherein each predicted trigger word is represented as

Wherein k represents the number of words of the trigger word;

wherein, extracting the trigger word according to the obtained type probability comprises:

fixing four types, namely B, I, O and PAD, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;

when in prediction, the PAD type is regarded as an O type, each position of the original text is decoded by adopting a BIO ternary labeling mode, and the position information of the trigger word in the original text is obtained and expressed as

Wherein

Indicating that the predicted trigger word is at the beginning of the original text,

indicating that the predicted trigger word is at the end of the original text.

And further, performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.

Further, the generating a question according to a question template capable of introducing trigger word information and obtaining a second input vector includes:

generating a question according to a question template capable of introducing trigger word information to obtain an input text T ^b ；

Input text T by word segmentation method provided by pre-training model BERT ^b Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector

According to the input text T ^b Obtaining a corresponding block code;

Obtaining input text T according to position coding mode provided by pre-training model BERT ^b The absolute position information of each word in the table is obtained to obtain a position vector

Word vector

Block vector

And a position vector

Adding the three vectors to obtain a second input vector v of the event recognition model _b ；

Wherein a text T is input ^b Is shown as

Further, the inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and obtaining an event type to which the event sample belongs includes:

inputting the second vector v _b Inputting the input into a pre-training model BERT to obtain the output of the CLS position of the BERT network, then classifying a full-link layer, obtaining the probability of the event type of the sentence by using a Softmax function, obtaining the event sample with the correct trigger word and the event type thereof, and expressing the predicted event type name as

Where n represents the number of words of the event type name.

And further, performing cross entropy loss calculation on the predicted event type probability and the original type label to obtain the loss of the event recognition model so as to train and optimize the event recognition model.

Further, the fixed event types are 34, the first 33 event types correspond to 33 event types of the ACE2005 data set, and the last event type is a None type, which means that the event sample is a sample with a wrong trigger and needs to be discarded.

Further, the generating a problem for a designated argument role type according to a problem template capable of introducing trigger word information and an event type, and acquiring a third input vector includes:

according to the obtained event type E ^p Acquiring all argument role types under the event, wherein each argument role name is expressed as

Generating a question aiming at the designated argument role according to the question template to obtain an input text T ^c ；

Input text T by word segmentation method provided by pre-training model BERT ^c Performing word segmentation, converting the input text after word segmentation into real-valued vector representation to obtain word vector

According to the input text T ^c Obtaining a corresponding block code;

Obtaining input text T according to position coding mode provided by pre-training model BERT ^c The absolute position information of each word in the table is obtained to obtain a position vector

Vector word

Block vector

And a position vector

Adding the three vectors to obtain a third input vector v of the event recognition model _c ；

Wherein a text T is input ^c Is shown as

Further, the inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the specified argument roles includes:

a third input vector v _c Inputting the input data into a pre-training model BERT to obtain output O of the CLS position of the BERT network ¹ And the output O of the last layer ² ，O ¹ Followed by a screening of the full junction layer, O ² Then, a classified full-connection layer is connected, and the error probability of the event sample and the type probability corresponding to each word are obtained by utilizing a Softmax function;

and acquiring a correct event sample and an argument of the specified argument role in the event according to the error probability of the event sample and the type probability corresponding to each word.

Further, the obtaining a correct event sample and an argument specifying an argument role in the event according to the error probability of the event sample and the type probability corresponding to each word includes:

fixing five types, namely B, I, O, bandI and PAD five types respectively, wherein the PAD type is used for constructing labels corresponding to the problem position, the CLS position and the SEP position in the training process;

during prediction, the PAD type is regarded as the O type, and in the BIO ternary standardOn the basis of the comment mode, a BandI type is added to solve the problem of nested entities, and the position information of the argument is obtained by decoding each position of the original text and is expressed as

Wherein

Indicating that the argument is predicted at the beginning of the original text,

indicating that the argument is predicted to be at the end of the original text.

Further, optimizing the training model by obtaining the loss includes:

performing cross entropy loss calculation on the predicted sample error probability and the original label to obtain a first loss;

performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain a second loss;

and adding the first loss and the second loss, and jointly training an optimized argument role extraction model according to the loss obtained by adding.

Further, on the basis of a BIO ternary labeling mode, a BandI type is added. For the BandI type, the word representing the current position representation is used as both a separate word of length one and as part of a previously matched word.

Further, the trigger word extraction model, the event recognition model and the argument role extraction model are optimized by adopting an Adam algorithm, and an early-stopping training method is used for preventing overfitting.

The other technical scheme adopted by the invention is as follows:

a question-answer based multitask event extraction device, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The invention has the beneficial effects that: by introducing the auxiliary screening task, the negative influence of error propagation on the performance of the model is reduced; for the selection of the problem template, special schemes for different models are determined, and good effects can be achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating steps of a method for extracting multi-tasking events based on question answering according to an embodiment of the present invention;

fig. 2 is a diagram of a multi-task event extraction model structure based on question answering in the embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1, the present embodiment provides a method for extracting multitask events based on question answering, which includes the following steps:

s1, generating a problem according to a problem template extracted aiming at the trigger word, and acquiring a first input vector.

As shown in FIG. 2, for the input part of the trigger word extraction model, the invention splices the word action and the original text, introduces the problem intention, takes the SEP as the segmentation word of the action and the original text, and marks the beginning and the end by the CLS and the SEP, which are expressed as

Taking the event sample D as an example, the original text is "He visited all his friends", the trigger word is "visited", the event type is Meet type, and the corresponding argument roles have induvidual and Group, which correspond to "He" and "all his friends" of the original text, respectively. Obtaining T according to the original text of the event sample D ^a ＝{CLS,action,SEP,He,visited,all,his,friendsSEP }. In this step, word action and original text need to be participled by WordPiece method provided by BERT, then spliced with CLS and SEP to obtain input sequence after word segmentation, and converted into corresponding one-hot vector, which is recorded as corresponding one-hot vector

Where N represents the length of the input sequence and | V | represents the size of the vocabulary. The corresponding word vector of the input sequence is thus represented as

Wherein W ^t ∈R ^|V|×e The word vector matrix provided for BERT, e denotes the dimension of the word vector. Likewise, according to T ^a The corresponding block sequence is obtained as: b is ^a ＝{0,0,0,1 ₁ ,1 ₂ ,…,1 _m ,1}. Where the subscript of 1 corresponds to the subscript in the input text, m corresponds to the length of the input sequence, and m is 5 in the event sample D. Converting the block sequence into corresponding block code by vector matrix W ^s ∈R ^|s|×e Encoding a block

Converting into real value vector to obtain block vector

Where | S | represents the number of blocks; using a matrix of position vectors

Encoding positions one-hot

Converting into real value vector to obtain position vector

Word vector

Block vector

And a position vector

The dimensions of all three vectors are (BS, N, e). Wherein BS represents the size of the batch of training and corresponds to the BERT-Base model, and the dimension of the input vector obtained by the event sample D is (BS, 9,768). Adding the three vectors to obtain an input vector of the trigger word extraction model, and marking as v _a 。

And S2, inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text.

As an alternative embodiment, the first input vector v is used _a Inputting the words into a trigger word extraction model, then converting the vectors into new vectors with dimensions (BS, N, 4) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively of a B type, an I type, an O type and a PAD type. During prediction, the PAD type is processed into an O type, and trigger words are extracted in a BIO ternary labeling mode. For the event sample D, the probability that the 'found' position is the B type is predicted to be the maximum, the probability that other positions of the original text are all the O types is predicted to be the maximum, and therefore the position information of the trigger word is extracted

Wherein

indicating that the predicted trigger word is at the end of the original text. In sample D is represented by

The trigger is therefore predicted as "visited". During the training process, PAD is used for constructing labels corresponding to the problem position, CLS and SEP position, and for event samplesThe constructed label sequence is label = { PAD, PAD, PAD, O, B, O, O, PAD }. And performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.

And S3, generating a problem according to a problem template capable of introducing trigger word information, and acquiring a second input vector.

As shown in fig. 2, for the input part of the event recognition model, the trigger word text information and the trigger word position information of the previous model are introduced. Taking the event sample D as an example, constructing a corresponding input text according to the question template, wherein the input text is represented as T ^b = CLS, the, trigger, word, situted, pos,1, pos, SEP, he, situted, all, his, friends, SEP }. In a manner similar to the step S2, the input text Tb is subjected to a series of conversions to respectively obtain word vectors

Block vector

And a position vector

Adding the three vectors to obtain an input vector of the event recognition model, and recording the input vector as a second input vector v _b 。

And S4, inputting the second input vector into the event recognition model, screening out an event sample with a correct trigger word, and acquiring the event type of the event sample.

As an alternative embodiment, the second input vector v is used _b Inputting the data into a trigger word extraction model, then converting the vector into a new vector with the dimensionality of (BS, N, 34) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively 33 event types and None types of an ACE2005 data set. In prediction, for the event sample D, the probability that the event type is the Meet type is predicted to be the largest. In the training process, the output result of the last model is combined for training, if a certain sample is in the last modelIf the result of extracting the trigger word is wrong, the label of the sample is set to None, and the labels of the other correct samples are set to the type of the 33 event types originally belonging to the ACE2005 data set. And performing cross entropy loss calculation on the predicted event type probability and the original type label to obtain the loss of the event recognition model so as to train and optimize the event recognition model.

And S5, generating a problem aiming at the designated argument role type according to a problem template capable of introducing trigger word information and event types, and acquiring a third input vector.

As shown in fig. 2, for the input part of the argument role extraction model, text information and position information of trigger words, and corresponding event types are introduced. There is a corresponding argument role group for each of the 33 event types that belong to the ACE2005 dataset. Taking event sample D as an example, the Meet type event corresponds to two argument roles, namely, an indivisual role and a Group role. Constructing two input texts corresponding to different argument roles according to the problem template, and recording the two input texts as

In a similar manner to step S2, text will be entered

And

a series of conversion is carried out to respectively obtain corresponding word vector, block vector and position vector, the three vectors are added to obtain two input vectors of the argument role extraction model, and the two input vectors are respectively marked as argument role extraction model

And

and S6, inputting the third input vector into the argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining the argument positions of the appointed argument roles.

As an alternative embodiment, will

And

respectively inputting the characters into argument role extraction models, then converting the vectors into new vectors with dimensions (BS, N, 5) by a full connection layer, and calculating by utilizing a Softmax function to obtain the probability that each word in the original text is respectively of a B type, an I type, an O type, a BandI type and a PAD type. During prediction, the PAD type is processed into an O type, argument extraction is carried out in a new labeling mode, and the extraction of nested arguments is guaranteed to be supported. Specifically, for the BandI type, the word representing the current position is used as both a single word of one length and as part of a previously matched word. With respect to the event sample D of the event,

the predicted result is that the probability that the 'He' position is of the B type is the maximum, the probability that other positions are of the O type is the maximum,

the predicted result is that the probability of the three positions of the all his friends is respectively the maximum of the B type, the I type and the I type, and the probability of the other positions are all the maximum of the O type. Namely, the argument corresponding to the argument role indicial is "He", and the argument corresponding to the argument role Group is "all his friends". During the training process, PAD will be used to construct problem location, CLS and SEP locationsThe corresponding label, for the event sample D,

the constructed tag sequence is label ₁ ＝{PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,PAD,B,O,O,O,O,PAD}，

The constructed tag sequence is then: label ₂ = PAD, PAD, PAD, PAD, PAD, PAD, PAD, PAD, PAD, O, O, B, I, I, PAD }. And performing cross entropy loss calculation on the predicted word type probability and the original type label to obtain the loss of the trigger word extraction model so as to train and optimize the trigger word extraction model.

In summary, compared with the prior art, the present embodiment has the following advantages and beneficial effects: the invention provides a method for reducing the negative influence of error propagation on the performance of the pipeline model by introducing a specific auxiliary screening task and utilizing the dependency relationship among the pipeline models. For the selection of the problem template, the invention determines the special scheme aiming at different models and obtains good effect. For the problem of nested entities, the invention provides a new encoding party to realize argument role extraction of a nested structure. Experiments prove that the effect is very good in the ACE2005 data set.

The present embodiment further provides a device for extracting multitask events based on question answering, including:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of fig. 1.

The multi-task event extraction device based on question answering can execute the multi-task event extraction method based on question answering provided by the method embodiment of the invention, can execute any combination of the method embodiments to implement steps, and has corresponding functions and beneficial effects of the method.

The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-task event extraction method based on question answering is characterized by comprising the following steps:

and inputting the third input vector into an argument role extraction model, screening out event samples with correct trigger word extraction and event identification results, and obtaining argument positions of the appointed argument roles.

2. The question-answer-based multitask event extraction method according to claim 1, wherein said generating questions according to the question template for triggering word extraction and obtaining first input vector includes:

the original text is divided by spaces and is represented as

Wherein m is the number of words of the original text;

From the input text, the corresponding block code, denoted B, is obtained ^a ＝{0,0,0,1 ₁ ,1 ₂ ,…,1 _m 1}; wherein the subscript of 1 corresponds to the input text T ^a Subscripts of (1);

Word vector

Block vector

And a position vector

3. The question-answer-based multitask event extraction method according to claim 2, wherein the step of inputting the first input vector into the trigger word extraction model to obtain the position information of the trigger word in the original text comprises the following steps:

Wherein k represents the number of words of the trigger word;

Wherein

indicating that the predicted trigger word is at the end of the original text.

4. The question-answer-based multitask event extraction method as claimed in claim 1, wherein said generating a question according to a question template capable of introducing trigger word information and obtaining a second input vector comprises:

generating questions from a question template capable of introducing trigger word informationTo obtain an input text T ^b ；

According to the input text T ^b Obtaining a corresponding block code;

Word vector

Block vector

And a position vector

Adding the three vectors to obtain a second input vector v of the event recognition model _b (ii) a Wherein a text T is input ^b Is shown as

5. The method for extracting multi-task events based on question answering according to claim 4, wherein the step of inputting the second input vector into the event recognition model, screening out the event sample with the correct trigger word, and obtaining the event type to which the event sample belongs comprises:

Where n represents the number of words of the event type name.

6. The method for extracting multi-task events based on question answering according to claim 1, wherein the generating a question for a specified argument role type according to a question template capable of introducing trigger word information and event types and obtaining a third input vector comprises:

According to the input text T ^c Obtaining corresponding block codes；

Obtaining input text T according to position coding mode provided by pre-training model BERT ^c Absolute position information of each word in the table, to obtain a position vector

Word vector

Block vector

And a position vector

Adding the three vectors to obtain a third input vector v of the event recognition model _c (ii) a Wherein a text T is input ^c Is shown as

7. The multi-task event extraction method based on question answering according to claim 6, wherein the step of inputting the third input vector into an argument role extraction model, screening out event samples with correct trigger word extraction and event recognition results, and obtaining argument positions of the specified argument roles comprises the steps of:

inputting the third input vector v _c Inputting the input into a pre-training model BERT to obtain the output O of the CLS position of the BERT network ¹ And the last layerOutput of (2) O ² ，O ¹ Followed by a screening of the full junction layer, O ² Then, a classified full-connection layer is connected, and the error probability of the event sample and the type probability corresponding to each word are obtained by utilizing a Softmax function;

8. The method for extracting multi-task events based on question answering according to claim 7, wherein the obtaining of the correct event sample and the argument specifying the role of the argument in the event according to the error probability of the event sample and the type probability corresponding to each word comprises:

when in prediction, the PAD type is regarded as an O type, a BandI type is added on the basis of a BIO ternary labeling mode to solve the problem of nested entities, and each position of an original text is decoded to obtain position information of a argument, wherein the position information is expressed as

Wherein

9. The question-answer-based multitask event extraction method according to claim 1, characterized in that said trigger word extraction model, event recognition model and argument role extraction model are all optimized by Adam algorithm, and an early-stop training method is used to prevent overfitting.

10. A question-answer based multitask event extraction device, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-9.