CN114328687A

CN114328687A - Event extraction model training method and device and event extraction method and device

Info

Publication number: CN114328687A
Application number: CN202111595365.XA
Authority: CN
Inventors: 徐国进; 韩翠云; 李心雨; 黄佳艳; 裴明; 施茜
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-12
Anticipated expiration: 2041-12-23
Also published as: CN114328687B

Abstract

The utility model provides an event extraction model training method and device, relating to the technical field of artificial intelligence such as knowledge map and deep learning, the concrete implementation scheme is as follows: a first training sample is obtained, wherein the first training sample comprises a first sample text and first annotation data. And carrying out model training through the first training sample to obtain a first sub-model. And obtaining a second training sample, wherein the second training sample comprises a second sample text, a plurality of second sample arguments existing in the second sample text, and sample probabilities that every two second sample arguments in the plurality of second sample arguments correspond to the same event. And carrying out model training through a second training sample to obtain a second sub-model. It is determined that the event extraction model includes a first submodel and a second submodel. The technical scheme provided by the disclosure can effectively improve the accuracy of the event extraction model.

Description

Event extraction model training method and device and event extraction method and device

Technical Field

The disclosure relates to the technical field of artificial intelligence such as knowledge maps and deep learning, in particular to an event extraction model training method and device and an event extraction method and device.

Background

Event extraction refers to extracting information of a required event from unstructured text and integrating the information into a structured form.

At present, event extraction is usually realized through an event extraction model, and the accuracy of the event extraction model is poor due to insufficient labeling information.

Disclosure of Invention

The disclosure provides an event extraction model training method and device and an event extraction method and device.

According to a first aspect of the present disclosure, there is provided an event extraction model training method, including:

obtaining a first training sample, wherein the first training sample comprises a first sample text and first labeling data, and the first labeling data comprises: a plurality of data packets corresponding to a plurality of sample arguments in the first sample text, a sample role corresponding to each data packet, and a sample event type corresponding to each data packet, wherein the sample arguments in any data packet are the same;

performing model training through the first training sample to obtain a first sub-model, wherein the first sub-model is used for determining arguments existing in a text, roles corresponding to the arguments and event types corresponding to the arguments;

obtaining a second training sample, wherein the second training sample comprises a second sample text, a plurality of sample events existing in the second sample text, and a second sample argument included in each sample event;

performing model training through the second training sample to obtain a second sub-model, wherein the second sub-model is used for determining events existing in the text and arguments corresponding to the events;

determining an event extraction model based on the first sub-model and the second sub-model.

According to a second aspect of the present disclosure, there is provided an event extraction method, including:

acquiring a first text to be processed;

processing the first text through a first sub-model in a pre-trained event extraction model to obtain a first output result, wherein the first output result comprises: argument existing in the first text, role corresponding to the argument and event type corresponding to the argument;

and processing the first output result through a second sub-model in the pre-trained event extraction model to obtain an event existing in the first text and an argument corresponding to the event.

According to a third aspect of the present disclosure, there is provided an event extraction model training apparatus, including:

an obtaining module, configured to obtain a first training sample, where the first training sample includes a first sample text and first labeling data, and the first labeling data includes: a plurality of data packets corresponding to a plurality of sample arguments in the first sample text, a sample role corresponding to each data packet, and a sample event type corresponding to each data packet, wherein the sample arguments in any data packet are the same;

the first processing module is used for carrying out model training through the first training sample to obtain a first sub-model, and the first sub-model is used for determining arguments existing in a text, roles corresponding to the arguments and event types corresponding to the arguments;

a second obtaining module, configured to obtain a second training sample, where the second training sample includes a second sample text, a plurality of sample events existing in the second sample text, and a second sample argument included in each sample event;

the second processing module is used for carrying out model training through the second training sample to obtain a second sub-model, and the second sub-model is used for determining events existing in the text and arguments corresponding to the events;

a determination module to determine an event extraction model based on the first submodel and the second submodel.

According to a fourth aspect of the present disclosure, there is provided an event extraction device comprising:

the acquisition module is used for acquiring a first text to be processed;

a first processing module, configured to process the first text through a first sub-model in a pre-trained event extraction model to obtain a first output result, where the first output result includes: argument existing in the first text, role corresponding to the argument and event type corresponding to the argument;

and the second processing module is used for processing the first output result through a second sub-model in the pre-trained event extraction model to obtain the event existing in the first text and the argument corresponding to the event.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect or the second aspect.

The problem of poor accuracy of the event extraction model is solved according to the technology of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an implementation of event extraction provided by an embodiment of the present disclosure;

FIG. 2 is a flowchart of an event extraction model training method provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating an implementation of annotation data provided by an embodiment of the present disclosure;

FIG. 4 is a second flowchart of an event extraction model training method provided by the embodiment of the present disclosure;

FIG. 5 is a schematic processing diagram of a first sub-model provided by an embodiment of the disclosure;

FIG. 6 is a schematic diagram illustrating an implementation of updating model parameters of a first sub-model according to an embodiment of the present disclosure;

fig. 7 is a flowchart three of an event extraction model training method provided in the embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating an implementation of determining a first probability provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating an implementation of determining a candidate window according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram illustrating an implementation of determining a target window according to an embodiment of the present disclosure;

FIG. 11 is a flowchart of an event extraction method provided by an embodiment of the present disclosure;

fig. 12 is a process diagram of an event extraction method according to an embodiment of the present disclosure;

FIG. 13 is a schematic structural diagram of an event extraction model training apparatus according to an embodiment of the present disclosure;

FIG. 14 is a schematic structural diagram of an event extraction device according to an embodiment of the present disclosure;

FIG. 15 is a block diagram of an electronic device for implementing an event extraction model training method and an event extraction method according to embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to better understand the technical solution of the present disclosure, the related art related to the present disclosure is further described in detail below.

An event is a form of presentation of information, which is defined as the objective fact that a particular person, object, interacts with a particular location at a particular time, typically sentence-level. In Topic Detection Tracking (TDT), an event refers to a set of related descriptions about a Topic, which may be formed by a classification or a cluster.

Wherein, each element composing the event may include at least one of the following: trigger words, event types, event arguments, and argument roles.

Wherein, the event triggers the word: the core words representing the occurrence of events are mostly verbs or nouns;

event type: there may be a plurality of event types that are preset, for example, 8 event types and 33 seed types defined in the ACE2005 corpus. Most event extraction currently uses 33 event types. It is understood that event recognition is a word-based 34-class (33-class event type + None) multivariate classification task.

Event argument: the event participants mainly comprise entities, values and time. A value is a non-physical event participant, such as a job site;

argument roles: event arguments play a role in events. There are 35 types of roles, e.g., attacker, victim, etc., that can also be preset. It is understood that the role classification is a 36-class (35-class role type + None) multi-classification task based on word pairs;

the event extraction technology is used for extracting events which are interested by a user from unstructured information and presenting the events to the user in a structured mode. The event extraction task can be decomposed into 4 subtasks, namely a trigger word recognition task, an event type classification task, an argument recognition task and a role classification task. Wherein the trigger recognition and event type classification may be merged into an event recognition task.

The event recognition is used for judging the type of the event to which each word in the sentence belongs, and is a word-based multi-classification task. Argument recognition and role classification can be merged into an argument role classification task. The role classification task is a multi-classification task based on word pairs, and the role relationship between any pair of trigger words and entities in the sentence is judged.

For example, the event extraction may be understood in conjunction with fig. 1, and fig. 1 provides an implementation schematic diagram of event extraction according to an embodiment of the present disclosure.

Assuming that there is currently an example sentence "in city a, a photographer died when a tank fired at hotel P", event extraction is performed for this example sentence at present, for example, two events shown in fig. 1 can be obtained.

The event type of the event 1 is death, the corresponding trigger word is 'death' in the example sentence, the event arguments in the event 1 comprise a photographer, a tank and city a, and the argument roles of the 3 event arguments are a victim, a tool and a place in sequence.

And the event type of the event 2 is attack, the corresponding trigger word is 'firing' in the example sentence, the event arguments in the event 2 comprise a photographer, a P hotel, a tank and a city A, and the argument roles of the 4 event arguments are sequentially a target, a tool and a place.

Based on the above example, it can be determined that event extraction can extract information of a desired event from unstructured text and integrate the information into a structured form.

Based on the above-described event extraction, chapter-level event extraction refers to structuring all events of predefined types from chapters of long texts, and the structured events include event types, and all event arguments and their roles under the event types. That is to say, the chapter-level event extraction is to perform an event extraction process on chapters of long texts, where the chapter-level event extraction often faces a complex situation that the text is long and the text contains multiple events of the same type.

Currently, event extraction is usually implemented through an event extraction model, in the existing chapter-level event extraction model, a supervision signal for each word in a chapter is usually required to be given during training, wherein the supervision signal may include argument roles, event types, positions in texts, and the like.

It can be understood that the existing chapter-level event extraction model relies on the supervision signal of each word, and chapter-level event arguments often occur multiple times in chapters, but there is a possibility that the annotation data is insufficient, so that the model training learning is easily misled by the wrong supervision signal.

It is here explained in connection with a specific example, that it is assumed that there is currently a text in which formula a is included, and that formula a occurs 10 times in total in this text, that is, formula a occurs in 10 different places in this text. In the prior art, the argument roles, the event types and the positions appearing in the text corresponding to the formula a are labeled, but the labeling is not sufficient because the workload of labeling is huge.

For example, in the present example, the formula a is represented 10 times in the text, but at the time of labeling, only the formulas in 4 places are labeled, that is, only the formulas in 4 places are labeled with the supervision signals, and the formulas in the remaining 6 places are not labeled. Since the training of the model is dependent on the labeling of the supervisory signals, for the model, since only 4 local formulas are labeled currently, the model considers that the formula a appears only 4 times in this text, but the actual situation is not, and the labeled 4 supervisory signals are misleading information for the model.

Therefore, it can be determined based on the above description that if the supervised signal in the text is not labeled sufficiently, the labeled supervised signal is actually wrong, and is easily misled by the wrong supervised signal during model training and learning.

In addition, it can be determined based on the above description that in the chapter-level event extraction, because the text to be extracted is long and the text usually contains a plurality of complex events of the same type, many times, the event extraction cannot distinguish the events of the same type, which results in an error in the final event extraction result, for example, if the events 1 and 2 are the same type events mixed together, there is a possibility that the argument in the event 1 is split into the argument in the event 2, which results in an error in the event extraction result.

In view of the above-described problems in the prior art, the present disclosure proposes the following technical ideas: by means of multi-example learning, event arguments which appear for multiple times in a text are optimized as a package to prevent model error learning, and therefore the model training method can be provided when the same event argument in a chapter appears for multiple times and is not sufficiently labeled. And by determining the central argument, the central argument is used as a method for distinguishing a plurality of events of the same type, and meanwhile, the central argument is used as the center to construct a window matched with other arguments of the same event, so that the accuracy of the event extraction result can be effectively improved.

Based on the above description, the event extraction model training method provided by the present disclosure is described below with reference to specific embodiments, and it should be noted that the execution main bodies of the embodiments of the present disclosure may be devices with data processing functions, such as a server, a processor, a microprocessor, and the like.

First, an event extraction model training method provided by the present disclosure is described with reference to fig. 2 and fig. 3, fig. 2 is a flowchart of the event extraction model training method provided by the embodiment of the present disclosure, and fig. 3 is an implementation schematic diagram of annotation data provided by the embodiment of the present disclosure.

As shown in fig. 2, the method includes:

s201, obtaining a first training sample, wherein the first training sample comprises a first sample text and first labeling data, and the first labeling data comprises: the data packages corresponding to the sample arguments in the first sample text, the sample roles corresponding to the data packages, and the sample event types corresponding to the data packages, wherein the sample arguments in any one data package are the same.

In this embodiment, the first training sample is a sample for training the first sub-model, wherein the first training sample may include the first sample and the first label data.

It can be understood that the first sample text is a text chapter, such as a paper, a composition, an article, and the like, the format, the word number, and the like of the first sample text are not limited in this embodiment, and the specific implementation of the first sample text can be selected according to actual requirements.

In addition, the first annotation data in this embodiment is annotation data for the first sample text, and the first annotation data may include: the data packages corresponding to the sample arguments in the first sample text, the sample roles corresponding to the data packages, and the sample event types corresponding to the data packages, wherein the sample arguments in any one data package are the same. That is to say, in this embodiment, the argument included in the first sample text, the argument role corresponding to the argument, and the event type corresponding to the argument are all labeled, so as to obtain the first labeled data.

Meanwhile, in this embodiment, when the argument in the first sample object is labeled, a plurality of data packets are determined according to a plurality of sample arguments in the first sample text, where any one of the data packets may include one or more sample arguments, and the sample arguments included in any one of the data packets are the same.

For example, it can be understood with reference to fig. 3 that, as shown in fig. 3, it is assumed that the sample text shown in fig. 3 currently exists, and for convenience of description, it is assumed that a plurality of formulas, such as formula a, formula b, and formula c shown in fig. 3, are included in the sample text, and it can be determined in conjunction with fig. 3 that both formula a and formula b appear multiple times in the sample text.

The same sample text is determined as one packet in this embodiment, and 3 packets shown on the right side of fig. 3 are obtained. Since formula a appears 4 times in the sample text, 4 formulas a are included in packet 1; and, formula b occurs 2 times in the sample text, thus 2 formulas b are included in packet 2; and, formula c appears only 1 time in the sample text, so 1 formula c is included in packet 3. And labeling each data packet with a corresponding sample role and sample event type.

As will be understood from the description of fig. 3, the same argument role is actually included in any one packet, and thus for any one packet, the sample role to which the packet corresponds is actually the argument role of the argument in the packet. And the sample event type corresponding to the data packet, which is actually the sample event type of the argument in the data packet.

It can be determined based on the above description that, in the present embodiment, for an argument that appears multiple times in the sample text, the same argument in multiple places may be regarded as one packet, and for an argument that appears only once, a packet including only the argument may also be determined. And then labeling the data packet so as to correctly inform the number of times of each argument of the model in the sample text.

For example, for the data packet 1 in fig. 3, the model can determine that the formula a appears 4 times in the sample text, and although we do not tell the model the positions of the 4 sample texts, we tell the model the number of occurrences of the formula a at least correctly, so as to avoid misleading the model due to insufficient labeling. The model can independently learn the specific positions of the arguments, so that the embodiment can label the labeled data for the model by taking the same argument appearing for multiple times as a data packet and labeling the data packet, thereby providing correct labeling information for the model and avoiding misleading the model due to insufficient labeling.

The implementation of packet partitioning described above is actually an application of multi-instance learning, and the specific implementation of multi-instance learning has been described in the above embodiments, and is not described herein again.

In an actual implementation process, the number of the first sample texts included in the first training sample may be multiple, and each first sample text has corresponding first labeling data, where a specific number of the first sample texts may be selected and set according to actual requirements.

S202, performing model training through a first training sample to obtain a first sub-model, wherein the first sub-model is used for determining arguments existing in the text, roles corresponding to the arguments and event types corresponding to the arguments.

After the first training data is obtained, model training may be performed through the first training data, for example, an initial first sub-model is trained through a first training sample, so as to obtain a trained first sub-model, where the first sub-model is used to determine an argument existing in a text, and an argument role corresponding to the argument and an event type corresponding to the argument.

S203, obtaining a second training sample, wherein the second training sample comprises a second sample text, a plurality of second sample arguments existing in the second sample text, and sample probabilities that every two second sample arguments in the plurality of second sample arguments correspond to the same event.

In this embodiment, there is also a second training sample, where the second training sample is a sample used for training the second sub-model, and the second training sample may include a second sample text, a plurality of second sample arguments existing in the second sample text, and a sample probability that every two second sample arguments in the plurality of second sample arguments correspond to the same event.

Based on the above description, it can be determined that each event argument may belong to a certain event, and then, for any two event arguments, it may belong to the same event or belong to different events, and then, in this embodiment, the sample probability that every two second sample arguments correspond to the same event may be 0 (that is, do not belong to the same event), or the sample probability that every two second sample arguments correspond to the same event may be 1 (that is, belong to the same event)

And, in a possible implementation, the training of the first sub-model and the second sub-model may have a context, that is, the first sub-model is trained first, and then the second sub-model is trained according to the output data of the first sub-model, so that the second sample text in the second training sample may be, for example, the same as the first sample text described above, and a plurality of second sample arguments existing in the second sample text in the second training sample may be, for example, arguments output by the first sub-model described above. That is, based on the same sample text, the first sub-model is trained first, and then the second sub-model is trained according to the argument extracted from the first sub-model.

Alternatively, in another possible implementation manner, the training of the first submodel and the training of the second submodel may also be independent from each other, that is, there is no relationship between the process of training the first submodel, the first training sample, and the process of training the second submodel, and the second training sample, which is a completely independent training process, and completely independent sample data. In an actual implementation process, specific implementations of the first training sample and the second training sample may be selected and set according to actual requirements, which is not particularly limited in this embodiment.

And S204, performing model training through a second training sample to obtain a second sub-model, wherein the second sub-model is used for determining events existing in the text, arguments corresponding to the events and roles corresponding to the arguments.

After the second training data is obtained, model training may be performed through the second training data, for example, an initial second sub-model is trained through a second training sample to obtain a trained second sub-model, where the second sub-model in this embodiment is used to determine an event existing in a text, an argument corresponding to the event, and a role corresponding to the argument.

And S205, determining an event extraction model based on the first sub-model and the second sub-model.

After the training of the first submodel and the second submodel, an event extraction model may be determined based on the first submodel and the second submodel, and thus the event extraction model in this embodiment includes the first submodel and the second submodel described above. The event extraction model in this embodiment is used to process an input text, so as to output an event existing in the text, an argument corresponding to the event, and a role corresponding to the argument.

The event extraction model training method provided by the embodiment of the disclosure comprises the following steps: obtaining a first training sample, wherein the first training sample comprises a first sample text and first labeling data, and the first labeling data comprises: the data packages corresponding to the sample arguments in the first sample text, the sample roles corresponding to the data packages, and the sample event types corresponding to the data packages, wherein the sample arguments in any one data package are the same. And performing model training through a first training sample to obtain a first sub-model, wherein the first sub-model is used for determining arguments existing in the text, roles corresponding to the arguments and event types corresponding to the arguments. And obtaining a second training sample, wherein the second training sample comprises a second sample text, a plurality of second sample arguments existing in the second sample text, and sample probabilities that every two second sample arguments in the plurality of second sample arguments correspond to the same event. And performing model training through a second training sample to obtain a second sub-model, wherein the second sub-model is used for determining events existing in the text, arguments corresponding to the events and roles corresponding to the arguments. An event extraction model is determined based on the first submodel and the second submodel. The method comprises the steps of respectively processing a first sub-model and a second sub-model in an event extraction model through a first training sample and a second training sample, wherein the first training data determine various sample arguments appearing in a sample text as data packets, the sample arguments included in any data packet are the same, and the data packets are labeled according to roles and event types, so that correct labeling information can be provided for the model, misleading of the model due to insufficient labeling can be avoided, and the accuracy of the event extraction model can be effectively improved.

On the basis of the above embodiment, the following describes implementation of the training of the first sub-model and the training of the second sub-model, and first, with reference to fig. 4 to 6, details of the training process of the first sub-model in the event extraction model training method provided by the present disclosure are described below. Fig. 4 is a second flowchart of the event extraction model training method provided in the embodiment of the present disclosure, fig. 5 is a schematic processing diagram of a first sub-model provided in the embodiment of the present disclosure, and fig. 6 is a schematic implementation diagram of updating model parameters of the first sub-model provided in the embodiment of the present disclosure.

As shown in fig. 4, the method includes:

s401, obtaining a first training sample, wherein the first training sample comprises a first sample text and first labeling data, and the first labeling data comprises: the data packages corresponding to the sample arguments in the first sample text, the sample roles corresponding to the data packages, and the sample event types corresponding to the data packages, wherein the sample arguments in any one data package are the same.

The implementation manner of S401 is similar to that of S201 described above, and is not described here again.

S402, processing the first sample text through a first sub-model to be trained to obtain first prediction data, wherein the first prediction data comprises a plurality of prediction arguments, a prediction role corresponding to the prediction arguments and a prediction event type corresponding to the prediction arguments.

The first sub-model that has not been trained is referred to as the first sub-model to be trained in this embodiment. When the first sub-model is trained according to the first training sample, for example, the first sample can be processed by the first sub-model to be trained, because the first sub-model in this embodiment is used for determining the argument, the role of the argument, and the event type of the argument existing in the text.

Therefore, referring to fig. 5, the first sub-model may output first prediction data, where the first prediction data includes a plurality of prediction arguments, where the prediction arguments are arguments extracted from the first sample text by the first sub-model, and may include, for example, the argument a, the argument b, the argument c, and the like shown in fig. 5. And the first prediction data comprises the prediction roles corresponding to the prediction arguments and the prediction event types corresponding to the prediction arguments.

In a possible implementation manner, the first prediction data in this embodiment further includes predicted positions of the plurality of prediction arguments in the first sample text, and the first prediction data further includes probabilities of the predicted positions of the plurality of prediction arguments in the first sample text. The probability can be understood as the confidence level output by the model itself, that is, how much the model has confidence in the currently output predicted position.

Therefore, in this embodiment, although the first sub-model is not told, the same argument is specifically positioned in the text, and only a few arguments are told in the text of the first sub-model, the first sub-model may output the extracted prediction arguments based on the multi-example learning manner, and simultaneously output the prediction positions of the extracted prediction arguments in the text and the probability for the current prediction position.

S403, determining a first loss according to the first annotation data, the prediction role corresponding to the prediction argument and the prediction event type corresponding to the prediction argument.

After the first sub-model outputs the first prediction data, since the first specimen is labeled with the first labeling data for the first specimen in the present embodiment, the loss of the model can be determined from the first labeling data and the first prediction data described above.

In one possible implementation, it may be determined based on the above description that a plurality of data packets are included in the first annotation data, wherein each data packet includes the same argument internally, and an argument role and an event type are labeled for each data packet, so that the argument role and the event type of the argument are actually labeled in the annotation data.

The first prediction data comprises a plurality of prediction arguments, a prediction role corresponding to each prediction argument and a prediction event type corresponding to the prediction argument.

Thus, referring to FIG. 6, the first loss can be determined based on the first annotation data, the predicted role corresponding to the predicted argument, and the event type corresponding to the predicted argument. In a possible implementation manner, for example, a first loss function may be preset, and then the first loss function is used to process the first labeled data, the predicted role corresponding to the predicted argument, and the event type corresponding to the predicted argument, so as to determine the first loss.

S404, grouping the plurality of prediction arguments to obtain a plurality of groups of prediction arguments, wherein the arguments in each group of prediction arguments are the same.

In this embodiment, the first prediction data may include a plurality of prediction arguments, and it may be determined based on the above description that the same argument may appear in the text many times, because the number of each sample argument is specifically reported to the model through the data packet in this embodiment, the model outputs the respective predicted positions of the same sample arguments.

Therefore, multiple prediction arguments can be grouped to obtain multiple sets of prediction arguments, where the arguments in each set of prediction arguments are the same. That is, the same argument in the prediction arguments is set as one group, thereby obtaining a plurality of groups of prediction arguments.

The set of predictors is substantially similar to the packet in the annotation data described above, except that the packet is labeled in advance, and the current set of predictors is the grouping of the plurality of predictors obtained after the processing by the first submodel.

S405, according to the probability of the predicted positions of the multiple prediction arguments in the first sample text, determining target prediction arguments in multiple groups of prediction arguments respectively, wherein the probability of the predicted positions of the target prediction arguments in the group of prediction arguments in the first sample text is the highest.

Based on the above description, it can be determined that the first prediction data in the present embodiment includes probabilities of the predicted positions of the multiple prediction arguments in the first sample text, and the probabilities in the present embodiment can be understood as confidence of the model output.

Meanwhile, it can be understood that, because the position of each argument in the text is not informed in the present disclosure, the model does not actually refer to information in the learning process, but learns autonomously, so that the probability of the predicted position of the predicted argument output by the model in the first sample text is actually not very high in the training process, that is, the position of the model for the output predicted argument is actually not very determined.

In order to improve the efficiency of model training, in this embodiment, a target prediction argument may be determined in multiple sets of prediction arguments, where the target prediction argument is an argument with the highest probability of a corresponding predicted position in the set of prediction arguments. That is to say, the loss is calculated according to the argument with the highest probability of the predicted position, so that the effectiveness of model training can be effectively improved.

And S406, determining a second loss according to the predicted position of the target argument and the actual position of the target argument in the first sample text.

It can be determined that the target argument is also an argument extracted from the first sample text, and thus based on the first sample text, the actual position of the target argument in the first sample text can be determined. In a possible implementation manner, in this embodiment, the actual positions of the arguments in the first sample text do not need to be labeled in advance, but after the model outputs the predicted positions of the predicted arguments, the target arguments near the predicted positions are determined in the first sample text to determine the actual positions of the target arguments in the first sample text.

Referring to FIG. 6, a second penalty may then be determined based on the predicted location of the target argument, as well as the actual location of the target argument within the first sample text. In one possible implementation, the predicted position of the target argument and the actual position of the target argument in the first sample text may be processed by a preset second loss function, for example, to obtain the second loss function.

And S407, updating model parameters of the first submodel according to the first loss and the second loss.

The first loss can be used for extracting arguments from the first submodel and optimizing the accuracy of the argument roles and the event types of the output arguments, and the second loss can be used for optimizing the accuracy of the positions of the arguments output by the first submodel.

It can be understood that, in the actual implementation process, the operations of determining the loss and updating the model parameter of the first submodel according to the loss described above may be performed iteratively for a plurality of times until a preset number of iterations is reached, or until the model converges, so that the trained first submodel may be obtained.

According to the event extraction model training method provided by the embodiment of the disclosure, a first sample text is processed through a first sub-model to be trained, so that first prediction data is output, and then a first loss for optimizing and determining the accuracy of an argument role and an event type is determined according to an argument role and an event type of a prediction argument included in the first prediction data and an argument role and an event type of an argument in first labeling data. And the target argument with the maximum probability can be selected according to the predicted position of the predicted argument in the first sample text and the probability of the predicted position, wherein the predicted argument is included in the first prediction data, and then the second loss is determined based on the target argument, so that the speed and the efficiency of model training can be effectively improved. Specifically, a second loss in accuracy for optimizing the location of the output argument is determined by the predicted location of the target argument and the actual location of the target argument in the first sample text. And then updating the model parameters of the first sub-model according to the first loss and the second loss, thereby effectively realizing the training aiming at the first sub-model and effectively ensuring the accuracy of the argument, the argument role of the argument, the event type of the argument and the position of the argument output by the first sub-model.

The foregoing embodiments describe implementations of training for the first sub-model, and details of implementations of training for the second sub-model are described below with reference to specific embodiments. For example, the description may be made with reference to fig. 7 to fig. 10, fig. 7 is a third flowchart of an event extraction model training method provided in the embodiment of the present disclosure, fig. 8 is an implementation schematic diagram of determining a first probability provided in the embodiment of the present disclosure, fig. 9 is an implementation schematic diagram of determining a candidate window provided in the embodiment of the present disclosure, and fig. 10 is an implementation schematic diagram of determining a target window provided in the embodiment of the present disclosure.

As shown in fig. 7, the method includes:

s701, obtaining a second training sample, wherein the second training sample comprises a second sample text, a plurality of second sample arguments existing in the second sample text, and sample probabilities that every two second sample arguments in the plurality of second sample arguments correspond to the same event.

The implementation manner of S701 is similar to that of S203 described above, and is not described herein again.

S702, determining a plurality of second sample roles corresponding to the second sample arguments.

In this embodiment, each sample argument corresponds to a respective sample role, and in a possible implementation manner, the current second sample argument may be, for example, a predicted argument output by the first sub-model, and it may be determined based on the above description that the first sub-model also outputs an argument role and an event type of the argument. Therefore, in the present case, a second sample role corresponding to each of the plurality of second sample arguments can be obtained based on the output of the first sub-model.

And S703, determining the first probability that the second sample argument under each second sample role corresponds to the same event.

In this embodiment, the second sample roles are sample roles corresponding to respective second sample arguments in the second sample text, and it can be understood that each sample argument corresponds to a respective corresponding event, and then in this embodiment, a first probability that the second sample arguments in the respective second sample roles correspond to the same event can also be determined.

It is to be understood that, for the same event type, there may be a plurality of different events, for example, there are currently an event a and an event b, and the event types of the two events are both event type 1.

In this embodiment, when determining the probability that the second sample argument under each second sample role corresponds to the same event, for example, the probability introduced above may be determined by performing analysis based on the second sample text.

For example, it can be understood in conjunction with fig. 8 that, as shown in fig. 8, it is assumed that a plurality of arguments under the sample role 1 exist in the second sample text, for example, argument 1, argument 2, and argument 3 shown in fig. 8 may be included in the second sample text, that is, the argument roles of these 3 arguments are sample role 1, and it can be determined based on the above description that the same argument may appear multiple times in the text, the example of fig. 8 represents that argument 2 appears 3 times in the second sample text.

Referring to FIG. 8, argument 1 is for event a, and the first argument 2 and the second argument 2 are for event a, the third argument 2 is for event b, and argument 3 is for event b. Then for the example of fig. 8, it may be determined that the probability of the second sample argument under the second sample text belonging to event a is 60% (3/5) and the probability of belonging to event b is 40% (2/5).

In this embodiment, the probability that the second sample argument in the second sample text belongs to the same event may be, for example, the probability that the second sample argument belongs to the same event with the largest number, that is, the probability that the second sample argument in the second sample text belongs to the same event is determined as the probability that the second sample argument is currently required, so that with respect to the example of fig. 8, the probability that the second sample argument in the second sample text belongs to the same event may be determined to be 60%.

Fig. 8 illustrates an implementation manner in which each sample argument in the sample role corresponds to two different events, and in an actual implementation process, each argument in the sample role may also correspond to more than two different events, which is similar to the above-described implementation manner, and a probability corresponding to an event with the largest number of occurrences may be determined as a currently required probability.

S704, determining the role coefficient of each second sample role according to the recall rate and the accuracy rate of each second sample role.

And the second sample argument in this embodiment is the argument output by the introduced first sub-model, and based on the introduction, it can be determined that the first sub-model can output the prediction arguments, the predicted roles of the respective prediction arguments, and the predicted event types corresponding to the respective prediction arguments.

Since each of the prediction arguments corresponds to a respective prediction role, there are many prediction roles in the output of the first sub-model, and the second sample role in this embodiment is actually the prediction role output by the first sub-model.

Then, for any second sample role, the Recall rate and the accuracy rate corresponding to the second sample role output by the first sub-model may be counted, where the specific implementation of the Recall rate (Recall) and the accuracy rate (Precision) may refer to the implementation in the prior art, and details thereof are not described here.

After the recall rate and the accuracy rate of each second sample role are determined, the role coefficient of each second sample role can be determined according to the recall rate and the accuracy rate of each second sample role. The role coefficient in this embodiment may be, for example, an integrated evaluation index (F-Measure).

In a possible implementation manner, for example, the recall rate and the accuracy rate of the second sample character may be processed according to a preset function to obtain the character coefficient of the second sample character.

The preset function for determining the role coefficient may satisfy the following formula one, for example:

wherein P is the accuracy, R is the recall, and F is the role coefficient, which may specifically be a comprehensive evaluation index in this embodiment.

The above operation may be performed for each second sample role, so as to obtain a role coefficient F corresponding to each second sample role.

S705, determining the priority of each second sample role according to the first probability corresponding to each second sample role and the role coefficient of each second sample role.

After determining the first probability corresponding to each second sample role and the role coefficient of each second sample role, for example, the priority of each second sample role may be determined according to the first probability and the role coefficient.

In a possible implementation manner, for any one of the second sample roles, for example, a product of the first probability P corresponding to the second sample role and the role coefficient F of the second sample role may be determined as the priority of the second sample role. It will be appreciated that the greater the product of P and F, the greater the priority to identify the second sample role.

S706, determining a to-be-selected center role in the plurality of second sample roles according to the priority of each second sample role.

After determining the respective priorities of the second sample roles, a candidate center role may be determined among the plurality of second sample roles according to the priorities of the second sample roles.

In a possible implementation manner, for example, the priority of each second sample role may be compared with a preset threshold, and if it is determined that the priority of a second sample role existing in the second sample roles is greater than or equal to the preset threshold, the second sample role with the priority greater than or equal to the preset threshold may be determined as the center role to be selected. In this case, there may be a plurality of candidate center characters in this embodiment.

In another possible implementation manner, if it is determined that the priorities of the plurality of second sample roles are all smaller than the preset threshold, the second sample role with the highest priority among the plurality of second sample roles may be determined as the center role to be selected.

It should be noted here that the larger the priority of the sample role is, the stronger the ability of distinguishing events of the argument under the sample role is, so in this embodiment, the candidate center role is determined based on the priority, and actually, the argument role with the stronger distinguishing event is determined.

And S707, determining the argument corresponding to the center role to be selected as the center argument.

After determining the candidate center role, determining an argument corresponding to the candidate center role as a center argument. The argument corresponding to the to-be-selected center role is actually the argument of the to-be-selected center role.

S708, determining a target window corresponding to the central argument in the second sample text, wherein the target window comprises a preset number of characters.

It can be determined that there may be a plurality of central arguments determined in this embodiment, and any one of the central arguments is taken as an example and described below.

In this embodiment, a target window corresponding to the central argument may be determined in the second sample text, where the target window may include a preset number of characters, and the number of characters included in the target window is actually the length of the window.

In a possible implementation manner, the target window in this embodiment satisfies the following condition: the target window comprises a preset number of characters, the target window comprises a central argument, and on the basis that the two conditions are met, the target window comprises the most other arguments which are consistent with the event type of the central argument.

The following describes possible implementations of determining the target window:

for example, a plurality of candidate windows can be determined in the second sample text, the candidate windows include a preset number of characters, and the candidate windows include arguments corresponding to the candidate center roles.

It can be understood that, in the embodiment, when determining the target window, because the length of the window is preset and the central argument must be included in the target window, in order to ensure that the target window includes the most other arguments that are consistent with the event type of the central argument, a plurality of candidate windows may be first determined by sliding the window.

The candidate window in this embodiment includes a preset number of characters, that is, the length of the candidate window is a preset length, and arguments corresponding to the candidate central role in the candidate window are actually the introduced central arguments.

For example, it can be understood in conjunction with fig. 9, and referring to fig. 9, where 901 is assumed to be the second sample text, a part of the content of the second sample text is illustrated in fig. 9. Assuming that the current central argument is "several million yuan" in fig. 9, at least the candidate windows respectively shown as 902, 903, and 904 in fig. 9 may be determined for the central argument, for example.

Based on the to-be-selected window 902, the to-be-selected window 903, and the to-be-selected window 904, it can be determined that each to-be-selected window includes a central argument "several thousand ten thousand yuan", and the length of each to-be-selected window is a preset length, where the preset length may be N words, for example, and the specific setting of N may be selected according to actual requirements.

It can be determined based on fig. 9 that, in this embodiment, the window may be slid based on a preset window length on the basis that the window includes the central argument, so as to determine a plurality of candidate windows.

After determining the window to be selected, the number of arguments corresponding to a first event type included in the window to be selected may be determined, where the first event type is an event type corresponding to the central argument.

It can be determined based on the above description that the target window in this embodiment is a window that includes the most other arguments that are consistent with the event type of the central argument. Therefore, the number of the arguments corresponding to the first event type included in each window to be selected can be determined, wherein the first event type is the event type corresponding to the central argument.

For example, it can be understood with reference to fig. 10, as shown in fig. 10, that there are currently window 1 and window 2 shown in fig. 10.

In addition to the central argument, the window 1 further includes an argument 1, an argument 2, an argument 3, and an argument 4, where the event type of the central argument is event type a, the event type of the argument 1 is event type b, the event type of the argument 2 is event type a, the event type of the argument 3 is event type b, and the event type of the argument 4 is event type c. It can be determined that the number of other arguments included in the candidate window 1 that are consistent with the event type of the central argument is 1, i.e., argument 2 in window 1.

And in the window 2, besides the central argument, the window also comprises an argument 2, an argument 3, an argument 4, an argument 1 and an argument 5, wherein the event type of the central argument is event type a, the event type of the argument 2 is event type a, the event type of the argument 3 is event type b, the event type of the argument 4 is event type a, the event type of the argument 1 is event type a, and the event type of the argument 5 is event type c. It can be determined that the number of other arguments included in the candidate window 1, which are consistent with the event type of the central argument, is 3, that is, the argument 2, the argument 4, and the argument 1 in the window 1.

And then, determining the target window according to the number of arguments corresponding to the first event type in the window to be selected.

Because the window including the most arguments that are consistent with the event type of the central argument needs to be determined as the target window in this embodiment, in a possible implementation manner, the window to be selected that includes the most arguments corresponding to the first event type may be determined as the target window.

For example, in the example of fig. 10 described above, it is assumed that there are only candidate window 1 and candidate window 2, because in the example of fig. 10, the number of candidate window 2 is the largest among the numbers of other arguments included in the candidate window that are consistent with the event type of the central argument, and thus candidate window 2 may be determined as the target window.

S709, determining a plurality of first arguments existing in the target window.

Based on the above description, it can be determined that, in the target window, many other arguments may be included besides the central argument, so that, in this embodiment, a plurality of first arguments existing in the target window may be determined, where the first arguments may be, for example, the remaining arguments except the central argument in the target window.

And S710, acquiring the prediction probability of the same event corresponding to the plurality of first arguments and the central argument.

After determining the plurality of first arguments, for example, the prediction probabilities that the plurality of first arguments and the central argument correspond to the same event, that is, the prediction probabilities that the first argument and the central argument belong to the same event, may be obtained.

The predicted probability in this embodiment may be obtained by the output of the second sub-model, that is, the second sub-model may process each first argument and the central argument, so as to output the probability that the first argument and the central argument correspond to the same event.

In a possible implementation manner, the current second sub-model processes the first argument and the central argument to output the probability that the first argument and the central argument correspond to the same event, which may be part of the processing process in the second sub-model, that is, the second sub-model includes the remaining processing processes in addition to the processing of outputting the predicted probability that the first argument and the central argument belong to the same event. For example, the process of determining the target window and the central argument described above may also be processed by the second submodel, and in this case, the second training text may be directly input into the second submodel. And the second sub-model also outputs the events existing in the text, the arguments corresponding to the events and the roles corresponding to the arguments.

Alternatively, the process of determining the target window and the central argument described above may not be processed by the second submodel, but may be performed in a preprocessing stage prior to processing by the second submodel. In this case, before the second training text is input into the second submodel, the pre-processing procedure described above needs to be performed on the second training text to determine the target window and the central argument in the second training text, and then, in addition to inputting the second training sample into the second submodel, the first text and the central argument in the target window need to be input into the second submodel as well, so that the second submodel can output the predicted probability that the first argument and the central argument belong to the same event.

And S711, determining the first argument with the corresponding prediction probability greater than or equal to the probability threshold as the target argument according to the prediction probability corresponding to each first argument.

In this embodiment, each first argument corresponds to a prediction probability with a central argument, and in a possible implementation manner, if the prediction probability corresponding to the first argument is greater than or equal to a probability threshold, it indicates that the probability that the current first argument and the central argument belong to the same event is relatively high, so that the first argument whose corresponding prediction probability is greater than or equal to the probability threshold may be determined as the target argument. The current target argument refers to the argument that belongs to the same event as the central argument.

In another possible implementation manner, if the prediction probability corresponding to the first argument is smaller than the probability threshold, it indicates that the current first argument and the central argument belong to the same event less likely, and thus the first argument is not the target argument required in this embodiment.

And it will be understood that in practice there will be multiple central arguments, for each of which the above operations are performed, assuming that some first argument in the current target window and the current central argument do not belong to the same event, but the first argument may belong to the same event as the remaining central arguments.

It can be understood that, in this embodiment, by determining the central argument and determining the predicted probability of each first argument and the central argument in the target window in the central argument to determine whether the first argument and the central argument belong to the same event, actually based on the central argument, effective division of events doped together is achieved.

In the actual implementation process, the specific setting of the probability threshold may be selected and set according to actual requirements, and the specific implementation of the probability threshold is not limited in this embodiment.

And S712, determining a prediction event corresponding to the central argument, wherein the prediction event comprises the central argument and the target argument.

After determining the target argument for the current central argument, it may be determined that the predicted event corresponding to the central argument corresponds, where a specific implementation of the event corresponding to the argument may be determined with reference to an implementation of event extraction in the related art, which is not limited in this embodiment.

After the predicted event corresponding to the central argument is determined, because the introduced target argument and the central argument are determined to belong to the same event, the predicted event can be determined to include the central argument and each target argument.

In the actual implementation process, because a plurality of central arguments exist, the operation described above is executed for each central argument, so that a corresponding predicted event can be determined for each central argument, and each target argument belonging to one event with the central argument is also included in the predicted event, so that a plurality of predicted events can be extracted from the sample text, and each predicted event includes at least one predicted argument.

S713, determining a third loss according to the prediction argument in the prediction event and the second sample argument in the sample event.

After determining the respective predicted event, because sample events are also included in the second training samples, each sample event includes at least one second sample argument therein. Then the sample event therein is equivalent to the annotation information and the predicted event is equivalent to the information output by the model, and then the third loss can be determined based on the predicted argument in the sample event and the second sample argument in the sample event.

In one possible implementation, the sample event and the predicted event may be processed, for example, by a third loss function to determine a third loss. The third loss in this embodiment is used to optimize the accuracy of the predicted event output by the second submodel, and further, to optimize the partitioning of the predicted event output by the second submodel and the accuracy of the prediction argument included in the predicted event.

And S714, updating the model parameters of the second submodel according to the third loss.

The third loss may optimize accuracy of a predicted event output by the second submodel, so that after the third loss is determined, model parameters of the second submodel may be updated according to the third loss, so as to implement training of the second submodel.

It can be understood that, in the actual implementation process, the operations of determining the loss and updating the model parameter of the second submodel according to the loss described above may be performed iteratively for a plurality of times until a preset number of iterations is reached, or until the model converges, so that the trained second submodel may be obtained.

The event extraction model training method provided by the embodiment of the disclosure determines a first probability corresponding to each second sample role in a second training sample and a role coefficient of each second sample role, then determines a priority of each second sample role according to the role coefficient and the first probability, then determines at least one central sample role according to the priority, wherein the central sample role is a sample role with stronger event distinguishing capability, and then determines a target window corresponding to each central argument according to each central argument under the central sample role. And then, in a target window of the central argument, the prediction probability that each other argument and the central argument belong to the same event is determined, and the prediction event corresponding to the central argument is determined according to the prediction probability, so that a plurality of events in a complex sentence can be effectively divided based on the central argument and the prediction probability, and a plurality of prediction events are output, wherein each prediction event comprises at least one prediction argument. And updating the model parameters of the second submodel according to the predicted event output by the model and the labeled sample event, so that the training of the second submodel can be accurately and effectively realized, and the second submodel can accurately and effectively output the predicted event comprising the corresponding argument.

The above description describes the training process for the first sub-model and the second sub-model, and based on the above description, it can be determined that the first sub-model in the present disclosure can implement extraction of arguments and output argument roles and event types corresponding to the arguments, and the second sub-model in the present disclosure can implement determination of events corresponding to the arguments according to the arguments extracted by the first sub-model, that is, implement division of events. The first submodel and the second submodel are applied in combination to enable the extraction of the event. Specifically, events extracted from the text can be obtained, and arguments included in each event, argument roles corresponding to the arguments, and event types corresponding to the arguments can be determined.

Based on the above embodiments, the event extraction method provided in the present disclosure is further described in detail below with reference to fig. 11 and 12.

Fig. 11 is a flowchart of an event extraction method provided in the embodiment of the present disclosure, and fig. 12 is a processing diagram of the event extraction method provided in the embodiment of the present disclosure.

As shown in fig. 11, the method includes:

s1101, obtaining a first text to be processed.

In this embodiment, the first text to be processed is a text that needs to be subjected to event extraction, and the embodiment does not limit specific content, length, format, and the like of the first text, and all the texts that need to be subjected to event extraction may be used as the first text in this embodiment.

S1102, processing the first text through a first sub-model in the pre-trained event extraction model to obtain a first output result, wherein the first output result comprises: the argument, the role corresponding to the argument and the event type corresponding to the argument exist in the first text.

The pre-trained event extraction model in this embodiment may include a first sub-model and a second sub-model, where the first sub-model may process the first text, so as to output arguments existing in the first text, argument roles corresponding to the arguments, and event types corresponding to the arguments.

Referring to fig. 12, a first text may be input into a first submodel of the event extraction model, so that the first submodel outputs a first output result, where the first output result includes arguments existing in the first text, argument roles corresponding to the arguments, and event types corresponding to the arguments.

S1103, processing the first output result through a second sub-model in the pre-trained event extraction model to obtain an event existing in the first text and an argument corresponding to the event.

The pre-trained event extraction model in this embodiment further includes a second sub-model, where the second sub-model may process the first output result of the first sub-model, so as to output an event existing in the first text and an argument corresponding to the event.

Based on the above description of the embodiment, it can be determined that actually, the processing of the second sub-model only needs to extract each argument and the first text to be processed, and therefore, in one possible implementation, referring to fig. 12, each argument in the first output result can be obtained.

And inputting each argument and the first text into a second submodel, so that the second submodel can output the events existing in the first text and the arguments included in each event.

It can be determined with reference to fig. 12 that the first sub-model can output various arguments in the first text, their corresponding argument roles, and their corresponding event types, and the second sub-model can output the events existing in the first text and their corresponding arguments.

The extraction result shown in fig. 12, that is, each event existing in the first text, the event type of each event, the argument included in each event, and the argument role of each argument may be obtained. In an optional implementation manner, the second submodel may further output trigger words of the events, and then the extraction result may further include trigger words of each event, so that the extraction of the events from the first text may be effectively implemented.

In this embodiment, based on the above-described model training process, it can be understood that the pre-trained event extraction model in this embodiment can also implement accurate and effective model training for arguments that occur multiple times in a text, and can also implement effective event partitioning for complex situations in which a plurality of events are mashup. Therefore, the first text is processed based on the trained event extraction model to obtain an event extraction result, and the accuracy of the event extraction result can be effectively ensured.

Fig. 13 is a schematic structural diagram of an event extraction model training apparatus according to an embodiment of the present disclosure. As shown in fig. 13, the event extraction model training apparatus 1300 of the present embodiment may include: an obtaining module 1301, a first processing module 1302, a second obtaining module 1303, a second processing module 1304, and a determining module 1305.

An obtaining module 1301, configured to obtain a first training sample, where the first training sample includes a first sample text and first labeling data, and the first labeling data includes: a plurality of data packets corresponding to a plurality of sample arguments in the first sample text, a sample role corresponding to each data packet, and a sample event type corresponding to each data packet, wherein the sample arguments in any data packet are the same;

a first processing module 1302, configured to perform model training through the first training sample to obtain a first sub-model, where the first sub-model is used to determine arguments existing in a text, roles corresponding to the arguments, and event types corresponding to the arguments;

a second obtaining module 1303, configured to obtain a second training sample, where the second training sample includes a second sample text, a plurality of sample events existing in the second sample text, and a second sample argument included in each sample event;

a second processing module 1304, configured to perform model training on the second training sample to obtain a second sub-model, where the second sub-model is used to determine an event existing in the text and an argument corresponding to the event;

a determining module 1305 for determining an event extraction model based on the first sub-model and the second sub-model.

In a possible implementation manner, the first processing module 1302 is specifically configured to:

processing the first sample by the first sub-model to be trained to obtain first prediction data, wherein the first prediction data comprises a plurality of prediction arguments, a prediction role corresponding to the prediction arguments and a prediction event type corresponding to the prediction arguments;

and updating the model parameters of the first sub-model according to the first marking data and the first prediction data.

In one possible implementation, the first prediction data further includes predicted positions of the plurality of prediction arguments in the first sample text;

the first processing module 1302 is specifically configured to:

determining a first loss according to the first labeled data, the predicted role corresponding to the predicted argument and the predicted event type corresponding to the predicted argument;

determining a second loss according to the predicted positions of the plurality of predicted arguments and the actual positions of the plurality of predicted arguments in the first sample text;

and updating the model parameters of the first submodel according to the first loss and the second loss.

In one possible implementation, the first prediction data further includes probabilities of predicted positions of the plurality of prediction arguments in the first sample text;

the first processing module 1302 is specifically configured to:

grouping the multiple prediction arguments to obtain multiple groups of prediction arguments, wherein the arguments in each group of prediction arguments are the same;

determining target prediction arguments in the plurality of groups of prediction arguments respectively according to the probabilities of the prediction positions of the plurality of prediction arguments in the first sample text, wherein the probability of the prediction position of the target prediction argument in the group of prediction arguments in the first sample text is highest;

and determining the second loss according to the predicted position of the target argument and the actual position of the target argument in the first sample text.

In a possible implementation manner, the second processing module 1304 is specifically configured to:

processing the second sample text and the plurality of second sample arguments through the second submodel to be trained to obtain at least one predicted event, wherein the predicted event comprises at least one predicted argument;

determining a third loss according to a prediction argument in the prediction event and a second sample argument in the sample event;

and updating the model parameters of the second submodel according to the third loss.

determining a central argument according to the second sample text;

determining a target window corresponding to the central argument in the second sample text, wherein the target window comprises a preset number of characters;

determining a plurality of first arguments existing in the target window, and acquiring the prediction probability that the plurality of first arguments and the central argument correspond to the same event;

and determining at least one predicted event according to the central argument, each first argument and the prediction probability corresponding to each first argument.

determining a plurality of windows to be selected in the second sample text, wherein the windows to be selected comprise the characters with the preset number, and the windows to be selected comprise arguments corresponding to the central roles to be selected;

determining the number of arguments corresponding to a first event type included in the window to be selected, wherein the first event type is the event type corresponding to the central argument;

and determining the target window according to the number of arguments corresponding to the first event type in the window to be selected.

and determining the window to be selected with the maximum number of arguments corresponding to the first event type as the target window.

determining a plurality of second sample roles corresponding to the plurality of second sample arguments;

determining a first probability that a second sample argument under each second sample role corresponds to the same event;

determining the role coefficient of each second sample role according to the recall rate and the accuracy rate of each second sample role;

determining the center role to be selected in the plurality of second sample roles according to the first probability corresponding to each second sample role and the role coefficient of each second sample role;

and determining the argument corresponding to the central role to be selected as the central argument.

In one possible implementation, for any one of the second sample roles; the second processing module 1304 is specifically configured to:

and processing the recall rate and the accuracy rate of the second sample role according to a preset function to obtain the role coefficient of the second sample role.

determining the priority of each second sample role according to the corresponding first probability of each second sample role and the role coefficient of each second sample role;

and determining the role of the center to be selected in the plurality of second sample roles according to the priority of each second sample role.

and determining the product of the first probability of the second sample role corresponding to one event and the role coefficient of the second sample role as the priority of each second sample role.

if the priority of a second sample role in the plurality of second sample roles is greater than or equal to a preset threshold, determining the second sample role with the priority greater than or equal to the preset threshold as the center role to be selected;

and if the priorities of the plurality of second sample roles are all smaller than the preset threshold, determining the second sample role with the highest priority in the plurality of second sample roles as the center role to be selected.

determining the first argument with the corresponding prediction probability larger than or equal to a probability threshold value as a target argument according to the prediction probability corresponding to each first argument;

and determining a predicted event corresponding to the central argument, wherein the predicted event comprises the central argument and the target argument.

Fig. 14 is a schematic structural diagram of an event extraction device according to an embodiment of the present disclosure. As shown in fig. 14, the event extraction device 1400 of the present embodiment may include: an obtaining module 1401, a first processing module 1402, and a second obtaining module 1403.

An obtaining module 1401, configured to obtain a first text to be processed;

a first processing module 1402, configured to process the first text through a first sub-model in the pre-trained event extraction model to obtain a first output result, where the first output result includes: argument existing in the first text, role corresponding to the argument and event type corresponding to the argument;

a second processing module 1403, configured to process the first output result through a second sub-model in the pre-trained event extraction model, so as to obtain an event existing in the first text and an argument corresponding to the event.

In a possible implementation manner, the second processing module 1403 is specifically configured to:

obtaining each argument in the first output result;

and inputting each argument and the first text into the second submodel, so that the second submodel outputs events existing in the first text and arguments corresponding to the events.

The invention provides an event extraction model training method and device and an event extraction method and device, which are applied to the technical field of artificial intelligence such as knowledge maps and deep learning and the like so as to achieve the effect of improving the accuracy of an extraction result of an event extraction model.

It should be noted that the head model in this embodiment is not a head model for a specific user, and cannot reflect personal information of a specific user. It should be noted that the two-dimensional face image in the present embodiment is from a public data set.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 15 shows a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 15, the apparatus 1500 includes a computing unit 1501 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data necessary for the operation of the device 1500 can also be stored. The calculation unit 1501, the ROM 1502, and the RAM 1503 are connected to each other by a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

Various components in device 1500 connect to I/O interface 1505, including: an input unit 1506 such as a keyboard, a mouse, and the like; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 1508, such as a magnetic disk, optical disk, or the like; and a communication unit 1509 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1509 allows the device 1500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1501 may be various general and/or special purpose processing components having processing and computing capabilities. Some examples of the computation unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computation chips, various computation units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The calculation unit 1501 executes the respective methods and processes described above, such as the event extraction model training method or the event extraction method. For example, in some embodiments, the event extraction model training method or the event extraction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM 1502 and/or the communication unit 1509. When the computer program is loaded into the RAM 1503 and executed by the computing unit 1501, one or more steps of the event extraction model training method or the event extraction method described above may be performed. Alternatively, in other embodiments, the computation unit 1501 may be configured to perform the event extraction model training method or the event extraction method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An event extraction model training method comprises the following steps:

2. The method of claim 1, wherein model training through the first training sample results in a first sub-model comprising:

3. The method of claim 2, wherein the first prediction data further includes predicted locations of the plurality of prediction arguments in the first sample text;

updating model parameters of the first sub-model according to the first annotation data and the first prediction data, including:

4. The method of claim 3, wherein the first prediction data further includes probabilities of predicted locations of the plurality of prediction arguments in the first sample text;

determining a second loss based on the predicted locations of the plurality of predicted arguments and the actual locations of the plurality of predicted arguments in the first sample text, comprising:

5. The method of any of claims 1-4, wherein model training through the second training sample results in a second sub-model comprising:

6. The method of claim 5, wherein processing the second sample text and the plurality of second sample arguments with the second submodel to be trained to derive at least one predicted event comprises:

determining a central argument according to the second sample text;

7. The method of claim 6, wherein determining a target window in the second sample text to which the central argument corresponds comprises:

8. The method of claim 7, wherein determining the target window according to the number of arguments corresponding to the first event type included in the window to be selected comprises:

9. The method of any of claims 6-8, wherein determining a central argument from the second sample text comprises:

10. The method of claim 9, wherein for any one second sample role; determining the role coefficient of the second sample role according to the recall rate and the accuracy rate of the second sample role, wherein the role coefficient comprises the following steps:

11. The method according to claim 9 or 10, wherein determining the candidate center role among the plurality of second sample roles according to the first probability corresponding to each of the second sample roles and the role coefficient of each of the second sample roles includes:

12. The method of claim 11, wherein for any one second sample role; determining the priority of each second sample role according to the first probability corresponding to each second sample role and the role coefficient of each second sample role, including:

13. The method according to claim 11 or 12, wherein determining the candidate center role among the plurality of second sample roles according to the priority of each of the second sample roles comprises:

14. The method of any one of claims 6 to 13, wherein said determining at least one predicted event from said central argument, each of said first arguments, and a predicted probability corresponding to each of said first arguments comprises:

15. An event extraction method, comprising:

acquiring a first text to be processed;

16. The method of claim 15, wherein the processing the first output result through a second submodel in the pre-trained event extraction model to obtain an event existing in the first text and an argument corresponding to the event comprises:

obtaining each argument in the first output result;

17. An event extraction model training device, comprising:

18. The apparatus of claim 17, wherein the first processing module is specifically configured to:

19. The apparatus of claim 18, wherein the first prediction data further comprises predicted locations of the plurality of prediction arguments in the first sample text;

the first processing module is specifically configured to:

20. The apparatus of claim 19, wherein the first prediction data further comprises probabilities of predicted locations of the plurality of prediction arguments in the first sample text;

the first processing module is specifically configured to:

21. The apparatus according to any one of claims 17 to 20, wherein the second processing module is specifically configured to:

22. The apparatus of claim 21, wherein the second processing module is specifically configured to:

determining a central argument according to the second sample text;

23. The apparatus of claim 22, wherein the second processing module is specifically configured to:

24. The apparatus of claim 23, wherein the second processing module is specifically configured to:

25. The apparatus according to any one of claims 22-24, wherein the second processing module is specifically configured to:

26. The apparatus of claim 25, wherein for any one second sample role; the second processing module is specifically configured to:

27. The apparatus according to claim 25 or 26, wherein the second processing module is specifically configured to:

28. The apparatus of claim 27, wherein for any one second sample role; the second processing module is specifically configured to:

29. The apparatus according to claim 27 or 28, wherein the second processing module is specifically configured to:

30. The apparatus according to any one of claims 22-29, wherein the second processing module is specifically configured to:

31. An event extraction device comprising:

the acquisition module is used for acquiring a first text to be processed;

32. The apparatus of claim 31, wherein the second processing module is specifically configured to:

obtaining each argument in the first output result;

33. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-14 or claims 15-16.

34. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-14 or claims 15-16.

35. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 14 or claims 15 to 16.