CN114328916A

CN114328916A - Event extraction and training method of model thereof, and device, equipment and medium thereof

Info

Publication number: CN114328916A
Application number: CN202111572355.4A
Authority: CN
Inventors: 朱晓雨; 李宝善; 代旭东; 陈志刚
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-04-12

Abstract

The application discloses an event extraction method, a training method of a model thereof, a device, equipment and a medium thereof, wherein the training method of the event extraction model comprises the following steps: performing first training on an event extraction model by using target sample texts in a first sample set, wherein the event extraction model is used for predicting event classification of the texts; obtaining the decision influence of each target sample text in the first sample set on the event extraction model; screening out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set; and performing second training on the event extraction model by using the target sample texts in the second sample set, wherein the scheme is used for reducing noise of the target text samples directly based on the decision influence of the target sample texts on the event extraction model, so that the training cost of the event extraction model can be saved.

Description

Event extraction and training method of model thereof, and device, equipment and medium thereof

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method for extracting events and training a model thereof, and an apparatus, a device, and a medium therefor.

Background

In recent years, with the development of industrial control technology and computer network communication technology, industrial bus standards such as RS422 and RS485 have come to be developed, and the development and evolution are progressing towards supporting a plurality of sub-nodes, high communication rate, long-distance transmission, high receiving sensitivity and lower cost.

With the development of the internet era, countless information is emerging explosively, and manually processing a large amount of text is very complicated and time-consuming. The information extraction task mainly comprises the steps of automatically extracting specific information wanted by a user from a large amount of texts and extracting structured information from unstructured texts so as to automatically classify mass contents. The information extraction task includes event extraction, for example, in the news field, since there are countless new news and events generated every day, by performing event extraction on news, we can obtain what the main events occurred in a piece of news, but it is complicated and time-consuming to manually process a large amount of text.

Even if the model is used for extracting the event from the text, the noise of the training sample is very high in the training of the model, so that the noise of the training sample needs to be reduced by adopting contrast learning, counterstudy and reinforcement learning, the model is trained complexly, the model is difficult to converge, and the training cost is high. In view of this, how to save the training cost of the model becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide an event extraction method, a training method of a model of the event extraction method, a device, equipment and a medium of the event extraction method, and the training cost of the event extraction model can be saved.

In order to solve the above technical problem, a first aspect of the present application provides a method for training an event extraction model, including: performing first training on the event extraction model by using target sample texts in a first sample set, wherein the event extraction model is used for predicting event classification of texts; obtaining the decision influence of each target sample text in the first sample set on the event extraction model; screening out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set; and performing second training on the event extraction model by using the target sample texts in the second sample set.

In order to solve the above technical problem, a second aspect of the present application provides an event extraction method, including: training to obtain an event extraction model by using the training method of the event extraction model in the first aspect; acquiring a text to be extracted; and performing event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted.

In order to solve the above technical problem, a third aspect of the present application provides an event extraction model training apparatus, including: the event extraction system comprises a first training module, an acquisition module, a screening module and a second training module, wherein the first training module is used for performing first training on an event extraction model by using target sample texts in a first sample set, and the event extraction model is used for predicting event classification of texts; an obtaining module, configured to obtain a decision influence of each target sample text in the first sample set on the event extraction model; the screening module is used for screening out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set; and the second training module is used for carrying out second training on the event extraction model by using the target sample texts in the second sample set.

In order to solve the above technical problem, a fourth aspect of the present application provides an event extraction device, including a training module, a file obtaining module, and a classification module, where the training module is configured to train to obtain an event extraction model by using the training method of the event extraction model according to the first aspect; the file acquisition module is used for acquiring a text to be extracted; and the classification module is used for performing event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted.

In order to solve the above technical problem, a fifth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory to implement the method for training an event extraction model according to the first aspect or to implement the method for extracting an event according to the second aspect.

In order to solve the above technical problem, a sixth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being configured to implement the method for training an event extraction model according to the first aspect or the method for extracting an event according to the second aspect.

According to the scheme, after the target sample texts in the first sample set are used for conducting first training on the event extraction model for predicting the event classification of texts, the decision influence of each target sample text in the first sample set on the event extraction model is obtained, so that at least one target sample text is screened out from the first sample set based on the decision influence to obtain a second sample set, and then second training can be conducted on the event extraction model by using the target sample texts in the second sample set.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for training an event extraction model according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a training method for an event extraction model according to the present application;

FIG. 3 is a flowchart illustrating a step S29 of another embodiment of the training method for an event extraction model according to the present application;

FIG. 4 is a schematic flow chart diagram illustrating an embodiment of an event extraction method of the present application;

FIG. 5 is a block diagram of an embodiment of a training apparatus for an event extraction model according to the present application;

FIG. 6 is a block diagram of an embodiment of an event extraction device according to the present application;

FIG. 7 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a training method for an event extraction model according to the present application. Specifically, the method may include the steps of:

step S11: the event extraction model is first trained using the target sample texts in the first sample set.

The event extraction model is used for event classification of the prediction text. When the event extraction model is trained by using various sample texts, sentence representation of a target sample text can be obtained, classification probability distribution of each word in the sentence representation is obtained, then event classification of each word is obtained based on the classification probability distribution of each word, and finally predicted event classification of the target sample text can be obtained by using the event classification of each word, so that event prediction information of the target sample text is obtained. The event prediction information may include a classification probability distribution, a predicted event classification, and the like. The classification probability distribution is a processing result of the text before the event extraction model performs event classification, and is used for representing a matrix representation of the probability size of each word belonging to each preset classification, for example, the classification probability distribution of one text is a two-dimensional matrix obtained by multiplying the number of words of the text by the number of preset classifications, and the classification probability distribution of a plurality of texts is a three-dimensional matrix obtained by multiplying the number of texts, the number of words of the text and the number of preset classifications, wherein the preset classifications can be set in a self-defined manner, and the texts can be an original sample text, an extended sample text, a target sample text, a test sample text, a verification sample text, a text to be extracted and the like. The event extraction model may use a deep learning model such as CNN (convolutional Neural Network), RNN (Recurrent Neural Network), BERT (Bidirectional Encoder retrieval from transformations), or the like to train a class classifier of the model. In an actual scene, the event extraction model may be a BERT model with a simple model structure and a full-connected layer, each target sample text in the first sample set is input into the event extraction model to obtain a sentence representation of the target sample text, a sequence labeling method is adopted to perform multi-classification on each word in the sentence, then a corresponding classification probability distribution is obtained through one full-connected layer, and the maximum value of the classification probability distribution is taken as the classification corresponding to each word.

The first sample set includes several target sample texts, which may be texts in various application scenarios such as news, and is not limited in this respect. The target sample text may be obtained by performing data expansion on the original sample text in the original sample set in a data enhancement mode to obtain a plurality of expanded sample texts, or may be obtained by performing data expansion on at least a part of the original sample text in the original sample set. For example, in a disclosed embodiment, before the first training of the event extraction model is performed by using the target sample texts in the first sample set, the original sample texts in the original sample set may be subjected to data expansion by using a data enhancement mode to obtain a plurality of expanded sample texts; at least part of original sample texts and a plurality of extended sample texts in the original sample set are used as target sample texts to form a first sample set, so that the purposes of automatically constructing training data and automatically extending the sample set can be achieved through a data enhancement method.

The data enhancement method may be based on text, directly generate the extended sample text, or based on representation, perform noise reduction on the generated extended sample text. The data enhancement mode can comprise one or a combination of dictionary replacement, word vector replacement based, TF-IDF replacement based, random word insertion, random exchange, random deletion, pre-training model generation modes, or other data enhancement modes. The original sample text comprises text content and corresponding labels, and in order to keep the consistency of the labels, when the original sample text in the original sample set is subjected to data expansion by using a data enhancement method, the labels are kept unchanged or synonym replacement is carried out on the labels as much as possible, so that label offset is avoided. The data enhancement mode comprises the steps of selecting a word to be replaced from an original sample text when dictionary replacement is carried out, and replacing the word to be replaced with a synonym which has the same semantic meaning with the word to be replaced in a word bank, wherein the word bank comprises but is not limited to word banks such as WordNet and FrameNet. The data enhancement method includes randomly selecting words from the original sample text for replacement based on word vector replacement, for example, replacing the selected words in the original sample text with words in a vector space that are closer to each other based on a pre-trained word vector, such as word2vec, glove, and the like. The data enhancement mode comprises that when the original sample text is replaced based on TF-IDF (term frequency-inverse text frequency index), words of the original sample text are subjected to traversal by using a TF-IDF method, words with low TF-IDF scores are replaced by preset words, and due to the fact that TF-IDF scores are low, the number of times of the words appearing in the original sample text is few, and provided information is less, therefore, after the words with low TF-IDF scores are replaced, the label consistency of the original sample text can still be guaranteed. The data enhancement mode comprises the steps of randomly selecting a word which is not a stop word from an original sample text when the word is randomly inserted, obtaining a synonym of the word, and inserting the synonym into a random position of the original sample text. The data enhancement mode comprises the steps of randomly replacing the positions of a plurality of words in the original sample text when randomly exchanging, and in addition, in order to ensure the consistency of the labels, a plurality of words do not comprise the label text. The data enhancement mode comprises deleting a plurality of words in the original sample text when randomly deleting, and in addition, in order to ensure the consistency of the labels, a plurality of words do not comprise the label text. The pre-training model generation mode is to shield part of words in the text, and generate new words and replace the shielded words by using the pre-training model, for example, in order to ensure label consistency, when the labels are shielded, label identifiers are added on two sides of a shielding position at the same time, the text generation of shielded label content is limited, and semantic deviation from the original sample text is prevented. In a practical scenario, the original sample text is: from 0 to 24 on 27 days 10 months, 2 cases of confirmed cases of local new crown pneumonia were newly added a, and then the range of the generated text content was limited by setting the location tag identifier < LOC > so that the tag would not deviate excessively, so that the pre-training model inputs are: from 0 to 24 days 0 to 27 months in 10 months, < LOC > < Mask > < LOC > adds 2 local new confirmed cases of the Xinguan pneumonia, and expanded sample texts are obtained: from 0 to 24 days 0, 10 months and 27 days, 2 confirmed cases of local neocoronary pneumonia were newly added to B. In the data enhancement method, when the synonym is used for replacing the word of the original sample text or the pre-training model generation mode, the label can be replaced, and the words in the original sample text except the label are processed under the other conditions, so that the consistency of the label is maintained.

In the prior art, when a sample is manually labeled to expand an original sample text, or a data enhancement mode is used to expand data of the original sample text in an original sample set, because the quality of the expanded sample text cannot be determined, only simple superposition can be performed on a data enhancement method, and whether a current classification task is suitable for the used data enhancement method is reversely deduced through a model training result, generally only 1-2 data enhancement methods are used, and the data enhancement method needs to be selected manually subsequently, so that the following defects are at least existed: the training cost is high; the number of the extended sample texts and the types of data enhancement methods are limited, and particularly for scenes such as sudden news event extraction, the generated extended sample texts are still insufficient; the obtained extended sample texts have a large amount of noise, the labels of a plurality of extended sample texts are possibly changed, the event extraction model is not promoted at all, and the event extraction classification effect is reduced. Compared with the prior art, the data enhancement method is not limited to a certain method in application, and all data enhancement methods for keeping the consistency of the labels can be used independently or randomly in an overlapping mode, so that the number of the data enhancement methods and the number of the data enhancement methods are not limited, more data enhancement methods are applied as far as possible, more training data are effectively obtained, and the training set is rapidly expanded.

Step S12: and obtaining the decision influence of each target sample text in the first sample set on the event extraction model.

After the target sample texts in the first sample set are used for performing first training on the event extraction model, a sub-sampling process is added, specifically referring to step S12 and step S13, each target sample text in the first sample set is evaluated, and the decision influence of the target sample text on model training is judged.

The obtaining mode of the decision influence is not specifically limited, for example, in a disclosed embodiment, when the decision influence of each target sample text in the first sample set on the event extraction model is obtained, event prediction can be performed on each target sample text in the first sample set by using the first trained event extraction model to obtain first event prediction information of each target sample text; and determining the decision influence of each target sample text based on the first event prediction information of each target sample text. The first event prediction information of the target sample text may include a first classification probability distribution of the target sample text or a predicted event classification of the target sample text. Therefore, the decision influence of each target sample text is determined by utilizing the first event prediction information of the target sample text, which is obtained by the event prediction of the event extraction model, and the decision influence can reflect the training influence of the target sample text on the event extraction model because the event extraction model is subjected to parameter adjustment through the first training.

Step S13: at least one target sample text is screened out from the first sample set based on the decision influence to obtain a second sample set.

And each target sample text corresponds to the decision influence of the target sample text on the event extraction model, the decision influence of each target sample text on the event extraction model is compared, at least one target sample text is screened out from the first sample set to obtain a second sample set, so that the noise of the target sample text is reduced, and the target sample text with better decision influence is screened out for second training. In a disclosed embodiment, at least one target sample text is screened out from a first sample set based on decision influence to obtain a second sample set, and a target sample text with decision influence meeting a preset influence condition is selected out from the first sample set to obtain the second sample set. The preset influence condition is that the decision influence is larger than a preset influence value, or the decision influence of each target sample text in the first sample set is within a front preset proportion range in the ordering from high to low. The preset influence condition, the preset influence value and the preset proportion range can be set and adjusted in a self-defined manner, and are not particularly limited herein.

Data noise reduction techniques are critical in the direction of event extraction. In the artificial intelligence era, a high-quality information extraction model is trained to realize rapid automatic screening of mass data, but how to acquire high-quality training data and realize rapid high-quality training data construction is an urgent problem to be solved. In the extended sample text generated by the data enhancement method, not all samples have a promoting effect on the model, and in the training process, the model can effectively apply various data enhancement methods only by selecting high-quality data which can help the model training in the extended sample text obtained by the data enhancement method. The existing method for reducing noise of target sample texts generally adopts counterstudy, contrast study, reinforcement study and the like, complex training is carried out on a model, the model is difficult to converge, the training cost is high, the model complexity is high, the model training efficiency is low, the scheme obtains the decision influence of each target sample text on an event extraction model in a first sample set, at least one target sample text is screened out from the first sample set based on the decision influence, the decision influence of each target sample text on the model training effect is obtained directly through calculation, automatic high-quality data sampling is achieved, and the training cost can be effectively saved.

Step S14: and performing second training on the event extraction model by using the target sample texts in the second sample set.

When the target sample texts in the second sample set are used for carrying out second training on the event extraction model, the training texts of the second training are the target sample texts obtained after noise reduction is carried out on the target sample texts by utilizing decision influence, and the implementation mode of model training can be any existing model training mode.

When the first training is performed on the event extraction model by using the target sample texts in the first sample set or the second training is performed on the event extraction model by using the target sample texts in the second sample set, the event prediction can be performed on the target sample texts by using the event extraction model to obtain second event prediction information of the target sample texts, so that the training loss is determined by using the second event prediction information of the target sample texts, and the parameters of the event extraction model are adjusted based on the training loss. The second event prediction information may be a classification probability distribution, a predicted event classification. The method of computation of the loss may be a cross-entropy loss function.

According to the scheme, after the target sample texts in the first sample set are used for conducting first training on the event extraction model for predicting event classification of texts, the decision influence of each target sample text in the first sample set on the event extraction model is obtained, so that at least one target sample text is screened out from the first sample set based on the decision influence to obtain the second sample set, second training can be conducted on the event extraction model by using the target sample texts in the second sample set, and compared with manual noise reduction or noise reduction of samples is conducted by adopting contrast learning, counterstudy and reinforcement learning, noise reduction is conducted on the target text samples based on the decision influence of the target sample texts on the event extraction model, and the training cost of the event extraction model can be saved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a training method of an event extraction model according to another embodiment of the present application. Specifically, the method may include the steps of:

step S21: and performing data expansion on the original sample texts in the original sample set by using a data enhancement mode to obtain a plurality of expanded sample texts, and taking at least part of the original sample texts and the plurality of expanded sample texts in the original sample set as target sample texts to form a first sample set.

The extended sample text is obtained by converting the original sample text in the original sample set by using a data enhancement method, so that the number of target sample texts is greatly increased, but the extended sample text may have the problems of tag offset and the like, and therefore, the target sample text needs to be denoised. For the description of step S21, refer to step S11, and are not repeated herein.

Step S22: the event extraction model is first trained using the target sample texts in the first sample set.

For the description of step S22, refer to step S11, and are not repeated herein. After the first training of the event extraction model, a sub-sampling process is added, which is specifically referred to the process of using sub-sampling to select data in steps S23 to S28.

Step S23: and utilizing the first trained event extraction model to perform event prediction on the test sample texts in the test sample set to obtain a second classification probability distribution of each test sample text in the test sample set.

Step S24: and performing event prediction on each target sample text in the first sample set by using the first trained event extraction model to obtain first event prediction information of each target sample text.

In an embodiment of the present disclosure, the first event prediction information of the target sample text includes a first classification probability distribution of the target sample text. Therefore, the step S23 and the step S24 respectively use the first trained event extraction model to perform event prediction on the test sample text and the target sample text, and correspondingly obtain a second classification probability distribution and a first classification probability distribution. The first trained event extraction model is the model which is trained by using the target sample text and has parameters adjusted, so that when the target sample text and the test sample text are input into the event extraction model with the parameters adjusted, only gradient change is needed, gradient return is not conducted, the parameters are not updated, and the gradient change is used for calculating the promotion effect of the target sample text on the training result.

The test sample set comprises a plurality of sample test sample texts, and labels of the sample test sample texts are as accurate as those of original sample texts, so that second classification probability distribution of the sample test sample texts is relatively fit with corresponding actual classification probability distribution. Some of the target sample texts are extended sample texts obtained by data enhancement, so that the first classification probability distribution of the target sample texts may have a large difference from the corresponding actual classification probability distribution.

In the embodiment of the present disclosure, the first event prediction information of the target sample text includes a first classification probability distribution of the target sample text, and before determining the decision impact of each target sample text based on the first event prediction information of each target sample text, an event extraction model that is trained first is used to perform event prediction on the test sample texts in the test sample set, so as to obtain a second classification probability distribution of each test sample text in the test sample set, so that step S12 in the embodiment of fig. 1 determines the decision impact of each target sample text based on the first event prediction information of each target sample text through steps S25-S27.

Step S25: and determining the vector parameters of each target sample text by using the first classification probability distribution of the target sample text.

In a disclosed embodiment, when determining the vector parameter of each target sample text by using the first classification probability distribution of the target sample text, calculating the difference value between the first classification probability distribution and the first actual classification probability distribution, and taking the sum of the product of the difference value, the square of the sentence representation of the target sample text and the regular parameter and one as the inverse sea plug vector product of the target sample text. The specific formula is as follows:

wherein H represents the inverse sea plug vector product, y_iRepresenting a first classification probability distribution; y is_iRepresenting a first actual classification probability distribution; x is the number of_ikA sentence representation representing a target sample text, and wherein_iRepresents a sentence,_kRepresents a word; c denotes a regularization parameter.

Step S26: and obtaining a first loss corresponding to each target sample text based on the first event prediction information of each target sample text, and obtaining a second loss corresponding to each test sample text based on the second classification probability distribution of each test sample text.

The first event prediction information for the target sample text includes a first classification probability distribution for the target sample text. The loss is the result of calculating the difference between the classification probability distribution predicted by the event extraction model and the corresponding actual classification probability distribution. For example, the computation method of the loss may be obtained by a cross-entropy loss function.

Step S27: and obtaining an influence function vector corresponding to each target sample text by using the vector parameter, the first loss of the target sample text and the second loss of the test sample text, wherein the influence function vector represents the decision influence of the target sample text.

The formula for the calculation of the influence function vector is as follows:

z_ip is the classification probability distribution of the ith target sample text in the first sample set,

phi represents an influence function vector of the ith target sample text in the first sample set, which is obtained by using the jth test sample text in the test sample set, for the classification probability distribution of the jth test sample text in the test sample set;

representing a second loss of the jth test sample text in the set of test samples;

representing a first loss of the ith target sample text in the first sample set. Therefore, when the vector parameters, the first loss of the target sample texts and the second loss of the test sample texts are utilized to obtain the influence function vector corresponding to each target sample text, the product of the inverse sea plug vector product, the first loss of the target sample texts and the second loss of the test sample texts is used as the influence function vector of the target sample texts, and therefore an influence function method is introduced to carry out data sampling so as to select high-quality target sample texts from a large amount of data. And the second loss of the test sample text is the second loss of any test sample text in the test sample set, so that when the influence function vector of each target sample text is calculated, the target sample text is respectively compared with any test sample text to obtain the parameter influence of the target sample text on the model training.

Step S28: and selecting a target sample text of which the influence function vector meets a preset influence condition from the first sample set to obtain a second sample set.

Because the influence function vector represents the decision influence of the target sample text, when at least one target sample text is screened out from the first sample set based on the decision influence to obtain a second sample set, the target sample text with the influence function vector meeting the preset influence condition is selected from the first sample set to obtain the second sample set. The preset influence condition is that the influence function vector is larger than a preset influence value, or the influence function vector is located in a front preset proportion range in the ordering from high to low of the influence function vector of each target sample text of the first sample set.

The influence of each target sample text on the event extraction model parameters is obtained based on the influence function vectors, so that after the influence function vectors corresponding to the target sample texts are obtained by utilizing the vector parameters, the first loss of the target sample texts and the second loss of the test sample texts, the linear sampling function can be adopted to process the influence function vectors, and the sampling classification probability distribution is calculated for each target sample text, so that when at least one target sample text is screened out from the first sample set based on decision influence to obtain a second sample set, the target sample text of which the processed influence function vectors meet the preset influence condition is selected from the first sample set to obtain the second sample set.

The decision influence of each target sample text is obtained through influence function calculation and is represented by an influence function vector, so that the influence function vector can obtain the influence of each target sample text on the parameters of the training event extraction model, then the sampling probability distribution is obtained according to the influence function vector, and the first sample set is subjected to sub-sampling. Through sub-sampling, target sample texts which can help the model training are reserved, target sample texts with label offset generated due to data enhancement are filtered, the quality of the target sample texts is improved, and the model training is promoted better.

Step S29: and performing second training on the event extraction model by using the target sample texts in the second sample set.

When the target sample text in the second sample set is used for carrying out second training on the event extraction model, event prediction can be carried out on the target sample text by using the event extraction model, so that the event extraction model executes sentence expression of the target sample text, third classification probability distribution of each word in the sentence expression is obtained, and the third classification probability distribution is used as second event prediction information of the target sample text; and then determining the training loss of the time by using second event prediction information of the target sample text, and adjusting parameters of the event extraction model based on the training loss of the time to complete second training of the event extraction model.

And based on the influence function vector representing the decision influence, denoising the target sample texts in the first sample set to obtain target sample texts in the second sample set. Because the influence function vector can obtain whether the target sample text has a promotion effect on the model training or not, the useful target sample text can be saved, and because the promotion degree of each target sample text on the model training is different, in the training process, for the sample with the promotion effect, the weight of the sample is expected to be larger, and for the sample with the smaller promotion effect, the weight of the sample is expected to be relatively smaller. Therefore, when the second training is performed on the event extraction model by using the target sample texts in the second sample set, in order to better assist the model training for the screened target sample texts, different weights may be matched for the target sample texts, based on which, for clearly explaining step S29, please refer to fig. 3, and fig. 3 is a flowchart illustrating step S29 according to another embodiment of the training method for the event extraction model of the present application. Specifically, the method may include the steps of:

step S291: and copying the first trained event extraction model to obtain a copy model.

In order not to affect the parameter update of the first trained event extraction model when obtaining the weight, the first trained event extraction model is copied to obtain a copied model, so that a model with the same structure as the first trained event extraction model is redefined, and the model parameters and the model gradient of the first trained event extraction model are saved.

Step S292: and training the copy model by using the target sample texts in the second sample set.

After the replication model is trained by using the target sample texts in the second sample set, the trained replication model is used for performing event prediction on the verification sample set. When the replication model is trained by using the target sample texts in the second sample set, inputting the target sample texts in the second sample set into the replication model in batches for event prediction to obtain fifth classification probability distribution of the target sample texts in the corresponding batch, then defining an indirect vector with the same size as the target sample texts in the batch, multiplying the indirect vector with the fifth classification probability distribution of the target sample texts in the batch to obtain third loss of the replication model, and finally adjusting parameters of the replication model based on the third loss. The fifth classification probability distribution is a matrix of multiple dimensions, specifically a matrix obtained by multiplying the number of target sample texts in a corresponding batch, the number of words in the target sample texts and the preset classification number to which the words belong. The indirect vector is also a matrix of the number of target sample texts of the corresponding batch multiplied by one.

The replication model may be trained using the target sample texts in the second sample set before determining weights of the target sample texts in the second sample set based on a second loss of event prediction for the validation sample set by the replication model. The trained replication model is used for event prediction on the validation sample set.

Step S293: and determining the weight of each target sample text in the second sample set based on the second loss of the event prediction of the replication model on the verification sample set.

And when determining the weight of each target sample text in the second sample set based on the second loss of the event prediction of the replication model, performing the event prediction of each validation sample text in the validation sample set by using the replication model to obtain third event prediction information of each validation sample text, then obtaining the second loss of the validation sample set based on the third event prediction information of each validation sample text, and finally determining the weight of each target sample text in the second sample set by using the second loss of the validation sample set. Specifically, when the second loss of the verification sample set is used to determine the weight of each target sample text in the second sample set, the gradient change of the indirect vector may be obtained, and the result after the gradient change is normalized is used as the weight.

In order to avoid overfitting of the test set, that is, to avoid that the target sample text only has a promoting effect on the test sample text but has no promoting effect after a part of the sample text is replaced, a second loss of event prediction of the replication model on the verification sample set is further required, and the weight of each target sample text in the second sample set is determined.

And after the model is copied, sampling samples, inputting the samples into the copied model, outputting a probability distribution matrix, performing gradient feedback, and adjusting parameters, wherein the training is the same as that of the model. And then, inputting the replication model with the adjusted parameters into the verification set sample, and in order to obtain the verification set weight, taking a gradient of the verification sample input model of the verification set without gradient return and parameter adjustment.

Therefore, the promotion effect of the target sample texts in the second sample set left after sampling on the event extraction model is large, the target sample texts with large promotion effect are enabled to be larger in weight, the target sample texts with small promotion effect are enabled to be smaller in weight, and therefore by introducing weight distribution, weight calculation is conducted on the sampled target sample texts, the promotion effect of the target sample texts trained for the second time on the model training is larger, the weight is larger, and therefore the model training is better assisted.

Step S294: and performing event prediction on the target sample text by using the event extraction model to obtain second event prediction information of the target sample text.

And when the second event prediction information comprises third classification probability distribution, performing event prediction on the target sample text by using the event extraction model to obtain second event prediction information of the target sample text, performing sentence representation of the target sample text by using the event extraction model, and obtaining the third classification probability distribution of each word in the sentence representation.

The whole process of obtaining the weights in steps S291 to S293 is performed on the replica model, and the process of obtaining the second event prediction information of the target sample text in step S294 is performed on the event extraction model, and the event extraction model is the first trained event extraction model because the second training process is performed on the event extraction model by using the target sample text in the second sample set. The sequence between the whole process of obtaining the weight in steps S291 to S293 and the step S294 of obtaining the second event prediction information of the target sample text is not limited.

Step S295: weighting the third classification probability distribution of the target sample text by using the weight to obtain a fourth classification probability distribution of the target sample text, determining the training loss of the time based on the fourth classification probability distribution of the target sample text, and adjusting parameters of the event extraction model based on the training loss of the time.

In the second training process, when the training loss is determined by using the second event prediction information of the target sample text, weighting the third classification probability distribution of the target sample text by using the weight to obtain the fourth classification probability distribution of the target sample text; and determining the training loss of the time based on the fourth classification probability distribution of the target sample text, and adjusting the parameters of the event extraction model based on the training loss of the time.

Therefore, in the above scheme, the input of the event extraction model is not the target sample text in the whole first training set, but the target sample text in the second training set after the data is sampled by using the influence function vector and the noise is reduced is realized, and the target sample text with the influence function vector meeting the preset influence condition can be selected, so that the sampling proportion with the best effect on the test set can be conveniently selected, and the training sample with higher quality can be obtained; meanwhile, in the second training process, the importance of the target sample text is calculated in a weight distribution mode according to the gradient change on the verification set, and model training is better promoted.

Referring to fig. 4, fig. 4 is a flowchart illustrating an event extraction method according to an embodiment of the present application.

Specifically, the method may include the steps of:

step S41: and training by using any one of the above training methods of the event extraction model to obtain the event extraction model.

The training method of the event extraction model may be any one of the above-described embodiments of the training method of the event extraction model.

Step S42: and acquiring the text to be extracted.

The text to be extracted may be any text. For example, in an actual scene, in a scene where news centered on a new crown epidemic situation suddenly explodes, and the new crown epidemic situation is exponentially increased, the text to be extracted is a news text, so that the event type prediction can be performed on the news text by using an event extraction method.

Step S43: and performing event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted.

When the event extraction model is used for extracting the events of the text to be extracted to obtain the event classification of the text to be extracted, the sentence representation of the text to be extracted can be obtained, the classification probability distribution of each word in the sentence representation is obtained, then the event classification of each word is obtained based on the classification probability distribution of each word, and finally the event classification of the text to be extracted can be obtained by using the event classification of each word, so that the trained event extraction model is used for extracting the text to be extracted to obtain the maximum classification probability distribution of the text to be extracted, and the classification of the events is predicted.

By the method, after the text to be extracted is obtained, the event extraction model is utilized to extract the event from the text to be extracted, the event classification of the text to be extracted is obtained, and the event extraction can be quickly and conveniently realized.

Referring to fig. 5, fig. 5 is a block diagram illustrating an embodiment of a training apparatus 50 for an event extraction model according to the present application. The training apparatus 50 of the event extraction model includes a first training module 51, an obtaining module 52, a screening module 53 and a second training module 54. A first training module 51, configured to perform a first training on an event extraction model by using target sample texts in a first sample set, where the event extraction model is used for predicting event classification of the texts; an obtaining module 52, configured to obtain a decision influence of each target sample text in the first sample set on the event extraction model; a screening module 53, configured to screen out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set; and a second training module 54, configured to perform a second training on the event extraction model by using the target sample texts in the second sample set.

In the above scheme, after the first training module 51 performs first training on the event extraction model for predicting event classification of texts by using the target sample texts in the first sample set, the obtaining module 52 obtains the decision influence of each target sample text in the first sample set on the event extraction model, so that the screening module 53 screens out at least one target sample text from the first sample set based on the decision influence to obtain the second sample set, and the second training module 54 performs second training on the event extraction model by using the target sample texts in the second sample set, compared with manual noise reduction or noise reduction of samples by using contrast learning, counterlearning and reinforcement learning, the scheme performs noise reduction on the target text samples based on the decision influence of the target sample texts on the event extraction model, and can save the training cost of the event extraction model.

In some disclosed embodiments, the obtaining module 52 is configured to, when obtaining a decision influence of each target sample text in the first sample set on the event extraction model, perform event prediction on each target sample text in the first sample set by using the first trained event extraction model to obtain first event prediction information of each target sample text; and determining the decision influence of each target sample text based on the first event prediction information of each target sample text.

Therefore, the decision influence of each target sample text is determined by utilizing the first event prediction information of the target sample text, which is obtained by the event prediction of the event extraction model, so that the decision influence can reflect the training influence of the target sample text on the event extraction model.

In some disclosed embodiments, the first event prediction information for the target sample text comprises a first classification probability distribution for the target sample text; the obtaining module 52 is configured to perform event prediction on the test sample texts in the test sample set by using a first trained event extraction model before determining the decision influence of each target sample text based on the first event prediction information of each target sample text, so as to obtain a second classification probability distribution of each test sample text in the test sample set; the obtaining module 52 is configured to, when determining the decision influence of each target sample text based on the first event prediction information of each target sample text, determine a vector parameter of each target sample text by using the first classification probability distribution of the target sample text; obtaining a first loss corresponding to each target sample text based on the first event prediction information of each target sample text, and obtaining a second loss corresponding to each test sample text based on the second classification probability distribution of each test sample text; and obtaining an influence function vector corresponding to each target sample text by using the vector parameter, the first loss of the target sample text and the second loss of the test sample text, wherein the influence function vector represents the decision influence of the target sample text.

Therefore, the target sample text and the test sample text are input into the first trained event extraction model to obtain corresponding classification probability distribution, respective loss is calculated based on the corresponding classification probability distribution, and the vector parameter of each target sample text is determined by using the first classification probability distribution of the target sample text, so that the influence function vector corresponding to each target sample text is obtained by using the vector parameter, the first loss of the target sample text and the second loss of the test sample text, the influence function vector represents the decision influence of the target sample text, and the influence function vector is the result of comparative analysis of the target sample text and the test sample text.

In some disclosed embodiments, the obtaining module 52 is configured to, when determining the vector parameter of each target sample text by using the first classification probability distribution of the target sample text, calculate a difference between the first classification probability distribution and the first actual classification probability distribution, and take a sum of a product of the difference, a square of a sentence representation of the target sample text, and a regular parameter, and one as an inverse sea plug vector product of the target sample text; the obtaining module 52 is configured to, when obtaining the influence function vector corresponding to each target sample text by using the vector parameter, the first loss of the target sample text, and the second loss of the test sample text, further use a product of the inverse sum-of-sea products, the first loss of the target sample text, and the second loss of the test sample text as the influence function vector of the target sample text.

Therefore, the vector parameter is an inverse sea plug vector product, and the product of the inverse sea plug vector product, the first loss of the target sample text and the second loss of the test sample text is used as the influence function vector of the target sample text, so that the decision influence of model training is judged in a calculation mode, and compared with a complex model, the cost of model training can be saved.

In some disclosed embodiments, the first training module 51 is configured to perform first training on the event extraction model by using the target sample texts in the first sample set, or the second training module 54 is configured to perform second training on the event extraction model by using the target sample texts in the second sample set, and further configured to perform event prediction on the target sample texts by using the event extraction model to obtain second event prediction information of the target sample texts; and determining the training loss of the time by utilizing the second event prediction information of the target sample text, and adjusting the parameters of the event extraction model based on the training loss of the time.

Therefore, the loss of the second event prediction information calculation model obtained by performing event prediction on the target sample text by using the event extraction model is reduced, and the adjustment of the model parameters in the first training and the second training is realized.

In some disclosed embodiments, the second event prediction information includes a third classification probability distribution, and the first training module 51 or the second training module 54 is configured to perform, when performing event prediction on the target sample text by using the event extraction model to obtain the second event prediction information of the target sample text, further by using the event extraction model to perform: a sentence representation of the target sample text is obtained, and a third classification probability distribution of words in the sentence representation is obtained.

Therefore, the event extraction model obtains the sentence representation of the target sample text and obtains the third classification probability distribution of each word in the sentence representation as the second event prediction information, the whole model training process is simple, and the requirement on the model structure is low.

In some disclosed embodiments, the second training module 54 is configured to copy the first trained event extraction model to obtain a copied model when performing the second training on the event extraction model by using the target sample texts in the second sample set; determining the weight of each target sample text in the second sample set based on the second loss of event prediction of the verification sample set by the replication model; in the second training process, the second training module 54 is configured to determine that the current training is lost by using the second event prediction information of the target sample text, and is further configured to weight the third classification probability distribution of the target sample text by using the weight to obtain a fourth classification probability distribution of the target sample text; and determining the training loss of the time based on the fourth classification probability distribution of the target sample text.

In some disclosed embodiments, the second training module 54 is further configured to train the replication model with the target sample texts in the second sample set before determining the weight of each target sample text in the second sample set based on a second loss of event prediction for the validation sample set by the replication model, wherein the trained replication model is used for event prediction for the validation sample set; and/or the second training module 54 is configured to, when determining the weight of each target sample text in the second sample set based on the second loss of the event prediction performed on the verification sample set by the replication model, perform event prediction on each verification sample text in the verification sample set by using the replication model to obtain third event prediction information of each verification sample text; obtaining a second loss of the verification sample set based on the third event prediction information of each verification sample text; and determining the weight of each target sample text in the second sample set by using the second loss of the verification sample set.

Therefore, in order to avoid overfitting of the test set, that is, in order to avoid the promotion effect of the target sample text on the test sample text, but no promotion effect exists after a part of the sample text is replaced, it is further necessary to verify the copy model after the sample text is input with the adjustment parameters, perform event prediction on the verification sample set, prove the magnitude of the promotion effect of the target sample text, and accordingly determine the weight correspondingly, introduce weight distribution, and optimize the model training effect.

In some disclosed embodiments, the second training module 54 is configured to, when training the replication model with the target sample texts in the second sample set, input the target sample texts in the second sample set into the replication model in batches for event prediction to obtain a fifth classification probability distribution of the target sample texts in the corresponding batch; defining an indirect vector with the same size as the target sample texts in the batch, and multiplying the indirect vector by the fifth classification probability distribution of the target sample texts in the batch to obtain a third loss of the replication model; adjusting parameters of the replication model based on the third loss; the second training module 54 is configured to, when determining the weight of each target sample text in the second sample set by using the second loss of the verification sample set, obtain a gradient change of the indirect vector, and use a result after the gradient change is normalized as the weight.

Therefore, the sampled target sample texts are weighted, so that the model training can be better assisted if the target sample texts have a larger promotion effect on the model training and have a larger weight.

In some disclosed embodiments, the screening module 53 is configured to, when screening out at least one target sample text from the first sample set based on the decision impact to obtain a second sample set, further select a target sample text whose decision impact satisfies a preset impact condition from the first sample set to obtain the second sample set, where the preset impact condition is that the decision impact is greater than a preset impact value or is located in a pre-preset proportion range in a high-to-low ordering of the decision impact of each target sample text of the first sample set.

Therefore, the decision influence of each target sample text in the first sample set on the event extraction model is obtained, at least one target sample text is screened out from the first sample set based on the decision influence, the decision influence of each target sample text on the model training effect is obtained directly through calculation, automatic high-quality data sampling is achieved, and the training cost can be effectively saved.

In some disclosed embodiments, the first training module 51 is further configured to perform data expansion on the original sample texts in the original sample set by using a data enhancement manner to obtain a plurality of expanded sample texts, before performing the first training on the event extraction model by using the target sample texts in the first sample set; at least part of original sample texts and a plurality of extended sample texts in the original sample set are used as target sample texts to form a first sample set.

Therefore, the training data can be automatically constructed through the data enhancement method, so that the purpose of automatically expanding the sample set can be achieved.

In some disclosed embodiments, the data enhancement mode comprises one or a combination of dictionary replacement, word vector replacement based, TF-IDF replacement based, random word insertion, random exchange, random deletion and pre-training model generation mode, wherein the pre-training model generation mode is used for shielding partial words in the text, and generating new words by using the pre-training model and replacing the shielded words.

Therefore, the application of the data enhancement method is not limited to a certain method, and all the data enhancement methods for keeping the consistency of the labels can be used independently or randomly in an overlapping mode, so that the number of the data enhancement methods and the number of the data enhancement methods are not limited, more data enhancement methods are applied as far as possible, more training data are effectively obtained, and the training set is rapidly expanded.

Referring to fig. 6, fig. 6 is a schematic block diagram of an embodiment of an event extraction device 60 according to the present application. The event extraction device 60 includes a training module 61, a document acquisition module 62, and a classification module 63. A training module 61, configured to train the obtained event extraction model by using any one of the above training method embodiments of the event extraction model; the file acquisition module 62 is used for acquiring a text to be extracted; and the classification module 63 is configured to perform event extraction on the text to be extracted by using the event extraction model to obtain an event classification of the text to be extracted.

In this way, after the file obtaining module 62 obtains the text to be extracted, the file obtaining module 62 performs event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted, so that the event extraction can be quickly and conveniently realized.

Referring to fig. 7, fig. 7 is a schematic diagram of a frame of an embodiment of an electronic device 70 according to the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, wherein the memory 71 stores program instructions, and the processor 72 is configured to execute the program instructions to implement the steps in any of the above-described embodiments of the event extraction model training method, or to implement the steps in any of the above-described embodiments of the event extraction method. Specifically, the electronic device 70 may include, but is not limited to: desktop computers, notebook computers, servers, mobile phones, tablet computers, and the like, without limitation.

In particular, the processor 72 is configured to control itself and the memory 71 to implement the steps in any of the above described embodiments of the training method of the event extraction model, or to implement the steps in any of the above described embodiments of the event extraction method. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 72 may be collectively implemented by an integrated circuit chip.

Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer readable storage medium 80 according to the present application. The computer readable storage medium 80 stores program instructions 81 executable by the processor, the program instructions 81 for implementing steps in any of the above described embodiments of the training method of the event extraction model or for implementing steps in any of the above described embodiments of the event extraction method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method for training an event extraction model, the method comprising:

performing first training on the event extraction model by using target sample texts in a first sample set, wherein the event extraction model is used for predicting event classification of texts;

obtaining the decision influence of each target sample text in the first sample set on the event extraction model;

screening out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set;

and performing second training on the event extraction model by using the target sample texts in the second sample set.

2. The method of claim 1, wherein obtaining a decision impact of each of the target sample texts in the first sample set on the event extraction model comprises:

performing event prediction on each target sample text in the first sample set by using the first trained event extraction model to obtain first event prediction information of each target sample text;

determining the decision impact of each of the target sample texts based on first event prediction information of each of the target sample texts.

3. The method of claim 2, wherein the first event prediction information of the target sample text comprises a first classification probability distribution of the target sample text; before the determining the decision impact of each of the target sample texts based on the first event prediction information of each of the target sample texts, the method further includes:

performing event prediction on test sample texts in a test sample set by using the first trained event extraction model to obtain second classification probability distribution of each test sample text in the test sample set;

the determining the decision impact of each of the target sample texts based on the first event prediction information of each of the target sample texts comprises:

determining vector parameters of each target sample text by using the first classification probability distribution of the target sample text;

obtaining a first loss corresponding to each target sample text based on the first event prediction information of each target sample text, and obtaining a second loss corresponding to each test sample text based on the second classification probability distribution of each test sample text;

and obtaining an influence function vector corresponding to each target sample text by using the vector parameter, the first loss of the target sample text and the second loss of the test sample text, wherein the influence function vector represents the decision influence of the target sample text.

4. The method of claim 3, wherein determining the vector parameter of each of the target sample texts by using the first classification probability distribution of the target sample texts comprises:

calculating a difference value of the first classification probability distribution and a first actual classification probability distribution, and taking the sum of the product of the difference value, the square of sentence representation of the target sample text and a regular parameter and one as an inverse sea plug vector product of the target sample text;

obtaining an influence function vector corresponding to each target sample text by using the vector parameter, the first loss of the target sample text, and the second loss of the test sample text, including:

taking the product of the inverse sea plug vector product, the first loss of the target sample text and the second loss of the test sample text as the influence function vector of the target sample text.

5. The method of claim 1, wherein the first training of the event extraction model with target sample text in a first sample set or the second training of the event extraction model with target sample text in a second sample set comprises:

performing event prediction on the target sample text by using the event extraction model to obtain second event prediction information of the target sample text;

and determining the training loss of the time by utilizing second event prediction information of the target sample text, and adjusting parameters of the event extraction model based on the training loss of the time.

6. The method according to claim 5, wherein the second event prediction information includes the third classification probability distribution, and the performing event prediction on the target sample text by using the event extraction model to obtain the second event prediction information of the target sample text includes:

performing, using the event extraction model: a sentence representation of the target sample text is obtained, and a third classification probability distribution of each word in the sentence representation is obtained.

7. The method of claim 6, wherein the second training of the event extraction model with target sample text in the second sample set further comprises:

copying the first trained event extraction model to obtain a copy model;

determining the weight of each target sample text in the second sample set based on a second loss of event prediction of the replication model on the verification sample set;

in the second training process, the determining a training loss of this time by using the second event prediction information of the target sample text includes:

weighting the third classification probability distribution of the target sample text by using the weight to obtain a fourth classification probability distribution of the target sample text;

and determining the training loss of the time based on the fourth classification probability distribution of the target sample text.

8. The method of claim 7, wherein prior to determining the weight of each target sample text in the second sample set based on the second loss of event prediction for the validation sample set based on the replication model, the method further comprises:

training the replication model by using target sample texts in the second sample set, wherein the trained replication model is used for performing event prediction on the verification sample set;

and/or determining a weight of each target sample text in the second sample set based on a second loss of event prediction of the replication model for the verification sample set, including:

performing event prediction on each verification sample text in the verification sample set by using the replication model to obtain third event prediction information of each verification sample text;

obtaining a second loss of the verification sample set based on the third event prediction information of each verification sample text;

and determining the weight of each target sample text in the second sample set by using the second loss of the verification sample set.

9. The method of claim 8, wherein training the replication model with the target sample text in the second sample set comprises:

inputting the target sample texts in the second sample set into the replication model in batches for event prediction to obtain fifth classification probability distribution of the target sample texts in corresponding batches;

defining an indirect vector with the same size as the target sample texts in the batch, and multiplying the indirect vector by a fifth classification probability distribution of the target sample texts in the batch to obtain a third loss of the replication model;

adjusting parameters of the replication model based on the third loss;

the determining the weight of each target sample text in the second sample set by using the second loss of the verification sample set includes:

and acquiring the gradient change of the indirect vector, and taking the result of the normalization of the gradient change as the weight.

10. The method of claim 1, wherein the filtering out at least one target sample text from the first sample set based on the decision impact to obtain a second sample set comprises:

and selecting the target sample texts with the decision influences meeting preset influence conditions from the first sample set to obtain a second sample set, wherein the preset influence conditions are that the decision influences are larger than a preset influence value, or the preset influence conditions are located in a pre-preset proportion range in the high-to-low ordering of the decision influences of the target sample texts of the first sample set.

11. The method of claim 1, wherein prior to the first training of the event extraction model with target sample text in a first sample set, the method further comprises:

performing data expansion on the original sample texts in the original sample set by using a data enhancement mode to obtain a plurality of expanded sample texts;

at least part of original sample text and a number of expanded sample texts in the original sample set are used as target sample texts to form the first sample set.

12. The method according to claim 11, wherein the data enhancement mode comprises one or a combination of dictionary replacement, word vector replacement based, TF-IDF replacement based, random word insertion, random exchange, random deletion, and pre-training model generation mode, wherein the pre-training model generation mode is to block partial words in the text, generate new words by using the pre-training model, and replace the blocked words.

13. An event extraction method, the method comprising:

training an event extraction model by using the training method of the event extraction model according to any one of claims 1 to 12;

acquiring a text to be extracted;

and performing event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted.

14. An apparatus for training an event extraction model, the apparatus comprising:

the first training module is used for carrying out first training on the event extraction model by using the target sample texts in the first sample set, wherein the event extraction model is used for predicting the event classification of the texts;

an obtaining module, configured to obtain a decision influence of each target sample text in the first sample set on the event extraction model;

the screening module is used for screening out at least one target sample text from the first sample set based on the decision influence to obtain a second sample set;

and the second training module is used for carrying out second training on the event extraction model by using the target sample texts in the second sample set.

15. An event extraction device, the device comprising:

a training module, configured to train an event extraction model by using the training method of the event extraction model according to any one of claims 1 to 12;

the file acquisition module is used for acquiring a text to be extracted;

and the classification module is used for performing event extraction on the text to be extracted by using the event extraction model to obtain the event classification of the text to be extracted.

16. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method for training an event extraction model according to any one of claims 1 to 12 or to implement the method for extracting events according to claim 13.

17. A computer readable storage medium having stored thereon program instructions executable by a processor, the program instructions when executed by the processor implementing the method of training an event extraction model according to any one of claims 1 to 12 or implementing the method of event extraction according to claim 13.