CN115983274B

CN115983274B - Noise event extraction method based on two-stage label correction

Info

Publication number: CN115983274B
Application number: CN202211638181.1A
Authority: CN
Inventors: 汪鹏; 徐子杰
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-11-28
Anticipated expiration: 2042-12-20
Also published as: CN115983274A

Abstract

The application discloses a noise event extraction method based on two-stage label correction. Firstly, the application designs an event extraction model based on text coding of a pre-training language model and global pointer network decoding as a basic model. Then, the noise label is divided into explicit noise and implicit noise, and a two-stage label correction method is proposed. In the first stage, mapping rules of the argument roles to the event types are obtained according to the event patterns to correct explicit noise in the original data. In the second stage, an adaptive iteration method is designed, and implicit noise in training data is corrected in an adaptive iteration mode according to the confidence level of a basic model on a prediction result. Finally, through at most three adaptive iterations, the application can obtain excellent event extraction performance under different noise rate settings. The method has the advantage of efficiently correcting the noise label in event extraction, and can meet the wide requirements of event extraction application in a real-world noise scene.

Description

Noise event extraction method based on two-stage label correction

Technical Field

The application belongs to the field of artificial intelligence natural language processing, and particularly relates to a noise event extraction method based on two-stage label correction.

Background

Event extraction is an important and challenging task in information extraction research, aimed at extracting event information from unstructured text into structured representations, which can be divided into two subtasks: 1. trigger word classification: identifying the text span of the trigger word from the event text and classifying the text span as the correct event type; 2. event argument classification: the text span of an argument is identified from the event text and classified as the correct argument role. At the application level, event extraction techniques may be applied to many natural language processing downstream tasks, such as information retrieval, event-based recommendation systems, intelligent questioning and answering, and the like.

Recently, significant progress has been made in the study of event extraction, and the performance enhancement of these approaches mainly benefits from pre-trained language models with powerful coding capabilities, such as BERT, roBERTa, and NEZHA. These event extraction methods focus on improving the encoding-decoding structure to improve the performance of the pre-trained language model and default to the data used to train the model being high quality, noiseless. In fact, event extraction is a task that requires a large amount of high quality annotation data, with the problems of time and effort consuming annotation and low label quality. While the remote supervision hypothesis provides a method of labeling event data with a semi-automatic template, the remote supervision hypothesis can cause semantic drift, thereby introducing noise tags. At the same time, these noise samples can always be fitted because the pre-trained language model has large scale parameters and network layers. These advanced event extraction methods are difficult to achieve in practical applications because of the lack of consideration of the possible presence of noise tags. In addition, although many efforts have been made in the current research on noise tags, these noise tag learning methods are mostly suitable for the computer vision field, and it is difficult to migrate to the event extraction field.

In order to solve the above problems, the prior art is as follows:

application number: CN202110286434.2, name: an event extraction method, an apparatus, a device and a storage medium are provided, the method includes: inputting a text to be extracted into a pre-trained event extraction model to obtain category labels of all text units in the text to be extracted as event extraction results; the event extraction model is obtained by training a text sequence serving as a training sample, event trigger word position labels of the text sequence and category labels of each text unit of the text sequence serving as first type sample labels, wherein the category labels of the text units comprise event trigger word types and event argument types of the text units. The method can realize event extraction, and can comprehensively identify multiple roles of the text in the event, thereby ensuring the integrity of event extraction.

But in its application:

1. the described event extraction method is applicable to high quality data scenarios without considering the possible noise signature problems in event extraction. In an actual event extraction scenario, the noise signature greatly affects the actual performance of the event extraction method. The event extraction method described by the application uses the noise labels as basic settings, and corrects two kinds of noise labels from two angles of explicit and implicit by respectively utilizing an event mode (Schema) mapping rule and adaptive iteration, thereby ensuring the data quality and the model performance.

2. The described event extraction method is mainly realized based on a sequence annotation model, does not consider the situation that event trigger words or arguments are overlapped, and is not suitable for complex overlapped event extraction scenes. The event extraction method described herein considers the potential event trigger word or argument overlapping problem in practical application, proposes an event extraction model based on a global pointer network, and decodes the starting position and the ending position of the candidate result as a whole, thereby effectively solving the overlapping problem.

Application number: CN202110286434.2, name: an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information belongs to the technical field of computers. The method uses an integrated knowledge auxiliary model for judgment, is formed by introducing massive text pre-training, takes a pre-training language model containing a large amount of semantic grammar knowledge information as a network structure unit of an event extraction model, uses a model algorithm of a remote supervision feature for mixing anti-noise interference, and adds gradient direction anti-interference training under a circular constraint condition.

But in its application:

1. the described event extraction method adopts a remote supervision algorithm based on countermeasure learning to enhance the noise interference resistance of the model. The event extraction method described herein adopts the idea of two-stage label correction, aims to correct noise labels rather than simply enhance the anti-interference capability of the model, and can obtain a high-performance model and corrected data at the same time.

2. According to the event extraction method, the anti-learning training is carried out by adding the gradient direction under the circular constraint condition, and the anti-interference capability of the model is improved through the disturbance of the gradient descending direction. The method is mainly suitable for enhancing generalization of the model in a low-noise-rate scene, and the situation of higher noise rate is not considered. According to the event extraction method, according to the characteristics of event extraction noise, the explicit noise is corrected by using an event mode (Schema) mapping rule, the implicit noise is corrected in a self-adaptive iterative mode, and the event extraction method is suitable for event extraction tasks under different noise rate settings.

The application designs a new two-stage label correction method to solve the problem of event extraction in a noise scene. First, noise tags are classified into explicit noise and implicit noise. Wherein, explicit noise refers to the contradiction between event type and event argument character in a given tag, and implicit noise refers to the potential semantic inconsistency between the given tag and the event text itself. The present application then corrects both noise signatures from both explicit and implicit perspectives using event mode (Schema) mapping rules and adaptive iterations, respectively. Specifically, in the first stage, the method establishes a mapping rule based on the event Schema according to the event type predefined by the event Schema and the corresponding argument roles, and processes the original data by utilizing the rule. In the second stage, the method uses the first stage primary processed data as a training set to train a global pointer network based event extraction model. Compared with the traditional double-pointer network, the method based on the global pointer network can simplify the process of identifying event trigger words or argument spans by the model, avoids inconsistent training and prediction, regards the starting position and the ending position as a whole and has global property, and ensures the basic performance of the event extraction model. Based on the method, the training set data is predicted by using a trained event extraction model, and an adaptive confidence threshold is generated for each event type or argument character according to a prediction result. For each given event sample, if the confidence score of the original label is smaller than the corresponding confidence threshold, correcting the original label by using a prediction result with the confidence higher than the threshold, training a model in the next iteration by using corrected data, and finally obtaining a model with converged performance and a corrected data set through repeated iteration.

Disclosure of Invention

The application provides a new energy power generation prediction method based on combination of meteorological similar moment and machine learning. Secondly, aiming at the two noise labels, the application provides an event extraction method for correcting the two-stage label. In the first stage, according to the predefined event type of the event mode and the corresponding event argument roles, a mapping rule from the event argument roles to the event type is obtained and is used for correcting the explicit noise in the original data. Then, in the second stage, the application designs a self-adaptive iteration method, which uses the data corrected for the first stage to train an event extraction model based on a global pointer network, and predicts the training data by using the model. And according to the confidence level of the model on the prediction result, the implicit noise in the training data is adaptively corrected, and the corrected data is used for model training of the next iteration. And the training data quality is improved through repeated self-adaptive iteration, and the model performance is enhanced. Finally, only up to three adaptive iterations are required, the application can achieve excellent event extraction performance at different noise rate settings. The method has the advantage of efficiently correcting the noise label in event extraction, and can meet the wide requirements of event extraction application in a real-world noise scene.

In order to solve the technical problems, the technical scheme provided by the application is as follows:

a noise event extraction method based on two-stage label correction includes the steps:

1) Event text encoding based on a pre-trained language model:

for inclusion in a given event samplenEvent text for individual wordsT _i Pre-trained language model encoder for event text T _i Encoding to obtain event characterization vector sequences with context-rich semanticsH _i The method is used for extracting downstream event trigger words and event arguments;

2) Event extraction model based on global pointer network:

the context-containing semantics obtained in step 1) are processedEvent characterization vector sequenceH _i The method is applied to a global pointer network decoder so as to capture span information and type information of event trigger words and event arguments;

3) The label correction method mapped by the event mode mapping module comprises the following steps:

for correcting explicit noise in the raw data in a first stage;

4) And an adaptive iteration module:

the adaptive iteration module is used for carrying out an adaptive iteration label correction method and correcting the implicit noise in the second stage;

5) Event extraction model training and testing applicable to noise scene label correction:

during the training process, the two-stage label correction method described in step 3) and step 4) is used, and the event extraction model described in step 2) is trained using corrected data.

As a further development of the application, the step 1) is based on an event text encoding process of a pre-trained language model, in particular as follows,

for a given event sampleWhereinT _i Is comprised ofnEvent text for individual words，Y _i Event tags that are likely to be noise, pre-trained language model encoders on event textT _i Encoding to obtain event characterization vector sequences with context-rich semanticsThe method is used for extracting the downstream event trigger words and event arguments.

As a further improvement of the present application, the process of extracting event trigger words or arguments by the event extraction model based on the global pointer network in the step 2) is specifically as follows,

representing text vectorsH _i Characterization feature through a layer of additional self-attention networkH _i,h The method comprises the steps of carrying out a first treatment on the surface of the Then using two linear layers to respectively identify start position and end position and their correspondent types, and utilizing multi-head attention mechanism to multiply output vector sequences of two linear layers to obtain oneFor each predefined event type or argument rolekCalculating head-tail position scores of candidate trigger words or arguments from a global perspective；

During decoding, the global pointer network characterizes the vector sequence for the eventH _i Calculate a score for each character position of (a)If (if)Then represent the firstThe character being the start position of the trigger word or argument and the firstThe characters being end positions and the trigger words or arguments being of the typek。

As a further improvement of the present application, the label correction method mapped by the event pattern mapping module in the step 3) is, specifically as follows,

firstly, judging whether a noise label exists in an argument label of an event sample in a training set by using a verification set; if the argument tag is noiseless, only one specific argument role existsr _k,l The current event type is determined to bet _k The method comprises the steps of carrying out a first treatment on the surface of the When the argument tag of the event sample may be a noise tag, the mapping rule needs to be further enhanced except that at least one specific argument role is neededr _k,l In addition, the argument roles in the argument tag currently given must all belong to the event typet _k Corresponding argument character setR _k 。

As a further improvement of the present application, the label correction method mapped by the event pattern mapping module in the step 4) is specifically as follows,

firstly, training an event extraction model obtained in the step 2) by using the primary correction data obtained in the step 3), predicting training data by using the model to obtain a logits matrix which marks the span and the type of a candidate trigger word or argument, then, mapping the logits matrix into corresponding confidence scores by using a sigmoid function by a self-adaptive iteration module, taking the average value of all confidence scores predicted to be of the type as a confidence threshold value for each event type and argument character, correcting the original label by using a prediction result with the confidence value higher than the threshold value if the confidence score of the original label is lower than the corresponding threshold value for a given event sample, using the trained model to continuously correct the original label by using the trained model until the model performance converges, and obtaining a final model for test set prediction.

As a further improvement of the present application, the step 5) is suitable for training the event extraction model for noise scene label correction and the training process in the test, specifically as follows,

in a first stage, correcting explicit noise in the raw data using step 3); in the second stage, the step 4) is used for adaptively and iteratively correcting the implicit noise, the event extraction model obtained in the step 2) is retrained by using corrected data while the noise label is corrected, the loss of the model is optimized in a supervised learning mode, the network parameters of the model are updated, and finally the event extraction model with converged performance and suitable for noise scenes is obtained.

As a further improvement of the application, the training process uses the Circle Loss function to alleviate the tag imbalance problem.

As a further improvement of the application, the step 5) is suitable for the test process in the event extraction model training and testing of noise scene label correction, and concretely comprises the following steps,

during model test, firstly, an original event text to be extracted is encoded by utilizing a pre-training language model according to the step 1), then, an event extraction model suitable for a noise scene is used for carrying out forward calculation, and finally, a logits matrix obtained by model calculation is decoded to obtain an extracted event trigger word and an event argument, so that the test process of the event extraction model under the noise scene is completed.

As a further improvement of the application, the performance of the model on the test set is adopted in the test process to evaluate the performance of the model, and event extraction is divided into two subtasks of trigger word classification and argument classification.

As a further improvement of the application, the evaluation indexes of the two subtasks of trigger word classification and argument classification both adopt micro F1 values.

Compared with the prior art, the application has the advantages that:

the application provides a noise event extraction method based on two-stage tag correction, which is used for promoting an event extraction task to noise tag setting. Firstly, classifying noise labels extracted by events according to task characteristics of event extraction: both explicit noise and implicit noise. Then, for these two types of noise, a two-stage tag correction method was designed: in the first stage, the explicit noise is solved by using an event Schema mapping rule; in the second stage, the implicit noise is corrected using adaptive iteration. Compared with the prior art, the two-stage label correction method is simple and effective, and the performance of the model in a noise scene is guaranteed. Meanwhile, the event extraction model adopts an advanced architecture of pre-training language model coding and global pointer network decoding, and for each candidate trigger word or argument, the starting position and the ending position of each candidate trigger word or argument are regarded as a whole to be judged, so that the problem that the traditional double pointer network needs to traverse the head and tail pointer network and set a heuristic threshold value to decode is solved, and the problem that training and testing are inconsistent due to decoding strategy problems of the traditional double pointer network is solved. Finally, verification is carried out on the event extraction task data set, so that the two-stage label correction method suitable for noise scene event extraction has better universality and generalization, and meanwhile, high efficiency in practical application can be ensured. Therefore, the application has better application prospect and popularization range.

Drawings

FIG. 1 is a logic flow diagram of the method of the present application;

FIG. 2 is a model flow diagram of the method of the present application;

fig. 3 is a training configuration diagram of the method of the present application.

Detailed Description

The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.

As a specific embodiment of the present application, the logic flow chart is shown in FIG. 1, the model flow chart is shown in FIG. 2, and the training configuration chart is shown in FIG. 3, a noise event extraction method based on two-stage label correction comprises the following steps:

1) Event text encoding based on a pre-trained language model.

For event extraction tasks in noise scenes, an event Schema is predefined first, includingkEvent type of speciesAnd each event typet _k Corresponding argument roles。

For a given event sampleWhereinT _i Is comprised ofnEvent text for individual words，Y _i Is an event tag that may be noise. The present application uses a pre-trained language model encoder for event textT _i Encoding to obtain event characterization vector sequences with context-rich semanticsH _i ：

Wherein,representing each word separatelyW _i Is embedded in the hidden layer of (a) and the entire event textTiIs embedded in the sequence. To better model event trigger words and corresponding arguments, text vectors are representedH _i Assisted characterization by a layer of additional self-attention networkH _i,h ：

z

Wherein the method comprises the steps ofIs a normalization function for calculating the attention score,is the scaling factor which is used to scale the image,，a query vector sequence, a key vector sequence, and a value vector sequence, respectively.

By adopting the method for carrying out event text coding based on the pre-training language model, the coding capability of the pre-training language model can be fully utilized, so that the basic performance of event extraction is ensured.

2) An event extraction model based on a global pointer network.

In the obtained event text context representation sequence, a plurality of decoding strategies can be adopted, and the fact that event arguments possibly exist in nesting in actual application is considered, so that the decoding strategy adopting a pointer network has certain advantages. However, conventional dual pointer networks require the use of extremely complex decoding strategies: for each event type or argument character, the head pointer logits matrix and the tail pointer logits matrix are traversed respectively, then heuristic threshold values are used for screening head and tail positions of candidate trigger words or arguments, and the decoding strategy often causes inconsistency between a training process and a testing process, so that the performance of a model is limited. To solve this problem, the present application employs a global pointer network based event extraction decoder, employing a multi-headed attention mechanism for each event type or argument role, each head focusing on a specific type. The global pointer network judges the starting and ending positions of candidate trigger words or arguments as a whole in a global view, adopts rotary position coding, injects relative information into a logits matrix, and learns structural information of texts while learning semantics. When decoding, the head and tail positions of candidate trigger words or arguments are determined simultaneously by the logits matrix of the global pointer network, so that all candidate results can be obtained by only one traversal.

Specifically, similar to a traditional two pointer network, the auxiliary representation of event text is first presentedH _i,h Inputting the candidate trigger words or the starting and ending positions of the argument to two linear layers to obtain the representation of the starting and ending positions of the candidate trigger words or the argument respectively:

wherein,is a network parameter that can be obtained in a training way,d _inner is the embedding dimension of the position characterization.h _k Is concerned with a specific typekIs provided. Thus, the type iskThe scoring matrix logits for the start or end positions of the trigger words or arguments are calculated as follows:

wherein the method comprises the steps ofAndis a rotational position embedding, injecting relative position information into the logits matrix.

Compared with the named entity recognition task, the label matrix of event extraction is relatively sparse, so that the problem of label unbalance is relieved by adopting a Circle Loss function:

wherein the method comprises the steps ofRespectively the starting positions areThe end position isIs used for the indexing of the set of (a),is a span tag set of trigger words or arguments given by the dataset,then it is a span set that is not a trigger word or argument.

Through the global pointer network event extraction model decoding process, the spans of all event trigger words or arguments to be extracted and the corresponding types of the event trigger words or arguments can be extracted, and a foundation is laid for subsequent correction of potential semantic noise.

3) Event Schema mapping module.

The application provides a novel two-stage label correction method, as shown in figure 2, in the first stageThe phase corrects explicit noise using event Schema mapping rules. Event Schema indicates that the argument role of an event depends on the particular event type, in other words, has a particular roleThe argument only participates in the type oft _k This indicates that there is a clear mapping of argument roles to event types. Thus, the logical rules can be constructed using this relationship to correct for explicit noise. Specifically, first, whether the labels of the argument roles in the current training set are noiseless is judged by the performance of the model on the verification set, and if the labels of the argument roles are reliable, any specific argument roles are judgedr _k,l Is sufficient to indicate that the current event type should bet _k . If the current event type label is nott _k The following logic rules are applied to correct for explicit noise:

wherein,is the currently given tagY _i The argument tag of (c) is used for identifying,t _cur is an event type tag therein.R _k Is an event typet _k The corresponding set of argument roles,Ris the complete set of argument roles across the entire dataset.

And only one particular argument character if noise is also present in the argument character's tagr _k,l It will not be sufficient to determine the event type of the current sample. Therefore, the present application enforces the constraint of the logic rules: except that at least one specific argument role is requiredr _k,l In addition, the argument roles in the argument tag currently given must all belong to the event typet _k Corresponding argument character setR _k ：

Through the event Schema mapping rule correction, the explicit noise is effectively solved. Because the process only depends on logic rules and does not need an additional deep neural network, the method has high execution efficiency and is convenient for a subsequent self-adaptive iteration module to concentrate on solving the implicit noise.

4) And an adaptive iteration module.

The first stage event Schema mapping module is used for processing the event extraction original data, so that the primarily corrected training data can be obtained. As shown in fig. 2, the present application uses an adaptive iteration module to correct for implicit noise in the second stage. Specifically, for each given event sampleFirstly, obtaining a scoring matrix of candidate trigger words or arguments by using an event extraction model based on a global pointer networkAnd：

wherein,Kis the number of event types and,Lis the number of argument roles and,Nis the length of the event text. The adaptive iteration module inputs the two logits matrixes into a sigmoid function to obtain the confidence score of the selected trigger word or argument. For each event typet _k Or argument rolesr _k,l Will be predicted as the event typet _k Or argument rolesr _k,l All of (3)Confidence mean of results as confidence threshold for that typeS：

Wherein,Mis the number of samples of the training set. The adaptive iteration module then corrects the labels based on whether the original labels are above a threshold of the corresponding type. Specifically, for a given event tagY _i Trigger word tag set in (a)And argument label combinationFirst calculate each trigger word tag thereinAnd each argument tagConfidence scores of (2), respectively, are recorded asAnd. When (when)Or alternativelyBelow a threshold of the corresponding typeOr alternativelyWhen (1). Adaptive iterative modulusThe block treats it as implicit noise and then uses the prediction results with confidence above the corresponding threshold to correct the noise labels:

if no trigger word greater than the corresponding threshold exists in the predicted result of the current event sampleOr argumentThe original labels in the current sample below the threshold are temporarily removed.

Through the adaptive correction process described above, the module eliminates implicit noise in one iteration. The corrected data is then used for training of the event extraction model in the next iteration and the trained model is used to again correct for noise tags that may be present. The correction process is iterated repeatedly until the performance of the model on the verification set converges, and finally a corrected data set is obtained.

5) The event extraction model training and testing method is suitable for noise scene label correction.

The whole structure diagram of event extraction suitable for noise scene label correction, which is realized by the steps, is shown in figure 2, and after model construction is completed, rapid relation extraction model training and prediction can be performed. During model training, a two-stage noise label correction method is adopted. In a first stage, correcting explicit noise in the raw data using step 3); step 4) is used in the second stage to adaptively iteratively correct for implicit noise. And (3) when correcting the noise label, the method retrains the event extraction model obtained in the step (2) by using corrected data, optimizes the loss of the model in a supervised learning mode, updates the network parameters of the model, and finally obtains the event extraction model with convergent performance and suitable for the noise scene. When the model is measured, firstly, an original event text to be extracted is encoded by utilizing a pre-training language model according to the step 1), then, an event extraction model suitable for a noise scene is used for carrying out forward calculation, and finally, a logits matrix obtained by model calculation is decoded to obtain an extracted event trigger word and an event argument, so that the testing process of the event extraction model under the noise scene is completed.

Given a predefined event Schema and event samplesWhereinT _i Is comprised ofnThe event text of the individual words is used,Y _i is an event tag that may be noisy, noise scenario event extraction tasks aim to text eventsT _i The event trigger words and the corresponding argument contained in the system are extracted. During the training process, the two-stage label correction method described in step 3) and step 4) is used, and the event extraction model described in step 2) is trained using corrected data. Considering that the event tag matrix is relatively sparse, the tag imbalance problem is relieved by adopting a Circle Loss function, and the event extraction model obtained after training can be used for processing other downstream tasks such as intelligent question-answering by natural language. The performance of the model on the test set is generally adopted to evaluate the performance of the model, event extraction is divided into two subtasks of trigger word classification and argument classification, and the evaluation indexes of the two subtasks adopt micro F1 values. The application provides a noise event extraction method based on two-stage tag correction, which is used for promoting an event extraction task to noise tag setting. Firstly, classifying noise labels extracted by events: both explicit noise and implicit noise. Then, for these two types of noise, a two-stage tag correction method was designed: in the first stage, the explicit noise is solved by using an event Schema mapping rule; in the second stage, the implicit noise is corrected using adaptive iteration. Compared with the prior art, the two-stage label correction method is simple and effective, and the performance of the model in a noise scene is guaranteed. Meanwhile, the event extraction model adopts an advanced architecture of pre-training language model coding and global pointer network decoding, and for each candidate trigger word or theoryThe starting position and the ending position of the element are regarded as a whole to be judged, so that the traditional mode that the double-pointer network needs to traverse the head-tail pointer network and set a heuristic threshold value to decode is avoided, and the problem that training and testing are inconsistent due to the problem of decoding strategies of the traditional double-pointer network is solved. Finally, verification is carried out on the event extraction task data set, so that the two-stage label correction method suitable for noise scene event extraction has better universality and generalization, and meanwhile, high efficiency in practical application can be ensured. Therefore, the application has better application prospect and popularization range.

[ example 1 ]

In the implementation example, the two-stage label correction method suitable for noise scene event extraction is used for training and testing the event extraction task on the real Chinese and English data sets, and the same data as the embodiment is used in all other embodiments. The chinese dataset uses a hundred degree published DuEE event extraction training set containing 16956 event samples, 65 event types, and 121 argument roles for a total of 41520 event arguments. The english dataset uses event data in the ACE 2005 corpus, comprising 5349 event samples, 33 event types, and 35 argument roles, totaling 9612 event arguments. Because the model has better robustness and generalization, the same super-parameter setting can be used in different scenes of Chinese and English.

The specific implementation is as follows: on the chinese dataset DuEE, the pre-training language encoder uses a netha-base with 12 layers of hidden layers, 768 dimensions of embedding and 12 multi-head attention heads. In the training phase, mini-batch is set to 32 samples, adamW algorithm optimizer is used, NEZHA encoder learns to be 0.00002, global pointer network event extraction decoder learns to be 0.001, cosine attenuated hot start is adopted in the first 10% of training update step, 0.1 dropout is adopted in each layer of network to prevent overfitting and enhance generalization, and all other network layer parameters are initialized randomly. And respectively selecting 2019-2023 from random seeds, and performing 5 times of training. The final result is taken as the average of 5 test results. On the english dataset ACE 2005, all settings are consistent with the chinese dataset except that the pre-training language encoder is changed to BERT-base.

Applying the trained event extraction model suitable for the noise scene to a test data set, comparing the results of event trigger words and event arguments extracted by the model with actual results, and finding that when the noise rate is 20%, 40%, 60% and 80%, the micro F1 scores of event trigger word classification are respectively: 77.3%, 75.2%, 70.4% and 62.5%, respectively, of the event argument classification: 63.6%, 62.4%, 58.7% and 51.9%. On the english dataset ACE 2005, the micro F1 scores for event trigger word classification are respectively: 74.0%, 73.1%, 68.4% and 58.2%, respectively, of the event argument classification: 58.8%, 58.0%, 55.7% and 50.1%.

This shows that the application has excellent performance under Chinese and English data and different noise settings, and even if the noise rate is up to 80%, the performance of the extraction event is higher than 50%, which indicates that the application can reach advanced performance in practical application.

[ example 2 ]

The event Schema mapping module of the two-stage label correction method suitable for noise scene event extraction can obviously improve the performance of the model for extracting event trigger words. When the argument tag of the event sample is noiseless, the event Schema mapping module is adopted to improve the performance of trigger word extraction by 4.3-9.6% under different noise rate settings of 20-80%. Although the performance of trigger word extraction is limited to 0.7% -3.9% when noise labels are also present in the argument labels of event samples. However, as the constraint condition of the logic rule is enhanced, the event Schema mapping module does not negatively affect the performance of the model. In practical application, the module only depends on logic rules without training a deep neural network, so that the processing of the original data can be rapidly realized to ensure the efficiency of the method in a real scene.

[ example 3 ]

The adaptive iteration module of the two-stage label correction method suitable for noise scene event extraction plays a vital role in processing potential semantic noise. Under different noise rate settings of 20% -80%, the performance of trigger word extraction can be improved by 6.1% -19.8% by using the module, and meanwhile, the performance of event argument extraction can be improved by 4.8% -23.1%. The adaptive iteration module has the capability of flexibly adapting to different noise rate settings, and meets the requirement of correcting unknown noise labels under the condition of uncertain noise label type distribution and duty ratio in an actual scene.

[ example 4 ]

The adaptive iteration module of the two-stage label correction method for noise scene event extraction typically employs 3 iterations to correct for potential semantic noise. The results on the test dataset show that in most cases 3 iterations are sufficient to allow the performance of the model to converge. Although more iterations, e.g., 5-10, can further improve the performance of the model by approximately 1% -3%, the time for training increases linearly. Therefore, the adaptive iteration module can achieve obvious performance improvement by only increasing a small amount of training cost, and can select different iteration times according to actual requirements, so that the adaptive iteration module has good flexibility.

The above description is only of the preferred embodiment of the present application, and is not intended to limit the present application in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present application, which fall within the scope of the present application as defined by the appended claims.

Claims

1. A noise event extraction method based on two-stage tag correction, comprising the steps of:

1) Event text encoding based on a pre-trained language model:

for inclusion in a given event samplenEvent text T of individual words _i Pre-trained language model encoder for event text T _i Encoding to obtain event characterization vector sequence H with context-rich semantics _i The method is used for extracting downstream event trigger words and event arguments;

the step 1) is based on an event text coding process of a pre-training language model, and concretely comprises the following steps,

for a given event sampleWhereinT _i Is comprised ofnEvent text of individual words->，Y _i Event tags that are likely to be noise, pre-trained language model encoders on event textT _i Coding to obtain event characterization vector sequence with context-rich semantics>The method is used for extracting downstream event trigger words and event arguments;

2) Event extraction model based on global pointer network:

the event characterization vector sequence H containing context semantics obtained in the step 1) is processed _i The method is applied to a global pointer network decoder so as to capture span information and type information of event trigger words and event arguments;

the process of extracting event trigger words or arguments by the event extraction model based on the global pointer network in the step 2) is specifically as follows,

representing text vectorsH _i Characterization feature through a layer of additional self-attention networkH _i,h The method comprises the steps of carrying out a first treatment on the surface of the Then using two linear layers to respectively identify start position and end position and their correspondent types, and utilizing multi-head attention mechanism to multiply output vector sequences of two linear layers to obtain oneFor each predefined event type or argument rolekCalculating head-tail position score of candidate trigger words or arguments from global view angles>；

During decoding, the global pointer network characterizes the vector sequence for the eventH _i Calculate a score for each character position of (a)If->Then indicate +.>The character is the start position of the trigger word or argument and +.>The characters being end positions and the trigger words or arguments being of the typek；

for correcting explicit noise in the raw data in a first stage;

the label correction method mapped by the event pattern mapping module in the step 3) specifically comprises the following steps,

firstly, judging whether a noise label exists in an argument label of an event sample in a training set by using a verification set; if the argument tag is noiseless, only one specific argument role existsr _k,l The current event type is determined to bet _k The method comprises the steps of carrying out a first treatment on the surface of the When the argument tag of the event sample may be a noise tag, the mapping rule needs to be further enhanced except that at least one specific argument role is neededr _k,l In addition, the argument roles in the argument tag currently given must all belong to the event typet _k Corresponding argument character setR _k；

4) And an adaptive iteration module:

the adaptive iteration module in the step 4) performs an adaptive iteration tag correction method, specifically as follows,

firstly, training an event extraction model obtained in the step 2) by using the primary correction data obtained in the step 3), predicting training data by using the model, predicting the training data by using the model to obtain a logits matrix which marks the span and the type of a candidate trigger word or argument, then, mapping the logits matrix into corresponding confidence scores by using a sigmoid function by a self-adaptive iteration module, taking the average value of all confidence scores predicted to be of the type as a confidence threshold value for each event type and argument character, correcting an original label by using a prediction result with confidence higher than the threshold value if the confidence score of the original label is lower than the corresponding threshold value for a given event sample, using the trained data for model training of the next iteration, continuing correcting the original label by using the trained model, and obtaining a final model for test set prediction by repeating self-adaption until the model performance converges;

during the training process, the two-stage label correction method described in the step 3) and the step 4) is used, and the event extraction model described in the step 2) is trained by using corrected data;

the step 5) is suitable for training the event extraction model for correcting the noise scene label and training in the test, and concretely comprises the following steps,

in a first stage, correcting explicit noise in the raw data using step 3); in the second stage, the step 4) is used for adaptively and iteratively correcting the implicit noise, the event extraction model obtained in the step 2) is retrained by using corrected data while the noise label is corrected, the loss of the model is optimized in a supervised learning mode, the network parameters of the model are updated, and finally the event extraction model with converged performance and suitable for a noise scene is obtained;

the step 5) is suitable for the test process in the event extraction model training and testing of noise scene label correction, and concretely comprises the following steps,

2. A noise event extraction method based on two-stage tag correction as defined in claim 1, wherein: the training process uses the Circle Loss function to alleviate the tag imbalance problem.

3. A noise event extraction method based on two-stage tag correction as defined in claim 1, wherein: in the test process, the performance of the model on the test set is adopted to evaluate the performance of the model, and event extraction is divided into two subtasks of trigger word classification and argument classification.

4. A noise event extraction method based on two-stage tag correction as defined in claim 1, wherein: the evaluation indexes of the two subtasks of trigger word classification and argument classification both adopt micro F1 values.