CN113326371B

CN113326371B - Event extraction method integrating pre-training language model and anti-noise interference remote supervision information

Info

Publication number: CN113326371B
Application number: CN202110480675.0A
Authority: CN
Inventors: 李书棋; 高阳
Original assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Current assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-12-29
Anticipated expiration: 2041-04-30
Also published as: CN113326371A

Abstract

The invention provides an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information, and belongs to the technical field of computers. The method uses an integrated knowledge auxiliary model for judgment, is formed by introducing massive text pre-training, takes a pre-training language model containing a large amount of semantic grammar knowledge information as a network structure unit of an event extraction model, uses a model algorithm of a remote supervision feature for mixing anti-noise interference, and adds gradient direction anti-interference training under a circular constraint condition.

Description

Event extraction method integrating pre-training language model and anti-noise interference remote supervision information

Technical Field

The invention relates to an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information, in particular to the technical field of computer data processing.

Background

With the continuous deepening of informatization construction in the Internet age, massive Internet information presents explosive growth, and how to utilize Internet information to assist in making industry decisions becomes an important point of current attention of enterprises and even government countries. Information from the Internet often appears in the form of words, usually from news manuscripts, forum replies and other channels, generally presents the characteristics of no structure and multiple redundancies, and needs to read, understand and position key information in words and filter irrelevant contents. The event extraction is to present the data content of the unstructured text in a structured form, extract the expression key intention in the text by taking the event as a unit, and convert unstructured text information into structured event information, so that the method is an important ring in the information extraction engineering in a series of works such as subsequent trend analysis, establishment of a priori knowledge map, public opinion message early warning and the like.

Traditional event extraction often relies on manual comprehensive participation, faces to massive internet information, and traditional event analysis reads and searches related information data in a huge article report by means of manual labor, and performs arrangement and recording, so that a large amount of human resources are consumed. In order to solve the problem of consuming a great deal of manpower in the process of information structuring, in recent years, recognition and extraction of event patterns by using a machine learning mode have been proposed. The machine learning mode is used for identifying the event mode in the text, and the text fragments conforming to the mode are extracted in a structured mode, so that the batched machine text processing can be realized, and the efficiency problem in extracting the text structured information by manual reading is greatly improved. However, the formulation of the traditional machine learning event mode template still needs to rely on the knowledge of experts in the field, and corresponding event mode features are automatically learned by means of deep learning through marked data, so that the method becomes a new direction of event structured extraction in recent years. Considering huge internet information and various content types, the migration and generalization capability of the deep learning model among different events is improved, and the method becomes a difficult problem of extracting the internet event information. It is common practice to introduce predictions of the external knowledge-aided model using a remote supervision approach. The remote supervision algorithm assumes: for a structured event in an existing knowledge-graph, it is assumed that any sentence in the external knowledge base that contains entities therein reflects this relationship to some extent. Based on the assumption, the remote supervision algorithm can label sentences in the external document library with relation labels based on a labeled small-sized knowledge graph, which is equivalent to automatic labeling of samples, so that the remote supervision algorithm is a semi-supervision algorithm. However, the remote supervision brings not only external knowledge information, but also erroneous guide information, and the accuracy of judgment of the model is affected by noise interference. Deficiencies in the text representation capabilities of RNNs and CNNs also affect the predictive extraction of events. Therefore, it is a problem to be considered how to use neural network models with more expressive power and use external knowledge to assist in the depth model in event structured extraction, while reducing error noise interference.

Disclosure of Invention

The invention aims to: one objective is to provide an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information, so as to solve the problems in the prior art, enrich text information, and increase the resistance of the model to noise errors through anti-interference training.

The technical scheme is as follows: the event extraction method for fusing the pre-training language model and the anti-noise interference remote supervision information is provided, and comprises the following steps:

1. an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information is characterized by comprising the following steps:

step 1, training data corpus acquisition, namely storing text-form data in a txt file form through a crawler according to internet text data acquired by the crawler;

step 2, preprocessing the marked data;

and 3, marking the text according to the event definition, adding the marked data into a remote supervision knowledge base in a supplementing manner, finishing the marking of the data, and according to 7:1: the proportion of 2 is divided into a training set, a verification set and a test set;

step 4, coding texts based on a pre-training model structure of massive text training, carrying out language modeling through a self-attention mechanism, fully utilizing multi-head attention to capture characteristic information of different angles of the texts, carrying out characteristic transformation and extraction through a double-layer neural network FFN, using a ReLU as an activation function and carrying out characteristic normalization through layer normalization, organically combining all layers of the pre-training model through a residual error connection method, and repeatedly iterating to obtain rich characteristic representations;

step 5, marking the position where the remote supervision trigger word appears by using the type number of the trigger word in the remote supervision library to obtain a discrete sequence feature, mapping the discrete sequence feature to a low-dimensional space through a remote supervision embedding layer, and splicing the feature extracted by the pre-training model and the remote supervision feature;

step 6, introducing disturbance in the gradient direction in the process of learning and parameter updating of the remote supervision embedded layer so as to calculate the counter-propagation gradient of the model in the remote supervision characteristic embedded layer; then, original parameters are reserved, antagonistic disturbance is introduced through disturbance step length, and forward and backward propagation is executed again, so that a new gradient is obtained; then, recovering original parameters of the remote supervision feature embedding layer, and updating parameters of the whole model by using the disturbed new gradient;

step 7, training the event extraction model by using training data, evaluating the quality of the training by verifying a data set and a test data set, and selecting the model with optimal performance for use by multiple rounds of iteration;

and 8, predicting and extracting the event of the new unlabeled internet text data by using the trained model, matching the new text with a remote supervision knowledge base after the new text is subjected to data preprocessing and cleaning, adding trigger words which appear in the remote knowledge base and the new text to be predicted into the remote characteristics of the text, and inputting the trigger type of the predicted event of the model and related event participation elements.

The beneficial effects are that: the invention provides an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information, which is used for carrying out automatic event structured extraction by combining the pre-training language model and the remote supervision information, and improving the model effect by introducing external knowledge and effectively relieving error noise information brought by the remote supervision and improving the event extraction effect by adding a disturbance countermeasure training mode. The method not only can represent richer text information, but also can increase the resistance of the model to noise errors through anti-interference training when the remote supervision features are introduced.

Drawings

FIG. 1 is a block diagram of an event extraction method model incorporating a pre-training language model and anti-noise interference remote supervision information.

FIG. 2 is a schematic diagram of the feature extraction of the attention mechanism of an event extraction method integrating a pre-training language model with anti-noise interference remote supervision information.

FIG. 3 is a diagram of a remote supervision feature layer of an event extraction method incorporating a pre-training language model and anti-noise interference remote supervision information.

FIG. 4 is an experimental effect diagram of an event extraction method integrating a pre-training language model and anti-noise interference remote supervision information.

FIG. 5 is a schematic diagram of an event extraction method incorporating a pre-training language model and anti-noise interference remote supervision information into a circular constraint.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, the invention provides an event extraction method for fusing a pre-training language model and anti-noise interference remote supervision information, which comprises the following steps:

and 1, collecting training data corpus, namely storing text-form data in a txt file form through a crawler according to internet text data obtained through the crawler.

And 2, preprocessing the labeling data, namely removing html labels and special symbols, and dividing the text into short texts in the form of sentences or paragraphs.

Marking event trigger words, subjects, objects, time, places and event types of the text existing events according to event definition, and adding marked data into a remote supervision knowledge base in a supplementing manner to finish marking of the data; matching the marked data with a remote supervision knowledge base, adding successfully matched trigger words into remote supervision information of the sample, and according to 7:1: the scale of 2 is divided into a training set, a validation set and a test set.

And 4, respectively constructing a model in two stages of event detection and event participation element extraction of event extraction.

Specifically, a pre-training model and a remotely supervised event extraction model are combined, as shown in fig. 1, and the specific process is as follows: text is first encoded using a pretrained model structure based on massive text training, as shown in fig. 2, language modeling is performed through a self-attention mechanism:

wherein Q, K, V is a calculation matrix of attention, which is calculated by matrix operation from input data and corresponding parameters; d, d _k The text vector representing the text after embedded matrix compression represents the dimension.

Secondly, the characteristic information of multiple angles in the multi-head attention capturing text is as follows:

MultiHead(Q,K,V)＝Concat(head1,...,head _h )W ^O

in which W is ^O Representing a linear transformation matrix, and participating in model training parameter updating; wherein:

in the method, in the process of the invention,a transformation mapping matrix representing Query vectors; />A transformation mapping matrix representing Key vectors; />A transformation mapping matrix representing Value x vectors;

and performing feature conversion extraction through a double-layer neural network FFN, and adopting a ReLU as an activation function layer:

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂

and performing feature normalization processing by using layer normalization:

wherein W is ₁ A calculation parameter representing a first fully connected network in the feed forward network layer; w (W) ₂ Representing a calculated parameter of a second fully connected network in the feed forward network layer; μ represents the average value of the vector represented by each word position of the sample, σ represents the variance of the vector represented by each word position of the sample, and α is a learnable weight parameter; e is a small value that prevents zero-divide errors from occurring; as indicated by the letter, the Element-wise Product was multiplied position by position.

From time to time, the layers of the pre-training model are combined by means of residual connection:

y＝f(X)+x

x represents the input variable, corresponding to cross-layer addition; x represents the original input of the layer network; and y is the network layer output characteristic, and the characteristics extracted by the pre-training model are obtained by overlapping the above processes for a plurality of times.

And 5, splicing the features extracted by the pre-training model and the remote supervision features, marking the positions where the remote supervision trigger words appear by using the type numbers of the trigger words in the remote supervision library to obtain a discrete sequence feature, mapping the discrete sequence feature to a low-dimensional space through a remote supervision embedding layer, splicing the discrete sequence feature and the pre-training extracted features into a classifier to predict events, and carrying out classification prediction on the positions of each word to determine whether the trigger words are the trigger words.

In a further embodiment, in order to alleviate the false trigger word information caused by the remote supervision, in this patent, an anti-learning strategy is adopted, and in the process of learning and parameter updating by the remote supervision embedded layer, a disturbance in a gradient direction is added by adopting an anti-learning method, so that the anti-noise interference capability of the model is improved, and the method specifically comprises the following steps:

the computing model counter propagates gradients at the remote supervisory feature embedding layer:

the optimal point in disturbance offset constraint is prevented by means of spherical mapping, a disturbance radius epsilon is set, the size of a constraint range of disturbance is represented, and a disturbance step size is obtained:

wherein,for the constraint space of disturbance, +.>Is the step length of the small steps.

The original parameters of the remote supervision characteristic embedding layer are reserved, the obtained disturbance step length is used for adding anti-disturbance, and forward and backward propagation is carried out again to obtain a new gradient.

And recovering original parameters of the remote supervision feature embedding layer, and updating overall parameters of the model by using the new gradient added with disturbance.

The event element extraction network structure is similar to event prediction, only the remote supervision feature embedding layer is removed, and each type of element is predicted at the output layer. Tests prove that the method is superior to other machine learning event extraction methods in terms of accuracy, recall and f1 score, as shown in fig. 4.

And 7, training the event extraction model by using training data, evaluating the quality of the training by verifying the data set and the test data set, and selecting the model with the optimal performance for use by multiple rounds of iteration.

As described above, although the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

step 2, preprocessing the marked data;