CN113326371A

CN113326371A - Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information

Info

Publication number: CN113326371A
Application number: CN202110480675.0A
Authority: CN
Inventors: 李书棋; 高阳
Original assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Current assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-08-31
Anticipated expiration: 2041-04-30
Also published as: CN113326371B

Abstract

The invention provides an event extraction method for fusing a pre-training language model and anti-noise interference remote monitoring information, and belongs to the technical field of computers. The method uses an integrated knowledge auxiliary model for judgment, is formed by introducing massive text pre-training, takes a pre-training language model containing a large amount of semantic grammar knowledge information as a network structure unit of an event extraction model, uses a model algorithm of a remote supervision characteristic of mixed anti-noise interference, and adds gradient direction anti-interference training under a circular constraint condition.

Description

Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information

Technical Field

The invention relates to an event extraction method for fusing a pre-training language model and anti-noise interference remote monitoring information, in particular to the technical field of computer data processing.

Background

With the continuous deepening of information construction in the internet era, mass internet information shows explosive growth, and how to utilize the internet information to assist the making of industry decisions becomes the key point of current attention of enterprises and even government countries. Information from the internet often appears in the form of characters, usually comes from channels such as news manuscripts, forum replies and the like, generally presents the characteristics of no structure and multiple redundancies, and needs to be read, understood and positioned to key information in the characters, and irrelevant contents are filtered. The event extraction is to present the data content of the non-structured text in a structured form, extract the expression key intention in the text by taking the event as a unit, convert the unstructured text information into the structured event information, and further be used in a series of subsequent work such as trend analysis, establishment of a physical knowledge graph, public opinion message early warning and the like, which is an important ring in the information extraction engineering.

The traditional event extraction usually depends on manual overall participation, and in the face of massive internet information, the traditional event analysis reads and searches related information data in huge article reports by means of manual work, and arranges and records the information data, so that a large amount of human resources are consumed. In order to solve the problem of large manpower consumption in the information structuring process, in recent years, a machine learning mode is proposed to identify and extract event patterns. The machine learning mode extracts the text segments which accord with the event pattern in a structured mode by identifying the event pattern in the characters, can realize batched machine text processing, and greatly improves the efficiency problem of extracting character structured information by manual reading. However, the traditional machine learning event pattern template still needs to rely on the knowledge of experts in the field, and corresponding event pattern features are automatically learned through labeled data by means of deep learning, so that the method becomes a new direction for structured extraction of events in recent years. Considering that the internet information is huge, the content is complex and various, the migration and generalization capability of the deep learning model among different events is improved, and the method becomes a difficult problem of the extraction of the internet event information. It is common practice to introduce external knowledge to assist the prediction of the model using remote supervision. The remote supervision algorithm assumes: for a structured event in an existing knowledge graph, it is assumed that any sentence in the external knowledge base containing the entity therein reflects this relationship to some extent. Based on the assumption, the remote supervision algorithm can label the relation labels for the sentences in the external document library based on a labeled small-sized knowledge map, which is equivalent to automatic labeling of samples, so that the remote supervision algorithm is a semi-supervision algorithm. However, the remote supervision brings wrong guidance information besides external knowledge information, and introduces noise interference to influence the accuracy of judgment of the model. The shortfalls of RNN and CNN in text representation capability also affect the prediction extraction of events. Therefore, it is a problem to be considered to study how to use a neural network model with stronger expression ability and use an external knowledge assisted depth model for event structured extraction, and simultaneously reduce error noise interference.

Disclosure of Invention

The purpose of the invention is as follows: an object is to provide an event extraction method for fusing a pre-training language model and anti-noise interference remote monitoring information, so as to solve the above problems in the prior art, enrich text information, and increase the resistance of the model to noise errors through anti-interference training.

The technical scheme is as follows: in a first aspect, an event extraction method for fusing a pre-training language model and anti-noise interference remote supervision information is provided, and the method includes the following steps:

step 1, training data corpus collection, internet text data obtained through a crawler, and data in a text form are stored in a txt file form through the crawler.

And 2, preprocessing the marked data, namely removing html tags and special symbols, and segmenting the text into short texts in the form of sentences or paragraphs.

Step 3, marking event trigger words, subjects, objects, time, places and event types of the text with events according to the event definitions, supplementing marked data into a remote monitoring knowledge base, and finishing marking the data; matching the marked data with a remote supervision knowledge base, adding the successfully matched trigger word into the remote supervision information of the sample, and performing the following steps: 1: the scale of 2 is divided into a training set, a validation set, and a test set.

And 4, respectively constructing models in two stages of event detection and event participation element extraction of event extraction.

And 5, training the event extraction model by using the training data, evaluating the quality of the training by verifying the data set and testing the data set, and selecting the model with the optimal performance for use by multiple rounds of iteration.

And 6, predicting and extracting events of the new unlabelled Internet text data by using the trained model, matching the new text with a remote supervision knowledge base after the new text is subjected to data preprocessing and cleaning, adding trigger words appearing in the remote knowledge base and the new text to be predicted into remote characteristics of the text, and inputting the trigger type of the model prediction event and related event participation elements.

In some implementations of the first aspect, the specific steps of constructing the event extraction model are as follows:

performing language modeling through a self-attention mechanism, and capturing multi-angle characteristic information in a text by using multi-head attention;

performing feature transformation extraction through a double-layer neural network FFN, adopting a ReLU as an activation function layer, and performing feature normalization processing by using layer normalization;

combining all layers of the pre-training model by utilizing a residual connection mode, and obtaining the characteristics extracted by the pre-training model through loop iteration;

marking the position where the remote supervision trigger word appears by using the type number of the trigger word in a remote supervision library to obtain a discrete sequence characteristic, and mapping the discrete sequence characteristic to a low-dimensional space through a remote supervision embedding layer;

splicing the features extracted by the pre-training model and the remote supervision features;

in the process of learning and parameter updating of the remote supervision embedded layer, adding disturbance in the gradient direction, and calculating the gradient of the model in the remote supervision characteristic embedded layer in a back propagation manner;

original parameters of a remote supervision characteristic embedding layer are reserved, the obtained disturbance step length is used for adding counterdisturbance, and forward and backward propagation is carried out again to obtain a new gradient;

restoring the original parameters of the remote supervision characteristic embedded layer, and updating the overall parameters of the model by using the new gradient after disturbance is added;

repeating the training process for multiple times, and selecting the model with the best result for use;

inputting new internet text data into a model through preprocessing;

and predicting whether an event occurs or not, and extracting the structured event elements when the event occurs is detected.

Has the advantages that: the invention provides an event extraction method fusing a pre-training language model and anti-noise interference remote monitoring information, which is characterized in that automatic event structured extraction is carried out by combining the pre-training language model and the remote monitoring information, and disturbance countermeasure training is added, so that the error noise information caused by remote monitoring is effectively relieved while the model effect is improved by introducing external knowledge, and the event extraction effect is improved. The comprehensive knowledge auxiliary model is used for judging, and is formed by introducing massive text pre-training, a pre-training language model containing a large amount of semantic grammar knowledge information is used as a network structure unit of an event extraction model, and a model algorithm of a remote supervision characteristic of mixed anti-noise interference is used.

Drawings

FIG. 1 is a model structure diagram of an event extraction method for fusing a pre-training language model and anti-noise interference remote monitoring information.

FIG. 2 is a schematic diagram of an event extraction method attention mechanism feature extraction that fuses a pre-training language model and anti-noise interference remote supervision information.

FIG. 3 is a diagram of an event extraction method remote monitoring feature layer structure that fuses a pre-trained language model with anti-noise interference remote monitoring information.

FIG. 4 is a diagram of experimental effects of an event extraction method that combines a pre-training language model with anti-noise interference remote monitoring information.

FIG. 5 is a schematic diagram of an event extraction method incorporating a pre-trained language model and anti-noise jamming remote supervisory information with circular constraints added.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, the present invention provides an event extraction method for fusing a pre-training language model and anti-noise interference remote monitoring information, which includes the following steps:

Specifically, an event extraction model combining a pre-training model and remote supervision is established, as shown in fig. 1, the specific process is as follows: firstly, a pre-training model structure based on massive text training is used for coding a text, and as shown in fig. 2, language modeling is performed through a self-attention mechanism:

q, K, V is a calculation matrix of attention, which is obtained by matrix operation of input data and corresponding parameters;

and representing the dimension of the text vector after the text is compressed by the embedded matrix.

Secondly, capturing feature information of multiple angles in the text by using multi-head attention as follows:

in the formula (I), the compound is shown in the specification,

representing a linear transformation matrix and participating in the updating of model training parameters; wherein:

in the formula (I), the compound is shown in the specification,

a transformation mapping matrix representing a Query vector;

a transformation mapping matrix representing Key vectors;

representing Value x vectorsTransforming the mapping matrix;

thirdly, performing feature transformation extraction through a double-layer neural network FFN, and taking ReLU as an activation function layer:

and using layer normalization for feature normalization:

wherein the content of the first and second substances,

a calculated parameter representing a first fully-connected network in the feedforward network layer;

a calculated parameter representing a second fully-connected network in the feedforward network layer; indicating that each word position of the sample represents the average of the vectors,

representing the variance of each word position representation vector for the sample,

is a learnable weight parameter;

a smaller value to prevent divide by zero errors;

indicating that the Element-wise Product is multiplied position by position.

From time to time, the layers of the pre-training model are combined in a residual connection mode:

x represents an input variable, corresponding to cross-layer addition;

representing the original input of the layer network; and y is the output characteristic of the network layer, and the characteristic extracted by the pre-training model is obtained by superposing the processes for multiple times.

And finally, splicing the features extracted by the pre-training model with the remote monitoring features, labeling the positions where the remote monitoring trigger words appear by using the type numbers of the trigger words in a remote monitoring library to obtain discrete sequence features, mapping the discrete sequence features to a low-dimensional space through a remote monitoring embedding layer, splicing the discrete sequence features with the pre-training extracted features and sending the discrete sequence features and the pre-training extracted features into a classifier for event prediction as shown in figure 3, and performing classification prediction on the position of each character to determine whether the trigger words are the remote monitoring trigger words.

In a further embodiment, the specific steps of constructing the event extraction model include:

language modeling is carried out through a self-attention mechanism, and multi-head attention is used for capturing feature information of multiple angles in a text.

And performing feature transformation extraction through a double-layer neural network FFN, adopting the ReLU as an activation function layer, and performing feature normalization processing by using layer normalization.

And combining all layers of the pre-training model by utilizing a residual error connection mode, and obtaining the characteristics extracted by the pre-training model through loop iteration.

And marking the position where the remote supervision trigger word appears by using the type number of the trigger word in a remote supervision library to obtain a discrete sequence characteristic, and mapping the discrete sequence characteristic to a low-dimensional space through a remote supervision embedding layer.

And splicing the features extracted by the pre-training model and the remote supervision features.

And in the process of learning and parameter updating of the remote supervision embedded layer, adding disturbance in the gradient direction, and reversely propagating the gradient of the remote supervision characteristic embedded layer by the calculation model.

And (4) retaining the original parameters of the remote supervision characteristic embedded layer, adding the anti-disturbance by using the obtained disturbance step length, and performing forward and backward propagation again to obtain a new gradient.

And recovering the original parameters of the remote supervision characteristic embedded layer, and updating the overall parameters of the model by using the new gradient after disturbance is added.

And repeating the training process for multiple times, and selecting the model with the best result for use.

And inputting new Internet text data into the model through preprocessing.

In a further embodiment, in order to alleviate false trigger word information brought by remote supervision, a counterstudy strategy is adopted in the patent, and disturbance in the gradient direction is added by adopting a counterstudy method in the process of learning and parameter updating of a remote supervision embedded layer. The anti-noise interference capability of the model is improved, and the method specifically comprises the following steps:

calculating the gradient of the model back propagation in the remote supervision feature embedding layer:

setting disturbance radius by preventing optimal point in disturbance deviation constraint in spherical mapping mode

Representing the size of the constraint range of the disturbance, and obtaining a disturbance step length:

wherein the content of the first and second substances,

in order to be a constraint space for the perturbation,

step size of a small step.

The event element extraction network structure is similar to the event prediction, only the remote supervision feature embedding layer is removed, and each type of element is predicted on the output layer. Tests prove that the method is superior to other event extraction methods of machine learning in accuracy, recall rate and f1 scores, and is shown in figure 4.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An event extraction method for fusing a pre-training language model and anti-noise interference remote supervision information is characterized by comprising the following steps:

step 1, collecting training data corpora, and storing data in a text form in a txt file form through a crawler according to internet text data obtained by the crawler;

step 2, preprocessing the marked data;

and 3, labeling the text according to the event definition, supplementing the labeled data into a remote monitoring knowledge base to finish labeling the data, and according to the following steps of 7: 1: 2, dividing the proportion into a training set, a verification set and a test set;

step 4, respectively constructing models in two stages of event detection and event participation element extraction of event extraction;

step 5, training an event extraction model by using training data, evaluating the quality of the training by verifying a data set and testing the data set, and selecting the model with the optimal performance for use by multiple rounds of iteration;

2. The method for extracting events fusing a pre-trained language model and anti-noise interference remote supervision information according to claim 1, wherein the event extraction model is constructed by the following specific steps:

repeating the training process for multiple times, and selecting a model with the optimal result for use;

inputting new internet text data into a model through preprocessing;

3. The event extraction method for fusing the pre-training language model and the anti-noise interference remote supervision information according to claim 1, wherein the pre-processing operation comprises removing html tags and special symbols, and segmenting the text into short texts in the form of sentences or paragraphs.

4. The method for extracting events fusing a pre-trained language model with anti-noise interference remote supervision information according to claim 1, wherein the step 3 is further as follows: marking event trigger words, subjects, objects, time, places and event types of the text with events according to the event definitions, supplementing marked data into a remote monitoring knowledge base, and finishing marking of the data; matching the marked data with a remote supervision knowledge base, adding the successfully matched trigger words into the remote supervision information of the current sample, and performing the following steps: 1: the scale of 2 is divided into a training set, a validation set, and a test set.