CN112084330A

CN112084330A - Incremental relation extraction method based on course planning meta-learning

Info

Publication number: CN112084330A
Application number: CN202010806791.2A
Authority: CN
Inventors: 李学恺; 吴桐桐; 漆桂林
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-12-15

Abstract

The invention discloses an incremental relation extraction method based on course planning meta-learning, which comprises the following steps of: constructing a course planning meta-learning incremental relation extraction framework; calculating the similarity among the relations, and replanning and memorizing the cache training sequence according to the similarity; training the relation extraction neural network model, and updating the parameters of the relation extraction neural network model by using a meta-learning parameter updating formula in combination with the meta-learning thought; selecting a batch of memory caches from the training data by using a clustering method; and loading the updated parameters into the relation extraction neural network model, and performing performance test. The invention introduces the ideas of course learning and meta learning into the task of extracting the increment relation, effectively improves the average accuracy and the overall accuracy of the model, and reduces the error range of the model, thereby effectively relieving the problems of 'catastrophic forgetting' and 'order sensitivity' in the increment learning and obtaining the effect superior to the traditional method on the mainstream data set.

Description

Incremental relation extraction method based on course planning meta-learning

Technical Field

The invention belongs to the field of computer natural language processing, and particularly relates to an incremental relation extraction method based on course planning meta-learning.

Background

Relationship extraction is an important task in the field of information extraction, and aims to extract semantic relationships between pairs of specified entities from natural language text. For example, in the example given in fig. 1, there is a relationship "located" between "southeast university" and "Nanjing". The relation extraction has wide application scenarios in subsequent applications such as automatic construction of knowledge graphs, automatic question answering and text mining.

The traditional supervised learning relationship extraction method has a good effect on the task and is widely used. However, in practical use scenarios, the construction of the knowledge-graph is often a continuous, iteratively updated, incremental process. New relationship categories will be added continuously, which puts higher demands on the relationship extraction system, namely the ability to process incremental data. Conventional supervised learning relationship extraction methods are still limited in this regard.

The traditional supervised learning relationship extraction method is set based on supervised learning, and all possible relationships are required to be defined in advance in model training and are trained by using a full data set. This makes the traditional relational extraction method require retraining the classification model using the full amount of data including incremental data when faced with new data, which takes a lot of effort and time cost. Some models have the capability of handling new relationship classes, such as metric learning models, but if incremental data is used directly to train the model to identify new classes, serious "catastrophic forgetting" problems can result, i.e., once an existing model is trained using a new data set, the model loses the capability of identifying the original data set, which is also a challenge in incremental learning.

Another important issue in incremental learning relationship extraction is "order sensitivity," i.e., the neural network of relationship extraction continuously processes a series of tasks during an update iteration, each task containing a different relationship class and training data. The performance of the final system is not only affected by the division of the training data, but also greatly affected by the training sequence of each task in the training process. This makes the performance of the relationship extraction system unstable, which makes the quality of the subsequent knowledge-graph construction work difficult to control.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the method for extracting the incremental relationship based on the course planning meta-learning is provided, the method is based on the meta-learning, the course planning method is creatively added, and the problems of 'catastrophic forgetting' and 'order sensitivity' can be better solved.

The technical scheme is as follows: in order to achieve the above object, the present invention provides an incremental relationship extraction method based on curriculum planning meta-learning, which comprises the following steps:

s1: constructing a course planning meta-learning incremental relation extraction framework; the incremental relation extraction framework comprises a relation extraction neural network model, a memory cache list, a network model training method, a meta learning parameter updating formula, a memory cache screening method and a memory training course planning method;

s2: calculating the similarity among the relations, and replanning and memorizing the cache training sequence according to the similarity;

s3: training the relation extraction neural network model, and updating the parameters of the relation extraction neural network model by using a meta-learning parameter updating formula in combination with the meta-learning thought;

s4: selecting a batch of memory caches from the training data in the step S3 by using a clustering method;

s5: loading the updated parameters into a relation extraction neural network model, and carrying out performance test;

s6: steps S3-S5 are repeated until no new tasks need to be processed.

Further, the updating the parameters of the relation extraction neural network model by the meta-learning parameter updating formula in step S3 specifically includes: and substituting the optimal parameters on the current task and the parameters of the previous task into a meta-learning parameter updating formula for updating.

Further, the selecting method of the memory cache in step S4 specifically includes:

a1: extracting a hidden layer of the neural network model from the relationship to obtain vector representations of natural language sentences of examples in all training data;

a2: clustering all vector representations into categories with required memory quantity by using a K-Means clustering algorithm, wherein each category has vector representation of sentences of at least one example; for the category with only one instance, taking the instance of the category as a memory; for a category of more than one instance, calculating the arithmetic mean of the vector representations of all instance sentences under the category as the vector centers of all instance sentences of the category, and taking the instance with the closest Euclidean distance of the instance sentence vector from the vector center as a memory instance.

Further, the calculation process of the inter-relationship similarity in step S2 is as follows: the relation label character is R, and the R is divided into a list W composed of a plurality of single characters by using a word segmentation tool₁,W₂,…,W_n]Obtaining word vector list of single word list [ E₁,E₂,…,E_n]Taking the mean value of the word vector list as the vector representation of the relation R, and calculating the relation R pair by pair_iAnd R_jCosine similarity between them, denoted Sim_i，jAnd storing the data in a similarity matrix Sim.

Further, the process of programming the memory buffer training sequence in step S2 is as follows: when the Task list is not empty, the first Task in the list is taken out_iAll training data of

Contains k relation classes

Will be provided with

Added to senrrelations (sets of known relationships) that are used to deposit relationships that have already occurred.

Further, the training process of the relation extraction neural network model in step S3 is as follows:

each relationship class has N_trainStrip training deviceExample, N_validStrip verification example and N_testThe test example strips labels which do not exist in the senrelationships in the error relationship label list in the three groups of examples; will N_trainStripe training instance partitioning into size N_batchThe training batch of (1), i.e. the training data is divided into a length of

The Ceil function represents rounding up;

task for the first Task₀Circularly taking out each training batch TrainBatch, and extracting a neural network by training relations;

task for non-first Task_iCyclically fetching each training batch TrainBatch and cyclically fetching a memory instance group M of one task from the MemoryList_jEach instance in the memory instance group comprises a relation category label, the similarity of the relation category label of each instance in the memory instance group and the relation category label of the instance in the training data batch is obtained from the similarity matrix Sim, the memory instance groups are arranged from small to large by using the similarity, and the memory instance group M 'is ordered by using the similarity'_jTraining the relation extraction neural network, and then training the relation extraction neural network by using a training batch TrainBatch;

task for any Task_iAfter one round of training data lists per complete cycle, use the verification example

Evaluating the network training condition of the current task and saving the parameter theta of the round of network with the best performance_iWhen the network does not obtain better effect in multiple continuous turns on the current task, the network parameters are updated

Further, the meta learning parameter update formula in step S3 includes a linear parameter update formula, a fixed parameter update formula, and a square root parameter update formula. Different formulas can be adopted on different data sets, certain influence is exerted on the final performance, and the method belongs to one of adjustable hyper-parameters. Three formulas can be used for experiments according to actual application scenes to finally determine a specific formula form.

Further, the performance test in step S4 is performed in the form of average accuracy and overall accuracy of the test model on the test data of all previous tasks.

Further, the calculation formula of the average accuracy in step S4 is as follows:

wherein, acc_iTo be at Task_iThe accuracy on the test case set of (1);

the overall accuracy is calculated as follows:

wherein the content of the first and second substances,

is the accuracy on all test data.

Further, the error range of the performance index in step S4 is calculated by: re-dividing tasks or disordering task sequences, measuring performance under other task divisions or task sequences, counting average accuracy and overall accuracy under various task divisions and sequences, and calculating error range of performance index according to the following formula:

wherein the content of the first and second substances,

representing the confidence coefficient when the confidence is alpha, representing the statistical sampleAnd n is the number of statistical samples.

The incremental learning relationship extraction is a relationship extraction setting closer to the actual application scene. In incremental learning relationship extraction, data is divided according to tasks, each task comprises a group of relationship categories different from other tasks, and each relationship category comprises a group of training examples D_trainA set of verification instances D_validAnd a set of test cases D_test. Each instance is composed of a natural language sentence S, a pair of entities (H, T) contained in the sentence, correct relation labels (R) existing between the entities and error relation labels [ R ] not existing between a plurality of entities₁,R₂,…,R_s]And (4) forming. The artificial neural network model processes each task in sequence according to a certain sequence, and trains until no new task exists, and at the moment, the model can judge all the existing relations.

For the incremental learning relation extraction method, three indexes can measure the performance, the overall accuracy, the average accuracy and the error range. The overall accuracy rate is the accuracy rate obtained by the method on all the trained relation test sets, and reflects the performance level of the method; the average accuracy is the arithmetic average of the accuracies obtained on the test set of all processed tasks, reflecting the 'catastrophic forgetting' resistance of the method; the error range reflects the random fluctuation of the overall accuracy and the average accuracy of the method on a task sequence formed by dividing and sequencing a plurality of groups of different data, and reflects the order sensitivity resistance of the method.

Has the advantages that: compared with the prior art, the method introduces the curriculum learning and the meta learning ideas into the incremental relation extraction task, effectively improves the average accuracy and the overall accuracy of the model, and reduces the error range of the model, thereby effectively relieving the problems of 'catastrophic forgetting' and 'order sensitivity' in incremental learning, and obtaining the effect superior to the traditional method on the mainstream data set.

Drawings

FIG. 1 is a specific example of relationship extraction described in the background;

FIG. 2 is an alternative formula for updating neural network model parameters in the present invention;

FIG. 3 is a flow diagram of an incremental relationship extraction framework based on curriculum planning meta-learning, in accordance with an embodiment of the present invention;

FIG. 4 is a structure of a relationship extraction neural network model in an embodiment of the present invention;

FIG. 5 is a graph comparing the performance of the method of the present invention with currently existing methods.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

In this embodiment, a data set, Lifelong FewRel, is extracted based on an incremental relationship, a BiLSTM model is used as a relationship extraction neural network model, the structure of which is shown in fig. 4, and GloVe word embedding vectors are used as pre-training word vectors to encode words in sentences.

In this embodiment, as shown in fig. 3, the incremental relationship extraction method based on course planning meta learning provided by the present invention specifically includes the following steps:

step 1) dividing the data set according to tasks.

Reading a Life Long FewRel data set, and listing the relation by using a word segmentation tool [ R ]₁,R₂,…,R_n]Dividing words in all relations, encoding each word in the relation by using GloVe word embedded vectors, averaging the divided words belonging to the relations one by one to obtain a vector representation list [ E ] of the relations₁,E₂,…,E_n]. The vector list of relationships is clustered into 10 different clusters using the K-Means algorithm, so that all relationships can be divided into 10 different clusters.

According to the division, training data, verification data and test data in the data set are divided into 10 tasks according to the correct relation labels of the examples.

Initializing BilSTM model parameters theta as relational extraction neural networks₀。

An empty list is initialized as a memory list MemoryList for storing the set of memory instances screened after each task is trained.

An empty list senrrelations is initialized to store the correct relationship labels contained in all the trained instances.

And 2) calculating the similarity among the relation categories.

Representing the list [ E ] by using the relation vector obtained in the step 1)₁,E₂,…,E_n]Get E pair by pair_iAnd E_jCalculating cosine similarity, and recording as Sim_i，jAnd storing the data in a similarity matrix Sim.

Step 3) when the Task list is not empty, taking out the first Task in the list_iAll training data of

Contains k relation classes

Will be provided with

Added to SeenRelations. And eliminating labels which are not in SeneRelations in the error relation label lists in the training examples, the verification examples and the test examples, and dividing the training data into training batches with the size of 50.

Task for the first Task₀And circularly taking out each training batch TrainBatch, segmenting the sentence S by using a segmentation tool for the training examples in one training batch, and encoding the sentence S by using a GloVe word vector. And taking the coded sentences and the correct relation class labels in the training examples as a group of training positive examples, and forming the training negative examples one by the coded sentences and the error relation labels in the training examples. Using the processed sentence code as the input of the relation extraction neural network BilSTM model, and using the positive example label or the negative example labelAs a label of the model, MarginLoss is used as a model loss function, and the Adam algorithm is used for carrying out back propagation optimization on the model.

Task for non-first Task_iCyclically fetching each training batch TrainBatch and cyclically fetching a memory instance group M of one task from the MemoryList_jEach instance in the memory instance group comprises a relationship class label, the similarity between the relationship class label of each instance in the memory instance group and the relationship class label of the instance in the training data batch is obtained from the similarity matrix Sim, the memory instance groups are arranged from small to large by using the similarity, and the sorted memory instance group M 'is respectively encoded and processed according to the method'_jThe BiSLTM model was trained using the processed data from a normal training batch TrainBatch.

Evaluating the network training condition of the current task and saving the parameter theta of the round of network with the best performance_iWhen the network does not achieve better effect on the current task for multiple successive rounds, the parameters on the current task can be updated using one of the formulas shown in FIG. 2

In the embodiment, the linear parameter update formula in fig. 2 is used to update the parameters on the current task

Wherein the content of the first and second substances,

representing the parameters of the model after the last task is completed, wherein the epsilon is a fixed constant,n is the total number of tasks.

Step 4) Task from current completion training_iTraining data of

In the following method, 50 examples are selected as the memory example set of the current task.

Extracting neural network BilSTM model from the relationship to obtain all training data

The vector representation of the natural language sentence S of (1), which may be obtained from the output of the last hidden layer of the model, all vector representations are clustered into 50 classes using the K-Means clustering algorithm. There will be a vector representation of the sentences of at least one instance in each category.

For the category with only one instance, taking the instance of the category as a memory; for a category of more than one instance, the arithmetic mean of the vector representations of all instance sentences under the category is calculated, and the instance with the closest euclidean distance to the mean sentence vector representation of the instance is taken as a memory, and each task will take 50 memory instances and add to the MemoryList.

Step 5) updating the network parameters

And loading the average accuracy and the overall accuracy of the test model on the test data of all the previous tasks in the relation extraction network BilSTM model.

The average accuracy is calculated as follows:

wherein, acc_iTo be at Task_iThe accuracy over the test case set.

The overall accuracy is calculated as follows:

wherein the content of the first and second substances,

is the accuracy on all test data.

Step 6) repeat steps 3) to 5) until no new tasks need to be processed. And measuring the average accuracy and the overall accuracy under the current task division.

Repartitioning the tasks or disordering the task order and then repeating steps 2) through 5). The performance under other task divisions or task orders is measured. The average accuracy and the overall accuracy under the division and the sequence of various tasks are counted, and the error range of the performance index is further calculated, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

and the confidence coefficient when the confidence coefficient is alpha is shown, the standard deviation on the statistical samples is shown, and n is the number of the statistical samples.

In order to verify the effect of the method of the present invention, in this embodiment, the performance of the method of the present invention is compared with the traditional supervised learning method and other incremental relationship extraction methods, and the comparison result is shown in fig. 5, it can be seen that the method of the present invention extracts a data set in an incremental relationship: the average accuracy and the overall accuracy of the Life FewRel data set and the Life SimpleQuestion data set are superior to those of the traditional method and other incremental relation extraction methods, and have smaller error ranges, which shows that the method provided by the invention can effectively relieve the 'catastrophic forgetting' problem in incremental learning and has better stability.

In summary, the method provided by the invention is based on the Life FewRel dataset and the Life SimpleQuestion dataset, extracts the neural network by taking the BilSTM model as a relation, utilizes the Glove pre-training word vector, and combines the ideas of course learning and meta-learning, thereby providing the incremental relation extraction method with high accuracy and high stability. Based on the method, a relationship extraction neural network model with better effect can be trained in the application scenes of continuous updating and iteration, and an incremental relationship extraction system for continuous learning is constructed.

Claims

1. An incremental relation extraction method based on course planning meta-learning is characterized by comprising the following steps:

s1: constructing a course planning meta-learning incremental relation extraction framework;

s6: steps S3-S5 are repeated until no new tasks need to be processed.

2. The incremental relationship extraction method based on curriculum planning meta-learning according to claim 1, wherein the updating formula of meta-learning parameters in step S3 is specifically to update parameters of the relationship extraction neural network model by: and substituting the optimal parameters on the current task and the parameters of the previous task into a meta-learning parameter updating formula for updating.

3. The method as claimed in claim 1, wherein the selecting method of the memory cache in step S4 specifically comprises:

4. The method for extracting incremental relationship based on curriculum planning meta learning as claimed in claim 1, wherein the step S2 is performed by calculating the similarity between relationships as follows: the relation label character is R, and the R is divided into a list W composed of a plurality of single characters by using a word segmentation tool₁,W₂,…,W_n]Obtaining word vector list of single word list [ E₁,E₂,…,E_n]Taking the mean value of the word vector list as the vector representation of the relation R, and calculating the relation R pair by pair_iAnd R_jCosine similarity between them, denoted Sim_i，jAnd storing the data in a similarity matrix Sim.

5. The method as claimed in claim 1, wherein the step of learning the cached training sequence in step S2 comprises: when the Task list is not empty, the first Task in the list is taken out_iAll training data of

Contains k relation classes

Will be provided with

Added to the set of known relationships SeenRelations.

6. The method of claim 5, wherein the training process of the neural network model for relationship extraction in step S3 is as follows:

each relationship class has N_trainBar training example, N_validStrip verification example and N_testThe test example strips labels which do not exist in the senrelationships in the error relationship label list in the three groups of examples; will N_trainStripe training instance partitioning into size N_batchThe training batch of (1), i.e. the training data is divided into a length of

The Ceil function represents rounding up;

Evaluating the current taskNetwork training situation and saving parameter theta of one round of network with best performance_iWhen the network does not obtain better effect in multiple continuous turns on the current task, the network parameters are updated

7. The method as claimed in claim 6, wherein the meta-learning parameter updating formulas in step S3 include linear parameter updating formulas, fixed parameter updating formulas and square root parameter updating formulas.

8. The method of claim 1, wherein the performance test in step S4 is performed by averaging accuracy of the test model over the test data of all previous tasks and calculating the error range of the performance index.

9. The method for extracting incremental relationship based on curriculum planning meta-learning as claimed in claim 8, wherein the average accuracy in step S4 is calculated as follows:

wherein, acc_iTo be at Task_iThe accuracy on the test case set of (1);

the overall accuracy is calculated as follows:

wherein the content of the first and second substances,

is the accuracy on all test data.

10. The method as claimed in claim 8, wherein the error range of the performance index in step S4 is calculated as follows: re-dividing tasks or disordering task sequences, measuring performance under other task divisions or task sequences, counting average accuracy and overall accuracy under various task divisions and sequences, and calculating error range of performance index according to the following formula:

wherein the content of the first and second substances,