CN113254675B

CN113254675B - Knowledge graph construction method based on self-adaptive few-sample relation extraction

Info

Publication number: CN113254675B
Application number: CN202110808184.4A
Authority: CN
Inventors: 孙喜民; 周晶; 毕立伟; 李晓明; 王帅; 孙博; 郑斌; 刘丹; 常江
Original assignee: State Grid E Commerce Co Ltd; State Grid E Commerce Technology Co Ltd
Current assignee: State Grid E Commerce Co Ltd; State Grid E Commerce Technology Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2021-11-16
Anticipated expiration: 2041-07-16
Also published as: CN113254675A

Abstract

The invention discloses a knowledge graph construction method based on self-adaptive few-sample relation extraction, which comprises the following steps of extracting the relation among entities by adopting a self-adaptive relation extraction model, wherein the construction of the self-adaptive relation extraction model comprises the following steps: s100: encoding the training set instance by using a text encoder to generate context semantics; s200: inputting the support set into a parameter generator to generate an initialization softmax parameter; s300: inputting the context semantics generated in the step S100 into an adaptive graph neural network, and updating the instance by using the adaptive graph neural network; s400: and carrying out classification prediction on the updated examples by using a softmax classifier, and acquiring the relationship type. According to the method and the device, a large amount of manual marking data is not needed when the relation is obtained, time and money consumption caused by a large amount of manual marking is avoided, and the relation extraction task in a specific field can be completed through a small amount of label data in the specific field.

Description

Knowledge graph construction method based on self-adaptive few-sample relation extraction

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a knowledge graph construction method based on self-adaption few-sample relation extraction.

Background

The knowledge graph is a series of different graphs for displaying the relation between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using a visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. In the prior art, the construction of a knowledge graph facing the general field is to form the knowledge graph by using an original unstructured text, and mainly comprises the following steps: (1) extracting entities, namely automatically identifying the entities from the unstructured text; (2) extracting the relationship, namely identifying the relationship between the entities; (3) entity linking, namely performing logic attribution and redundancy elimination on the extracted entities and relationship data; (4) and (4) knowledge reasoning, namely automatically reasoning the missing of the relation value according to the fact triple, and completing the knowledge graph.

The steps (1) and (2) both relate to an information extraction technology, the information extraction is an important component in natural language processing, and especially in the current information society, the extraction of useful information from mass data is significant. The information extraction can be divided into entity extraction, relation extraction, event extraction and the like. The relationship extraction task is generally in the form of given text and two entities involved in the text, and determines whether and what relationship exists between the entities. The relation extraction is not only an important link in knowledge map construction, but also widely used in technologies such as automatic question answering, automatic summarization, emotion analysis and the like.

The traditional supervised learning method has good effect on the relation extraction task, but in practical application, the relation extraction method based on supervised learning requires enough and completely labeled training data, but the work of labeling the data consumes a large amount of manpower and material resources, and is difficult to migrate to other fields. Therefore, it is necessary to research how to improve the relationship extraction performance by using little or even no annotation data.

In order to solve the problem of data requirement in supervised learning, a solution is a remote supervision method, the basic idea is to rely on the existing knowledge base, obtain a text containing an entity pair in the knowledge base from the text as a training corpus, and Mintz proposes an assumption, if a certain entity pair in the knowledge base has a certain relationship, all data containing the entity pair express the relationship. However, the remote supervision has the defect that the generated data has a large amount of noise data, and the problem of long tail of sample distribution cannot be solved essentially. The other solution idea is how to fully utilize a small amount of labeled samples for training, so that the model has better generalization capability, namely, learning of a small amount of samples.

At present, there are two main methods for extracting and learning the relation of few samples: metric learning and meta learning. Metric learning is the learning of a metric function through a priori knowledge, with which the input is mapped to a subspace such that pairs of similar and dissimilar data can be readily resolved, typically for classification problems. Meta-learning is mainly to optimize the strategy of finding the optimal parameters in the hypothesis space, for example, finding a suitable initial model parameter, and learning an optimizer to directly output the parameter update.

Graph neural networks are an emerging field in recent years, extend traditional neural networks to non-euclidean space, perform graph operations on graph structures, and have certain interpretable performance. The graph neural network takes structural information between the categories as a channel for information propagation, and can well extract the relationship between samples. The method simulates the corresponding association and distinguishing mechanism of the human brain in cognition, and acquires more auxiliary information about a new task, so that the problem of insufficient sample data is solved. The graph neural network can well capture the difference between the categories, and the category classification is convenient to realize.

Disclosure of Invention

The invention introduces the graph neural network into the extraction of the few-sample relationship and provides a knowledge graph construction method based on the self-adaption few-sample relationship extraction. The method can avoid time and money consumption caused by a large amount of manual labeling, can quickly complete the relation extraction task of the specific field through a small amount of label data of the specific field, and has good generalization performance on the unseen field.

The invention considers that the model forgets the old task in the migration from the old task to the new task and considers that a large number of labeled training samples are needed when the model carries out the new task training, applies the graph network neural to the multi-task problem, and realizes the rapid and accurate classification on the basis of providing only a small number of sample images without providing a large number of labeled training samples by utilizing the characteristic that the information in the graph neural network can be spread and aggregated among nodes.

The method for constructing the knowledge graph based on the self-adaptive few-sample relation extraction, provided by the embodiment of the invention, comprises the following steps:

automatically extracting entities from the acquired unstructured text;

extracting the relation between the entities by taking the original unstructured text and the identified entities as the input of a relation model;

performing entity linking based on the extracted entities and relationship data;

and automatically deducing the missing of the relation value according to the fact triple, and completing the knowledge graph.

The relational model is constructed as follows:

given a training set comprisingMA class under each of which isNEach instance comprises a sentence and a head entity and a tail entity of the sentence; randomly extracting M1 classes from the training set, and randomly extracting from each classKAn instance, constructing a support set

，

，

(ii) a Remaining from each categoryN-KRandom sampling in one sampleLConstructing a query set by each instance;

s100: encoding the training set instance by using a text encoder to generate context semantics;

s200: inputting the support set into a parameter generator to generate an initialization softmax parameter;

s300: inputting the context semantics generated in the step S100 into an adaptive graph neural network, and updating the instance by using the adaptive graph neural network; the adaptive graph neural network is constructed as follows:

s310: constructing a point diagram, wherein nodes represent a feature vector of an example, and edges describe the similarity relation between the examples;

s320: constructing a distribution graph, wherein nodes represent the distribution of an example, and edges describe the similarity relation between the distribution and the distribution; the distribution refers to a vector formed by similarity relation between one example and all other examples;

s330: taking context relation semantics of the support set and the query set as feature vectors, initializing nodes of the point diagram, and initializing corresponding edges of the point diagram by using similarity among the nodes;

s340: initializing nodes of the distribution diagram by using the similar relation vectors of each instance in the support set and the query set, and initializing corresponding edges of the distribution diagram by using the similar relation between the nodes;

vector of similarity relationship

，

I.e. the first in the distribution diagramiA node; i | represents that, the cascade operation,

and

respectively show examplesiAnd examplesjA relationship category label of, if

Then, then

Otherwise

；

S350: aggregating the similarity relation between the nodes in the point diagram and the node in the distribution diagram of the previous layer to serve as the updated distribution diagram node, and updating the edge of the distribution diagram;

s350: aggregating the similarity relation between each node in the updated distribution diagram and the corresponding node in the row-level point diagram to serve as the node of the updated point diagram, and updating the aggregation from the point diagram to the point diagram;

s400: and carrying out classification prediction on the updated examples by using a softmax classifier, and acquiring the relationship type.

Further, in step S100, the positions of the sentences and the head and tail entities in the example are encoded.

Further, encoding the positions of the sentences and the head and tail entities in the example, further comprises:

s110: mapping each word in the example sentence into a word vector;

s120: based on the word vectors, coding each word and the relative position of two entities of the sentence where the word is located respectively, and connecting the obtained coding vectors to obtain the position codes of the words;

s130: and inputting the examples and the position codes of the words in the examples into a text encoder to generate the context semantics of each example.

Further, step S200 further includes:

s210: dividing the support set instances according to the relation category;

s220: generating the weight and the bias corresponding to each relationship category by using the example under each relationship category;

s230: and the weights and the bias weights corresponding to all the relation categories form a weight vector and a bias vector, namely initializing the softmax parameter.

Further, in the sub-step S330, the similarity relationship between the nodes of the point diagram

Wherein, in the step (A),

node representing initialization

And node

The similarity relationship between the two components is similar,

representing two layers of convolution-regularization-RELU network and sigmoid active layer;

in sub-step S340, the similarity relationship between nodes of the distribution diagram is used to describe the edges

，

The method comprises the following steps that (1) the method comprises a two-layer convolution-regularization-RELU network and a sigmoid activation layer;

and

all are existing neural networks.

The invention has the following characteristics and beneficial effects:

the invention not only improves the accuracy of relation extraction under specific tasks, but also improves the generalization performance of tasks which do not appear. A large amount of manual marking data is not needed when the relation is obtained, time and money consumption caused by a large amount of manual marking is avoided, and the relation extraction task of the specific field can be completed through a small amount of label data of the specific field.

The invention not only shows and considers the relationship between the examples, but also pays attention to the relationship between the example distribution and the example distribution, thereby better depicting the boundaries of different relationships and improving the discriminability of relationship representation under specific tasks. Meanwhile, because the input space of the natural language is shared among all NLP tasks, the adaptive method based on the meta-learning may generalize unseen tasks, i.e., relationship classes that do not appear in the training set may also be extracted.

Drawings

FIG. 1 is a detailed flow chart of relationship extraction in the embodiment.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It is to be understood that the specific embodiments described are merely a few examples of the invention and not all examples. All other embodiments, which can be derived by a person skilled in the art from the described embodiments without inventive step, are within the scope of protection of the invention.

The construction method of the knowledge graph uses the scenes that: the device for constructing the knowledge graph of the vertical field comprises a device for constructing the knowledge graph of the vertical field and a server, wherein the device for constructing the knowledge graph of the vertical field needs the device for constructing the knowledge graph to obtain a plurality of types of unstructured texts in the server, and the method for constructing the knowledge graph of the vertical field is further adopted to process the unstructured texts, so that the knowledge graph of the vertical field is constructed.

The execution subject of the method for constructing the knowledge graph can be a device for constructing the knowledge graph, and the device for constructing the knowledge graph can be realized by any software and/or hardware.

The embodiment of the invention discloses a knowledge graph construction method based on self-adaptive few-sample relation extraction, which comprises the following specific steps of:

and step one, extracting entities, namely automatically identifying the entities from the unstructured text.

This step is the automatic recognition of named entities from the original unstructured text. The embodiment adopts the LSTM-CRF technology, each word in the unstructured text is represented as a word embedding, the word embedding is used as the input of the LSTM model, and the prediction score of each word is output. The score of LSTM layer prediction is re-entered into the CRF layer. In the CRF layer, the tag sequence with the highest predicted score is selected as the best answer.

And step two, extracting the relationship, namely identifying the relationship among the entities.

The step is the key innovation of the knowledge graph construction method. Inputting the original unstructured text and the two entities identified in the step one into a trained adaptive relationship extraction model based on the distribution level relationship, and selecting the relationship category with the highest score from the classification scores output by the model as the relationship between the two entities. The detailed procedure of this step will be provided later.

And step three, entity linking, namely performing logic attribution and redundancy elimination on the extracted entities and relationship data.

And after acquiring the entities and the relationships among the entities from the original unstructured text, performing logic attribution and redundant error filtering on the entities and entity relationship data through entity links. And extracting the entity representation after the model is updated according to the adaptive relationship, and calculating the similarity between any two entities.

And fourthly, knowledge reasoning, namely automatically reasoning the relation value loss according to the fact triple and completing the knowledge graph.

In the step, the lost fact is automatically deduced according to the existing fact triple, the relation value loss between the knowledge graphs is processed, further knowledge discovery is completed, and the completion of the knowledge graphs is carried out. In this embodiment, a distributed inference model transit is adopted, and a relationship in each triple instance (head, relationship, tail) is regarded as a mapping from the head to the tail of the entity, and a condition h + r = t is satisfied, where h denotes a head entity vector, r denotes a relationship vector, and t denotes a tail entity vector. In the knowledge graph, if the head and tail entity vectors do not exist in the existing triples, the relation vectors are calculated through t-h, and the relation between the head and tail entities is obtained to supplement the knowledge graph.

Fig. 1 shows a detailed flow of relationship extraction in the embodiment, which includes the following specific processes:

the method comprises the steps of receiving original unstructured text, namely a relational data set, wherein the relational data set adopts a data set FewRel1.0, and the relational data set is formed by combining data according to relational categories. Extraction from relational data sets by relational categoryMThe individual relationship class data form a training set

The remaining relationship category data constitutes a test set

. Training set

IncludedMA category, each category havingNAn instance, each instance

，

Is shown asiIn one example of the above-described method,

the representation of a sentence is represented by,

representing sentences

The head entity of (a) is,

representing sentences

The tail entity of (1). To simulate the test-time scenario during the training period, from the training set

In the random extraction

Each class is randomly extracted from each classN1 instance constructs a support set, the first in the support setsEach element is marked as

，

As an example

Corresponding relationship category labels. Remaining from each categoryN-NRandom sampling of 1 sampleN2 instances construct a query set

Querying the first in the setqEach element is marked as

，

Is composed of

Corresponding relation category markAnd (6) a label.

Firstly, a text encoder is used for encoding the instances in the training set to generate context semantics.

The coding in the step comprises coding sentences in the examples and entity positions in the sentences, and carrying out nonlinear combination on the sentence coding and the position coding. The specific method comprises the following steps:

in the present embodiment, for each example

，

Is shown asiAn example. Example sentence Using word2vec

Each word in (1)

Mapping into a word vector

，

Is the dimension of the word vector and,

representing example sentences

To (1) akThe number of the individual words,ksequentially taking 1, 2 and …K，KAs sentences

The number of words in. Will be provided with

Each word in

Respectively coding the relative positions of two entities (head entity and tail entity) of the sentence into two relative vectors, and connecting the two vectors to obtain the position code

，

，

Being the dimension of the relative position vector, a connection of 2 relative position vectors, the dimension is

. Here, the number of the first and second electrodes,

the relative position with the sentence entity refers to:

in sentences with entities

The number of spaced words in (a).

By way of example

The generated semantic representation of the context as input to the text encoder is noted

. In the present embodiment, a transform model is used as a text encoder.

Second, support set

The parameters are input into a parameter generator which is connected with a power supply,and generating a softmax parameter of the initialization generator under the current task.

This step further comprises the substeps of:

(1) will support the collection and press

Each category is divided into a plurality of categories, and the example set of each category is recorded as

，

A label representing a category is attached to the content,

namely the firstnA collection of class instances.

(2) For each instance under each category

Performing nonlinear mapping weighted summation to obtain representation of each category

，

Showing examples

Text encoder

Recurrent neural network

Then the outputs of all the examples of the nth class are weighted and summed and averaged,

is a vector of the weights that is,

is a bias vector.

In particular to a multilayer perceptron and a tanh active layer with two layers,

is the weight and bias of the linear layer in softmax. For the

Individual class weight vector

And an offset vector

Respectively recording as:

，

。

and thirdly, using the output of the first step as input fine tuning to obtain the optimal parameters under the specific task by using the self-adaptive graph neural network based on the distribution level relation, wherein the current parameters can enable the graph model based on the distribution level relation to well classify the current task.

The self-adaptive graph neural network based on the distribution level relation is constructed as follows:

(1) constructing a point map

：

A point diagram of the l-th generation of examples is shown,

representing a set of nodes, each node representing an instanceiThe feature vector of (2);

representing a set of edges, each edge describing an instanceiAnd examplesjThe similarity relationship between them.

(2) Constructing a distribution map

：

The distribution diagram of the l generation is shown,

representing a set of nodes, each node

Showing an exampleiDistribution of (2), exampleiIs a multi-dimensional vector, whereinjDimension is a node in a point diagramiAnd nodejSimilar relationship of

Node ofiRespectively solving the similarity relation with all the nodes in the point diagram to obtain an exampleiDistribution of (2).

Representing a set of edges, each edge describing an instanceiAnd examplesjThe similarity relationship between the distributions of (c).

(3) Initializing a dot diagram:

for the initialization of the point diagram, extracting the context semantics corresponding to the instances in the support set and the query set, and initializing the nodes of the first generation point diagram by using the context semantics

Then, the similar relation between the nodes is used for describing the edges

，

Is a two-layer convolution-regularization-RELU network and sigmoid active layer.

(4) Initializing a distribution diagram:

the purpose of the distribution map is to integrate the relationships between nodes to obtain relationships between distributions, so that each node of the distribution map is a feature vector of similar relationships of dimension M1 x N1jExample of the line description iAnd examplesjThe similarity relationship between them.

The nodes of the first generation profile are initialized as follows:

（1）

in the formula (1), i represents that cascade operation,

and

respectively show examplesiAnd examplesjA relationship category label of, if

Then, then

Otherwise

。

Describing edges using similarity relationships between nodes of a profile

，

(5) And aggregation and updating of the dot diagram to the distribution diagram.

For the profile of the l-th layer, the nodes are calculated as follows:

（2）

which aggregates the relationship between each node in the point diagram

And information of the node in the distribution map of the previous layer

，

The point diagram to distribution diagram propagation process is represented, and the point diagram to distribution diagram is a one-layer multi-layer perceptron network.

The edges in the profile are updated in a similar manner to the point map,

。

(6) and aggregating and updating the updated distribution map to the point map.

For the firstlAnd (3) deducing node information in the next generation point diagram by using the distribution diagram, wherein the calculation process is as follows:

（3）

which aggregates the relationships between each node in the profile

And information of the node in the previous layer point diagram

D2P shows the profile-to-profile propagation process, which is a one-layer fully connected layer and RELU active layer.TRepresenting the total number of instances in the support set and query set.

In the first placelLayer givenl-node representation and information coding information of any two nodes in a layer 1 (i.e. the upper layer) point diagram

The updating method is as follows:

note that normalization processing is performed here.

And fourthly, performing classification prediction on the classifier parameters based on the current classification task obtained in the second step by using the updated relation representation of each instance obtained in the third step, wherein the prediction result is the extracted relation type.

For the test specimen

，

，

Is a graph neural network of the distribution level relations in the third step,

is the classifier parameter under the current task.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. The method for constructing the knowledge graph based on the self-adaptive few-sample relation extraction comprises the steps of automatically extracting entities and the relation between the entities from the obtained unstructured text, carrying out entity linkage based on the extracted entities and relation data, and completing the knowledge graph; the method is characterized in that:

the relationship between the entities is extracted by adopting a self-adaptive relationship extraction model, and the self-adaptive relationship extraction model is constructed as follows:

given a training set comprisingMA class under each of which isNEach instance comprises a sentence and a head entity and a tail entity of the sentence; random extraction from training set

Each class is randomly extracted from each classKAn instance, constructing a support set

，

，

in step S100, the positions of sentences, head entities and tail entities in the example are coded;

the encoding of the positions of the sentences and the head and tail entities in the examples further comprises:

s110: mapping each word in the example sentence into a word vector;

s130: inputting the examples and the position codes of the words in the examples into a text encoder to generate context semantics of each example;

vector of similarity relationship

，

I.e. the first in the distribution diagramiA node; the expression of | l is for the cascade operation,

and

respectively show examplesiAnd examplesjA relationship category label of, if

Then, then

Otherwise

；

s350: aggregating the similarity relation between each node in the updated distribution map and the corresponding node in the previous layer of point map to serve as the node of the updated point map, and updating the aggregation from the point map to the point map;

2. The method for constructing a knowledge graph based on adaptive few-sample relationship extraction as claimed in claim 1, wherein:

step S200 further includes:

s210: dividing the support set instances according to the relation category;

3. The method for constructing a knowledge graph based on adaptive few-sample relationship extraction as claimed in claim 1, wherein:

in substep S330, similarity between nodes of the point map

Wherein, in the step (A),

node representing initialization

And node

The similarity relationship between the two components is similar,

representing a neural network;

，

Representing a neural network.

4. The method for constructing a knowledge graph based on adaptive few-sample relationship extraction as claimed in claim 1, wherein:

the entity linking based on the extracted entities and the relationship data specifically comprises the following steps:

and (3) utilizing the updated entity representation of the adaptive relationship extraction model, calculating the similarity between any two entities, and combining the two entities with the similarity larger than a set threshold value.