CN111831783A

CN111831783A - Chapter-level relation extraction method

Info

Publication number: CN111831783A
Application number: CN202010644404.XA
Authority: CN
Inventors: 张世琨; 叶蔚; 李博; 胡文蕙; 张君福
Original assignee: Beijing Peking University Software Engineering Co ltd
Current assignee: Beijing Peking University Software Engineering Co ltd
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2020-10-27
Anticipated expiration: 2040-07-07
Also published as: CN111831783B

Abstract

The invention provides a chapter-level relation extraction method, relates to the technical field of natural language processing, and mainly solves the technical problems of calculation resource consumption and logical reasoning between a target entity and a non-target entity aiming at chapter-level documents. The invention comprises the following steps: inputting a document to be processed, wherein the document is a chapter-level document; processing the document based on bidirectional attention constraints to obtain abstract semantic representations of entities and sentences, wherein the abstract semantic representations have global information and logical reasoning information; and judging the relation type of the target entity pair in the document based on the abstract semantic representation. The developer can efficiently and accurately extract the discourse-level relationship by using the method of the invention, and simultaneously solve two main problems of discourse-level relationship extraction, namely the problem of the calculation cost caused by traversing all entities to generate alternative samples and the problem of logical reasoning between the target entity and the non-target entity.

Description

Chapter-level relation extraction method

Technical Field

The invention relates to the technical field of natural language processing, in particular to a chapter-level relation extraction method in the technical field of bidirectional attention constraint and graph-convolution neural network intersection. The invention obtains abstract representation of an entity and a document through bidirectional attention mechanism constraint, and uses a chapter-level relation extraction technology for logical reasoning by using a graph convolution neural network.

Background

Relationship extraction is a judgment based on a given context and two target entitiesTwo target entitiesWhat is the relationship between them. Relational extraction is one of the very important techniques for constructing large-scale knowledge graphs, and can also be used to assist some downstream tasks, such as question-answering systems and the like. The research on relation extraction is mainly divided into two directions, wherein one direction is based on a traditional machine learning idea for constructing a large number of artificial feature projects, and the other direction is a deep learning method for building a neural network.

In traditional machine learning solutions, some researchers have constructed different features for the entire sentence and for a given target entity pair. The traditional machine learning solution method is very dependent on feature engineering and professional knowledge, so that a large amount of manpower is needed for constructing features on a model, meanwhile, the generalization capability of the model greatly fluctuates along with the change of a data set, and in recent years, a large number of researchers use a deep learning method to solve the problem of relationship extraction. In deep learning, researchers have focused their research on applying Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and attention mechanisms, as well as improvements to the above models, thus also yielding a number of relevant solutions based on deep learning methods.

The existing task definition and method for extracting the relation are based on the task of sentence level, namely, one sentence is taken as a context, and the relation of target entities is judged according to two target entities appearing in the current sentence. In practical application scenarios, the processed text is not just as simple as sentence-level text in most cases, and relatively speaking, a large number of entities appear in each chapter, and each target entity appears multiple times in a document, in which case some classical methods in relation extraction for sentence-level are not available, such as entity position vector, PCNN, etc. Meanwhile, the length of the text at the article level is often hundreds or even thousands, which is far greater than that of the text at the sentence level, the extraction of the whole document information is difficult due to the overlong text, the entity is difficult to be well represented and interacted due to the long-distance loss of the dependent information, and therefore the effect of the chapter-level relation extraction model is greatly reduced. Meanwhile, because a large number of entities exist, the combination between each entity should become a potential target entity pair, and through traversing all entity combinations in each article, a sentence and a target entity sample are constructed, and a sentence-level relation extraction model is applied, so that the consumption of calculation data is greatly increased. More importantly, due to the increase of the text length, a large number of entities exist in the input document, and the relationship between two target entities is obtained not only by direct interaction, but also by one or more times of reasoning between the entities possibly through logical reasoning between non-target entities, so that the accurate classification result cannot be obtained for the relationship needing reasoning only by interacting the target entities.

Disclosure of Invention

One of the purposes of the present invention is to provide a chapter-level relationship extraction method, which solves the technical problems of the prior art, such as the chapter-level problem, the calculation resource consumption problem and the logical reasoning between the target entity and the non-target entity. Advantageous effects can be achieved in preferred embodiments of the present invention, as described in detail below.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention relates to a chapter-level relation extraction method, which comprises the following steps:

inputting a document to be processed, wherein the document is a chapter-level document;

processing the document based on bidirectional attention constraints to obtain abstract semantic representations of entities and sentences, wherein the abstract semantic representations have global information and logical reasoning information;

and judging the relation type of the target entity pair in the document based on the abstract semantic representation.

Further, the processing the document based on the bidirectional attention constraint to obtain an abstract semantic representation of an entity and a sentence includes:

obtaining a representation of each entity and a representation of each sentence in the document;

obtaining a global entity interaction matrix based on the representation of each entity;

calculating the global entity interaction matrix and the representation of each sentence based on a first attention mechanism to obtain the representation of each sentence based on entity attention;

calculating the representation of each sentence based on the entity attention based on a graph convolution network to obtain the representation of each sentence with logical reasoning information;

and calculating the global entity interaction matrix and the representation of each sentence with the logical reasoning information based on a second attention mechanism to obtain a new global entity interaction matrix with the global information and the logical reasoning information, and taking the new global entity interaction matrix as the abstract semantic representation of the entity and the sentence.

Further, the obtaining a representation of each entity and a representation of each sentence in the document includes:

acquiring abstract semantic representation of the document;

a representation of each entity and a representation of each sentence in the document are obtained based on the abstract semantic representation of the document.

Further, the obtaining a representation of each sentence in the document based on the abstract semantic representation of the document includes:

and extracting the composition of each sentence from the abstract semantic representation of the document according to the initial position and the end position of each sentence, performing maximum pooling on the composition of each sentence, and taking a sentence vector obtained after the maximum pooling as the representation of each sentence.

Further, the obtaining a representation of each entity in the document based on the abstract semantic representation of the document includes:

determining words forming the entity according to the position of each entity, calculating an average word vector of the words, taking the average word vector as an entity vector, and taking the average vector of the entity vectors of the same entity as the representation of each entity.

Further, the obtaining the abstract semantic representation of the document includes:

performing word vector conversion on the document to obtain a word vector matrix of the document;

and carrying out bidirectional LSTM operation on the word vector matrix to obtain abstract semantic representation of the document.

Further, the obtaining a global entity interaction matrix based on the representation of each entity includes:

carrying out information interaction on every two entities, and converting two entity vectors into an entity interaction vector;

and forming a global entity interaction matrix by all entity interaction vectors.

Further, the matrix weights used by the first attention mechanism and the second attention mechanism are the same.

Further, the determining a relationship type of a target entity pair in the document based on the abstract semantic representation includes:

carrying out column-based weighting operation on the new global entity interaction matrix to obtain a global entity interaction vector;

and inputting the global entity interaction vector into a preset classification function for distinguishing entity pair relationship types to obtain a classification result, and taking the classification result as the relationship type of a target entity pair in the document.

Further, the classification function adopts a softmax function.

The chapter-level relation extraction method provided by the invention at least has the following beneficial technical effects:

the invention predicts the relationship result of all entity combinations in the document at the same time through the representation of each entity and the coding of the relative position between the entities, namely, no matter how many different entities are in the current document, the invention can efficiently and simultaneously obtain the entity representation, and further input the relationship result of all potential entity pairs in the document at one time. In addition, aiming at the problem of logical reasoning between non-target entities, the invention adopts a bidirectional attention mechanism, namely, firstly acquiring each sentence and the coded representation of each entity, and then, aiming at each entity, traversing to acquire whether the combination of the entity pairs has a relationship or not, and if so, which relationship in a given relationship set.

The developer can efficiently and accurately extract the discourse-level relationship by using the method of the invention, and simultaneously solve two main problems of discourse-level relationship extraction, namely the problem of the calculation cost caused by traversing all entities to generate alternative samples and the problem of logical reasoning between the target entity and the non-target entity.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a chapter-level relationship extraction method according to the present invention;

FIG. 2 is a schematic structural diagram of a chapter-level relationship extraction method according to the present invention;

FIG. 3 is a flow diagram illustrating the processing of the document based on a bi-directional attention constraint according to the present invention;

fig. 4 is a schematic diagram of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Referring to fig. 1 and 2, the present invention is a chapter-level relationship extraction method, which includes the following steps:

s1: inputting a document to be processed, wherein the document is a chapter-level document;

s2: processing the document based on bidirectional attention constraints to obtain abstract semantic representations of entities and sentences, wherein the abstract semantic representations have global information and logical reasoning information;

s3: and judging the relation type of the target entity pair in the document based on the abstract semantic representation.

It should be noted that the bidirectional attention mechanism is based on the following intuitive observation: if a given sentence contains a large amount of information that determines what the target entity pair is, then the target entity pair, in turn, may find it more closely associated with the given sentence when computing the association with each sentence in the document.

The invention mainly aims at documents at chapter level, namely, texts with the length often hundreds or even thousands which are far larger than that of sentences, and a large number of target entities exist in the documents at chapter level. By the method for extracting the relationship of the target entity in the discourse-level document based on the bidirectional attention constraint, the input discourse-level document is subjected to a series of transformations and information extraction, and whether the combination of the target entity pair contains the relationship or not is finally obtained, and if the relationship exists, which relationship is in a given relationship set. The developer can efficiently and accurately extract the discourse-level relationship and simultaneously solve two main problems of discourse-level relationship extraction, namely the problem of calculation cost caused by traversing all entities to generate alternative samples and the problem of logical reasoning between a target entity and a non-target entity.

Referring to fig. 3, the processing the document based on the bidirectional attention constraint to obtain an abstract semantic representation of an entity and a sentence includes:

s21: obtaining a representation of each entity and a representation of each sentence in the document;

s22: obtaining a global entity interaction matrix based on the representation of each entity;

s23: calculating the global entity interaction matrix and the representation of each sentence based on a first attention mechanism to obtain the representation of each sentence based on entity attention;

s24: calculating the representation of each sentence based on the entity attention based on a graph convolution network to obtain the representation of each sentence with logical reasoning information;

s25: and calculating the global entity interaction matrix and the representation of each sentence with the logical reasoning information based on a second attention mechanism to obtain a new global entity interaction matrix with the global information and the logical reasoning information, and taking the new global entity interaction matrix as the abstract semantic representation of the entity and the sentence.

In step S21, the obtaining of the representation of each entity and the representation of each sentence in the document includes:

acquiring abstract semantic representation of the document;

Wherein the obtaining of the abstract semantic representation of the document comprises:

It should be noted that, the pre-training word vector matrix is used to encode the input document to be processed, and convert it into word vectors, so as to obtain word vector representation of the document, and further obtain abstract semantic representation of the document through bidirectional LSTM.

Suppose that there are n words, m sentences, and k different entities in the document. Suppose that the ith word is denoted w_iThere is a pre-trained word vector matrix of size N x d, where N is the number of words in the word vector matrix and d is the dimension of each word in the word vector matrix. Converting each word into a word vector using a pre-trained pair of word vector matrices, i.e., pairs w_iAnd finding out the corresponding word vectors, and splicing the word vectors of all words in the document to obtain the word vector representation of the whole document, wherein the size of the word vector representation is n multiplied by d.

Assuming that the number of hidden units in the bidirectional LSTM is H, the abstract semantic meaning of the document after the bidirectional LSTM is represented as H, and the size of the abstract semantic meaning is n multiplied by 2H.

The obtaining a representation of each sentence in the document based on the abstract semantic representation of the document comprises:

It should be noted that, when the article to be processed sets the start position and the end position of each sentence in advance, it is easy to extract each sentence in the document and obtain the vector representation of each sentence by using the maximum pooling operation.

Assuming that the beginning and the end of a sentence in a document are the a-th word and the b-th word, respectively, the expression of the sentence is:

wherein the content of the first and second substances,

is 1 × 2 h.

The obtaining a representation of each entity in the document based on the abstract semantic representation of the document comprises:

It should be noted that, assuming that a given target entity appears 3 times, the entities are respectively denoted as e₁ ¹,e₂ ¹And e₃ ¹Then the representation of the current entity

Is 1 × 2 h.

In step S22, the obtaining a global entity interaction matrix based on the representation of each entity includes:

It should be noted that, traversing all entity pairs in the current document, considering the sequence of two entities (i.e. the direction of the entities), and then performing information interaction on every two entities, the interaction method is to transform two entity vectors into a high-level entity interaction vector v through a bilinear layer_ij：

Suppose that two entity vectors are

Operation of the bilinear layer is v_ij＝σ(e_iW_ve_j+b_v) Wherein W is_v,b_vAre trainable parameters.

All entity interaction vectors are constructed as a global entity interaction matrix. Assuming that there are k entities in the current document, there are k (k-1) interaction vectors in the constructed global entity interaction matrix.

Step S23: calculating the global entity interaction matrix and the representation of each sentence based on a first attention mechanism to obtain the representation of each sentence based on entity attention; the method comprises the following specific steps:

each sentence representation is subjected to attention mechanism operation with a global entity interaction matrix, so that the method obtains certain logic reasoning capability, and the representation of each sentence based on entity attention is obtained.

Specifically, it can be seen from the foregoing that a sentence is represented as

If the entity interaction vector is denoted by v, the attention weight of the current sentence and the entity interaction vector is calculated as

Calculating the current sentence and each entity interaction vector to obtain k (k-1) attention weights, then weighting all entity interaction vectors to obtain the representation of each sentence based on the attention of the entity interaction vectors

Namely, it is

All sentences are combined into a feature matrix X by column.

Step S24: calculating the representation of each sentence based on the entity attention based on a graph convolution network to obtain the representation of each sentence with logical reasoning information; the method comprises the following specific steps:

and (3) carrying out graph convolution operation on the feature matrix X of the sentence through the following transformation: l ═ ρ (AXW)₂) Where p is a non-linear activation function, e.g. sigmoid function, W₂Are trainable parameters. Thus obtainingL of (a) is the same size as X, where each column is a representation of a sentence, L.

It should be noted that, through a graph convolution neural network layer, the sequential contact information and the logical inference information between each sentence are obtained. After passing through the layer, each sentence representation with the sequential contact information and the logical reasoning information is obtained.

Step S25: and calculating the global entity interaction matrix and the representation of each sentence with the logical reasoning information based on a second attention mechanism to obtain a new global entity interaction matrix with the global information and the logical reasoning information, and taking the new global entity interaction matrix as the abstract semantic representation of the entity and the sentence. The method specifically comprises the following steps:

specifically, it can be seen from step S24 that a sentence is represented as

If the interaction vector of an entity is represented as v, the attention weight of the current interaction vector of the entity and the sentence is calculated as

Calculating the current entity interaction vector and each sentence to obtain m attention weights, then weighting all sentence vectors to obtain the representation of each entity interaction vector based on the sentence attention

Namely, it is

It should be noted that the matrix weights used in the first attention mechanism and the second attention mechanism are the same. This is because for a given two target entities and context, the magnitude of the degree of association between the target entity and the context should be consistent with the magnitude of the degree of association between the context and the target entity, so a constraint is placed on the matrices resulting from the two attention mechanisms to be as close as possible. And through bidirectional attention constraint, the representation of each entity for each sentence is obtained, then a new global entity interaction matrix with global information and logical reasoning information is obtained, and the matrix is used for carrying out classification of relation extraction.

In step S3, determining the relationship type of the target entity pair in the document based on the abstract semantic representation includes:

Preferably, the classification function employs a softmax function.

For each entity interaction vector generated from step S25

Judging which relationship it belongs to, and using a softmax function:

taking the relation type corresponding to the y with the maximum probability value as an entity interaction vector

The relationship (2) of (c).

It should be noted that, assuming that there are 3 predefined relationships, a global entity interaction vector will obtain 3 probability values after passing softmax, and which probability is the largest is predicted as which relationship corresponding to the probability values of the 3 relationships. For example, the target entities of a document are: the method comprises the steps of obtaining a relation of belonging, capital and road after a classification function operation is carried out on a Beijing capital airport, a Beijing capital airport and a Beijing subway airport field line by a target entity pair, wherein the corresponding probabilities of the relation are 0.7, 0.2 and 0.1 respectively, and the relation of the target entity pair belongs to the relation.

The implementation is illustrated with respect to fig. 4 as follows:

1. firstly, acquiring representations of all entities (such as Beijing capital airport, Beijing subway airport lines and the like) and sentences in an input document;

2. obtaining an attention weight matrix based on a given target entity and sentence, and a representation of a new entity and sentence;

3. after the new sentence representation passes through the graph convolution neural network, obtaining the sentence representation with the logic reasoning information, and further performing the second attention mechanism calculation with the entity representation;

4. constraining the obtained calculation results of the two attention mechanisms, and calculating the interaction information of each entity pair;

5. and judging whether a given target entity pair has a relationship or not by using the interaction information of each entity pair, and if so, determining which relationship is.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A chapter-level relation extraction method is characterized by comprising the following steps:

2. The extraction method according to claim 1, wherein the processing the document based on the bidirectional attention constraint to obtain an abstract semantic representation of an entity and a sentence comprises:

3. The extraction method according to claim 2, wherein the obtaining of the representation of each entity and the representation of each sentence in the document comprises:

acquiring abstract semantic representation of the document;

4. The extraction method according to claim 3, wherein the obtaining a representation of each sentence in the document based on the abstract semantic representation of the document comprises:

5. The extraction method according to claim 3, wherein the obtaining a representation of each entity in the document based on the abstract semantic representation of the document comprises:

6. The extraction method according to claim 3, wherein the obtaining of the abstract semantic representation of the document comprises:

7. The extraction method according to claim 5, wherein the obtaining a global entity interaction matrix based on the representation of each entity comprises:

8. The extraction method according to claim 2, wherein the first attention mechanism and the second attention mechanism use the same matrix weight.

9. The extraction method according to claim 2, wherein the determining the relationship type of the target entity pair in the document based on the abstract semantic representation comprises:

10. The extraction method according to claim 9, characterized in that the classification function employs a softmax function.