CN113901758A

CN113901758A - Relation extraction method for knowledge graph automatic construction system

Info

Publication number: CN113901758A
Application number: CN202111133794.5A
Authority: CN
Inventors: 徐小龙; 董益豪; 朱曼; 吴晓诗; 胡惠娟
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2022-01-07

Abstract

A relation extraction method for an automatic knowledge graph construction system comprises the steps of firstly, coding a text, converting the text into word vectors, and preliminarily extracting text features; generating a syntax dependency tree by using a syntax dependency structure of the text, generating a weighted dependency adjacent matrix by weighting each relation type, and extracting syntax dependency information in the text by using a graph convolution neural network; synchronously, a multi-head attention mechanism is directly applied to the coded text to generate an attention matrix, and the graph convolution neural network with the same structure is used for extracting information except the syntactic dependency information of the text; and finally, obtaining the characteristic expressions of two entities and the sentence, scoring all possible relation categories by using a feedforward neural network and a normalized exponential function, and selecting the relation with the highest score as a relation classification result. The method can fully acquire the information of different dimensions of the text, and obtains excellent effect on the public data set extracted by the relation.

Description

Relation extraction method for knowledge graph automatic construction system

Technical Field

The invention belongs to the technical field of natural language processing and artificial intelligence, and particularly relates to a relation extraction method for an automatic knowledge graph construction system.

Background

The relation extraction is a key subtask in the field of natural language processing, and is an important component of an information extraction task. The relation extraction aims to extract relation information among entities from unstructured texts, and through combination with a named entity recognition task, triples in the form of < subject, predicate (relation), object > required for building a knowledge graph system can be generated.

The traditional relation extraction method mainly analyzes texts by applying linguistic knowledge, and performs text matching and relation extraction by manually designing extraction rules or kernel functions by using a method based on statistics and rules. However, due to the complexity of natural language, the relation extraction model based on artificial rules cannot meet the performance requirements of people, artificial noise is often introduced into the model, the performance is very limited, and meanwhile the problem of weak generalization exists.

With the rapid development of neural networks and deep learning, researchers have begun introducing neural networks into the task of relationship extraction. The neural network and the deep learning method can effectively fit and extract text features by simulating the working principle of cerebral neurons, and break the limitation of artificial design rules. Existing neural network-based relational extraction models are mainly classified into sequence-based models and dependency-based models.

The model based on the sequence encodes the word sequence in the sentence, the distance position characteristics of the words in the sequence relative to the entity are extracted by utilizing the convolutional neural network, the cyclic neural network is more sensitive to the relation between the remote entity pairs as a time sequence model, and the problem that the relation information between the remote words in the text is difficult to obtain can be effectively relieved by combining the cyclic neural network with the convolutional neural network. However, the sequence-based model looks at word sequences and ignores the overall syntactic structure information of the sentence.

In contrast to sequence-based models, dependency-based models can efficiently exploit the syntactic structural features of sentences and capture other implicit long-distance syntactic relationships. The dependency-based model generally converts sentences into dependency trees according to the dependency relationship among the words, and further converts the dependency trees into corresponding dependency adjacency matrixes to participate in the training of the neural network, and the implicit long-distance syntactic relationship and the multi-hop relationship are captured through the dependency relationship among each word. And since the dependency structure is usually a graph structure, the graph convolution neural network is also introduced into a dependency-based relationship extraction model. The current major work focuses on how to more effectively prune the dependency tree and prune information irrelevant to relationship extraction to improve the model performance, however, the rule-based pruning also has artificial noise, and the attention-based soft pruning destroys the original dependency structure and cannot fully utilize rich information contained in the dependency matrix.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a relation extraction method for an automatic knowledge graph construction system, which utilizes a multi-head attention mechanism and a weighted dependency matrix to acquire key information of different dimensions of a text in parallel and achieves an excellent effect on a public data set of relation extraction.

The invention provides a relation extraction method for an automatic knowledge graph construction system, which comprises the following steps,

step s1, embedding words in each word in the text by using a pre-trained word vector dictionary, converting part-of-speech tagging information and named entity identification information of each word into vector representation and splicing the vector representation of the word with the vector representation of the word, and acquiring a vector x_i；

Step S2, vector x_iPerforming bidirectional long-short term memory network operation, and splicing the forward operation result and the backward operation result to obtain a vector h'_t；

S3, constructing a syntax dependency tree A through a syntax dependency structure of the text, setting a learnable weight variable D, constructing a dependency adjacent matrix by using the A, carrying out independent heating on matrix values, and carrying out bitwise multiplication on the matrix values and the weight variable D to obtain a weighted dependency matrix A';

step S4, pass vector h'_tAcquiring feature expression matrixes Q and K of the text, and acquiring K attention moment matrixes of the text by using a multi-head attention mechanism

Obtaining a matrix A' after linear dimensionality reduction;

step S5, taking the matrix A 'and the matrix A' as the input of the graph convolution module with different graph convolution network layer numbers to carry out graph convolution operation, and respectively obtaining the matrix

Sum matrix

Obtaining matrix H after linear dimensionality reduction_output；

Step s6, slave matrix H_outputTo obtain the feature expression matrix h of the sentence_sentAnd a feature representation matrix of two entities

And

obtaining a relational feature representation matrix h using a feedforward neural network_relationAnd finally, carrying out relation prediction through a normalized exponential function to obtain a final classification result.

As a further technical solution of the present invention, in step S1, a vector is calculated

Wherein the vector w_iWord vectors, being word vectors of the words themselves

Sum vector

And respectively carrying out splicing operation on the word vectors of the part of speech tagging information and the named entity identification information of the word.

Further, in step S2, the vector x_iHidden state vector h in one direction at time t_tThe calculation is carried out, and the formula is as follows,

I_t＝σ(x_tW_xi+h_t-1W_hi+b_i)

F_t＝σ(x_tW_xf+h_t-1W_hf+b_f)

o_t＝σ(x_tW_xo+h_t-1W_ho+b_o)

h_t＝o_t⊙tanh(C_t)

wherein x is_tFor the input at time t, σ is sigmoid activation function, tanh is hyperbolic tangent activation function, W_xi、W_xf、W_xoAnd W_xcAre respectively x_tWeight parameter matrix at input gate, forget gate, output gate and memory cell, W_hi、W_hf、W_hoAnd W_hcAre respectively h_tIn transitWeight parameter matrix of entry, forgetting, output gates and memory cells, b_i、b_f、b_oAnd b_cBias parameters for input gate, forgetting gate, output gate and memory cell, I_t、F_t、O_t、

And C_tOutputs of the input gate, the forgetting gate, the output gate, the candidate memory cell and the memory cell at time t, respectively, which is the matrix multiplication by elements;

will output in the forward direction

And backward output

Spliced to obtain an output h'_tIs composed of

Furthermore, the weighted dependency matrix A' in step 3 is calculated as,

A′＝φ(onehot(A)·D)

φ(x)＝max(x，0)；

wherein, A is an original dependency tree, onehot is an independent heating operation, phi is a ReLU activation function, and max is a maximum value.

Further, in step s4, the attention force matrix

Is of the formula

Where K is the number of multi-head attention heads, Q and K are feature representations of the text obtained through steps S1 and S2, and W is_i ^QAnd W_i ^KIs a weight parameter matrix, d is the input dimension, softmax isNormalizing the exponential function to obtain k attention moment arrays

After splicing, obtaining A' through linear layer dimensionality reduction, the formula is,

wherein, W^AAnd b^AThe weight parameter matrix and the bias parameter of the linear transformation layer.

Further, the result of each graph convolution module is calculated in step S5 as,

output^o＝W_o[input⁰；GCN(input⁰)；..；GCN(output^i-1)]

input^c＝W_c[input^i-1；output⁰；...；output^N]

wherein, in the graph convolution network of the L layer, the characteristic expression set of the initial input is

Level I node I accept

As input, and output

W^(l)Convolution of the network weight parameter matrix for the graph, b^(l)Is the bias parameter of the graph convolution network, N is the number of graph convolution layers of the previous sub-module, M is the number of sub-modules，W_o、W_c、W_fAll are linear transform layer weight parameter matrices.

Further, the two graph volume modules obtained in step S5 are generated separately

And

h is obtained after splicing and linear dimensionality reduction_output。

Further, in step S6, the final relational feature is calculated by the formula,

where FFNN represents a feed forward neural network computation.

The advantage of the present invention is that,

1. the invention uses word vectors pre-trained by a large-scale dictionary and a bidirectional long-short term memory network to encode texts, and obtains the initial vector representation of the texts. A part of text characteristic information is already contained in the initial vector representation, and the vector is used as the input of a subsequent neural network model.

2. The invention utilizes a multi-head attention mechanism to obtain a plurality of attention matrixes of a text, wherein each attention matrix is obtained from different important parts of the text. The multi-head attention mechanism can acquire important information except text syntax dependency information and effectively extract features through a graph convolution network module.

3. The method utilizes the weighted dependency matrix to obtain the syntactic dependency structure information of the text, assigns learnable weight to each relationship type, changes the dependency matrix from a 0-1 matrix into the weighted matrix through the iterative update of the neural network, enables the matrix to express more accurate syntactic structure information, and performs the feature extraction through the graph convolution network module.

4. The multi-head attention mechanism and the weighted dependency matrix in the method respectively extract the key information of different dimensions in the text, and meanwhile, parallel calculation can be carried out, so that the time cost is reduced while the performance is improved.

Drawings

FIG. 1 is a schematic diagram of a relationship extraction model of the present invention,

FIG. 2 is a schematic diagram of a process for constructing a weighted dependency matrix according to the present invention.

Detailed Description

Referring to fig. 1, the embodiment provides a relationship extraction method for an automatic knowledge graph construction system, which utilizes a multi-head attention mechanism and a weighted dependency matrix to obtain key information of different dimensions of a text in parallel, and includes the following specific steps:

step 1: embedding initial words in a word vector dictionary which is trained in advance for each word in an original text to obtain vector representation w of each word_iWhere i is the ith word in the text. Additionally, the part-of-speech tagging information and the named entity identification information of each word are converted into vector representation to obtain a vector

And

the vector representation of the word is spliced to finally obtain

As the final word embedding vector representation for each word.

Step 2: vector representation x for each word obtained in step 1_iCarrying out bidirectional long-short term memory network operation, coding the forward sequence and the backward sequence of the sentence to obtain a hidden state vector H ═ H of the sentence passing through the bidirectional long-short term memory network₁，h₂，...，h_n]Where n is the number of words in the sentence, the hidden state vector h in a certain direction at time t_tThe calculation formula of (a) is as follows:

I_t＝σ(xtW_xi+h_t-1W_hi+b_i) (1)

F_t＝σ(x_tW_xf+h_t-1W_hf+b_f) (2)

O_t＝σ(x_tW_xo+h_t-1W_ho+b_o) (3)

h_t＝O_t⊙tanh(C_t) (8)

wherein x_tRepresenting the input at time t, σ represents the sigmoid activation function, tanh represents the hyperbolic tangent activation function, W_xi、W_xf、W_xoAnd W_xcRespectively represent x_tWeight parameter matrix at input gate, forget gate, output gate and memory cell, W_hi、W_hf、W_hoAnd W_hcRespectively represent h_tIn the weight parameter matrix of the input gate, the forgetting gate, the output gate and the memory cell, b_i、b_f、b_oAnd b_cRespectively representing the bias parameters of the input gate, the forgetting gate, the output gate and the memory cell, I_t、F_t、O_t、

And C_tRespectively representing the outputs of the input gate, the forgetting gate, the output gate, the candidate memory cell and the memory cell at the time tAnd, represents a matrix multiplication by elements. At time t, the final output h'_tIs output from the forward direction

And backward output

And splicing to obtain the following calculation formula:

and step 3: and constructing a syntactic dependency tree according to the syntactic dependency structure of the text, wherein the sentence contains n words, and the dependency tree has n nodes and can be converted into an n multiplied by n dependency adjacency matrix A. If there is a dependency between word a and word b, then A_ab1, otherwise A_ab0. Setting a learnable weight variable D ═ D₁，d₂，...，d_Q]Where Q is the number of relationship categories contained in the data set, d_qThe weight of the relationship class with index q is 1 by default. First, we replace the values outside the main diagonal of the dependency matrix with the index of the correspondence class in the weight variable D. For index g, construct a one-hot vector r of length N_q＝[0，...，0，1，0，...0]Wherein r is_q[q]The rest value is 0, so that the weight variable D can participate in the calculation of the neural network through matrix bitwise multiplication to realize parameter updating, and a weight scalar obtained through matrix summation keeps the shape of the dependency matrix constant. For the original dependency tree A, the formula for constructing the adjacency matrix A' is as follows:

A′＝φ(onehot(A)·D) (10)

φ(x)＝max(x，0) (11)

where onehot represents the one-hot operation, #representsthe ReLU activation function, and max represents taking the maximum value. As shown in particular in fig. 2.

And 4, step 4: the multi-head attention mechanism is directly acted on the text to obtain k attention moment arrays (k is the number of the multi-head attention heads) of the text

Is the same as the dependency matrix a, the calculation formula is as follows:

where Q and K are the feature representations of the text obtained in steps 1 and 2, W_i ^QAnd W_i ^KFor the weight parameter matrix, d represents the input dimension and softmax represents the normalized exponential function. Obtaining k matrices

Then, after the two are spliced, the dimension is reduced through a linear layer to obtain A' which is used as the input of the graph convolution module, and the calculation formula is as follows:

wherein W^AAnd W^AThe weight parameter matrix and the bias parameter of the linear transformation layer.

And 5: taking A 'and A' obtained in the steps 3 and 4 as the input of the graph convolution module, using graph convolution networks with different depths as sub-modules for each graph convolution module, and using dense connection in the sub-modules to obtain the output of each graph convolution layer^(l)Splicing the sub-modules to be used as the input of the next sub-module, and finally performing dense connection on the outputs of all the sub-modules to obtain the final product

In a graph convolution network of L layer, the characteristic expression set of initial input is

Level 1 node i accept

As input, and output

The calculation formula is as follows:

output^o＝W_o[input⁰；GCN(input⁰)；..；GCN(output^i-1)] (15)

input^c＝W_c[input^i-1；output⁰；...；output^N] (16

wherein W^(l)Representing a matrix of graph convolution network weight parameters, b^(l)Representing the bias parameter of the graph convolution network, N is the number of graph convolution layers of the previous sub-module, M is the number of sub-modules, W_o、W_c、W_fAll are linear transform layer weight parameter matrices. In the formula (17), for each input obtained by calculationⁱ(i is more than or equal to 1), discarding operation is carried out, and neurons are discarded randomly. Separate generation of the two graph convolution modules of FIG. 1

And

h is obtained after splicing and linear dimensionality reduction_outputAs input to the relational classification layer.

Step 6: h from step 5_outputRespectively obtain the feature representation h of the sentence_sentCharacterization of two entities

And

deriving a final relational feature representation h using a feed-forward neural network_relationAnd finally, carrying out relation prediction by a normalized exponential function, wherein the calculation formula is as follows:

where FFNN represents a feed forward neural network computation.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are intended to further illustrate the principles of the invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention, which is intended to be protected by the appended claims. The scope of the invention is defined by the claims and their equivalents.

Claims

1. A relation extraction method for an automatic knowledge graph construction system is characterized by comprising the following steps,

step S1, embedding each word in the text by using a pre-trained word vector dictionary, converting part-of-speech tagging information and named entity identification information of each word into vector representation and splicing the vector representation of the word with the vector representation of the word, and acquiring a vector x_i；

step S4, pass vector h'_tObtaining features of textSymbolizing matrixes Q and K, and acquiring K attention moment matrixes of the text by using a multi-head attention mechanism

Obtaining a matrix A' after linear dimensionality reduction;

Sum matrix

Obtaining matrix H after linear dimensionality reduction_output：

And

2. The relation extraction method for the knowledge-graph-oriented automatic construction system according to claim 1, wherein in the step S1, vectors are used

Wherein the vector w_iWord vectors, being word vectors of the words themselves

Sum vector

3. The relation extraction method for the knowledge-graph-oriented automatic construction system according to claim 1, wherein in the step S2, the vector x is_iHidden state vector h in one direction at time t_tThe calculation is carried out, and the formula is as follows,

I_t＝σ(x_tW_xi+h_t-1W_hi+b_i)

F_t＝σ(x_tW_xf+h_t-1W_hf+b_f)

O_t＝σ(x_tW_xo+h_t-1W_ho+b_o)

h_t＝O_t⊙tanh(C_t)

wherein x is_tFor the input at time t, σ is sigmoid activation function, tanh is hyperbolic tangent activation function, W_xi、W_xf、W_xoAnd W_xcAre respectively x_tAt the input door, forget the doorWeight parameter matrix, W, of output gates and memory cells_hi、W_hf、W_hoAnd W_hcAre respectively h_tIn the weight parameter matrix of the input gate, the forgetting gate, the output gate and the memory cell, b_i、b_f、b_oAnd b_cBias parameters for input gate, forgetting gate, output gate and memory cell, I_t、F_t、O_t、

will output in the forward direction

And backward output

Spliced to obtain an output h'_tIs composed of

4. The relation extraction method for the knowledge-graph-oriented automatic construction system according to claim 1, wherein the weighted dependency matrix A' in step 3 is calculated by the following formula,

A′＝φ(onehot(A)·D)

φ(x)＝max(x，0)；

5. The relation extraction method for the knowledge-graph-oriented automatic construction system according to claim 1, wherein the attention force matrix in the step S4

Is of the formula

Where K is the number of multi-head attention heads, Q and K are feature representations of the text obtained through steps S1 and S2,

and

is a weight parameter matrix, d is an input dimension, softmax is a normalized exponential function, and k attention moment matrices are combined

6. The relation extraction method for the automatic knowledge-graph construction system according to claim 1, wherein the formula for calculating the result of each graph convolution module in the step S5 is,

output^o＝W_o[input⁰；GCN(input⁰)；..；GCN(output^i-1)]

input^c＝W_c[input^i-1；output⁰；...；output^N]

Level 1 node i accept

As input, and output

W^(l)Convolution of the network weight parameter matrix for the graph, b^(l)Is a graph convolution network offset parameter, N is the number of graph convolution layers of the last sub-module, M is the number of sub-modules, W_o、W_c、W_fAll are linear transform layer weight parameter matrices.

7. The relation extraction method for the knowledge-graph-oriented automatic construction system according to claim 1, wherein the two graph volume modules obtained in the step S5 are generated respectively

And

h is obtained after splicing and linear dimensionality reduction_output。

8. The relation extraction method for the knowledge-graph automatic construction system according to claim 1, wherein in step S6, the final relation feature is calculated by the formula,

where FFNN represents a feed forward neural network computation.