CN116306779A

CN116306779A - Knowledge reasoning method based on structure distinguishable representation graph neural network

Info

Publication number: CN116306779A
Application number: CN202310089311.9A
Authority: CN
Inventors: 周正斌; 王震; 惠孛; 孙明; 康昭; 王勇
Original assignee: Creative Information Technology Co ltd
Current assignee: Creative Information Technology Co ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-06-23

Abstract

The invention discloses a knowledge reasoning method based on a structure distinguishable representation graph neural network, which belongs to the technical field of knowledge graph reasoning, and comprises the steps of preprocessing training data; then establishing a domain aggregation mechanism; constructing a graph attention network model through a domain aggregation mechanism, and learning an embedded representation of the preprocessed training data by using the graph attention network model; setting a loss function, performing end-to-end training on the graph annotation network model, and updating network parameters of the model by using the loss function; and inputting the preprocessed training data into the updated graph annotation network model for training, and complementing the knowledge graph by using the trained model. The invention generally unifies the representation of training data. And then designing a neighborhood aggregation mechanism, enhancing the distinguishing property of the network model, constructing a complete graph meaning network model, designing the structure of an encoder-decoder according to the neighborhood aggregation mechanism, embedding the training data of graph meaning network learning into the representation, and improving the distinguishing capability of the model.

Description

Knowledge reasoning method based on structure distinguishable representation graph neural network

Technical Field

The invention relates to the technical field of knowledge graph processing, in particular to a knowledge reasoning method based on a structure distinguishable representation graph neural network.

Background

In the process of constructing the knowledge graph, a large amount of knowledge information is derived from the document and the webpage information, and deviations often exist in the process of extracting knowledge from the document, and the deviations are from two aspects: (1) There is much noisy information in the document, i.e. no useful information, which may originate from the knowledge extraction algorithm itself or may be related to the validity of the language text itself; (2) The limited amount of information in documents does not cover all knowledge, especially much of the common sense knowledge. All the above results in incomplete knowledge patterns, so knowledge pattern complementation is increasingly important in constructing knowledge patterns.

Knowledge graph completion, also known as link prediction, focuses on predicting the head entity (body) or tail entity (object) that test triples lack. The purpose of the link prediction method is to define a scoring function to assign a value to each triplet, the true triplet should score higher than the false triplet. The most advanced knowledge graph completion method at present can be divided into a translation model, a tensor factor decomposition model, a model based on a convolutional neural network and a model based on a graph neural network.

The translation model represents entities and relationships as vectors in a low-dimensional vector space to aid KGC. The transition model represents the relationship between the head and tail entities as a dual objective function, representing only the composite and inverse relationships. The subgraph of the knowledge graph contains actual relationships between entities and missing inferred relationships. However, it cannot represent an antisymmetric relationship and an inverse relationship. ComplEx solves the DistMult problem, using ComplEx-valued embedding to represent symmetric, anti-symmetric and inverse relationships, but cannot derive ComplEx relationships. The translational model performs well in simple figures using simple operations and limited parameters.

The tensor factorization model represents the knowledge graph as a third-order binary tensor, where each element corresponds to a triplet representing the true or false of a fact. The core idea of the RESCAL model is to encode the whole knowledge graph into a three-dimensional tensor, decompose a core tensor and a factor matrix from the three-dimensional tensor, and take the result recovered by the core tensor and the factor matrix as the probability that the triplet is true. The TuckeR model decomposes the three-dimensional tensor representing KG into a core tensor and three matrices. It is a fully expressed model, ensuring that the lower bound of the parameters required for full expression is several orders of magnitude smaller than other models. These methods describe the knowledge base completion task as a three-dimensional (3D) binary tensor completion problem.

Recently, some Convolutional Neural Network (CNN) based models have achieved good performance, which are respectively ConvE, convKB and capsule models. The ConvE performs global two-dimensional convolution operation on the reconstructed entity and relation embedding, has high parameter efficiency, and achieves better results than a plurality of methods. Whereas in ConvKB each triplet (head entity, relationship, tail entity) is represented as a three-column matrix, where each column vector represents a triplet element, the model can capture global relationships and transitional features between entities and relationships in the knowledge base. The capsule uses a capsule network to model the relationship triplets, the model is based on the previous ConvKB, and all feature maps are encapsulated into one capsule after passing through the convolution layer. The method uses the local characteristics of the entity to complete the knowledge graph, but ignores the global structure information, and cannot better distinguish the representation of the graph neural network.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a knowledge reasoning method based on a structure distinguishable representation graph neural network, and is helpful to solve the problem that the global structure information is ignored in the process of knowledge graph completion by the current knowledge reasoning method, so that the representation graph neural network cannot be well distinguished.

The aim of the invention is realized by the following technical scheme:

the invention provides a knowledge reasoning method based on a structure distinguishable representation graph neural network, which comprises the following steps:

s1: preprocessing training data;

s2: establishing a domain aggregation mechanism;

s3: constructing a graph attention network model through a domain aggregation mechanism, and learning an embedded representation of the preprocessed training data by using the graph attention network model;

s4: setting a loss function, performing end-to-end training on the graph annotation network model, and updating network parameters of the model by using the loss function;

s5: and inputting the preprocessed training data into the updated graph annotation network model for training, and complementing the knowledge graph by using the trained model.

Further, the step S1 specifically includes: the WN18RR and FB15K-237 datasets are employed as training data and are represented as knowledge triples in the form of subject-relationship-objects.

Further, the step S2 specifically includes the following substeps:

s201: improving the GAT encoder, using a multi-layer perceptron to perform GAT modeling and learn a single shot function;

s202: and acquiring related field information of the target entity in the knowledge triples on different subspaces through multiple K-hop iterative computations by adopting a multi-head attention mechanism.

Further, the step S3 specifically includes the following substeps:

s301: setting an encoder, calculating attention values of neighbors around target entities in all knowledge triples through a design attention head module, and obtaining embedded representations of all target entities by using a neighborhood aggregation mechanism;

s302: and setting a decoder, and learning the embedded representation of the target entity in all knowledge triples by adopting a 3*3 convolution filter to obtain the relative attention value in all knowledge triples and obtain the graph annotation network model.

Further, the step S4 specifically includes the following substeps:

s401: dividing the preprocessed training data into a training set, a testing set and a verification set, wherein the proportion of the training set, the testing set and the verification set is about 28:1:1, and setting a control group;

s402: and performing end-to-end training on the graph meaning network model according to the training set, performing optimization training on a decoder of the graph meaning network model by using soft interval loss, and updating network parameters of the graph meaning network model.

The invention has the beneficial effects that: the invention provides a knowledge reasoning method based on a structure distinguishable representation graph neural network, which comprises the steps of preprocessing training data; then establishing a domain aggregation mechanism; constructing a graph attention network model through a domain aggregation mechanism, and learning an embedded representation of the preprocessed training data by using the graph attention network model; setting a loss function, performing end-to-end training on the graph annotation network model, and updating network parameters of the model by using the loss function; and inputting the preprocessed training data into the updated graph annotation network model for training, and complementing the knowledge graph by using the trained model. The invention unifies the data representation mode by preprocessing the training data. A neighborhood aggregation mechanism is then designed to enhance the distinguishability of the graph-annotation network model. And constructing a complete graph meaning network model, designing the structure of an encoder-decoder according to a neighborhood aggregation mechanism, embedding the training data learned by the graph meaning network into a representation, and improving the distinguishing capability of the graph meaning network.

Drawings

FIG. 1 is a flow chart of method steps of the present invention;

FIG. 2 is an overall block diagram of a schematic network model;

FIG. 3 is a schematic diagram of the polymerization process of the attention layer of the present invention.

Description of the embodiments

For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 shows a flow chart of steps of a knowledge reasoning method based on a structure-distinguishable representation graph neural network, which specifically includes the following steps:

s1: preprocessing training data;

s2: establishing a domain aggregation mechanism;

Further, in one embodiment, the step S1 specifically includes: the WN18RR and FB15K-237 data sets are used as training data, the training data are expressed as knowledge triples in the form of subjects-relations-objects, data filtering is achieved, and after irrelevant data are filtered, the WN18RR data sets comprise 40493 entities and 11 relations; the FB15K-237 dataset has 14541 entities, 237 relationships.

Further, in one embodiment, the step S2 specifically includes the following substeps:

s201: the GAT encoder is improved, a multi-layer perceptron (MLPs) is used for modeling and learning the single shot function, and the principle is a general approximation theorem. The hidden representation h (k-1) s is measured using a parameter phi that can be learned, and then the schematic neural network updates the node representation to the following formula, where phi ^(k) Is a learnable parameter at the kth iteration, h (K-1) s is a hidden representation of an entity s in the kth-1 round of iterations, a (K) s represents neighborhood information of the entity s collected in K-hop, K is a current round number, and s is a corresponding entity:

；

the translation of K-hop Chinese is K-hop and may also be interpreted as K-neighbor. In the above figure, the K-neighbor operation refers to finding the set of all vertices (i.e., neighborhood information of the entity s) whose shortest path is K hops (or K steps) from a certain vertex (entity s). K is a positive integer > =1.

S202: in order to stabilize the learning process, more neighborhood information is packaged, a multi-head attention mechanism is adopted, related domain information on different subspaces is obtained through multiple times of calculation, wherein a (K) s represents the neighborhood information collected in a K-hop, M is the number of heads, MLP is a multi-layer perceptron and phi ^(k) Is a learnable parameter at the kth iteration, corresponding to the GAT layer shown in fig. 2, h (k) s represents the final entity representation learned from our structurally distinguishable GAT model. The formula is as follows:

；

the multi-head attention mechanism is an improved mechanism of the single-head attention mechanism, and each attention operation is grouped (head) and feature information can be extracted from multiple dimensions.

Further, in one embodiment, the step S3 specifically includes the following substeps:

s301: setting up an encoder, calculating the attention value of the neighbor around the target entity by designing an attention head module, and obtaining the embedded representation of the target entity by using a neighborhood aggregation mechanism, wherein the aggregation process by using the neighborhood aggregation mechanism is shown in fig. 3. To get a hidden embedding of the target entity s, it is necessary to learn an embedded representation of each knowledge triplet related to s. For simplicity, assume that the neighborhood of s is N(s) = { (r) _i ，o _i )|(s，r _i ，o _i ) E T, neighborhood is determined by applying a method to r _i And o _i Is learned by linear transformation in series, as shown in the following formula, wherein W ₁ Is a projection matrix, vector r _i And o _i An embedded representation representing the ith relationship and entity, c _i Is the neighbor information learned to the ith:

；

the encoder then learns the embedded representation of the entire knowledge triplet to obtain the ith knowledgeAbsolute attention value d of identification triplet _i As shown in the following formula, wherein s _i Refer to the embedded representation of the ith entity s, vector r _i And o _i Embedded representation representing the ith relationship and entity:

；

the relative attention value can then be deduced. The following formula gives the relative attention value p of a single knowledge triplet _i Wherein d _i ，d _j Is the absolute attention value of the ith, jth knowledge triplet, N(s) represents the neighborhood of entity s, pi is the attention value of the ith knowledge triplet:

。

s302: a decoder is provided to learn the output of the knowledge triples using a 3*3 convolution filter in order to obtain an embedded representation of each knowledge triplet. As mentioned in ConvKB, its purpose is to obtain an embedded representation of each knowledge triplet and preserve its translation characteristics. The scoring function of the decoder of the invention is defined as follows, wherein g is an activation function ReLU; omega represents a convolution filter sharing parameters, and is irrelevant to s, r and o; * Representing a convolution operator; concat represents a connection operator; w represents a linear transformation matrix, s _i Refer to the embedded representation of the ith entity s, vector r _i And o _i Embedded representation representing the ith relationship and entity:

。

further, the step S4 specifically includes the following substeps:

s401: the preprocessed training data is divided into a training set, a testing set and a verification set, wherein the ratio of the training set, the testing set and the verification set is about 28:1:1, and a control group is set. The training set is used for carrying out end-to-end training on the model and updating parameters; the verification set is used for carrying out preliminary evaluation on the capacity of the model and modifying the super parameters according to the evaluation; the test set is used for evaluating the generalization capability of the final model of the model and calculating the performance index of the model;

s402: a loss function is set. We used Adam optimizers to optimally train the decoder, minimizing the loss function by L2 regularization of the weight matrix W of the model. We train the decoder using soft spacing loss, as shown in the following equation, where W represents a linear transformation matrix, s _i Refer to the embedded representation of the ith entity s, vector r _i And o _i An embedded representation representing the ith relationship and entity, T' is a set of false knowledge triples, generated by destruction of the true knowledge triples in T, phi(s) _i ，r _i， o _i ) Representing the scoring of the decoder, L _decoder Representing the final loss function:

；

。

for different data sets, we do the following setup: for the FB15K-237 data set, dropout is set to be 0.3, the learning rate is set to be 0.001, the embedding size is set to be 100, and the MLP layer number is set to be 3; for the WN18RR data set, dropout is set to 0.0, learning rate is set to 0.0005, embedding size is set to 50, and MLP layer number is set to 3.

Specifically, for step S5, training data unified as a knowledge triplet is input into the network, and a completed knowledge graph is output. In training and testing data, the performance of the model is influenced by the inverse relation, and experiments show that the model has better effect on the complementation of various large-scale general knowledge maps containing the inverse relation by utilizing a neighborhood aggregation mechanism with distinguishable structures.

Further, by performing experiments on the WN18RR data set and the FB15K-237 data set, the experimental result of the knowledge reasoning method based on the structure distinguishable representation graph neural network provided by the invention is compared with the experimental result of a classical model ConvE, and the experimental result of the classical model ConvE is used for referencing paper [1] detectors T, minervini P, stenetorp P, et al, convolitional 2D Knowledge Graph Embeddings[J ]. 2017. The evaluation index is the proportion of the correct entity ranked in the first N bits (Hits@1, hits@3, hits@10). SD-GAT obtained better scores than all baseline tests, showing the expressive power of our model. More specifically, on FB15K-237, SD-GAT achieves an improvement of 0.17 on Hits@1, 0.13 on Hits@3, and 0.06 on Hits@10. On WN18RR, SD-GAT achieved a 0.02 improvement on the Hits@3.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A method for knowledge reasoning based on a neural network of a structurally distinguishable representation graph, comprising:

s1: preprocessing training data;

s2: establishing a domain aggregation mechanism;

2. The knowledge reasoning method based on the structure-distinguishable representation graph neural network according to claim 1, wherein the S1 specifically comprises: the WN18RR and FB15K-237 datasets are employed as training data and are represented as knowledge triples in the form of subject-relationship-objects.

3. The knowledge reasoning method based on the structure-distinguishable representation graph neural network according to claim 1, wherein the step S2 specifically comprises the following sub-steps:

4. The knowledge reasoning method based on the structure-distinguishable representation graph neural network according to claim 1, wherein the S3 specifically comprises the following steps:

5. The knowledge reasoning method based on the structure-distinguishable representation graph neural network according to claim 1, wherein the step S4 specifically comprises the following sub-steps:

s401: dividing the preprocessed training data into a training set, a testing set and a verification set, wherein the ratio of the training set to the testing set to the verification set is 28:1:1, and setting a control group;