CN113761221B

CN113761221B - Knowledge graph entity alignment method based on graph neural network

Info

Publication number: CN113761221B
Application number: CN202110734416.6A
Authority: CN
Inventors: 张静; 栾瑞鹏; 亓东林; 孙晓; 陈曙东; 朱浩洋; 欧阳小叶
Original assignee: Chinese People's Liberation Army 32801
Current assignee: Chinese People's Liberation Army 32801
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2022-02-15
Anticipated expiration: 2041-06-30
Also published as: CN113761221A

Abstract

Aiming at the problem that the entity alignment method of the prior knowledge graph can cause information loss to a certain degree, the invention discloses a knowledge graph entity alignment method based on a graph neural network, which comprises the steps of data preprocessing, data preprocessing of two knowledge graphs to be aligned and the prior alignment seeds, and the processing result as the input of the next step; constructing a graph neural network model, inputting a preprocessing result into a graph convolution neural network, and uniformly modeling two knowledge graphs to be aligned by using the graph neural network to obtain vectorization representation of an entity in the knowledge graphs; and searching an entity with the highest semantic similarity to the entity represented by the entity vector in the vector space based on a greedy algorithm to serve as an aligned entity. The invention jointly models the two maps to be aligned by constructing a uniform map neural network, more effectively utilizes the comprehensive information of the two maps, obtains more accurate entity vectorization representation and improves the accuracy of entity alignment.

Description

Knowledge graph entity alignment method based on graph neural network

Technical Field

The invention relates to the field of knowledge graph-based entity alignment, in particular to a knowledge graph entity alignment method based on a graph neural network.

Background

In recent years, knowledge-graph technology has developed rapidly, and various researches based on knowledge-graphs are endless. Researchers have constructed a large number of general knowledge maps and domain knowledge maps, and knowledge among the knowledge maps is overlapped and supplemented. How to fuse the multi-source heterogeneous knowledge graph to obtain a knowledge graph with more complete knowledge, thereby better supporting the application of the graph is urgent. The entity alignment technology is an important method for realizing knowledge graph fusion.

At present, an entity alignment method of knowledge graphs mainly uses two graph neural networks to independently model two graphs to be aligned, vector representations of entities and relations in the two graphs are obtained respectively, vector spaces are searched through a greedy algorithm and the like, and a pair of entities with the closest vector representations are used as alignment entities. The method models the two maps respectively, and the entity alignment task needs to fully utilize the comprehensive information of the two maps, so the method can cause information loss to a certain degree.

Disclosure of Invention

The invention discloses a knowledge graph entity alignment method based on a graph neural network, aiming at the problem that the existing knowledge graph entity alignment method can cause information loss to a certain degree.

The invention discloses a knowledge graph entity alignment method based on a graph neural network, which comprises the following specific steps:

s1, preprocessing data, preprocessing the two knowledge maps to be aligned and the existing alignment seeds, and using the processing result as the input of the step S2;

s2, constructing a graph neural network model, inputting the preprocessing result of the step S1 into a graph convolution neural network, and uniformly modeling two knowledge graphs needing to be aligned by using the graph neural network to obtain vectorization representation of an entity in the knowledge graphs;

and S3, searching an entity vector in the vector space based on the greedy algorithm to represent the entity with the highest semantic similarity with the entity, and taking the entity as an alignment entity.

In the step S1, during data preprocessing, all triplets and aligned entity seed pairs included in the two knowledge maps are processed, and are randomly initialized by using an embedding layer in the Keras artificial neural network library, so as to obtain vector representations of entities and relationships in the triplets. In order to ensure that the subsequently constructed graph neural network jointly models the two knowledge graphs to be aligned, the two knowledge graphs to be aligned are associated by utilizing the aligned seeds, and initial data information is fully mined; regarding the pre-aligned seeds as aligned triples, and constructing an adjacency matrix; when constructing the cross-map relational triples, traversing all the existing triples, replacing the entity contained in the triples with the entity aligned by the pre-aligned seeds for the triples containing a certain entity in the aligned seeds, and generating a new triplet, thereby constructing the adjacency matrix and the relational triples of the cross-map and obtaining a preprocessing result for synthesizing two pieces of knowledge-map information needing to be aligned.

In the step S2, the data preprocessing result obtained in the step S1 is input into a atlas convolutional neural network, and unified modeling is performed on two knowledge maps to be aligned, so as to obtain vectorized representation of entities and relations in the maps; the vectorization representation of the entity and the relation in the graph convolutional neural network is continuously adjusted through iteration, the similarity degree of the entity semantic meaning and the similarity degree of the entity vector representation are consistent as a convergence target, and the specific iteration process is as follows:

and S21, aggregating information, initializing graph vector representation, performing aggregation operation on each node in the graph, aggregating information of neighbor nodes to update vector representation of the central node, and aggregating information of all connected nodes. The calculation formula of the aggregation information is as follows:

wherein

Representing the ith entity e at the ith iteration_iIs used to represent the vector of (a),

indicating the kth entity e at the l iteration_kIs used to represent the vector of (a),

denotes e_iAll neighbor nodes of。

S22, reversely deriving all variables of the neural network model of the graph according to a chain rule according to the loss function, and updating model parameters by using a gradient descent method, wherein the expression of the loss function L is as follows:

where P represents a pre-aligned seed entity pair, P' represents other entity pairs resulting from random negative sampling,

respectively representing the pre-aligned seed entity pairs e after the graph neural network coding_iAnd e_jIs used to represent the vector of (a),

respectively represents a random negative sampling entity pair e 'coded by a graph neural network'_iAnd e'_jAnd λ represents the threshold. And reversely deriving all variables in the neural network of the graph by using the loss function, and updating the vector representation of the knowledge graph.

And S23, repeating the steps S21 and S22 until the whole training process is finished, and the map vector representation is not changed any more.

In step S3, a greedy algorithm is used to search the whole vector space, and the paired vector corresponding entities whose euclidean distances represented by the node vectors in different knowledge graphs are smaller than a threshold are obtained as alignment entities.

The invention has the beneficial effects that:

1. the method is improved on the basis of the existing graph neural network entity alignment method, and the respective modeling optimization aiming at the two graphs is modified into the large graph integral modeling aiming at the two graphs, so that the associated information of the two graphs to be aligned is better captured, and the entity alignment accuracy is improved; and the combined modeling mode is simple and easy to operate, the method has better expansibility, and can be used as a module to be embedded into the existing entity alignment model, so that the entity alignment accuracy of the existing method is improved.

2. The invention provides a new idea for other applications based on the knowledge graph, namely, a training set is not only used as a standard when the gradient of the model is reduced, but also can be used as a part of enhanced features to be input into the model when in input, thereby more fully utilizing the information of training data and improving the final performance of the model.

Drawings

FIG. 1 is a flow chart of the steps of the method.

Detailed Description

For a better understanding of the present disclosure, an example is given here.

The invention discloses a knowledge graph entity alignment method based on a graph neural network, which comprises the following steps:

In the step S1, during data preprocessing, all triplets and aligned entity seed pairs included in the two knowledge maps are processed, and are randomly initialized by using an embedding layer in the Keras artificial neural network library, so as to obtain vector representations of entities and relationships in the triplets. In order to ensure that the subsequently constructed graph neural network jointly models the two knowledge graphs to be aligned, the two knowledge graphs to be aligned are associated by utilizing the aligned seeds, and initial data information is fully mined; regarding the pre-aligned seeds as aligned triples, and constructing an adjacency matrix; when constructing the cross-map relational triples, traversing all the existing triples, replacing the entity contained in the triples with the entity aligned by the pre-aligned seeds for the triples containing a certain entity in the aligned seeds, and generating a new triplet, thereby constructing the adjacency matrix and the relational triples of the cross-map and obtaining a preprocessing result for synthesizing two pieces of knowledge-map information needing to be aligned. For example, to illustrate step S1, if there are two chinese-english maps that need to perform the entity alignment task, there is a triple in the chinese map: china- > capital- > Beijing, and an English map has a triple: China-China is a pre-aligned entity seed, and two new triples can be expanded by referring to the two triples according to the two map joint modeling principles in the foregoing: china- > belongsto- > Asia, China- > capital- > Beijing. And the expanded triple is used as enhancement data and put into a training set, so that the final entity alignment accuracy of the model can be effectively improved.

and S21, aggregating information, performing aggregation operation on each node in the graph after initializing graph vector representation, and aggregating information of neighbor nodes to update vector representation of the central node. Due to the processing of S1, the aggregation information is different from the neighbor node information of the graph where the existing graph neural network only aggregates nodes, and in step S1, the cross-graph adjacency matrix and the cross-graph relationship triplet are associated, and then all the associated node information is aggregated. The calculation formula of the aggregation information is as follows:

wherein

Is shown asI entity e in l iterations_iIs used to represent the vector of (a),

denotes e_iAll neighbor nodes of (1).

In the process of step S3, a greedy algorithm is used to search the whole vector space, and the paired vector corresponding entities whose euclidean distances represented by the node vectors in different knowledge graphs are smaller than a threshold value are obtained by calculation and are used as alignment entities.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A knowledge graph entity alignment method based on a graph neural network is characterized by comprising the following specific steps:

s3, searching an entity vector in the vector space based on a greedy algorithm to represent an entity with the highest semantic similarity to the entity, and taking the entity as an alignment entity;

in the step S1, during data preprocessing, all triples and aligned entity seed pairs included in the two knowledge maps are processed, and are randomly initialized by using an embedding layer in the Keras artificial neural network library, so as to obtain vector representations of entities and relationships in the triples; in order to ensure that the subsequently constructed graph neural network jointly models the two knowledge graphs to be aligned, the two knowledge graphs to be aligned are associated by utilizing the aligned seeds, and initial data information is fully mined; regarding the pre-aligned seeds as aligned triples, and constructing an adjacency matrix; when constructing the cross-map relational triples, traversing all the existing triples, replacing an entity contained in each triplet with an entity aligned with a pre-aligned seed for the triples containing a certain entity in the aligned seed, and generating a new triplet, thereby constructing an adjacency matrix and a relational triplet of a cross-map and obtaining a preprocessing result for synthesizing two pieces of knowledge-map information needing to be aligned;

s21, aggregating information, after initializing map vector representation, performing aggregation operation on each node in the map, aggregating information of neighbor nodes to update vector representation of the central node, and aggregating information of all connected nodes; the calculation formula of the aggregation information is as follows:

wherein

denotes e_iAll neighbor nodes of (1);

respectively represents a random negative sampling entity pair e 'coded by a graph neural network'_iAnd e'_jRepresents the vector of (a), λ represents the threshold; reversely deriving all variables in the neural network of the graph by using the loss function, and updating the vector representation of the knowledge graph;

2. The method for aligning knowledge-graph entities based on graph neural network as claimed in claim 1, wherein in step S3, the greedy algorithm is used to search the whole vector space, and the paired vector corresponding entities whose euclidean distances represented by node vectors in different knowledge-graphs are smaller than the threshold are used as the aligned entities.