CN116610820A

CN116610820A - Knowledge graph entity alignment method, device, equipment and storage medium

Info

Publication number: CN116610820A
Application number: CN202310896828.9A
Authority: CN
Inventors: 左勇; 谢喜林; 王晓龙
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-08-18
Anticipated expiration: 2043-07-21
Also published as: CN116610820B

Abstract

The application discloses a method, a device, equipment and a storage medium for aligning knowledge graph entities, which relate to the technical field of artificial intelligent knowledge graphs and comprise the following steps: acquiring a first vector of all entities and relations in a first knowledge graph and a second vector in a second knowledge graph; and normalizing and vector weighting the first vector and the second vector based on the weight value of the first vector and the weight value of the second vector respectively to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang, and comparing the entity similarity of the first knowledge graph with the second knowledge graph to determine an alignment entity in the first knowledge graph and the second knowledge graph, and then performing entity alignment operation on the first knowledge graph and the second knowledge graph to obtain the target knowledge graph. In this way, the accuracy of alignment between multi-source co-fingered entities may be enhanced.

Description

Knowledge graph entity alignment method, device, equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence knowledge graph technology, and in particular, to a method, an apparatus, a device, and a storage medium for aligning knowledge graph entities.

Background

Entity Alignment (Entity Alignment) aims at finding out entities which refer to the same thing in different Knowledge graphs (knowledgegraph), is a key technology for Knowledge Graph fusion, and is widely applied to the field of natural language processing. Recently, existing entity alignment methods are all based on the transfer distance of an entity and graph network information of the entity and a one-hop neighbor node, the entities from different knowledge maps are converted into embedded vectors with low dimensionality, and the similarity between the vectors is calculated to determine the alignment relationship between the entities. Better alignment is achieved in some scenarios based on the way the entity vector is represented. However, the method often ignores the semantic difference description among the entities and the difference of the entities among the graph network topologies, and generates serious interference on entity matching alignment, thereby affecting the accuracy of the entity alignment. Therefore, how to improve the accuracy of the alignment of the knowledge-graph entities needs to be solved.

Disclosure of Invention

Accordingly, the present invention is directed to a method, apparatus, device and storage medium for aligning knowledge-graph entities, which can enhance the accuracy of alignment between multi-source co-fingered entities. The specific scheme is as follows:

In a first aspect, the application discloses a knowledge-graph entity alignment method, which comprises the following steps:

embedding a first knowledge graph and a second knowledge graph into a preset same vector space to obtain a first vector of all entities and relations in the first knowledge graph and a second vector of all entities and relations in the second knowledge graph;

respectively constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity in a first entity relation matrix corresponding to the first knowledge graph and a second entity relation matrix corresponding to the second knowledge graph and a target neighbor entity of the central entity;

normalizing the corresponding first vector and the second vector based on the first KNN tree and the second KNN tree respectively, and then performing vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector respectively to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang corresponding to the first knowledge graph and the second knowledge graph respectively;

and comparing entity similarity of the first knowledge graph with the second knowledge graph according to a first final alignment vector and the second final alignment vector, and determining alignment entities in the first knowledge graph and the second knowledge graph according to a comparison result so as to perform entity alignment operation on the basis of the first knowledge graph and the second knowledge graph of the alignment entities to obtain a target knowledge graph.

Optionally, the embedding the first knowledge-graph and the second knowledge-graph into a preset same vector space to obtain a first vector of all entities and relations in the first knowledge-graph and a second vector of all entities and relations in the second knowledge-graph includes:

training the first initial graph convolution neural network model and the second initial graph convolution neural network model by using a preset minimum interval loss formula to obtain a corresponding first vector extraction model and a corresponding second vector extraction model; the first initial graph convolution neural network model and the second initial graph convolution neural network model share a weight matrix;

extracting first vectors of all entities and relations in the first knowledge-graph by using the first vector extraction model, and extracting second vectors of all entities and relations in the second knowledge-graph by using the second vector extraction model.

Optionally, after embedding the first knowledge-graph and the second knowledge-graph into a preset same vector space to obtain the first vectors of all the entities and the relations in the first knowledge-graph and the second vectors of all the entities and the relations in the second knowledge-graph, the method further includes:

And determining the similarity of the vector products between all the vectors in the first vector and the second vector.

Optionally, before constructing the first KNN tree corresponding to the first knowledge graph and the second KNN tree corresponding to the second knowledge graph, the method further includes:

collecting relation predicates of all entities and relations in the first knowledge graph and the second knowledge graph;

obtaining scoring information of the relation predicates generated based on semantic information based on a preset scoring collection interface, and carrying out weight assignment processing on the relation predicates to obtain a weight value of the first vector and a weight value of the second vector; or, respectively performing knowledge representation learning and knowledge embedding on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and calculating the weight value of the relation predicate according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain the weight value of the first vector and the weight value of the second vector.

Optionally, the constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each center entity in the first entity relationship matrix corresponding to the first knowledge graph and the second entity relationship matrix corresponding to the second knowledge graph and the target neighbor entity of the center entity respectively includes:

constructing a first initial relation matrix corresponding to the first knowledge graph and a second initial relation matrix corresponding to the second knowledge graph based on all entities and relations in the first knowledge graph and the second knowledge graph;

comparing the weight value of the first vector and the weight value of the second vector with a first preset threshold value, and respectively adjusting all entities and relations in the first initial relation matrix and the second initial relation matrix according to the comparison result to obtain a first entity relation matrix and a second entity relation matrix;

and constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on the central entity in the first entity relation matrix and the second entity relation matrix and the target neighbor entity of the central entity respectively.

traversing the first entity relation matrix and the second entity relation matrix to determine neighbor entities corresponding to central entities in the first entity relation matrix and the second entity relation matrix;

sorting the vector product similarity corresponding to the central entity and the corresponding neighbor entities, and determining the target neighbor entity corresponding to each central entity according to the sorting result;

and constructing a KNN tree according to the target neighbor entity corresponding to the central entity to obtain a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph.

Optionally, the comparing the entity similarity between the first knowledge-graph and the second knowledge-graph according to the first final alignment vector and the second final alignment vector, and determining the alignment entity in the first knowledge-graph and the second knowledge-graph according to the comparison result, includes:

Determining entity similarity between the entities in the first knowledge-graph and each entity in the first knowledge-graph according to a first final alignment vector and the second final alignment vector;

and comparing the entity similarity with a second preset threshold value, and determining the entity with the entity similarity value larger than the second preset threshold value as an alignment entity.

In a second aspect, the present application discloses a knowledge-graph entity alignment device, including:

the vector extraction module is used for embedding the first knowledge graph and the second knowledge graph into a preset same vector space to obtain first vectors of all entities and relations in the first knowledge graph and second vectors of all entities and relations in the second knowledge graph;

the KNN tree generation module is used for respectively constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity and target neighbor entities of the central entity in a first entity relation matrix corresponding to the first knowledge graph and a second entity relation matrix corresponding to the second knowledge graph;

the vector obtaining module is used for carrying out normalization processing on the corresponding first vector and the second vector based on the first KNN tree and the second KNN tree respectively, and then carrying out vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector respectively so as to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang corresponding to the first knowledge graph and the second knowledge graph respectively;

The alignment entity determining module is used for comparing entity similarity between the first knowledge-graph and the second knowledge-graph according to a first final alignment vector and the second final alignment vector, and determining alignment entities in the first knowledge-graph and the second knowledge-graph according to a comparison result so as to perform entity alignment operation on the basis of the first knowledge-graph and the second knowledge-graph of the alignment entities to obtain a target knowledge-graph.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the aforementioned knowledge graph entity alignment method.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, which when executed by a processor implements the aforementioned knowledge-graph entity alignment method.

In the method, a first knowledge graph and a second knowledge graph are embedded into a preset same vector space to obtain a first vector of all entities and relations in the first knowledge graph and a second vector of all entities and relations in the second knowledge graph; respectively constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity in a first entity relation matrix corresponding to the first knowledge graph and a second entity relation matrix corresponding to the second knowledge graph and a target neighbor entity of the central entity; normalizing the corresponding first vector and the second vector based on the first KNN tree and the second KNN tree respectively, and then performing vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector respectively to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang corresponding to the first knowledge graph and the second knowledge graph respectively; and comparing entity similarity of the first knowledge graph with the second knowledge graph according to a first final alignment vector and the second final alignment vector, and determining alignment entities in the first knowledge graph and the second knowledge graph according to a comparison result so as to perform entity alignment operation on the basis of the first knowledge graph and the second knowledge graph of the alignment entities to obtain a target knowledge graph. The method comprises the steps of placing two knowledge maps in the same vector space, enabling two vectors obtained by converting the two knowledge maps to have comparability, constructing a KNN tree of entity vertexes according to each center entity and target neighbor entities of the center entity in an entity relation matrix corresponding to the two knowledge maps, and respectively carrying out normalization processing on the two vectors by utilizing the constructed KNN tree to obtain a final alignment vector. The final alignment vector fuses various information, especially main relation and neighbor information, and effectively enhances the representation of the entity. And finally, determining alignment vectors in the first knowledge-graph and the second knowledge-graph according to the final pair Ji Xiangliang. In this way, the common entity representation is integrated with the entity itself information, the entity main relation information and the neighbor entity information, the entity representation is enhanced, and the entity alignment accuracy is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for aligning knowledge graph entities;

FIG. 2 is a flowchart of a specific knowledge-graph entity alignment method disclosed in the present application;

fig. 3 is a schematic structural diagram of a knowledge-graph entity alignment device disclosed in the present application;

fig. 4 is a block diagram of an electronic device according to the present disclosure.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In a knowledge graph, entities are in a graph network, and the entity itself and the relationships and weights between entities are a very important trade-off element for identifying entities. Therefore, the embodiment will specifically introduce a method for aligning knowledge-graph entities, which fully considers the entity of the knowledge-graph, the network topology structure of the entity, and the relationship weight between the entity and the neighboring entity, and can enhance the accuracy of alignment between the multi-source co-fingered entities.

Referring to fig. 1, the embodiment of the application discloses a method for aligning a knowledge graph entity, which comprises the following steps:

step S11: embedding the first knowledge graph and the second knowledge graph into a preset same vector space to obtain a first vector of all entities and relations in the first knowledge graph and a second vector of all entities and relations in the second knowledge graph.

In this embodiment, embedding the first knowledge-graph and the second knowledge-graph into a preset same vector space to obtain a first vector of all entities and relationships in the first knowledge-graph and a second vector of all entities and relationships in the second knowledge-graph includes: training the first initial graph convolution neural network model and the second initial graph convolution neural network model by using a preset minimum interval loss formula to obtain a corresponding first vector extraction model and a corresponding second vector extraction model; the first initial graph convolution neural network model and the second initial graph convolution neural network model share a weight matrix; extracting first vectors of all entities and relations in the first knowledge-graph by using the first vector extraction model, and extracting second vectors of all entities and relations in the second knowledge-graph by using the second vector extraction model. That is, the first knowledge-graph and the second knowledge-graph comprise a portion of known alignment entities and relationships, and the portion of triples can effectively become bridges for aligning the entities in the first knowledge-graph and the second knowledge-graph. And replacing the aligned triples in the first knowledge graph and the second knowledge graph with consistent triples. The GCN model is trained with minimized gap loss using a GCN-Align (Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks) algorithm, using two GCN (Graph Convolutional Networks, i.e., graph rolling network) models that share a weight matrix, the minimized gap loss formula is shown below,

；

Where e1 and e2 refer to a small number of entities and relationships in two maps that are known to align, h (e) is the extracted vector, f is the loss value, d is the vector dimension,is the L2 norm. And through the trained vector representation, the entities of the first knowledge-graph and the second knowledge-graph are embedded into a unified vector space. The entities and relationships of the first and second knowledge-graph obtain a low-dimensional vector representation in the same representation space. The entity and relationship vectors represent the use of L2 distance minimization in training to optimize the training process. Because the first knowledge graph and the second knowledge graph are represented by vectors in the same vector space, global information of entities and relations is fused in the vectors, and the characteristics of GCN represent that the vectors are also fused with graph topological structure information. The first knowledge-graph and the second knowledge-graph comprise part of the known alignment triples, so that preliminary links are also established in the representation vectors. Vector representations of all entities and relationships in the first knowledge-graph and the second knowledge-graph are obtained. After the first knowledge-graph and the second knowledge-graph are embedded into the preset same vector space to obtain the first vectors of all the entities and the relations in the first knowledge-graph and the second vectors of all the entities and the relations in the second knowledge-graph, the method further comprises: and determining the similarity of the vector products between all the vectors in the first vector and the second vector. For example, the first vector contains 10 vectors, and the second vector contains 10 vectors, so that there are 20 data in total, and the multiplication of any two data results in 190 vector product similarities.

Step S12: and constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity in the first entity relation matrix corresponding to the first knowledge graph and the second entity relation matrix corresponding to the second knowledge graph and the target neighbor entity of the central entity.

In this embodiment, a first KNN (K-nearest neighbor) tree corresponding to the first knowledge-graph and a second KNN tree corresponding to the second knowledge-graph are constructed based on each center entity in the first entity-relationship matrix corresponding to the first knowledge-graph and the second entity-relationship matrix corresponding to the second knowledge-graph, and the target neighbor entity of the center entity, respectively. And the first entity relation matrix and the second entity relation matrix are correspondingly adjusted according to the first vector and the second vector, and some triples with low noise and some confidence are removed. Traversing the first entity relation matrix and the second entity relation matrix respectively, finding out the topN entity nearest to each center entity, and sequencing from large to small by using the values of the center entity and the neighbor entity in the relation matrix. It should be noted that, the number of topN entities may be set according to practical situations, and the value of the central entity and the neighbor entity in the relationship matrix is the similarity of the vector product. If there are triple entities of the self-loop relationship, then the self-entity is excluded from the ordering. Thus, each entity and TopN neighbor entity construct a KNN tree structure, the relationship information of the triples is tightly combined in the tree structure, and the stronger the relationship is, the closer the entity is to the center entity, the more favorable the alignment of the center entity is assisted. And then constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph.

Step S13: and respectively carrying out normalization processing on the corresponding first vector and the corresponding second vector based on the first KNN tree and the second KNN tree, and then respectively carrying out vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector so as to obtain a first final pair Ji Xiangliang and a second final alignment vector corresponding to the first knowledge graph and the second knowledge graph respectively.

In this embodiment, the first vector and the second vector are normalized respectively, then a relationship scoring weight is fused, and the normalized relationship vector and the weight are multiplied to obtain a weighted relationship vector. The weighted relation vectors between the central entity and all the neighbor entities are aggregated, and then the vector of the central entity is aggregated as the final vector of entity alignment. The relation scoring weight is a relation predicate for collecting all entities and relations in the first knowledge graph and the second knowledge graph; then, score information of the relation predicates generated based on semantic information is obtained based on a preset score collection interface, and weight assignment processing is carried out on the relation predicates to obtain a weight value of the first vector and a weight value of the second vector; or respectively carrying out knowledge representation learning and knowledge embedding on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and calculating the weight value of the relation predicate according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain the weight value of the first vector and the weight value of the second vector. And each node in the first KNN tree and the second KNN tree obtains a final alignment vector of each entity according to the aggregation vector rule. The final alignment vector of the entity is fused into various information, especially main relation and neighbor information, and the representation of the entity is effectively enhanced.

Step S14: and comparing entity similarity of the first knowledge graph with the second knowledge graph according to a first final alignment vector and the second final alignment vector, and determining alignment entities in the first knowledge graph and the second knowledge graph according to a comparison result so as to perform entity alignment operation on the basis of the first knowledge graph and the second knowledge graph of the alignment entities to obtain a target knowledge graph.

In this embodiment, the comparing the entity similarity between the first knowledge-graph and the second knowledge-graph according to the first final alignment vector and the second final alignment vector, and determining the alignment entity in the first knowledge-graph and the second knowledge-graph according to the comparison result includes: determining entity similarity between the entities in the first knowledge-graph and each entity in the first knowledge-graph according to a first final alignment vector and the second final alignment vector; and comparing the entity similarity with a second preset threshold value, and determining the entity with the entity similarity value larger than the second preset threshold value as an alignment entity. And traversing each entity of the first knowledge graph and all entities in the second knowledge graph to perform vector similarity comparison, and setting the second preset threshold value according to the similarity between the known aligned entities and the relationship. An entity greater than a threshold is determined to be a pair of aligned entities, i.e., aligned entities. And then performing entity alignment operation based on the first knowledge-graph and the second knowledge-graph of the alignment entity to obtain a target knowledge-graph.

It can be seen that, in this embodiment, first, a first knowledge graph and a second knowledge graph are embedded into a preset identical vector space, so as to obtain a first vector of all entities and relationships in the first knowledge graph and a second vector of all entities and relationships in the second knowledge graph; respectively constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity in a first entity relation matrix corresponding to the first knowledge graph and a second entity relation matrix corresponding to the second knowledge graph and a target neighbor entity of the central entity; normalizing the corresponding first vector and the second vector based on the first KNN tree and the second KNN tree respectively, and then performing vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector respectively to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang corresponding to the first knowledge graph and the second knowledge graph respectively; and comparing entity similarity of the first knowledge graph with the second knowledge graph according to a first final alignment vector and the second final alignment vector, and determining alignment entities in the first knowledge graph and the second knowledge graph according to a comparison result so as to perform entity alignment operation on the basis of the first knowledge graph and the second knowledge graph of the alignment entities to obtain a target knowledge graph. The method comprises the steps of placing two knowledge maps in the same vector space, enabling two vectors obtained by converting the two knowledge maps to have comparability, constructing a KNN tree of entity vertexes according to each center entity and target neighbor entities of the center entity in an entity relation matrix corresponding to the two knowledge maps, and respectively carrying out normalization processing on the two vectors by utilizing the constructed KNN tree to obtain a final alignment vector. The final alignment vector fuses various information, especially main relation and neighbor information, and effectively enhances the representation of the entity. And finally, determining alignment vectors in the first knowledge-graph and the second knowledge-graph according to the final pair Ji Xiangliang. In this way, the common entity representation is integrated with the entity itself information, the entity main relation information and the neighbor entity information, the entity representation is enhanced, and the entity alignment accuracy is effectively improved.

The above embodiment specifically introduces a knowledge graph entity alignment method, and the present embodiment will specifically describe a process of KNN tree construction.

Referring to fig. 2, the embodiment of the application discloses a specific knowledge-graph entity alignment method, which comprises the following steps:

step S21: and collecting relation predicates of all entities and relations in the first knowledge graph and the second knowledge graph.

In this embodiment, relationship predicates of all entities and relationships in the first knowledge graph and the second knowledge graph are collected. And collecting relation predicates of all triples in the first knowledge graph and the second knowledge graph. Wherein, it should be noted that the triples include all entities and relationships.

Step S22: and acquiring scoring information of the relation predicates generated based on semantic information based on a preset scoring collection interface, and carrying out weight assignment processing on the relation predicates to obtain a weight value of the first vector and a weight value of the second vector.

In this embodiment, the artificial scoring (0-1.0) is performed according to the semantic information of the relationship predicates. The scoring criteria score the confirmed objectivity of the entity using the semantics of the relationship predicate. For example, in the address relationship, the relationship predicates describing the address are "located" and "address" are all relationship predicates of address certainty, and the scoring is 1.0. And then, obtaining scoring information of the relation predicates generated based on semantic information based on a preset scoring collection interface, wherein the manually calibrated weight has a certain subjectivity, so that the calibration accuracy can be improved by using a multi-person labeling voting mode, and finally, carrying out weight assignment processing on the relation predicates according to the more accurate scoring information to obtain the weight value of the first vector and the weight value of the second vector.

Step S23: and respectively carrying out knowledge representation learning and knowledge embedding on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and calculating the weight value of the relation predicate according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain the weight value of the first vector and the weight value of the second vector.

In this embodiment, knowledge representation learning and knowledge embedding are performed on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and weight values of the relational predicates are calculated according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain weight values of the first vector and the weight values of the second vector. Namely, a TransE (Translating Embeddings) knowledge embedding algorithm is used for carrying out knowledge embedding representation learning on the first knowledge graph and the second knowledge graph, and vector representations of the entities and the relations are trained. And collecting all relation predicates in the triples in the first knowledge graph and the second knowledge graph of the knowledge graph. The triples are classified according to the relationship predicates, and how many relationship predicates are classified into how many classes. Counting the sum of each class, wherein the sum of the kth class is as follows:

；

Wherein h is a header entity and r is a relationshipT is the tail entity, and the tail entity,is the L2 norm; the k-th coefficients are:

；

wherein n is the number of the k-th type triples; for all category coefficientsWeighting->The method comprises the following steps:

；

wherein S is _k Is the sum of all class coefficients. In this way, the weight value of the first vector and the weight value of the second vector can be obtained.

Step S24: constructing a first initial relation matrix corresponding to the first knowledge graph and a second initial relation matrix corresponding to the second knowledge graph based on all entities and relations in the first knowledge graph and the second knowledge graph; comparing the weight value of the first vector and the weight value of the second vector with a first preset threshold value, and respectively adjusting all entities and relations in the first initial relation matrix and the second initial relation matrix according to the comparison result to obtain a first entity relation matrix and a second entity relation matrix.

In this embodiment, a first initial relationship matrix corresponding to the first knowledge-graph and a second initial relationship matrix corresponding to the second knowledge-graph are constructed based on triples in the first knowledge-graph and the second knowledge-graph; and comparing the triplet weight value with a first preset threshold value, and respectively adjusting the triples in the first initial relation matrix and the second initial relation matrix according to the comparison result to obtain a first entity relation matrix and a second entity relation matrix. That is, a three-dimensional (x, y, z) entity and relationship matrix is constructed. And constructing an x direction by using the head entity of the triplet, constructing a y direction by using the tail entity, and constructing a z direction by using the relation to form a knowledge graph matrix. Assuming that the first knowledge-graph has 100 head entities, 100 tail entities and 10 relations, a knowledge-graph matrix of 100×100×10 can be formed. And assigning values in the matrix by using the triplet weight values. According to the data condition, the values of the triad relations in the known alignment in the matrix are reserved, the threshold value is set independently for other values, the values higher than the threshold value are reserved, and the other values are cleared. Some noise and some triples with low confidence are removed.

Step S25: traversing the first entity relation matrix and the second entity relation matrix to determine neighbor entities corresponding to central entities in the first entity relation matrix and the second entity relation matrix.

In this embodiment, the first entity relationship matrix and the second entity relationship matrix are traversed to determine neighbor entities corresponding to central entities in the first entity relationship matrix and the second entity relationship matrix. Namely, traversing the first entity relation matrix and the second entity relation matrix respectively, and finding out a corresponding neighbor entity by each central entity. It should be noted that, the neighbor entity corresponding to the central entity in the first entity-relationship matrix may be derived from the first entity-relationship matrix or may be derived from the second entity-relationship matrix.

Step S26: sorting the vector product similarity corresponding to the central entity and the corresponding neighbor entities, and determining the target neighbor entity corresponding to each central entity according to the sorting result; and constructing a KNN tree according to the target neighbor entity corresponding to the central entity to obtain a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph.

In this embodiment, the vector product similarity corresponding to the central entity and the corresponding neighbor entities is ordered, and the target neighbor entity corresponding to each central entity is determined according to the ordering result; and constructing a KNN tree according to the target neighbor entity corresponding to the central entity to obtain a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph. I.e. using the values of the central entity and the neighbor entities in the relation matrix, i.e. ordered from big to small. If there are triple entities of the self-loop relationship, then the self-entity is excluded from the ordering. Thus, each entity and TopN neighbor entity construct a KNN tree structure, the relationship information of the triples is tightly combined in the tree structure, and the stronger the relationship is, the closer the entity is to the center entity, the more favorable the alignment of the center entity is assisted. It should be noted that, the number of topN entities may be set according to practical situations, and the value of the central entity and the neighbor entity in the relationship matrix is the similarity of the vector product.

In this embodiment, collecting relationship predicates of all entities and relationships in the first knowledge graph and the second knowledge graph; obtaining scoring information of the relation predicates generated based on semantic information based on a preset scoring collection interface, and carrying out weight assignment processing on the relation predicates to obtain a weight value of the first vector and a weight value of the second vector; or; respectively carrying out knowledge representation learning and knowledge embedding on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and calculating the weight value of the relation predicate according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain the weight value of the first vector and the weight value of the second vector; constructing a first initial relation matrix corresponding to the first knowledge graph and a second initial relation matrix corresponding to the second knowledge graph based on the triples in the first knowledge graph and the second knowledge graph; comparing the triplet weight value with a first preset threshold value, respectively adjusting triples in the first initial relation matrix and the second initial relation matrix according to a comparison result to obtain a first entity relation matrix and a second entity relation matrix, traversing the first entity relation matrix and the second entity relation matrix, and determining neighbor entities corresponding to central entities in the first entity relation matrix and the second entity relation matrix; sorting the vector product similarity corresponding to the central entity and the corresponding neighbor entities, and determining the target neighbor entity corresponding to each central entity according to the sorting result; and constructing a KNN tree according to the target neighbor entity corresponding to the central entity to obtain a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph. In this way, each entity and TopN neighbor entity construct a KNN tree structure, in which the relationship information of the triples is tightly combined, and the stronger the relationship, the closer the entity is to the center entity, the more favorable the alignment of the center entity is assisted.

As described with reference to fig. 3, the embodiment of the present application further correspondingly discloses a knowledge-graph entity alignment device, including:

the vector extraction module 11 is configured to embed a first knowledge graph and a second knowledge graph into a preset same vector space, so as to obtain a first vector of all entities and relationships in the first knowledge graph and a second vector of all entities and relationships in the second knowledge graph;

the KNN tree generating module 12 is configured to construct a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on each central entity in the first entity relation matrix corresponding to the first knowledge graph and the second entity relation matrix corresponding to the second knowledge graph, and a target neighbor entity of the central entity;

a vector obtaining module 13, configured to normalize the corresponding first vector and the second vector based on the first KNN tree and the second KNN tree, and then perform vector weighting processing on the corresponding processed first vector and the processed second vector based on the weight value of the first vector and the weight value of the second vector, so as to obtain a first final pair Ji Xiangliang and a second final pair Ji Xiangliang corresponding to the first knowledge graph and the second knowledge graph, respectively;

The alignment entity determining module 14 is configured to compare entity similarities of the first knowledge-graph and the second knowledge-graph according to a first final alignment vector and the second final alignment vector, and determine alignment entities in the first knowledge-graph and the second knowledge-graph according to a comparison result, so as to perform entity alignment operation based on the alignment entities and the first knowledge-graph and the second knowledge-graph to obtain a target knowledge-graph.

In some specific embodiments, the vector extraction module 11 may specifically include:

the model training unit is used for training the first initial graph convolution neural network model and the second initial graph convolution neural network model by utilizing a preset minimum interval loss formula so as to obtain a corresponding first vector extraction model and a corresponding second vector extraction model; the first initial graph convolution neural network model and the second initial graph convolution neural network model share a weight matrix;

the vector conversion unit is used for extracting first vectors of all entities and relations in the first knowledge graph by using the first vector extraction model, and extracting second vectors of all entities and relations in the second knowledge graph by using the second vector extraction model.

In some specific embodiments, the knowledge-graph entity alignment device may further include:

and the vector product similarity determining module is used for determining the vector product similarity between all the vectors in the first vector and the second vector.

the relation predicate collection module is used for collecting relation predicates of all entities and relations in the first knowledge graph and the second knowledge graph;

The first weight value determining module is used for acquiring the grading information of the relation predicates generated based on semantic information based on a preset grading collection interface, and carrying out weight assignment processing on the relation predicates to obtain the weight value of the first vector and the weight value of the second vector;

the second weight value determining module is used for respectively carrying out knowledge representation learning and knowledge embedding on the first knowledge graph and the second knowledge graph based on a preset knowledge embedding algorithm to obtain a first calculation vector and a second calculation vector, and calculating the weight value of the relation predicate according to a preset weight value calculation formula and the first calculation vector and the second calculation vector to obtain the weight value of the first vector and the weight value of the second vector.

In some specific embodiments, the KNN tree generating module 12 may specifically include:

a relationship matrix construction unit, configured to construct a first initial relationship matrix corresponding to the first knowledge-graph and a second initial relationship matrix corresponding to the second knowledge-graph based on all entities and relationships in the first knowledge-graph and the second knowledge-graph;

the matrix adjustment unit is used for comparing the weight value of the first vector and the weight value of the second vector with a first preset threshold value, and respectively adjusting all entities and relations in the first initial relation matrix and the second initial relation matrix according to comparison results to obtain a first entity relation matrix and a second entity relation matrix;

And the KNN tree construction submodule is used for constructing a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph based on the central entity in the first entity relation matrix and the second entity relation matrix and the target neighbor entity of the central entity respectively.

In some specific embodiments, the KNN tree construction submodule may specifically include:

the neighbor entity determining unit is used for traversing the first entity relation matrix and the second entity relation matrix to determine neighbor entities corresponding to central entities in the first entity relation matrix and the second entity relation matrix;

the vector product similarity sorting unit is used for sorting the vector product similarity corresponding to the central entity and the corresponding neighbor entities, and determining the target neighbor entity corresponding to each central entity according to the sorting result;

and the KNN tree construction unit is used for constructing a KNN tree according to the target neighbor entity corresponding to the central entity to obtain a first KNN tree corresponding to the first knowledge graph and a second KNN tree corresponding to the second knowledge graph.

In some specific embodiments, the alignment entity determining module 14 may specifically include:

The entity similarity determining unit is used for determining entity similarity between the entity in the first knowledge graph and each entity in the first knowledge graph according to a first final alignment vector and the second final alignment vector;

and the alignment entity determining unit is used for comparing the entity similarity with a second preset threshold value and determining an entity with the entity similarity value larger than the second preset threshold value as an alignment entity.

Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the diagram is not to be considered as any limitation on the scope of use of the present application.

Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the knowledge-graph entity alignment method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the knowledge-graph entity alignment method performed by the electronic device 20 as disclosed in any of the previous embodiments.

Further, the application also discloses a computer readable storage medium for storing a computer program; the method comprises the steps of executing a computer program by a processor, wherein the computer program realizes the method for aligning the knowledge graph entities. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The method for aligning the knowledge graph entities is characterized by comprising the following steps:

2. The method for aligning a knowledge-graph entity according to claim 1, wherein the embedding the first knowledge-graph and the second knowledge-graph into a predetermined identical vector space to obtain a first vector of all entities and relationships in the first knowledge-graph and a second vector of all entities and relationships in the second knowledge-graph includes:

3. The method for aligning a knowledge-graph entity according to claim 1, wherein after embedding the first knowledge-graph and the second knowledge-graph into a preset same vector space to obtain a first vector of all entities and relationships in the first knowledge-graph and a second vector of all entities and relationships in the second knowledge-graph, further comprises:

4. The method for aligning a knowledge-graph entity according to claim 3, wherein before constructing the first KNN tree corresponding to the first knowledge-graph and the second KNN tree corresponding to the second knowledge-graph, the method further comprises:

5. The method for aligning a knowledge-graph entity according to claim 4, wherein constructing a first KNN tree corresponding to the first knowledge-graph and a second KNN tree corresponding to the second knowledge-graph based on each center entity in a first entity-relationship matrix corresponding to the first knowledge-graph and a second entity-relationship matrix corresponding to the second knowledge-graph and a target neighbor entity of the center entity, respectively, comprises:

6. The method for aligning knowledge-graph entities according to any one of claims 3 to 5, wherein the constructing a first KNN tree corresponding to the first knowledge-graph and a second KNN tree corresponding to the second knowledge-graph based on each center entity in a first entity-relationship matrix corresponding to the first knowledge-graph and a second entity-relationship matrix corresponding to the second knowledge-graph and a target neighbor entity of the center entity, respectively, includes:

7. The method for aligning a knowledge-graph entity according to claim 1, wherein the comparing the entity similarity between the first knowledge-graph and the second knowledge-graph according to the first final alignment vector and the second final alignment vector, and determining the aligned entity in the first knowledge-graph and the second knowledge-graph according to the comparison result, includes:

8. A knowledge-graph entity alignment device, comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the knowledge-graph entity alignment method of any one of claims 1 to 7.

10. A computer readable storage medium for storing a computer program which when executed by a processor implements the knowledge-graph entity alignment method of any one of claims 1 to 7.