CN114036307B

CN114036307B - Knowledge graph entity alignment method and device

Info

Publication number: CN114036307B
Application number: CN202111095446.3A
Authority: CN
Inventors: 曾开胜; 李涓子; 侯磊; 冯铃; 唐杰; 许斌
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2022-09-13
Anticipated expiration: 2041-09-17
Also published as: CN114036307A

Abstract

The invention provides a knowledge graph entity alignment method and a device, comprising the following steps: acquiring data of two knowledge maps to be fused; performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representation of each entity in the two knowledge graphs; performing relation representation learning for enhancing entity semantics according to entity representation, and modeling the relation between entities to obtain entity relation representation; carrying out concept and concept hierarchy system representation learning according to entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy system representation; and constraining the entity representation in the entity alignment process based on the vector distance through the entity relationship representation, the concept and the concept hierarchy representation to obtain the result of the alignment of the two knowledge graph entities. The concept and the concept hierarchy are fused into an entity alignment framework and play a role, and the accuracy of entity alignment is improved.

Description

Knowledge graph entity alignment method and device

Technical Field

The invention relates to the field of computer artificial intelligent natural language processing, in particular to a knowledge graph entity alignment method and a knowledge graph entity alignment device.

Background

The knowledge map fusing multi-language and multi-knowledge source information becomes an important knowledge source for a plurality of artificial intelligence applications such as information extraction, intelligent question answering and the like. Entity alignment attracts the interest of many scholars and becomes an important research problem in order to more efficiently fuse information with overlapping and complementary knowledge-graphs. Many knowledge maps provide rich structural knowledge for different applications, and due to different construction purposes, the knowledge maps have great heterogeneity and also contain some complementary knowledge. In order to better support tasks such as a cross-language question-answering system and a cross-language recommendation system on an upper layer, the integration of different knowledge maps becomes an important research direction. Entity alignment is a key technology of knowledge graph fusion.

In the traditional method for aligning knowledge graph entities, a series of similarity is calculated mainly by utilizing entity text information, entity attribute information, entity network structure information and the like, and then whether a given entity pair is an equivalent entity is judged by a manually set threshold or a classification algorithm in machine learning, so that error conditions of entity alignment belonging to different concepts can occur in the process of an entity alignment task, and the accuracy of entity alignment is influenced.

Disclosure of Invention

The invention provides a knowledge graph entity alignment method and a knowledge graph entity alignment device, which are used for overcoming the defect of low entity alignment accuracy in a knowledge graph in the prior art, realizing the alignment of entities belonging to the same concept under the constraint of the concept and improving the accuracy of entity alignment.

The invention provides a knowledge graph entity alignment method, which comprises the following steps:

acquiring data of two knowledge maps to be fused;

performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representations of all entities in the two knowledge graphs;

performing relation representation learning for enhancing entity semantics according to the entity representation, and modeling the relation between the entities to obtain entity relation representation;

carrying out concept and concept hierarchy system representation learning according to the entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy system representation;

and constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation, concept and concept hierarchy representation to obtain the result of aligning the two knowledge graph entities.

According to the knowledge graph entity alignment method provided by the invention, data of two knowledge graphs are a head entity, a tail entity and a relational triple set of the relationship between the two entities; the learning of the entity representation of neighborhood aggregation is performed on the data of the two knowledge graphs respectively to obtain the entity representation of each entity in the two knowledge graphs, and the learning specifically comprises the following steps:

encoding each of two of said knowledge-graphs using an attention-seeking neural network, entity e _i The vector in the l +1 th layer network is represented as

The calculation method is as follows:

wherein,

is the weight of the l-th network, the entity initial vector

From the entity representation matrix

d is the dimension of the vector representation, σ () is a non-linear activation function,

is entity e in the l-th network _i And e _j Attention weight between;

using two matrix divisionsCalculating attention coefficient by performing linear transformation on head entity and tail entity respectively

Wherein,

is a parameter matrix of two linear transformations ^T For matrix transpose operations, LeakyReLU (.) is a nonlinear activation function;

by combining entity e _i The attention coefficient with the neighbor entity is normalized to obtain the attention weight between the entities

Obtaining an entity e through an L-layer graph convolution network _i Vector representation of converged domain entity information

H ^(L) The matrix is represented for the neighborhood aggregated entities.

According to the knowledge graph entity alignment method provided by the invention, the relation representation learning for enhancing entity semantics is carried out according to the entity representation, the relation between the entities is modeled to obtain the entity relation representation, and the method specifically comprises the following steps:

representing the entity and the relation to the same vector space through a knowledge representation translation model TransE, and representing the entity and the relation to each relation type triple (e) _h ,r,e _t ) E.g. T, calculating a rationality score:

wherein e is _h And e _t Representing two entities, r represents an entity e _h And entity e _t The relationship between;

optimization objective O using interval ordering based loss function as knowledge representation translation model TransE _R ：

Wherein entity e _h And e _t The vector representation of (a) is taken from the entity representation matrix H of the neighborhood aggregation ^(L) The vector representation of the relation r is taken from a relation matrix needing to be learned

γ ₂ >0 is an interval hyperparameter, and the training negative sample set T' is obtained by carrying out type negative sampling on the relational triple set T.

According to the knowledge graph entity alignment method provided by the invention, the concept and concept hierarchy representation learning is carried out according to the entity representation, the relationships between entities and concepts and between concepts are modeled, and the concept and concept hierarchy representation is obtained, and the method specifically comprises the following steps:

establishing a box embedding representation model as a concept and concept hierarchy representation, wherein the box embedding representation model represents a concept by a hyper-rectangle with axes aligned in space, the hyper-rectangle is provided with an internal space, and a concept c is formally defined as a vector in a real number space

The area covered by concept c is:

wherein, the relation of the order deviation is not more than the order deviation,

is the center of the hyper-rectangle c,

is the range offset of the hyper-rectangle c; when each element in off (c) is 0, Box _c Degenerating into a d-dimensional vector, i.e. a point in d-dimensional space, and representing the same as the entity, therefore, the relationship between the entity and the concept is described by using the point and the hyper-rectangle in space, and the entity vector belonging to the concept c can be represented as:

{e _i ∈Box _c |e _i ∈E}

judging whether an entity belongs to a concept, measuring by the distance between a spatial midpoint and a hyper-rectangle, giving an entity e and a concept c, and defining the distance between the entity e and the concept c as follows:

wherein, c _max ＝Cen(c)+Off(c),c _min Cen (c) -off (c), from an external distance dist _outside (e, c) and an internal distance dist _inside(e,c) Measuring the distance between an entity and a concept in two aspects, wherein the outer distance represents the distance between the entity and the boundary of a hyper-rectangle in which the concept is positioned, and the inner distance represents the distance between the entity and the center of the rectangle in which the concept is positioned; and 0<β<1 is a hyper-parameter for balancing the specific gravity of two types of distances;

optimization target O for defining instanceOf relationship between entity and concept _I ：

Wherein the vector representation of entity e is taken from the neighborhood aggregated entity representation matrix H ^(L) The vector representation of concept c is taken from the concept matrix to be learned

γ ₃ >0 is a predefined interval hyperparameter, training the set of negative samples L' _instanceOf Is a set L of dependencies between entities and concepts _instanceOf And carrying out random uniform sampling to obtain the target.

According to the knowledge graph entity alignment method provided by the invention, the concept and concept hierarchy representation learning is carried out according to the entity representation, the relation between the entity and the concept and between the concept and the concept is modeled to obtain the concept and concept hierarchy representation, and the method specifically comprises the following steps:

definition of the concept of upper and lower bits<c _i ,subclassOf,c _j >Distance function of (d):

wherein, c _y.max ＝Cen(c _y )+Off(c _y ),c _y.min ＝Cen(c _y )-Off(c _y ) Y ∈ { i, j }, if concept c _i Is completely covered with concept c _j The hyper-rectangles contain, then the conceptual distance f between them _box (c _i ,c _j )＝0；

Optimization objective O defining concepts and SubclassOf relationships between concepts _S ：

According to the method for aligning the knowledge graph entities, which is provided by the invention, the entity representation is restricted in the entity aligning process based on the vector distance through the entity relationship representation, the concept and the concept hierarchy representation, so that the aligning result of the two knowledge graph entities is obtained, and the method specifically comprises the following steps:

the distance function between two entities is defined as:

wherein,

l representing a vector ₁ /L ₂ Norm, applying interval ordering based loss function as optimization target O of entity alignment _E ：

Wherein, [.] ₊ Max {0, } denotes taking the maximum value between the input vector and 0, γ ₁ >0 is a spacing hyperparameter, e _i And e _j The vector representation of (a) is taken from the entity representation matrix H of the domain aggregation ^(L) The training negative sample S' is generated by nearest neighbor sampling of the set S by known pre-aligned equivalent entities between the two pre-fused knowledge-graphs.

obtaining two optimization targets of pre-fusion knowledge graph entity alignment according to the entity alignment optimization target, the entity-relation optimization target, the entity-concept relation optimization target and the concept-concept relation optimization target:

O＝α ₁ O _E +α ₂ O _R +α ₃ O _I +α ₄ O _S

wherein, O _E ,O _R ,O _I ,O _S Respectively corresponding to the optimization targets of entity alignment, relationship representation, instanceOf relationship and subbcloaseOf relationship, alpha ₁ ,α ₂ ,α ₃ ,α ₄ >0 is balance of eachThe weight parameters of the partial target.

The invention also provides a knowledge graph entity alignment device, comprising:

the system comprises a knowledge graph acquisition unit, a fusion unit and a fusion unit, wherein the knowledge graph acquisition unit is used for acquiring data of two knowledge graphs to be fused;

the entity representation acquisition unit is used for respectively carrying out neighborhood aggregation entity representation learning on the data of the two knowledge maps to obtain entity representations of all entities in the two knowledge maps;

the entity relationship expression acquisition unit is used for performing relationship expression learning for enhancing entity semantics according to the entity expression and modeling the relationship between the entities to obtain entity relationship expression;

the concept and concept hierarchy representation acquisition unit is used for carrying out concept and concept hierarchy representation learning according to the entity representation and modeling the relationship between the entity and the concept and the relationship between the concept and the concept to obtain concept and concept hierarchy representation;

and the entity alignment result acquisition unit is used for constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation and concept hierarchy representation to obtain the result of the alignment of the two knowledge graph entities.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for aligning knowledge-graph entities as described in any one of the above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps of the method for knowledge-graph entity alignment as described in any one of the above.

The method and the device for aligning the knowledge graph entities fuse concepts and the concept hierarchy into the entity alignment framework and play a role in constraining in the entity alignment process, so that entities belonging to the same concept are aligned, and the accuracy of entity alignment is improved.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a method for knowledge-graph entity alignment provided by the present invention;

FIG. 2 is a schematic diagram of the structure of the knowledge-graph entity alignment process provided by the present invention;

FIG. 3 is a schematic diagram of a knowledge-graph entity alignment apparatus provided by the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Although knowledge-graph entity alignment technology has made great progress with the continuous development of technology, an important structural information in knowledge-graph is ignored in the current method: concepts and concept hierarchies. The concept is abstract description of things with certain similar characteristics in the knowledge graph, the concept and concept, and the upper and lower relations between the entity and the concept form a concept hierarchy, and the knowledge graphs such as DBpedia and YAGO have the concept and concept hierarchy. Unlike equivalent entity linking, the concept and concept hierarchy can provide assistance for entity alignment from another level.

For the alignment required in the embodiment of the inventionTwo knowledge maps, formally denoted G ₁ ＝(E ₁ ,R ₁ ,T ₁ ) And G ₂ ＝(E ₂ ,R ₂ ,T ₂ ) In which E _i Represents a set of entities, R _i Represents a set of relationships, T _i A fact triplet that is composed of entities and relationships is represented (i.e.,<head entity, relationship, Tail entity>) The set, i ∈ {1,2} is the sequence number of the two knowledge-graphs, respectively. Given E _i An entity e of (2), whose set of neighbour entities is formally denoted N _e ＝{e′|(e,r,e′)∈T _i }∪{e′|(e′,r,e)∈T _i }. Formalizing a set of equivalent entity pairs between two knowledge graphs into

Wherein

Denotes e ₁ And e ₂ In reality, the entities have the same semantics, and are the same entity, that is, the equivalent entity pair is formed together. Formally representing a set of concepts to which an entity belongs

The conceptual hierarchy may be formally represented as L ═ L _instanceOf ∪L _subclassOf Wherein the set of dependencies between entities and concepts is

The upper and lower relation between concepts is formalized as

Wherein instanceOf represents that entity e is an entity of concept c, and subclassOf represents concept c ₁ Is a concept c ₂ A sub-concept of (1). Two knowledge graphs G ₁ And G ₂ Merging into a large knowledge graph G for processing, namely correspondingly formalizing the entity set, the relation set and the fact ternary set into E-E ₁ ∪E ₂ 、R＝R ₁ ∪R ₂ 、T＝T ₁ ∪T ₂ 。

As shown in fig. 1, an embodiment of the present invention provides a method for aligning knowledge-graph entities, including:

step 110: acquiring data of two knowledge maps to be fused;

specifically, the data of the two to-be-fused knowledge graphs are a head entity, a tail entity, and a relational triple set in the form of < head entity, relation, tail entity > of the relation between the two entities.

Step 120: performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representations of all entities in the two knowledge graphs;

specifically, the entity representation of the domain aggregation is learned, and the entity is encoded according to the graph structure of the knowledge graph, so that the entity vector representation embedded in the same space is obtained.

Step 130: performing relation representation learning for enhancing entity semantics according to the entity representation, and modeling the relation between the entities to obtain entity relation representation;

specifically, the relation expression learning of entity semantics is enhanced, the relation among the entities is modeled, the semantic information coded by the entities is enriched, and the entity discrimination is improved.

Step 140: carrying out concept and concept hierarchy system representation learning according to the entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy system representation;

specifically, the concept and concept hierarchy represents learning, the concept and concept hierarchy is modeled to naturally establish a connection with an entity, and the entity alignment task is assisted by constraining the entity representation.

Step 150: and constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation, concept and concept hierarchy representation to obtain the result of aligning the two knowledge graph entities.

In the embodiment of the present invention, the process corresponding to step 120 specifically includes:

for better aggregating the characteristics of entities, the equivalence information among entities is transferred in a Graph structure, and a multi-layer Attention-oriented Network (GAT) pair is applied for coding, and an entity e _i Vectors in the l +1 th layer network are represented as

The calculation method is as follows:

wherein,

is the weight of the l-th network, the entity initial vector

From the entity representation matrix

d is the dimension of the vector representation, σ () is a non-linear activation function, in particular ReLU () -max (0,) is chosen as the non-linear activation function,

is entity e in the l-th network _i And e _j Attention weight between;

in the embodiment of the invention, two different matrixes are used for respectively carrying out linear transformation on two end entities to calculate the attention coefficient

Wherein,

a parameter matrix for two different linear transformations, () ^T For matrix transpose operations, LeakyReLU (.) is a nonlinear activation function;

by combining entities e _i The attention coefficient of the adjacent entity is normalized to obtain the attention weight between the entities

H ^(L) The matrix of representations of entities aggregated for the neighborhood, and thus, a low-dimensional vector representation of the entities is obtained.

In step 120, the multi-layer graph attention encoder may better aggregate the features of the neighborhood entities via the graph attention neural network, such that the equivalence performance between entities is propagated throughout the graph.

In the embodiment of the present invention, the process corresponding to step 130 specifically includes:

the expression of the relationship is introduced by selecting a classical knowledge expression translation model TransE, the entity and the relationship are expressed to the same vector space, the relationship is equivalent to the vector translation from a head entity to a tail entity, and the TransE is that each relationship type three is that each relationship type triple (e) _h ,r,e _t ) E.g. T, calculating a rationality score:

optimization objective O of applying interval ordering-based loss function as knowledge representation translation model TransE _R ：

γ ₂ >And 0 is an interval hyper-parameter, and the training negative sample set T' is obtained by carrying out type negative sampling on the relational triple set T.

In step 130, the knowledge graph representation method TransE is used to introduce vector representation of the relationship between the entities and to bring constraints to the representation of the entities, enrich semantic information encoded by the entities and improve the discrimination of the entities.

In the embodiment of the present invention, the process corresponding to step 140 specifically includes:

due to the fact that the entity and the concept have hierarchical relations, the selected concept representation model has the capability of hierarchical representation. In embodiments of the present invention, a box-embedded representation model is proposed to model concepts and concept hierarchies.

Building a box-embedded representation model as a conceptual representation model, the box-embedded representation model representing a concept with a hyper-rectangle aligned with axes in space, the hyper-rectangle having an inner space, unlike points in space, whereby we can represent a concept as a hyper-rectangle, entities belonging to the concept are represented as points in the hyper-rectangle, and formally define concept c as a vector in real space

The area covered by concept c is:

is the center of the hyper-rectangle c,

{e _i ∈Box _c |e _i ∈E} (7)

wherein, c _max ＝Cen(c)+Off(c),c _min Cen (c) -off (c), from an external distance dist _outside (e, c) and an internal distance dist _inside(e,c) Measuring the distance between an entity and a concept in two aspects, wherein the outer distance represents the distance between the entity and the boundary of a hyper-rectangle in which the concept is positioned, and the inner distance represents the distance between the entity and the center of the rectangle in which the concept is positioned; and 0<β<1 is a hyper-parameter for balancing the proportion of two types of distances, specifically, a smaller beta is set, and the distance from the entity positioned in the hyper-rectangle to the center is reduced to be beta times of the range of the hyper-rectangle, thereby achieving the purpose of weakening the internal distance and paying more attention to the external distance, but still needing to pay more attention to the internal distanceMeasure the internal distance (i.e. β ≠ 0), since it is undesirable to have the range of any hyper-rectangle expand indefinitely, it is desirable to have the entity representation under a concept as close as possible to it, and the entity representation not belonging to the concept as far away from it, thus defining the optimization objective O of the instanceOf relationships between entities and concepts _I ：

γ ₃ >0 is a predefined interval hyperparameter, training the set of negative samples L' _instanceOf Is a set L of dependencies between entities and concepts _instanceOf Randomly and uniformly sampling to obtain;

in order to make the learned concept representations have the property of a hierarchical relationship, namely a subilasof relationship, each group of concept representations with an upper-lower relationship is constrained, so that the hyper-rectangle of the upper concept can contain the hyper-rectangle of the lower concept, which naturally accords with the characteristic that the concept at the upper level in the concept hierarchy has a larger description range than the concept at the lower level. Therefore, the concept of upper and lower bits is defined<c _i ,subclassOf,c _j >Distance function of (d):

Defining concepts and SubclassO between conceptsOptimization objective of f relation O _S ：

In step 140, the embodiment of the invention applies a box embedded representation model to model concepts and a concept hierarchy, and represents and learns the concepts to naturally establish connection with the entities and help entity alignment by constraining entity representation.

In the embodiment of the present invention, the process corresponding to step 150 specifically includes:

although the same graph neural network is applied to different knowledge graphs to encode the entities, the entities of different knowledge graphs are still located in different vector spaces, in order to represent the entities to the same vector space, the distance between each pair of equivalent entity vector representations is reduced by using a pre-aligned equivalent entity pair set S, so that the aim of aligning the entities of the two graphs is achieved, and a distance function between the two entities is defined as follows:

wherein,

Wherein, [.] ₊ Max {0, } denotes taking the maximum value between the input vector and 0, γ ₁ >0 is a spacing hyperparameter, e _i And e _j The vector representation of (a) is taken from the entity representation matrix H of the domain aggregation ^(L) The training negative sample S' is a known pre-pair between two pre-fused knowledge mapsThe uniform equivalent entities are generated by sampling the nearest neighbors of the set S.

In the embodiment of the present invention, the corresponding step 150 further includes: obtaining two optimization targets of pre-fusion knowledge graph entity alignment according to the entity alignment optimization target, the entity-relationship optimization target, the entity-concept relationship optimization target and the concept-concept relationship optimization target:

O＝α ₁ O _E +α ₂ O _R +α ₃ O _I +α ₄ O _S (14)

wherein, O _E ,O _R ,O _I ,O _S Respectively corresponding to the optimization targets of entity alignment, relationship representation, instanceOf relationship and subbcloaseOf relationship, alpha ₁ ,α ₂ ,α ₃ ,α ₄ >0 is a weight parameter to balance the targets of each part. In the embodiment of the invention, the number of layers of the graph neural network is set to be 3 (including an input layer), and the initialized entity vector representation matrix X, the relation vector representation matrix R and the concept vector representation matrix C are randomly initialized by adopting Xavier uniform distribution. This objective function is optimized to a minimum using the AdaGrad algorithm. Therefore, the entity alignment between the two knowledge graphs is completed, and therefore the fusion is realized.

The following describes a process and a result of a simulation experiment performed on the performance of the knowledge graph entity alignment method provided by the embodiment of the present invention. In the course of the following description, "C4 EA" is used as a marker for experiments carried out by the method of the example of the invention.

The specific experimental process is as follows:

1. introduction of data sets.

The method of the present embodiment was evaluated using a data set DBP15K that is disclosed and widely used in the art. Where the DBP15K contains three cross-language datasets constructed from different language versions of DBpedia, each dataset containing 15,000 pairs of equivalent entities. The relevant information of the data set is shown in table 1:

TABLE 1 correlation statistics of data sets

30% of the equivalent pairs were used for training in the experiment, and the remaining 70% were used for testing.

1. And (4) setting an experiment.

Consistent with the existing research work, Hits @ N and MRR are used to evaluate the experimental effect. Where Hits @ N represents the percentage of correct entities contained in the first N results of the alignment, and MRR (mean Recyclical rank) represents the average of the reciprocals of the correct entity ordering in all alignment results. The comparison method comprises the following steps: MTransE model, JAPE model, AlignEA model, GCN-Align model, KECG model, MuGNN model and AliNet model; and a comparative model of itself: c4EA (w/o Box). An ablation model C4EA (w/o Box), C4EA with the module for learning concept and concept hierarchy representation removed, was isolated from C4EA to explore the impact of adding concept and concept hierarchy on entity alignment. In the aspect of model parameters, the learning rate lambda of the AdaGrad algorithm is selected from {0.001, 0.005, 0.01, 0.05 }; selecting a representation dimension d of the vector from {100, 150, 200 }; selecting interval hyperparameter gamma from {1.0, 2.0, 3.0, 5.0} ₁ ,γ ₂ ,γ ₃ (ii) a Selecting balance hyper parameter alpha from {0.2, 0.4, 0.6, 0.8, 1.0 }) ₁ ,α ₂ ,α ₃ ,α ₄ . For the entity representation learning of neighborhood aggregation, the sampling number of a negative case corresponding to a positive case is 25; for the relation representation learning and the concept and concept hierarchy representation learning of the enhanced entity semantics, the sampling number of the negative examples corresponding to one positive example is 2. Through experiments, the optimal parameter combination of the knowledge graph entity alignment model of the embodiment of the invention is found out: λ 0.005, d 200, γ ₁ ＝3.0，γ ₂ ＝3.0，γ ₃ ＝1.0，β＝0.02，α ₁ ＝1.0，α ₂ ＝0.8，α ₃ ＝0.6，α ₄ 0.6, the distance measure in the method is selected from L ₂ And (4) norm.

3. Experimental results and analysis.

By adopting the data sets and the experimental settings, the performance of entity alignment through the knowledge graph entity alignment model disclosed by the invention is tested on each data set and compared with the mainstream method. As shown in table 2, the evaluation results are the entity alignments. On each data set, C4EA is obviously superior to the comparative method under 3 evaluation indexes, and the accuracy and the stability of the device disclosed by the invention are proved.

TABLE 2 knowledge-graph entity alignment results (Hits @ N units%)

As can be seen from the results in table 2, the C4EA effect is superior to the comparative prior art method. C4EA combining learning of relationship representation and learning of concept and concept hierarchy representation of enhanced entity semantics can make full use of structural information in knowledge graph, make entity representation rich in more semantics, and facilitate distinguishing confusing entities when aligning, such as DBP15K _FR-EN The experimental results on the subdata set show that Hits @1 of C4EA is improved by 0.026 compared to AliNet and by 0.334 compared to the earliest entity alignment method MtransE. C4EA utilizes multilayer GAT to carry out entity representation learning of neighborhood aggregation, weakens the influence of local entities difficult to align on global alignment, achieves the effect of relieving the influence of map heterogeneity on entity alignment, and is applied to DBP15K _JA-EN Hits @10 of C4EA on the sub data set reached 0.892, which is much higher than 0.745 achieved with GCN-Align, which also uses graph neural networks but fails to overcome graph structure differences.

To explore the impact of increasing concept-to-concept hierarchy on entity alignment, an ablation model C4EA (w/o Box) was isolated from C4EA, and the experimental results listed in table 2 reflect the positive impact of concept-to-concept hierarchy representation learning on entity alignment. At DBP15K _FR-EN On the subdata set, the Hits @1 of C4EA was raised from 0.550 to 0.578, the MRR was raised from 0.664 to 0.671, and the rise in MRR well reflects the positive contribution of the joint concept and concept hierarchy as a whole to entity alignment.

As shown in fig. 2, a schematic process diagram of performing an alignment task on each entity in two to-be-fused knowledge graphs according to the entity alignment method provided by the embodiment of the present invention is shown. The method for aligning knowledge graph entities based on the constraint of the concept and the concept hierarchy system starts from two knowledge graphs, and utilizes a multilayer graph attention neural network under the setting of shared parameters as a coder to carry out entity coding to obtain entity vector representation of domain fusion; the classical knowledge representation translation model TransE is selected to introduce the relation of the knowledge map, the relation between the entities is effectively modeled, the semantic information coded by the entities is enriched, and the discrimination of entity representation is improved, so that the aim of enhancing the learning of the relation representation of entity semantics is fulfilled; the method comprises the steps of applying a box embedded representation model to model concepts and concept hierarchies of a knowledge graph, and performing representation learning on the concepts to establish direct connection between the concepts and representations of entities, so that final representations of the entities are influenced and constrained, and finally entity alignment tasks are completed. The method effectively overcomes the defect that the prior method cannot fully utilize the important structural information of the knowledge graph concept and the concept hierarchy, and more efficiently realizes the aim of aligning the cross-language knowledge graph entities.

The knowledge-graph entity alignment apparatus provided in the embodiment of the present invention is described below, and the knowledge-graph entity alignment apparatus described below and the knowledge-graph entity alignment method described above may be referred to with reference to each other, as shown in fig. 3, and the embodiment of the present invention provides a knowledge-graph entity alignment apparatus, including:

a knowledge graph obtaining unit 310, configured to obtain data of two knowledge graphs to be fused;

an entity representation obtaining unit 320, configured to perform neighborhood aggregation entity representation learning on data of the two knowledge maps respectively to obtain entity representations of entities in the two knowledge maps;

an entity relationship representation obtaining unit 330, configured to perform relationship representation learning for enhancing entity semantics according to the entity representation, and model a relationship between entities to obtain an entity relationship representation;

a concept and concept hierarchy representation obtaining unit 340, configured to perform concept and concept hierarchy representation learning according to the entity representation, and model relationships between entities and concepts, and between concepts and concepts to obtain a concept and concept hierarchy representation;

and an entity alignment result obtaining unit 350, configured to perform constraint on entity representation in an entity alignment process based on vector distance through entity relationship representation and concept-concept hierarchy representation, so as to obtain a result of aligning two knowledge graph entities.

An entity structure schematic diagram of an electronic device provided in an embodiment of the present invention is described below with reference to fig. 4, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface (Communications Interface)420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are in communication with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a knowledge-graph entity alignment method comprising: acquiring data of two knowledge graphs to be fused; performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representations of all entities in the two knowledge graphs; performing relation representation learning for enhancing entity semantics according to the entity representation, and modeling the relation between the entities to obtain entity relation representation; carrying out concept and concept hierarchy system representation learning according to the entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy system representation; and constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation, concept and concept hierarchy representation to obtain the result of aligning the two knowledge graph entities.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially or partially contributed to by the prior art, or may be embodied in a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for aligning knowledge-graph entities provided by the above methods, where the method includes: acquiring data of two knowledge maps to be fused; performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representations of all entities in the two knowledge graphs; performing relation representation learning for enhancing entity semantics according to the entity representation, and modeling the relation between the entities to obtain entity relation representation; carrying out concept and concept hierarchy representation learning according to the entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy representation; and constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation, concept and concept hierarchy representation to obtain the result of aligning the two knowledge graph entities.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to execute the method for aligning knowledge-graph entities provided in the foregoing: acquiring data of two knowledge maps to be fused; performing neighborhood aggregation entity representation learning on the data of the two knowledge graphs respectively to obtain entity representations of all entities in the two knowledge graphs; performing relation representation learning for enhancing entity semantics according to the entity representation, and modeling the relation between the entities to obtain entity relation representation; carrying out concept and concept hierarchy system representation learning according to the entity representation, and modeling the relationship between entities and concepts and between concepts to obtain concept and concept hierarchy system representation; and constraining the entity representation in the entity alignment process based on the vector distance through entity relationship representation, concept and concept hierarchy representation to obtain two results of knowledge graph entity alignment.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for knowledge-graph entity alignment, comprising:

acquiring data of two knowledge maps to be fused;

2. The method of knowledge-graph entity alignment of claim 1 wherein the data of two knowledge-graphs are a head entity, a tail entity and a set of relational triples of the relationship between the two entities; the learning of the entity representation of neighborhood aggregation is performed on the data of the two knowledge graphs respectively to obtain the entity representation of each entity in the two knowledge graphs, and the learning specifically comprises the following steps:

encoding each of two of said knowledge-maps using an attention-seeking neural network, entity e _i The vector in the l +1 th layer network is represented as

The calculation method is as follows:

wherein,

is the weight of the l-th network, the entity initial vector

From the entity representation matrix

is entity e in the l-th network _i And e _j Attention weight between;

calculating attention coefficient by using two matrixes to perform linear transformation on head entity and tail entity respectively

Wherein,

for the two parameter matrices of the linear transformation,

for matrix transpose operations, LeakyReLU (.) is a nonlinear activation function;

by combining entities e _i To define attention coefficients of its neighbour entitiesNormalizing to obtain inter-entity attention weight

H ^(L) The matrix is represented for the neighborhood aggregated entities.

3. The method of knowledge-graph entity alignment of claim 2, wherein performing relationship representation learning for enhancing entity semantics according to the entity representations, modeling relationships between entities to obtain entity relationship representations, specifically comprises:

4. The method for knowledge graph entity alignment according to claim 3, wherein the learning of concept and concept hierarchy representation according to the entity representation and modeling of entity-to-concept and concept-to-concept relationships to obtain concept-to-concept hierarchy representation specifically comprises:

establishing a box embedding representation model as concept and concept hierarchy representation, wherein the box embedding representation model represents a concept by a hyper-rectangle with aligned axes in space, the hyper-rectangle is provided with an internal space, and a concept c is formally defined as a vector in a real number space

The area covered by concept c is:

is the center of the hyper-rectangle c,

is the range offset of the hyper-rectangle c; when each element in off (c) is 0, Box _c Degeneration ofFor a d-dimensional vector, i.e. a point in the d-dimensional space, and the same representation as the entity, therefore, the relationship between the entity and the concept is described by using the point and the hyper-rectangle in the space, the entity vector belonging to the concept c can be represented as:

{e _i ∈Box _c |e _i ∈E}

wherein, c _max ＝Cen(c)+Off(c),c _min Cen (c) -off (c), from an external distance dist _outside (e, c) and an internal distance dist _inside(e,c) Measuring the distance between an entity and a concept in two aspects, wherein the external distance represents the distance from the entity to the boundary of a hyper-rectangle in which the concept is positioned, and the internal distance represents the distance from the entity to the center of the rectangle in which the concept is positioned; and 0<β<1 is a hyper-parameter for balancing the specific gravity of two types of distances;

γ ₃ >0 is a predefined interval hyperparameter, training the negative sample set L' _instanceOf Is a set L of dependencies between entities and concepts _instanceOf And randomly and uniformly sampling.

5. The method for knowledge graph entity alignment according to claim 4, wherein the learning of concept and concept hierarchy representation according to the entity representation and modeling of entity-to-concept and concept-to-concept relationships to obtain concept-to-concept hierarchy representation specifically comprises:

wherein, c _y.max ＝Cen(c _y )+Off(c _y ),c _y.min ＝Cen(c _y )-Off(c _y ) Y ∈ { i, j }, if concept c _i Is completely covered with concept c _j The hyper-rectangles contain, the conceptual distance f between them _box (c _i ,c _j )＝0；

Optimization objective O for defining concepts and SubclassOf relationships between concepts _S ：

6. The method of aligning knowledge-graph entities according to claim 5, wherein the constraining the entity representation in the entity aligning process based on the vector distance through the entity relationship representation, the concept and the concept hierarchy representation to obtain the aligning result of the two knowledge-graph entities specifically comprises:

the distance function between two entities is defined as:

wherein,

l representing a vector ₁ /L ₂ Norm, applying interval-ordering based loss function as optimization objective O of entity alignment _E ：

7. The method of aligning knowledge-graph entities according to claim 6, wherein the constraining the entity representation in the entity aligning process based on the vector distance through the entity relationship representation, the concept and the concept hierarchy representation to obtain the aligning result of the two knowledge-graph entities specifically comprises:

obtaining two optimization targets of pre-fusion knowledge graph entity alignment according to the entity alignment optimization target, the entity-relationship optimization target, the entity-concept relationship optimization target and the concept-concept relationship optimization target:

O＝α ₁ O _E +α ₂ O _R +α ₃ O _I +α ₄ O _S

wherein, O _E ,O _R ,O _I ,O _S Respectively corresponding to the optimization targets of entity alignment, relationship representation, instanceOf relationship and subbcloaseOf relationship, alpha ₁ ,α ₂ ,α ₃ ,α ₄ >0 is a weight parameter for balancing each partial target.

8. A knowledge-graph entity alignment apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of knowledge-graph entity alignment of any of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for knowledge-graph entity alignment of any of claims 1 to 7.