CN113111657B - Cross-language knowledge graph alignment and fusion method, device and storage medium - Google Patents

Cross-language knowledge graph alignment and fusion method, device and storage medium Download PDF

Info

Publication number
CN113111657B
CN113111657B CN202110241500.4A CN202110241500A CN113111657B CN 113111657 B CN113111657 B CN 113111657B CN 202110241500 A CN202110241500 A CN 202110241500A CN 113111657 B CN113111657 B CN 113111657B
Authority
CN
China
Prior art keywords
graph
sub
knowledge
entity
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110241500.4A
Other languages
Chinese (zh)
Other versions
CN113111657A (en
Inventor
俞山青
张建林
甘燃
童天航
傅晨波
宣琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110241500.4A priority Critical patent/CN113111657B/en
Publication of CN113111657A publication Critical patent/CN113111657A/en
Application granted granted Critical
Publication of CN113111657B publication Critical patent/CN113111657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-language knowledge graph alignment and fusion method, a device and a storage medium, wherein the method comprises the following steps: s1, constructing a first-order sub-graph feature matrix of an entity in a knowledge graph network of two different languages; s2, inputting the structural feature matrix, the attribute feature matrix and the first-order sub-image feature matrix of the entity into an alignment model, obtaining three embedded vector matrixes of the structural features, the attribute features and the sub-image features of all the entities in the knowledge graph network of two different languages, and splicing the three embedded vector matrixes; s3, calculating the similarity of the embedded vectors between the entity to be aligned and each entity in the target knowledge graph network; s4, ordering the entities in the target knowledge graph network according to the similarity, and obtaining candidate equivalent entities of the entities to be aligned; s5, fusing the knowledge graph networks of the two different languages according to the candidate equivalent entities. The method can effectively realize the alignment and fusion of the cross-language knowledge graph.

Description

Cross-language knowledge graph alignment and fusion method, device and storage medium
Technical Field
The present invention relates to the field of alignment and fusion technologies of cross-language knowledge graphs, and in particular, to a method, an apparatus, and a storage medium for alignment and fusion of cross-language knowledge graphs.
Background
Knowledge Graphs (KG) aimed at organizing human Knowledge in a structured form are playing an increasingly important role as an infrastructure in the fields of artificial intelligence and natural language processing. A knowledge graph is a collection of knowledge facts, typically represented using triples (head entity, relationship, tail entity).
Currently, most of knowledge maps are built based on a single language data source, except for very few global-built knowledge maps such as Google knowledge maps. The knowledge graph described by the single language is often only used for serving users of the single language, and brings a huge gap to the fusion of global knowledge. Under the background of big data age, how to align and fuse the cross-language knowledge graphs so as to realize the global sharing of knowledge. The multi-language knowledge graph fusion is a problem which needs to be solved in the process of further development of the knowledge graph.
The research of the cross-language knowledge graph is still in a starting stage, and the task of aligning and fusing the constructed knowledge graphs in different languages is to be completed. Knowledge bases used in the construction process of knowledge maps of different languages are uncertain, and the knowledge bases of different languages may have larger differences in coverage, granularity, description of the same knowledge and the like due to different language environments and knowledge backgrounds possessed by users of different languages.
The alignment and fusion technology of the cross-language knowledge patterns is beneficial to linking and fusing the knowledge patterns with personalized knowledge of multiple nations and nations worldwide, and realizes barrier-free cross-language information retrieval, natural language processing and the like. For example, in the aspect of medical treatment, the knowledge graph of the domestic traditional Chinese medicine is aligned and fused with the knowledge graph of the foreign western medicine, so that the knowledge graph of the combination of the traditional Chinese medicine and the western medicine can be constructed, and more comprehensive and effective medical knowledge can be provided for doctors and patients. In the aspect of a search engine, through using the aligned and fused multilingual knowledge graphs, a user can alternately use multiple languages to acquire knowledge which is acquired through a single language from the knowledge graphs without barriers, and can acquire knowledge with a wider range and multiple language versions than the prior art.
The knowledge graph also contains rich network structures, and entities in the knowledge graph can be regarded as nodes in the network, and the relationships can be regarded as edges in the network. Subgraphs are the fundamental constituent elements in a network, so studying the substructure of a network is an effective way to analyze a network. Recently, graph embedding algorithms such as Word2vec and deep are widely applied to tasks such as node classification. But the embedded vectors obtained by such models only contain local structure information around the nodes, and global structure information of the whole network is ignored. The construction of the sub-graph network can supplement the structural characteristics of the original network so as to better perform downstream tasks such as node classification, network classification and the like.
Disclosure of Invention
The invention aims to provide a cross-language knowledge graph alignment and fusion method, device and storage medium, which are used for solving the technical problem of insufficient embedded vector expression capability caused by using only structural information in a cross-language knowledge graph entity alignment method in the prior art and can effectively realize the alignment and fusion of the cross-language knowledge graph.
In order to achieve the above object, the present invention provides the following solutions: the invention discloses a cross-language knowledge graph alignment and fusion method, which comprises the following steps:
Step S1, acquiring two knowledge graph networks with different languages, respectively establishing a first-order sub-graph network according to the two knowledge graph networks with different languages, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages;
S2, constructing an alignment model based on a graph roll-up neural network GCN, respectively acquiring structural feature matrixes and attribute feature matrixes of entities in two knowledge graph networks with different languages, inputting the structural feature matrixes, the attribute feature matrixes and the first-order sub-image feature matrixes of the entities into the trained alignment model, acquiring structural feature embedded vector matrixes, attribute feature embedded vector matrixes and sub-image feature embedded vector matrixes of all the entities in the knowledge graph networks with different languages, and splicing the three embedded vector matrixes;
Respectively taking two knowledge graph networks with different languages as a knowledge graph network to be aligned and a target knowledge graph network, selecting an entity to be aligned from the knowledge graph networks to be aligned, and calculating the similarity of embedded vectors between the entity to be aligned and each entity in the target knowledge graph network according to a scoring function;
S4, sorting the entities in the target knowledge-graph network according to the similarity of the embedded vectors of each entity and the entity to be aligned in the target knowledge-graph network, and obtaining candidate equivalent entities of the entity to be aligned;
and S5, fusing the knowledge graph networks of the two different languages according to the candidate equivalent entities.
Preferably, in the step S1, the extracting of the first-order sub-graph feature specifically includes:
s1.1, detecting a subgraph from a network structure of a knowledge graph;
S1.2, constructing a sub-graph network based on the detected sub-graph;
and step S1.3, encoding the nodes established in the sub-graph network based on a one-hot encoding method to obtain first-order sub-graph characteristics of the entity.
Preferably, in the step S1.1, the subgraph detected from the network structure of the original knowledge graph is a line.
Preferably, in the step S1.2, the method for constructing the sub-graph network based on the detected sub-graph includes: traversing all sub-graphs, judging whether the two sub-graphs share the same node or link in the knowledge graph network structure, if so, creating a connecting edge between the two sub-graphs to obtain a sub-graph network.
Preferably, in the step S1.3, the method for encoding the node established in the sub-graph network is as follows: judging whether the entity in the knowledge graph network belongs to a certain sub-graph in the node set of the sub-graph network, if yes, marking the coding position of the corresponding sub-graph as 1, and if not, marking the coding position of the corresponding sub-graph as 0.
Preferably, in the step S2, the method for obtaining the splicing result of the three embedded vector matrices of the structural feature, the attribute feature, and the sub-graph feature of the entity is shown in formula 1:
in the method, in the process of the invention, Respectively embedding a structural feature embedded vector matrix, an attribute feature embedded vector matrix and a sub-graph feature embedded vector matrix of the knowledge graph network; p is an adjacency matrix with the size of n multiplied by n, and n is the number of entities in the knowledge-graph network; /(I)I is an identity matrix; /(I)Is/>Is a diagonal node degree matrix of (a); h s(l)、Ha(l)、Hsgn (l) is the structural feature matrix, the attribute feature matrix, and the first-order sub-graph feature matrix, respectively, of the entity input to the first layer of the pair Ji Moxing; w s(l)、Wa(l)、Wsgn (l) is the weight matrix of the structural feature, the attribute feature and the sub-image feature of the entity in the first layer of the pair Ji Moxing respectively; [ (r) ]; and represents a concatenation of two matrices; sigma is a nonlinear activation function.
Preferably, in the step S3, the similarity of the embedded vectors between different entities is calculated according to a scoring function, as shown in equation 5:
Wherein D (e i,ej) represents the similarity between entities e i、ej; d s、da、dsgn is the dimension of three embedded vectors of the structural feature, the attribute feature and the sub-graph feature respectively; alpha, beta and gamma are all super parameters, alpha+beta+gamma=1.
Preferably, in the step S5, the method for fusing the knowledge graph networks of two different languages according to the candidate equivalent entity includes:
And combining the entities and the relations in the knowledge-graph of two different languages according to the equivalent entities of the unaligned entities in the target knowledge-graph network to realize the fusion of the knowledge-graph networks of the two different languages.
The invention also provides a cross-language knowledge graph alignment and fusion device, which comprises a sub-graph feature extraction module, an entity alignment module and a knowledge graph fusion module which are connected in sequence;
The sub-graph feature extraction module is used for respectively converting two knowledge graph networks with different languages into a first-order sub-graph network, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages;
The entity alignment module is used for constructing an alignment model based on GCN, respectively inputting a structural feature matrix, an attribute feature matrix and a first-order sub-image feature matrix of the entities in the knowledge graph networks of two different languages into the trained alignment model to obtain embedded vector representations of the entities in the knowledge graph networks of the two different languages, calculating the similarity of the embedded vectors between the entity to be aligned and all the entities in the target knowledge graph network, and obtaining candidate equivalent entities of the entity to be aligned;
The knowledge spectrum fusion module is used for fusing the knowledge spectrum networks of two different languages according to the candidate equivalent entities.
The invention also provides a storage medium for storing a program, wherein the program is used for realizing the cross-language knowledge graph alignment and fusion method.
The invention discloses the following technical effects:
The invention discloses a cross-language knowledge graph alignment and fusion method, a device and a storage medium, wherein the method comprises the following steps: converting the original knowledge graph into a first-order sub-graph network and extracting first-order sub-graph features of the entity; the structural features and the attribute features of the original knowledge graph are combined to be used as the input of the graph convolution neural network; selecting proper weights to splice three embedded vectors; calculating the similarity between the entity embedded vectors according to the scoring function; sorting the entities in the map according to the similarity to obtain candidate equivalent entities; and fusing the knowledge maps according to the newly discovered equivalent entity. Because the structures of the cross-language knowledge graph are very similar, the sub-graph features of the knowledge graph entities can be utilized to effectively enhance the structural features of the original graph, improve the representation capability of entity vectors, and construct a new cross-language knowledge graph entity alignment model, so that knowledge graph fusion can be effectively carried out through the new model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a cross-language knowledge graph alignment and fusion method of the present invention;
FIG. 2 is a flowchart of a method for extracting first-order sub-graph features according to an embodiment of the present invention;
Fig. 3 is a schematic structural diagram of the cross-language knowledge graph alignment and fusion device of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1-2, the present embodiment provides a cross-language knowledge graph alignment and fusion method, which includes the following steps:
Step S1, acquiring two knowledge graph networks with different languages, respectively establishing a first-order sub-graph network according to the two knowledge graph networks with different languages, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages;
The knowledge graph network is a complex network, the entity corresponds to a node of the knowledge graph network, and the relation corresponds to the continuous edge of the knowledge graph network. The extraction of the first-order sub-graph features specifically comprises the following steps:
s1.1, detecting subgraphs from a knowledge graph network, and selecting the most basic subgraphs (namely lines) as subgraphs;
S1.2, constructing a sub-graph network based on the detected sub-graph;
The method comprises the following steps: after enough subgraphs are extracted from the knowledge graph network, constructing a subgraph network among the selected subgraphs according to the following rules; traversing all sub-graphs, judging whether the two sub-graphs share the same node or link in the original knowledge graph network structure, if so, creating a connecting edge between the two sub-graphs to obtain a sub-graph network, wherein the node set of the sub-graph network is G= { G 1,g2,...,gm }.
S1.3, encoding nodes established in a sub-graph network based on a one-hot (one-bit effective) encoding method to obtain first-order sub-graph features of an entity; the method comprises the following steps: the newly established node in the sub-graph network is regarded as a coding object of one-hot (one-bit effective) coding; judging whether the entity in the original knowledge graph network belongs to a certain sub-graph in a node set G= { G 1,g2,...,gm } of the sub-graph network, if yes, marking the coding position of the corresponding sub-graph as 1, otherwise, marking the coding position of the corresponding sub-graph as 0, and finishing the extraction of the first-order sub-graph characteristics.
S2, constructing an alignment model based on GCN (Graph Convolutional Network, graph convolution neural network), training the alignment model, respectively acquiring structural feature matrixes and attribute feature matrixes of entities in knowledge graph networks of two different languages, inputting the structural feature matrixes and the attribute feature matrixes of the entities and the first-order sub-graph feature matrixes extracted in the step S1 into the trained alignment model, acquiring structural feature embedded vector matrixes, attribute feature embedded vector matrixes and sub-graph feature embedded vector matrixes of all the entities in the knowledge graph networks of the two different languages, and selecting proper weights to splice the three embedded vector matrixes;
the training method of the alignment model comprises the following steps:
The training method comprises the steps of obtaining pre-aligned entity pairs from knowledge graph networks of two different languages to form a sample set, and completing training of an alignment model through the sample set pairs, wherein the training of the alignment model is completed by respectively constructing a structural feature embedded vector, an attribute feature embedded vector and an objective function L s、La、Lsgn,Ls、La、Lsgn of a sub-image feature embedded vector of an entity, wherein the structural feature embedded vector, the attribute feature embedded vector and the objective function L s、La、Lsgn,Ls、La、Lsgn are mutually independent, respectively optimizing the structural feature embedded vector, the attribute feature embedded vector and the objective function L s、La、Lsgn,Ls、La、Lsgn by using random gradient descent, and completing training of the alignment model so that the embedded vectors of equivalent entities are as close as possible in a vector space, and the embedded vectors of non-equivalent entities are as far away as possible in the vector space.
The structural feature embedding vector, the attribute feature embedding vector, and the objective function L s、La、Lsgn of the sub-graph feature embedding vector of the entity are as shown in equations (2) to (4):
Wherein [ (+ ] is an extremum taking function, [ x ] +=max{0,x};f(x,y)=||x-y||1; s is a sample set formed by aligned entity pairs; s' is a negative sample set, wherein the negative sample set is obtained by randomly replacing one entity in a prealigned entity pair (e, v), wherein e and v are entities in two knowledge-graph networks with different languages respectively, and the replaced entities are randomly selected from the two knowledge-graph networks; h s(e)、ha(e)、hsgn (e) is the embedded vector of the structural feature, the attribute feature and the sub-graph feature of the entity e respectively; h s(v)、ha(v)、hsgn (v) is the embedded vector of the structural feature, the attribute feature and the sub-graph feature of the entity v respectively; gamma s、γa、γsgn is greater than 0, which is a super parameter used to control the alignment degree of positive and negative alignment entity pairs.
The acquisition method of the attribute feature matrix comprises the following steps: and selecting the attribute with the frequency of occurrence of the first 2000 in all attribute triples for coding.
The method for obtaining the splicing results of the three embedded vector matrixes of the structural features, the attribute features and the sub-image features of the entity is shown in the formula (1):
in the method, in the process of the invention, Respectively embedding a structural feature embedded vector matrix, an attribute feature embedded vector matrix and a sub-graph feature embedded vector matrix of the knowledge graph network; p is an adjacency matrix with the size of n multiplied by n, and n is the number of entities in the knowledge-graph network; /(I)I is an identity matrix; /(I)Is/>Is a diagonal node degree matrix of (a); h s(l)、Ha(l)、Hsgn (l) is the structural feature matrix, the attribute feature matrix, and the first-order sub-graph feature matrix, respectively, of the entity input to the first layer of the pair Ji Moxing; w s(l)、Wa(l)、Wsgn (l) is the weight matrix of the structural feature, the attribute feature and the sub-image feature of the entity in the first layer of the pair Ji Moxing respectively; [ (r) ]; and represents a concatenation of two matrices; σ is a nonlinear activation function, similar to RELU.
Step S3, respectively taking two knowledge graph networks with different languages as a knowledge graph network to be aligned and a target knowledge graph network, selecting an entity to be aligned from the knowledge graph network to be aligned, traversing embedded vectors of all the entities in the target knowledge graph network, and calculating the similarity of the embedded vectors between the entity to be aligned and each entity in the target knowledge graph network according to a scoring function;
The method specifically comprises the following steps:
S3.1, respectively taking two knowledge graph networks with different languages as a knowledge graph network to be aligned and a target knowledge graph network, selecting entities to be aligned from the knowledge graph network to be aligned, and traversing embedded vectors of all entities in the target knowledge graph network;
step S3.2, calculating the similarity of the embedded vectors between the entity to be aligned and all the entities in the target knowledge graph network according to the scoring function, wherein the similarity is shown in a formula (5):
wherein D (e i,ej) represents the similarity between the entities e i、ej, and i E [1, n 1],j∈[1,n2],n1、n2 ] are the number of the entities in the knowledge-graph network to be aligned and the target knowledge-graph network respectively; f (x, y) = |x-y|| 1;ds、da、dsgn are structural features attribute characteristics of three embedded vector dimensions of the subgraph feature; alpha, beta and gamma are super parameters for balancing three embedded vectors of structural features, attribute features and sub-graph features, and alpha+beta+gamma=1.
S4, sorting the entities in the target knowledge-graph network according to the similarity of the embedded vectors of each entity in the target knowledge-graph network and the entity e i to be aligned to obtain candidate equivalent entities of the entity to be aligned;
s5, fusing the knowledge graph networks of two different languages according to the candidate equivalent entities;
The method comprises the following steps: and combining the entities and the relations in the knowledge-graph of two different languages according to the equivalent entities of the unaligned entity in the target knowledge-graph network to realize the fusion of the knowledge-graph networks of the two different languages for the unaligned entity (i.e. the entity not in the prealigned entity pair) in the knowledge-graph network to be aligned.
Referring to fig. 3, the present embodiment further provides a cross-language knowledge graph alignment and fusion device, which specifically includes: the system comprises a sub-graph feature extraction module, an entity alignment module and a knowledge graph fusion module which are connected in sequence;
The sub-graph feature extraction module is used for respectively converting two knowledge graph networks with different languages into a first-order sub-graph network, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages; the entity alignment module builds an alignment model based on GCN, trains the alignment model through pre-alignment entity alignment, respectively inputs a structural feature matrix, an attribute feature matrix and a first-order sub-image feature matrix of entities in two knowledge-graph networks with different languages into the trained alignment model to obtain embedded vector representations of the entities in the knowledge-graph networks with different languages, calculates similarity of the embedded vectors between the entity to be aligned and all the entities in the target knowledge-graph network, and obtains candidate equivalent entities of the entity to be aligned;
the knowledge spectrum fusion module is used for fusing knowledge spectrum networks of two different languages according to the candidate equivalent entities; the method comprises the following steps: and combining the relationship triples and the attribute triples of the entity to be aligned and the candidate equivalent entity to finish the fusion of the cross-language knowledge graph network.
The present embodiment also provides a storage medium for storing a program, which when executed by the cross-language knowledge graph alignment and fusion device, implements the steps of the cross-language knowledge graph alignment and fusion method.
In the embodiment, the structural features of the original atlas are expanded by inputting the sub-image features of the entity into the cross-language knowledge atlas alignment model, and the representation capability of the entity vector is improved, so that the alignment and fusion of the knowledge atlas entity can be completed better.
The technical conception of the invention is as follows: knowledge maps and a set of known alignment entity pairs for two different languages for a given KG 1 and KG 2 The GCN model is used for carrying out feature coding on the entities in the atlas, and the entities from different languages are embedded into a unified vector space. The sub-graph feature extraction part extracts a sub-graph network in the original graph by using a first-order sub-graph network method, and then performs sub-graph feature coding on each entity. After training, the distances between the equivalent entities are as close as possible, and finally the candidate entities are ranked through a predefined distance function to find the corresponding equivalent entity of each entity. And finally fusing knowledge maps of two different languages according to the newly discovered equivalent entity pairs.
In this embodiment, taking a medical knowledge graph as an example, the cross-language knowledge graph alignment and fusion method of the present invention is explained:
Knowledge maps store knowledge in the real world in the form of triples, which in this embodiment are divided into two categories, relationship triples and attribute triples. For example, in chinese medical knowledge map CMeKG, (hypoglycemia, medication, hydrocortisone) is a relational triplet in the format of (entity 1, relationship, entity 2); (diabetes, department of science, internal medicine) is an attribute triplet in the form of (entity, attribute value). Formally, the knowledge graph is represented as kg= { E, R, a, T R,TA }, where E, R, a represent sets of entities, relationships, and attributes, respectively, T R E e×r×e represent sets of relationship triples, T A E e×a×v represent sets of attribute triples, and V represents a set of attribute values.
The entity alignment tasks are described below forAnd/>Knowledge maps of two different languages, the embodiment defines the task of cross-language knowledge map alignment as a collection/>, through existing known entity pairsTo find a new alignment entity pair in KG and then to find a new alignment entity pair according to a distance function.
The alignment and fusion technology of the cross-language knowledge patterns is beneficial to linking and fusing the knowledge patterns with personalized knowledge of multiple nations and nations worldwide, and realizes barrier-free cross-language information retrieval, natural language processing and the like. Taking the medical knowledge graph as an example, the Chinese medical science case knowledge graph provided by the Chinese medical information research institute of the national academy of Chinese medical science and the English antibiotic medicine knowledge graph IASO are aligned and fused, so that the knowledge graph combining the Chinese and Western medicine can be constructed, and more comprehensive and effective medical knowledge can be provided for doctors and patients.
The knowledge graph of the traditional Chinese medical case extracts clinical knowledge from the medical case to construct the knowledge graph, so that a user can know the characteristic therapy of the traditional Chinese medical science, and the clinical manifestation, the related therapy, the related health care method and the like of diseases (such as chronic gastritis). The English antibiotic medicine medical knowledge map IASO is an English medicine medical knowledge map developed in a man-machine combination mode based on large-scale medical text data by utilizing natural language processing and text mining technology. It covers 507 infectious diseases and their treatment methods, 332 different infection sites, 936 systematic related symptoms, 371 complications and other knowledge. In the medical knowledge graph, disease names, therapeutic drugs and symptom names are the most basic entities, and most of the knowledge can be fused only by aligning the three types of entities. The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims (9)

1. The cross-language knowledge graph alignment and fusion method is characterized by comprising the following steps of:
Step S1, acquiring two knowledge graph networks with different languages, respectively establishing a first-order sub-graph network according to the two knowledge graph networks with different languages, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages;
S2, constructing an alignment model based on a graph roll-up neural network GCN, respectively acquiring structural feature matrixes and attribute feature matrixes of entities in two knowledge graph networks with different languages, inputting the structural feature matrixes, the attribute feature matrixes and the first-order sub-image feature matrixes of the entities into the trained alignment model, acquiring structural feature embedded vector matrixes, attribute feature embedded vector matrixes and sub-image feature embedded vector matrixes of all the entities in the knowledge graph networks with different languages, and splicing the three embedded vector matrixes;
the method for acquiring the splicing results of the three embedded vector matrixes of the structural feature, the attribute feature and the sub-image feature of the entity is shown in the formula 1:
in the method, in the process of the invention, Respectively embedding a structural feature embedded vector matrix, an attribute feature embedded vector matrix and a sub-graph feature embedded vector matrix of the knowledge graph network; p is an adjacency matrix with the size of n multiplied by n, and n is the number of entities in the knowledge-graph network; /(I)I is an identity matrix; /(I)Is/>Is a diagonal node degree matrix of (a); h s(l)、Ha(l)、Hsgn (l) is the structural feature matrix, the attribute feature matrix, and the first-order sub-graph feature matrix, respectively, of the entity input to the first layer of the pair Ji Moxing; w s(l)、Wa(l)、Wsgn (l) is the weight matrix of the structural feature, the attribute feature and the sub-image feature of the entity in the first layer of the pair Ji Moxing respectively; [ (r) ]; and represents a concatenation of two matrices; sigma is a nonlinear activation function;
Step S3, using two knowledge graph networks with different languages as a knowledge graph network to be aligned and a target knowledge graph network respectively, selecting an entity to be aligned from the knowledge graph networks to be aligned, and calculating the similarity of the embedded vectors between the entity to be aligned and each entity in the target knowledge graph network according to a scoring function;
S4, sorting the entities in the target knowledge-graph network according to the similarity of the embedded vectors of each entity and the entity to be aligned in the target knowledge-graph network, and obtaining candidate equivalent entities of the entity to be aligned;
and S5, fusing the knowledge graph networks of the two different languages according to the candidate equivalent entities.
2. The method for aligning and fusing cross-language knowledge graphs according to claim 1, wherein in the step S1, the extracting of the first-order sub-graph features specifically includes:
s1.1, detecting a subgraph from a network structure of a knowledge graph;
S1.2, constructing a sub-graph network based on the detected sub-graph;
and step S1.3, encoding the nodes established in the sub-graph network based on a one-hot encoding method to obtain first-order sub-graph characteristics of the entity.
3. The method for aligning and fusing cross-language knowledge graph according to claim 2, wherein in step S1.1, the sub-graph detected from the network structure of the original knowledge graph is a line.
4. The method for aligning and fusing cross-language knowledge graph according to claim 2, wherein in the step S1.2, the method for constructing the sub-graph network based on the detected sub-graph comprises: traversing all sub-graphs, judging whether the two sub-graphs share the same node or link in the knowledge graph network structure, if so, creating a connecting edge between the two sub-graphs to obtain a sub-graph network.
5. The method for aligning and fusing cross-language knowledge graph according to claim 2, wherein in the step S1.3, the method for encoding the nodes established in the sub-graph network is as follows: judging whether the entity in the knowledge graph network belongs to a certain sub-graph in the node set of the sub-graph network, if yes, marking the coding position of the corresponding sub-graph as 1, and if not, marking the coding position of the corresponding sub-graph as 0.
6. The method for aligning and fusing cross-language knowledge graphs according to claim 1, wherein in the step S3, the similarity of the embedded vectors between different entities is calculated according to a scoring function, as shown in equation 5:
Wherein D (e i,ej) represents the similarity between entities e i、ej; d s、da、dsgn is the dimension of three embedded vectors of the structural feature, the attribute feature and the sub-graph feature respectively; alpha, beta and gamma are all super parameters, alpha+beta+gamma=1.
7. The method for aligning and fusing knowledge-graph networks of different languages according to claim 1, wherein in step S5, the method for fusing knowledge-graph networks of two different languages according to candidate equivalent entities comprises:
And combining the entities and the relations in the knowledge-graph of two different languages according to the equivalent entities of the unaligned entities in the target knowledge-graph network to realize the fusion of the knowledge-graph networks of the two different languages.
8. The device for aligning and fusing the cross-language knowledge graph according to any one of claims 1 to 7, comprising a sub-graph feature extraction module, an entity alignment module and a knowledge graph fusion module which are connected in sequence;
The sub-graph feature extraction module is used for respectively converting two knowledge graph networks with different languages into a first-order sub-graph network, extracting first-order sub-graph features of the entity based on the first-order sub-graph network, and constructing a first-order sub-graph feature matrix of the entity in the knowledge graph networks with different languages;
the entity alignment module is used for constructing an alignment model based on GCN, respectively inputting a structural feature matrix, an attribute feature matrix and a first-order sub-image feature matrix of the entities in the knowledge graph networks of two different languages into the trained alignment model to obtain embedded vector representations of the entities in the knowledge graph networks of the two different languages, calculating the similarity of the embedded vectors between the entity to be aligned and all the entities in the target knowledge graph network, and obtaining candidate equivalent entities of the entity to be aligned; the method for acquiring the splicing results of the three embedded vector matrixes of the structural feature, the attribute feature and the sub-image feature of the entity is shown in the formula 1:
in the method, in the process of the invention, Respectively embedding a structural feature embedded vector matrix, an attribute feature embedded vector matrix and a sub-graph feature embedded vector matrix of the knowledge graph network; p is an adjacency matrix with the size of n multiplied by n, and n is the number of entities in the knowledge-graph network; /(I)I is an identity matrix; /(I)Is/>Is a diagonal node degree matrix of (a); h s(l)、Ha(l)、Hsgn (l) is the structural feature matrix, the attribute feature matrix, and the first-order sub-graph feature matrix, respectively, of the entity input to the first layer of the pair Ji Moxing; w s(l)、Wa(l)、Wsgn (l) is the weight matrix of the structural feature, the attribute feature and the sub-image feature of the entity in the first layer of the pair Ji Moxing respectively; [ (r) ]; and represents a concatenation of two matrices; sigma is a nonlinear activation function;
The knowledge spectrum fusion module is used for fusing the knowledge spectrum networks of two different languages according to the candidate equivalent entities.
9. A storage medium storing a program for implementing the cross-language knowledge graph alignment and fusion method of any one of claims 1 to 7.
CN202110241500.4A 2021-03-04 2021-03-04 Cross-language knowledge graph alignment and fusion method, device and storage medium Active CN113111657B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110241500.4A CN113111657B (en) 2021-03-04 2021-03-04 Cross-language knowledge graph alignment and fusion method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110241500.4A CN113111657B (en) 2021-03-04 2021-03-04 Cross-language knowledge graph alignment and fusion method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113111657A CN113111657A (en) 2021-07-13
CN113111657B true CN113111657B (en) 2024-05-03

Family

ID=76710295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110241500.4A Active CN113111657B (en) 2021-03-04 2021-03-04 Cross-language knowledge graph alignment and fusion method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113111657B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969367B (en) * 2022-05-30 2024-04-30 大连民族大学 Cross-language entity alignment method based on multi-aspect subtask interaction
CN116561346B (en) * 2023-07-06 2023-10-31 北京邮电大学 Entity alignment method and device based on graph convolution network and information fusion
CN117149839B (en) * 2023-09-14 2024-04-16 中国科学院软件研究所 Cross-ecological software detection method and device for open source software supply chain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941722A (en) * 2019-10-12 2020-03-31 中国人民解放军国防科技大学 Knowledge graph fusion method based on entity alignment
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
WO2020143184A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge fusion method and apparatus, computer device, and storage medium
CN111931505A (en) * 2020-05-22 2020-11-13 北京理工大学 Cross-language entity alignment method based on subgraph embedding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143184A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge fusion method and apparatus, computer device, and storage medium
CN110941722A (en) * 2019-10-12 2020-03-31 中国人民解放军国防科技大学 Knowledge graph fusion method based on entity alignment
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
CN111931505A (en) * 2020-05-22 2020-11-13 北京理工大学 Cross-language entity alignment method based on subgraph embedding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Aligning Knowledge Base and Document Embedding Models Using Regularized Multi-Task Learning;Baumgartner M等;C]//International Semantic Web Conference. Springer;全文 *
Multi-view knowledge graph embedding for entity alignment;Zhang Q等;arXiv preprint;全文 *
公共安全领域知识图谱的知识融合技术研究;李攀成;中国硕士学位论文全文数据库(1);全文 *
融合实体描述及类型的知识图谱表示学习方法;杜文倩等;中文信息学报;第34卷(第7期);全文 *

Also Published As

Publication number Publication date
CN113111657A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN113111657B (en) Cross-language knowledge graph alignment and fusion method, device and storage medium
Hui et al. Linguistic structure guided context modeling for referring image segmentation
WO2021227726A1 (en) Methods and apparatuses for training face detection and image detection neural networks, and device
CN111310668B (en) Gait recognition method based on skeleton information
JP2023502827A (en) How to acquire geographic knowledge
US11977604B2 (en) Method, device and apparatus for recognizing, categorizing and searching for garment, and storage medium
CN114996488B (en) Skynet big data decision-level fusion method
CN112215837A (en) Multi-attribute image semantic analysis method and device
CN112084373B (en) Graph embedding-based multi-source heterogeneous network user alignment method
CN112149400A (en) Data processing method, device, equipment and storage medium
KR20210102039A (en) Electronic device and control method thereof
CN113761221B (en) Knowledge graph entity alignment method based on graph neural network
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
CN110851609A (en) Representation learning method and device
CN112420212A (en) Method for constructing stroke medical knowledge map
CN116737979A (en) Context-guided multi-modal-associated image text retrieval method and system
CN113380360B (en) Similar medical record retrieval method and system based on multi-mode medical record map
CN114373554A (en) Drug interaction relation extraction method using drug knowledge and syntactic dependency relation
CN111309930B (en) Medical knowledge graph entity alignment method based on representation learning
CN116775927A (en) Cross-modal image-text retrieval method and system based on local context
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
CN116341655A (en) Entity alignment method based on multi-mode collaborative representation learning
Qiang et al. Hybrid deep neural network-based cross-modal image and text retrieval method for large-scale data
Xu et al. Estimating similarity of rich internet pages using visual information
Ning et al. Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant