CN115831379A

CN115831379A - Knowledge graph complementing method and device, storage medium and electronic equipment

Info

Publication number: CN115831379A
Application number: CN202211486149.6A
Authority: CN
Inventors: 孙小婉; 蔡巍; 招一强; 张霞
Original assignee: Neusoft Corp; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Current assignee: Neusoft Corp; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-21

Abstract

The disclosure relates to a knowledge graph complementing method, a knowledge graph complementing device, a storage medium and electronic equipment. The method comprises the following steps: determining a target entity relationship, at least one known triple including the target entity relationship and a triple to be supplemented including the target entity relationship from a known graph spectrum to be supplemented, wherein the triple to be supplemented includes a known entity and an entity to be supplemented; determining a vector representation of the target entity relationship from the at least one known triplet; determining the vector representation of the entity to be supplemented according to the vector representation of the target entity relationship and the vector representation of the known entity; determining the first N candidate entity vectors with the maximum vector representation similarity with the entity to be complemented from an entity vector data set, wherein N is an integer greater than 1; and completing the triple to be completed according to the first N candidate entity vectors to obtain a completed knowledge graph. By adopting the method, the construction efficiency of the knowledge graph can be improved.

Description

Knowledge graph complementing method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of knowledge graph technology, and in particular, to a knowledge graph completion method, apparatus, storage medium, and electronic device.

Background

With the rapid construction and development of hospital information systems, a large amount of medical data such as electronic medical records, inspection indexes, operation records, medical documents, medical images and the like are brought to hospitals. For convenience of management and use of these medical quantities, data fusion processing needs to be performed on these data. For example, integrating these data into an easy-to-use knowledge graph.

In the related art, in order to more conveniently use medical data, a medical-related knowledge graph is constructed according to the medical data, and at present, a main method for constructing the medical-related knowledge graph is manually constructed by medical service experts, however, the method for manually constructing the knowledge graph is time-consuming and has high labor cost.

Disclosure of Invention

In order to solve the problems in the related art, the present disclosure provides a knowledge graph complementing method, apparatus, storage medium, and electronic device.

In order to achieve the above object, a first aspect of the embodiments of the present disclosure provides a knowledge-graph completion method, including:

determining a target entity relationship, at least one known triple including the target entity relationship and a triple to be supplemented including the target entity relationship from a known graph spectrum to be supplemented, wherein the triple to be supplemented includes a known entity and an entity to be supplemented;

determining a vector representation of the target entity relationship from the at least one known triplet;

determining the vector representation of the entity to be supplemented according to the vector representation of the target entity relationship and the vector representation of the known entity;

determining the first N candidate entity vectors with the maximum vector representation similarity with the entity to be complemented from an entity vector data set, wherein N is an integer greater than 1;

and completing the triple to be completed according to the first N candidate entity vectors to obtain a completed knowledge graph.

Optionally, the completing the triple to be completed according to the first N candidate entity vectors includes:

displaying N candidate entities corresponding to the first N candidate entity vectors to a user;

and responding to the operation that the user selects a target entity from the N candidate entities, determining the target entity as the entity to be supplemented, and obtaining a supplemented triple.

Optionally, the entity vector data set is obtained by mapping a data source corresponding to the knowledge graph to be complemented based on word embedding processing of word2 vec;

the known triples include a known head entity and a known tail entity, and determining a vector representation of the target entity relationship based on one of the known triples includes:

determining a vector representation corresponding to the known head entity from the entity vector dataset;

determining a vector representation corresponding to the known tail entity from the entity vector dataset;

and calculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity to obtain the vector representation of the target entity relationship.

Optionally, the known triplet includes a known head entity and a known tail entity, and determining a vector representation of the target entity relationship according to a plurality of the known triplets includes:

calculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity in each known triple to obtain a first relation vector representation;

clustering all the first relation vector representations to obtain M clusters;

and determining the mean vector of each cluster as the vector representation of the target entity relationship to obtain M vector representations of the target entity relationship.

Optionally, the determining, by the known entity, the vector representation of the to-be-complemented entity according to the vector representation of the target entity relationship and the vector representation of the known entity includes:

and calculating the difference value of the vector representation of the head entity and the vector representation of each target entity relationship to obtain the vector representations of the M entities to be supplemented.

Optionally, when the number of vector representations of the entity to be complemented is M, correspondingly, the complementing the triple to be complemented according to the first N candidate entity vectors corresponds to M groups of the first N candidate entity vectors, and includes:

and for each group of the first N candidate entity vectors, completing the triple to be completed according to the group of the first N candidate entity vectors to obtain M completed triples.

Optionally, the determining, by the vector representation of the target entity relationship and the vector representation of the known entity, the vector representation of the entity to be complemented includes:

and calculating the sum of the vector representation of the tail entity and the vector representation of the relation of each target entity to obtain the vector representations of the M entities to be supplemented.

A second aspect of an embodiment of the present disclosure provides a knowledge graph spectrum complementing device, including:

the target entity relation, at least one known triple comprising the target entity relation and a to-be-supplemented triple comprising the target entity relation are determined from a to-be-supplemented known graph spectrum, and the to-be-supplemented triple comprises a known entity and a to-be-supplemented entity;

a second determination module configured to determine a vector representation of the target entity relationship from the at least one known triplet;

a third determining module configured to determine a vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity;

a selection module configured to determine, from an entity vector data set, the first N candidate entity vectors having the greatest similarity to the vector representation of the entity to be complemented, N being an integer greater than 1;

and the completion module is configured to complete the triple to be completed according to the first N candidate entity vectors so as to obtain a completed knowledge graph.

Optionally, the completion module includes:

a presentation sub-module configured to present to a user N candidate entities corresponding to the first N candidate entity vectors;

and the execution sub-module is configured to respond to the operation that the user selects a target entity from the N candidate entities, determine the target entity as the entity to be supplemented, and obtain a supplemented triple.

Optionally, the entity vector data set is obtained by mapping a data source corresponding to the knowledge graph to be completed based on word2vec word embedding processing; the known triples include a known head entity and a known tail entity;

the second determining module includes:

a first determining sub-module configured to determine a vector representation corresponding to the known head entity from the entity vector dataset; determining a vector representation corresponding to the known tail entity from the entity vector dataset;

a first calculation sub-module configured to calculate a difference between the vector representation of the known head entity and the vector representation of the known tail entity to obtain a vector representation of the target entity relationship.

Optionally, the second determining module includes:

a second calculating sub-module configured to calculate, for each of the known triples, a difference between the vector representation of the known head entity and the vector representation of the known tail entity in the known triplet, resulting in a first relationship vector representation;

the clustering submodule is configured to perform clustering processing on all the first relation vector representations to obtain M clusters;

a second determining sub-module configured to determine, for each cluster, a mean vector of the cluster as a vector representation of the target entity relationship to obtain M vector representations of the target entity relationship.

Optionally, the third determining module includes:

and the third calculation sub-module is configured to calculate a difference value between the vector representation of the head entity and the vector representation of each target entity relationship if the known entity is the head entity, so as to obtain the vector representations of the M entities to be complemented.

Optionally, the completion module includes:

and the completion sub-module is configured to, under the condition that the number of vectors of the entity to be completed is M, correspondingly, perform completion on the triple to be completed according to the previous N candidate entity vectors of each group, corresponding to M groups of the previous N candidate entity vectors, so as to obtain M post-completion triples.

Optionally, the third determining module includes:

and the fourth calculation submodule is configured to calculate a sum of the vector representation of the tail entity and the vector representation of each target entity relationship if the known entity is the tail entity, so as to obtain the vector representations of the M entities to be complemented.

A third aspect of embodiments of the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first aspects.

A fourth aspect of the embodiments of the present disclosure provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of the first aspects.

By adopting the technical scheme, the following beneficial technical effects can be at least achieved:

the knowledge graph to be supplemented comprises a target entity relationship, at least one known triple comprising the target entity relationship, and a triple to be supplemented comprising the target entity relationship. The target entity relationship, at least one known triple comprising the target entity relationship, and a triple to be complemented comprising the target entity relationship are determined from the knowledge-graph to be complemented. And determining a vector representation of the target entity relationship based on the at least one known triplet. And determining the vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity. And determining the first N candidate entity vectors with the maximum similarity to the vector representation of the entity to be supplemented from the entity vector data set, and completing the triple to be supplemented according to the first N candidate entity vectors to obtain a supplemented triple, so as to obtain a supplemented knowledge graph. By adopting the method, only the target entity relationship in the knowledge graph, at least one known triple comprising the target entity relationship and one entity in the triple to be supplemented comprising the target entity relationship need to be manually constructed, and then the knowledge graph supplementing method disclosed by the invention can be adopted to automatically supplement the triple to be supplemented to obtain the supplemented triple, so that the knowledge graph after automatic supplementation is obtained. Compared with the method for manually constructing all the contents of the knowledge graph in the related technology, the method disclosed by the invention only needs to manually construct part of the contents in the knowledge graph, namely the entity relationship in the knowledge graph, a triple comprising the entity relationship and an entity in the triple to be supplemented, so that the efficiency is higher, and the manual workload is lower.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a knowledge-graph completion method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a block diagram illustrating a knowledge graph spectrum complementing apparatus according to an exemplary embodiment of the present disclosure.

Fig. 3 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Fig. 4 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

It should be noted that all actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

As described in the background art, with the rapid construction and development of hospital information systems, a large amount of medical data, such as electronic medical records, examination indexes, surgical records, medical documents, medical images, and the like, is brought to hospitals. The medical data has important data research value and treatment reference value in the aspects of medical research, clinical treatment and the like. For example, stomach cancer is one of the most common malignant tumors, and is also a malignant tumor with a very high mortality rate. The histopathological diagnosis data of the endoscope biopsy in the medical data related to the gastric cancer is the basis for the accurate diagnosis and treatment of the gastric cancer. The histopathological diagnosis data of the postoperative system of the gastric cancer is the histopathological basis for determining the histological type of the gastric cancer, comprehensively evaluating the disease progress of the gastric cancer, judging the prognosis of a patient (namely predicting the final result of a certain disease) and formulating a specific individual treatment scheme.

Different medical detection equipment and medical information entry equipment may bear different data systems, and different data systems may store data in different ways, so that the massive medical data of the hospital may have different data structure types. Pathological information of gastric cancer, for example, mainly includes pathological specimens, pathological reports, and immunohistochemistry. Wherein case specimens are structured data, pathology reports are unstructured data, and immunohistochemistry is semi-structured type of sequence information. It should be explained that we refer to medical data of different data types from different medical examination devices and different medical information entry devices as multi-source heterogeneous data, and refer to these medical examination devices and medical information entry devices as data sources.

The massive medical data may have different data structure types, and the data with different data structure types are difficult to comb and apply due to inconsistent data structures during research and use. Therefore, in order to use these huge amounts of medical data, data fusion processing is required to be performed on these data. Such as integrating the data into a knowledge graph having a unified data structure. Knowledge-graph may be easy-to-use data based on its capabilities, such as semantic search capabilities.

In the related art, in order to use medical data, a medical-related knowledge graph is constructed according to the medical data, and a main method for constructing the medical-related knowledge graph is manually constructed by a medical service expert, however, the method for manually constructing the knowledge graph is time-consuming and has high labor cost.

In view of this, in order to solve the problems of time consumption and high cost in the manual construction of the knowledge graph in the related art, the present disclosure provides a knowledge graph completing method, apparatus, storage medium and electronic device.

Before describing detailed embodiments of the technical solutions of the present disclosure, the following is a brief explanation of related terms involved in the embodiments of the present disclosure.

Knowledge Graph (also known as Knowledge domain visualization or Knowledge domain mapping map) is a series of different graphs displaying the relationship between the Knowledge development process and the structure, and uses visualization technology to describe Knowledge resources and their carriers, and to mine, analyze, construct, draw and display Knowledge and the mutual relation between them.

The knowledge graph consists of three element units, namely a head entity, an entity relation and a tail entity, which are expressed by a triple form (the head entity, the entity relation and the tail entity).

Word2vec is a commonly used word embedding method whose basic principle is to represent the current word using context, and semantically similar words will be represented as similar or similar vector representations.

The vector representation corresponding to any text refers to a mapping result of mapping any text based on word2vec word embedding processing.

Clustering, the process of dividing a set of physical or abstract objects into classes composed of similar objects in embodiments of the present disclosure, is referred to as clustering. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.

In addition, it should be noted that the intellectual graph complementing method disclosed by the invention is a method for continuously perfecting the intellectual graph in a mode of complementing and expanding the existing intellectual graph with a certain scale. In the process of constructing the knowledge graph, the method can complement and expand the knowledge graph which is not constructed so as to obtain the constructed knowledge graph. In the process of updating the knowledge graph, the knowledge graph to be updated can be supplemented and expanded so as to obtain the continuously improved and updated knowledge graph.

The following provides a detailed description of embodiments of the present disclosure.

The knowledge graph completing method is suitable for knowledge graph completing treatment in any field. In the subsequent embodiments of the present disclosure, the gastric cancer pathology knowledge graph is mainly used as an example for illustration, and the specific triple example involved is also a practical example in the field of gastric cancer pathology.

FIG. 1 is a flow chart illustrating a knowledge-graph completion method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the knowledge-graph complementing method includes the following steps:

s11, determining a target entity relationship, at least one known triple including the target entity relationship and a to-be-supplemented triple including the target entity relationship from a to-be-supplemented known graph spectrum, wherein the to-be-supplemented triple includes a known entity and a to-be-supplemented entity.

In the embodiment of the present disclosure, the knowledge graph spectrum to be complemented includes a plurality of entity relationships, at least one known triple corresponding to each entity relationship, and a triplet to be complemented including any entity relationship. The purpose of the disclosure is to complete triples to be completed including target entity relationships according to at least one known triplet including target entity relationships in an analogy-based manner, thereby completing and extending a knowledge graph.

And S12, determining the vector representation of the target entity relationship according to the at least one known triple.

A triplet includes a head entity, an entity relationship, and a tail entity, and in general, multiple triplets may have the same entity relationship. Illustratively, in the field of gastric cancer pathology, triplet a may be (chemotherapy regimen SOX, drug, oxaliplatin) and triplet B may be (chemotherapy regimen FLOT, drug, docetaxel). Wherein SOX may characterize a regimen for selecting oxaliplatin injection for infusion therapy or oral S-1 capsule for therapy. FLOT characterizes docetaxel, oxaliplatin, tetrahydrofolate, 5-fluorouracil, a regimen of repeated treatments every 14 days. From the triad a and the triad B, the triad a and the triad B have the same entity relationship, i.e., "drug". In this case, the entity relationship of the triple a and the triple B should be semantically equivalent, i.e. the entity relationship of the triple a and the triple B is semantically the same or similar.

Since any text can be mapped into a multi-dimensional dense vector by a word embedding operation, and entity relationships can be characterized according to a head entity and a tail entity, a vector representation of an entity relationship can also be determined according to a vector representation of a head entity and a vector representation of a tail entity. And between triples having the same entity relationship, the vector representations of their entity relationships should also be semantically equivalent, i.e., the vector representations of their entity relationships should be the same or similar.

Since the entity relationship may be characterized in terms of a head entity and a tail entity, and the vector representation of the entity relationship may be characterized in terms of a vector representation of the head entity and a vector representation of the tail entity, in embodiments of the present disclosure, for a target entity relationship, the vector representation of the target entity relationship may be determined in terms of at least one known triple that includes the target entity relationship.

And S13, determining the vector representation of the entity to be supplemented according to the vector representation of the target entity relationship and the vector representation of the known entity.

For triplets (head entity, entity relationship, tail entity), in addition to the entity relationship may be characterized according to the head entity and the tail entity, similarly, the tail entity may also be characterized according to the head entity and the entity relationship, or the head entity may be characterized according to the entity relationship and the tail entity. I.e., for a triple (head entity, entity relationship, tail entity), another element may be characterized from any two elements in the triple. And the characterization mode can be any one of the following modes:

vector representation of head entity + vector representation of entity relationship = vector representation of tail entity;

vector representation of head entity-vector representation of entity relationship = vector representation of tail entity;

vector representation of head entity + vector representation of tail entity = vector representation of entity relationship;

the present disclosure does not specifically limit the manner in which any two elements of a triplet are characterized with respect to another element.

Based on this, in the embodiment of the present disclosure, for a triplet to be compensated, with the vector representation of a target entity relationship in the triplet to be compensated and the vector representation of a known entity being known, the vector representation of the entity to be compensated in the triplet to be compensated may be determined.

In one embodiment, given a known triplet (A, R, B), the vector representation of entity A is characterized by v (A) and the vector representation of entity B is characterized by v (B). And assuming that the vector of target entity relationships determined from the known triples (a, R, B) represents v (R) in the following way: v (R) = v (a) -v (B).

Further, if a triplet (C, R, X) to be complemented is assumed, where X is the entity to be complemented. The vector representation v (X) of the entity to be complemented is determined from the vector representation v (R) = v (a) -v (B) of the target entity relation and the vector representation v (C) of the known entity in the triplet to be complemented as follows: v (X) = v (C) -v (R) = v (C) - (v (a) -v (B)).

In another embodiment, given a known triplet (A, R, B), the vector representation of entity A is characterized by v (A) and the vector representation of entity B is characterized by v (B). And assuming that the vector of target entity relationships determined from the known triples (a, R, B) represents v (R) in the following way: v (R) = v (B) -v (a).

Further, if a triplet (C, R, X) to be complemented is assumed, where X is the entity to be complemented. The vector representation v (X) of the entity to be complemented is determined from the vector representation v (R) = v (B) -v (a) of the target entity relation and the vector representation v (C) of the known entity in the triplet to be complemented as follows: v (X) = v (C) + v (R) = v (C) + (v (B) -v (a)).

In the subsequent embodiments of the present disclosure, the description is mainly exemplified in a characterization manner of "vector representation of head entity — vector representation of entity relationship = vector representation of tail entity".

S14, determining the first N candidate entity vectors with the maximum vector representation similarity with the entity to be complemented from the entity vector data set, wherein N is an integer greater than 1.

Illustratively, the entity vector data set includes a plurality of entity vector representations, and a similarity, such as a cosine similarity, between each entity vector representation and a vector representation of an entity to be complemented is calculated. And determining the first N entity vector representations with the maximum similarity to the vector representation of the entity to be complemented as candidate entity vectors.

For example, it is assumed that N is 2, and the entity vector data set includes entity vector representations a, b, c, d, and e, where a cosine similarity between a vector representation v (X) of the entity to be complemented and the entity vector representation a is 0.2, a cosine similarity between the vector representation v (X) of the entity to be complemented and the entity vector representation b is 0.1, a cosine similarity between the vector representation v (X) of the entity to be complemented and the entity vector representation c is 0.5, a cosine similarity between the vector representation v (X) of the entity to be complemented and the entity vector representation d is 0.95, and a cosine similarity between the vector representation v (X) of the entity to be complemented and the entity vector representation e is 0.89. The first 2 entity vectors with the greatest similarity to the vector representation v (X) of the entity to be complemented are the entity vector representation d and the entity vector representation e. I.e. the candidate entity vectors are the entity vector representation d and the entity vector representation e.

In order to ensure the accuracy of completing the triple to be completed, in the embodiment of the present disclosure, N is set to be an integer greater than 1, so that the probability that one correct entity to be completed is included in the multiple candidate entity vectors is greater, and accordingly, the accuracy of completing the entity to be completed in the triple to be completed according to the multiple candidate entity vectors is higher.

And S15, completing the triple to be completed according to the first N candidate entity vectors to obtain a completed knowledge graph.

In one embodiment, the candidate entity vector with the greatest similarity to the vector representation of the entity to be supplemented in the candidate entity vectors may be used as the entity to be supplemented in the triplet to be supplemented, so as to obtain the supplemented triplet.

In another embodiment, the completing the triple to be completed according to the first N candidate entity vectors includes:

displaying N candidate entities corresponding to the first N candidate entity vectors to a user; and responding to the operation that the user selects a target entity from the N candidate entities, determining the target entity as the entity to be supplemented, and obtaining a supplemented triple.

Illustratively, the user is presented with N candidate entities corresponding to the first N candidate entity vectors. Therefore, the user only needs to determine a target entity from the N candidate entities as the entity to be supplemented, and the triple to be supplemented is supplemented to obtain the supplemented triple. Compared with the mode that the user selects a target entity from the massive entities corresponding to the entity vector data set as the entity to be supplemented, the method has higher efficiency. Moreover, the accuracy of the target entity can be guaranteed by selecting the target entity through the user. I.e. this way has both high efficiency and accuracy.

The method is adopted to determine the target entity relationship, at least one known triple including the target entity relationship and the triple to be supplemented including the target entity relationship from the knowledge graph to be supplemented. And determining a vector representation of the target entity relationship based on the at least one known triplet. And determining the vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity. And determining the first N candidate entity vectors with the maximum vector representation similarity with the entity to be complemented from the entity vector data set, complementing the triple to be complemented according to the first N candidate entity vectors to obtain a complemented triple, and further obtaining the complemented knowledge-graph. By adopting the method, only the target entity relationship, at least one known triple comprising the target entity relationship and the triple to be complemented comprising the target entity relationship need to be artificially constructed, and then the complemented triple can be obtained by adopting the knowledge graph complementing method disclosed by the invention, so that the complemented triple can be obtained, and further the complemented knowledge graph can be obtained. This approach of the present disclosure is more efficient than the approach of manually constructing the full content of the knowledge-graph in the related art.

In the embodiment of the disclosure, the entity vector data set is obtained by mapping a data source corresponding to a to-be-complemented knowledge graph based on word embedding processing of word2 vec.

The data source corresponding to the knowledge graph to be supplemented comprises a data source for extracting known triples, entity relations and triples to be supplemented, such as medical guidelines, papers, textbooks and clinical medical data of hospitals, etc. in the field to which the knowledge graph to be supplemented belongs.

Illustratively, in the field of gastric cancer pathology, data sources corresponding to a knowledge graph to be supplemented include gastric cancer pathology specimens, gastric cancer pathology reports, gastric cancer immunohistochemistry, and the like.

And training the word2vec model according to the data source corresponding to the knowledge graph spectrum to be supplemented, so as to obtain the trained word2vec model. And inputting entities in the data source corresponding to the to-be-supplemented known graph spectrum into the trained word2vec model to obtain entity vector representation corresponding to each entity, thereby obtaining an entity vector data set.

An embodiment, wherein the known triples include a known head entity and a known tail entity, and wherein determining a vector representation of the target entity relationship from one of the known triples comprises:

determining a vector representation corresponding to the known head entity from the entity vector dataset; determining a vector representation corresponding to the known tail entity from the entity vector dataset; and calculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity to obtain the vector representation of the target entity relationship.

If there is a vector representation in the entity vector data set corresponding to a known head entity and a known tail entity, then a vector representation corresponding to the known head entity can be determined from the entity vector data set. And determining from the entity vector dataset a vector representation corresponding to the known tail entity. And calculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity to obtain the vector representation of the target entity relationship. For example, the vector representation of the known head entity is subtracted from the vector representation of the known tail entity to obtain a vector representation of the target entity relationship.

In another embodiment, if the correspondence between the vector representation in the entity vector dataset and the entity is not known, the known triple includes a known head entity and a known tail entity, and determining the vector representation of the target entity relationship according to one of the known triples includes:

and inputting the known head entity into the trained word2vec model to obtain the vector representation of the known head entity. And inputting the known tail entity into the trained word2vec model to obtain the vector representation of the known tail entity. And then calculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity to obtain the vector representation of the target entity relationship. For example, the vector representation of the known head entity is subtracted from the vector representation of the known tail entity to obtain a vector representation of the target entity relationship.

calculating the difference value between the vector representation of the known head entity and the vector representation of the known tail entity in each known triple to obtain a first relation vector representation; clustering all the first relation vector representations to obtain M clusters; and determining the mean vector of each cluster as the vector representation of the target entity relationship to obtain M vector representations of the target entity relationship.

In some embodiments, if a cluster includes k known triples, then (A) is used _j ，R，B _j ) The jth known triplet is characterized. Then the mean vector for the cluster is calculated as

It will be appreciated that as the same drug may be used in different treatment regimes, there may be the same physical relationship and the same head entity or there may be the same physical relationship and the same tail entity between the triplets. For example, in the field of gastric cancer pathology, the triad (chemotherapy regimen SOX, drug, oxaliplatin) and the triad (chemotherapy regimen FLOT, drug, oxaliplatin) have the same physical relationship "drug" and the same tail entity "oxaliplatin". Thus, the entities to be complemented in the triplets to be complemented may also correspond to multiple entity types. Based on this, the present disclosure proposes to determine vector representations of relationships of multiple target entities according to multiple known triples, and then determine multiple entities to be complemented.

By way of example, assume that there are known triplets A, B, C, D, E, F. To is directed atCalculating the difference value of the vector representation of the known head entity and the vector representation of the known tail entity in the known triple A to obtain a first relation vector representation R _A . Similarly, a first relationship vector representation R corresponding to a known triplet B may be determined _B The first relation vector corresponding to the known triple C represents R _C The first relation vector corresponding to the known triplet D represents R _D The first relation vector corresponding to the known triple E represents R _E And a first relation vector representation R corresponding to the known triple F _F . Then, for all the first relation vectors R _A 、R _B 、R _C 、R _D 、R _E 、R _F And performing clustering processing, for example, performing clustering processing by using a K-means algorithm in the related art, wherein a specific clustering principle can be referred to in the related art. Suppose that clustering results in two clusters, one cluster comprising R _A 、R _C 、R _E The other cluster includes R _B 、R _D 、R _F . Then, R is added _A 、R _C 、R _E Mean vector R of _v1 ＝(R _A +R _C +R _E ) And/3 is determined as a vector representation of the target entity relationship. And, adding R _B 、R _D 、R _F Mean vector R of _v2 ＝(R _B +R _D +R _F ) And/3 is determined as a vector representation of the target entity relationship.

That is, if the number of known triples is multiple, the vector representation of the target entity relationship determined from the multiple known triples may be one or more. I.e., M is an integer greater than or equal to 1 in embodiments of the present disclosure.

After determining the vector representation of the target entity relationship, a vector representation of the entity to be complemented may be determined from the vector representation of the target entity relationship and the vector representation of the known entity.

For example, assume that entity C is known as the head entity and the triplet to be complemented is (C, R, X). The vector of the relationship of the two target entities is represented as R _v1 ＝(R _A +R _C +R _E ) [ 3 ] and R _v2 ＝(R _B +R _D +R _F )/3. Then, vector representations of 2 entities to be complemented, respectively V, can be calculated ₁ (X)＝V(C)-R _v1 、V ₂ (X)＝V(C)-R _v2 。

Optionally, when the number of vector representations of the entity to be complemented is M, correspondingly, corresponding to M groups of the first N candidate entity vectors, the complementing the triple to be complemented according to the first N candidate entity vectors includes:

The vector representation at the entity to be complemented comprises V ₁ (X)＝V(C)-R _v1 And V ₂ (X)＝V(C)-R _v2 In the case of (1). Accordingly, according to V ₁ (X)＝V(C)-R _v1 Determining sum V from entity vector dataset ₁ Obtaining a group of first N candidate entity vectors G by the first N candidate entity vectors with the maximum similarity ₁ And according to V ₂ (X)＝V(C)-R _v2 Determining sum V from entity vector dataset ₂ Obtaining a group of first N candidate entity vectors G by the first N candidate entity vectors with the maximum similarity ₂ 。

Then, according to the first N candidate entity vectors G ₁ Completing the triad to be completed to obtain the completed triad T ₁ . According to the first N candidate entity vectors G ₂ Completing the triad to be completed to obtain the completed triad T ₂ . The number of the completed triples is the same as the number of vector representations of the target entity relationships.

Optionally, the determining, according to the vector representation of the target entity relationship and the vector representation of the known entity, the vector representation of the entity to be complemented includes:

For example, assume that entity C is known to be the tail entity and the triplet to be complemented is (X, R, C). The vector of the relationship of the two target entities is represented as R _v1 ＝(R _A +R _C +R _E ) [ 3 ] and R _v2 ＝(R _B +R _D +R _F )/3. Then, vector representations of 2 entities to be complemented, respectively V, can be calculated ₁ (X)＝V(C)+R _v1 、V ₂ (X)＝V(C)+R _v2 。

After determining the vector representation of the entity to be complemented, a corresponding number of complemented triples may be determined, and for a detailed implementation, reference may be made to the embodiment where the known entity is the head entity and the principle is the same, which is not described herein again.

By adopting the method disclosed by the invention, the entity to be supplemented in the triple to be supplemented can be effectively predicted, so that the knowledge graph spectrum to be supplemented in the existing scale is supplemented and expanded, and the working efficiency, the working difficulty and the accuracy can be improved in the processes of construction, updating and the like of the knowledge graph.

Fig. 2 is a block diagram illustrating a knowledge graph spectrum complementing apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the knowledge graph spectrum complementing apparatus 200 includes:

a first determining module 210 configured to determine, from a to-be-supplemented known graph spectrum, a target entity relationship, at least one known triple including the target entity relationship, and a to-be-supplemented triple including the target entity relationship, where the to-be-supplemented triple includes a known entity and a to-be-supplemented entity;

a second determining module 220 configured to determine a vector representation of the target entity relationship from the at least one known triplet;

a third determining module 230 configured to determine a vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity;

a selecting module 240 configured to determine, from the entity vector data set, the first N candidate entity vectors having the greatest similarity to the vector representation of the entity to be complemented, where N is an integer greater than 1;

a completion module 250 configured to complete the triple to be completed according to the first N candidate entity vectors to obtain a completed knowledge graph.

By adopting the device, the target entity relationship, at least one known triple comprising the target entity relationship and the triple to be supplemented comprising the target entity relationship are determined from the knowledge graph to be supplemented. And determining a vector representation of the target entity relationship based on the at least one known triplet. And determining the vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity. And determining the first N candidate entity vectors with the maximum similarity to the vector representation of the entity to be supplemented from the entity vector data set, and completing the triple to be supplemented according to the first N candidate entity vectors to obtain a supplemented triple, so as to obtain a supplemented knowledge graph. By adopting the method, only the target entity relationship, at least one known triple comprising the target entity relationship and the triple to be complemented comprising the target entity relationship need to be constructed manually, and then the complemented triple can be obtained by adopting the knowledge graph complementing method disclosed by the invention, so that the complemented knowledge graph can be obtained. This approach of the present disclosure is more efficient than the approach of manually constructing the full content of the knowledge-graph in the related art.

Optionally, the completion module 250 includes:

the second determining module 220 includes:

Optionally, the second determining module 220 includes:

Optionally, the third determining module 230 includes:

Optionally, the completion module 250 includes:

Optionally, the third determining module 230 includes:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 3 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.

The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the above-mentioned knowledge-graph completing method. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, or combinations thereof, which is not limited herein. The corresponding communication component 705 may thus include: wi-Fi module, bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described method of knowledge map completion.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described knowledge-graph completion method is also provided. For example, the computer readable storage medium may be the memory 702 described above comprising program instructions executable by the processor 701 of the electronic device 700 to perform the knowledge-graph completion method described above.

Fig. 4 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure. For example, the electronic device 1900 in fig. 4 may be provided as a server. Referring to fig. 4, electronic device 1900 includes a processor 1922, which can be one or more in number, and memory 1932 for storing computer programs executable by processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the above-described knowledge-graph completion method.

Additionally, the electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management for the electronic device 1900, and the communication component 1950 may be configured to enable communication for the electronic device 1900, e.g., wired or wireless communication. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system stored in the memory 1932.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described knowledge-graph completion method is also provided. For example, the non-transitory computer readable storage medium may be the memory 1932 described above that includes program instructions executable by the processor 1922 of the electronic device 1900 to perform the method of knowledgegraph replenishment described above.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned knowledge-graph complementing method when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method of knowledge-graph completion, the method comprising:

2. The method of claim 1, wherein the completing the triple to be completed according to the first N candidate entity vectors comprises:

3. The method according to claim 1 or 2, wherein the entity vector data set is obtained by mapping a data source corresponding to the knowledge graph to be complemented based on word embedding processing of word2 vec;

4. The method of claim 1 or 2, wherein the known triples include a known head entity and a known tail entity, and wherein determining the vector representation of the target entity relationship from the plurality of known triples comprises:

calculating the difference value between the vector representation of the known head entity and the vector representation of the known tail entity in each known triple to obtain a first relation vector representation;

clustering all the first relation vector representations to obtain M clusters;

5. The method of claim 4, wherein the known entity is a head entity, and the determining the vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity comprises:

6. The method according to claim 5, wherein, in a case that the number of vector representations of the entity to be complemented is M, correspondingly, corresponding to M groups of the first N candidate entity vectors, the complementing the triple to be complemented according to the first N candidate entity vectors comprises:

and completing the triple to be completed according to the first N candidate entity vectors of each group aiming at each group of the first N candidate entity vectors so as to obtain M completed triples.

7. The method of claim 4, wherein the known entity is a tail entity, and the determining the vector representation of the entity to be complemented according to the vector representation of the target entity relationship and the vector representation of the known entity comprises:

8. A knowledge graph complementing apparatus, comprising:

9. A non-transitory computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, performs the steps of the method of any one of claims 1 to 7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.