WO2022261840A1

WO2022261840A1 - Method and apparatus for missing link prediction for knowledge graph

Info

Publication number: WO2022261840A1
Application number: PCT/CN2021/100199
Authority: WO
Inventors: Shi Xia Liu; Wei Hao WANG; Daniel Schneegass; Xiao Liang; Johannes Kehrer; Sebastian-Philipp Brandt
Original assignee: Siemens Aktiengesellschaft; Siemens Ltd., China
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2022-12-22
Also published as: CN117651942A

Abstract

A method, apparatus, system and computer-readable medium for missing link prediction for a knowledge graph are presented. A method (100) can include following steps: predicting (S101) at least one missing link in a first knowledge graph; for each missing link, measuring (S102) effect on the first knowledge graph by adding the missing link, displaying (S103) the measured effect to a user, receiving (S104) the user's decision on whether to add the missing link based on the measured effect, and processing (S105) the first knowledge graph according to the user's decision. With the solution provided, data quality of a knowledge graph can be improved.

Description

[Title established by the ISA under Rule 37.2] METHOD AND APPARATUS FOR MISSING LINK PREDICTION FOR KNOWLEDGE GRAPH

Technical Field

The present invention relates to techniques of knowledge graph, and more particularly to a method, apparatus and computer-readable storage medium for missing link prediction for a knowledge graph.

Background Art

Knowledge graphs (KG) are structured semantic networks that describe entities and their relationships. They were first proposed for understanding the user's search intent to improve search quality.

FIG. 1 shows an example of a knowledge graph, which effectively organizes scattered knowledge through a structured method for easy reference and utilization. Unlike the black-box model of deep learning, knowledge graphs are more interpretable and accessible for users to understand and use. Due to its rich semantic information, knowledge graphs have been widely used in intelligent question answering, social networking, anti-fraud, etc.

However, due to poor data quality or some data processing errors in the process of building knowledge graphs, the built ones may have some missing links, which will affect users’ access to information.

Summary of the Invention

As mentioned above, there might be missing links in knowledge graphs, which will affect the quality of data. In present disclosure, a method, apparatus and computer-readable storage medium for missing link prediction for a knowledge graph are presented, visual analytics are provided to help users analyze and complete the knowledge graph, users can easily find possible missing links and add them in the knowledge graph.

Furthermore, large-scale knowledge graphs contain rich knowledge. however, due to a large number of entities (nodes) and relationships (links) , also with the heterogeneous structure, it is difficult for users to directly obtain valuable information from knowledge graphs. So, in some embodiments of the present disclosure, part of a knowledge graph can be displayed based on a user’s interest.

Also, in some embodiments, concepts relationship can be shown, with which users can better understand information in the knowledge graph.

Furthermore, search view and filtering view are also provided for convenience of user’s operation.

Brief Description of the Drawings

The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present disclosure taken in conjunction with the accompanying drawings, wherein:

FIG. 1 depicts an example of knowledge graph.

FIG. 2 depicts a block diagram of an apparatus for missing link prediction for a knowledge graph in accordance with one embodiment of the present disclosure.

FIG. 3 depicts flow diagrams of a method for missing link prediction for a knowledge graph in accordance with one embodiment of the present disclosure.

FIG. 4 depicts adding missing links in a knowledge graph in accordance with one embodiment of the present disclosure.

FIG. 5 depicts an instance view in accordance with one embodiment of the present disclosure.

FIG. 6A and FIG. 6B depict calculation of a user’s interest on a node in a knowledge graph in accordance with one embodiment of the present disclosure.

FIG. 7A, FIG. 7B and FIG. 7C depict process of generating an instance view in accordance with one embodiment of the present disclosure.

FIG. 8 depicts an instance view in accordance with one embodiment of the present disclosure.

FIG. 9 depicts process of generating a concept view in accordance with one embodiment of the present disclosure.

FIG. 10 depicts search view in accordance with one embodiment of the present disclosure.

FIG. 11A～FIG. 11C depicts process of node filtering in accordance with one embodiment of the present disclosure.

Reference Numbers:

10, an apparatus for missing link prediction for a knowledge graph

101, at least one memory

102, at least one processor

103, a display

11, a knowledge graph processing program

111, a processing module

112, a displaying module

113, an interaction module

12, knowledge graph

100, a method for missing link prediction for a knowledge graph

S101～S121, steps of method 100

S1161～S1165, sub steps of S116

Detailed Description of Example Embodiments

Hereinafter, above-mentioned and other features of the present technique are described in detail. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.

When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.

Now the present disclosure will be described hereinafter in details by referring to FIG. 2 to FIG. 11C.

FIG. 2 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure. The apparatus 10 for missing link prediction for a knowledge graph in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for missing link prediction for a knowledge graph in the present disclosure. The apparatus 10 can also be a single computer, as shown in FIG. 2, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) . The apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc. embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.

The at least one memory 101 shown in FIG. 2 can contain a knowledge graph processing program 11, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for missing link prediction for a knowledge graph in the present disclosure. Knowledge graph 12 can also be stored in the at least one memory 101.

The knowledge graph processing program 11 can include:

- a processing module 111, configured to execute data processing, making judgements, and other processing related steps;

- a displaying module 112, configured to display information to a user;

- an interaction module 113, configured to interact with the user, receive user’s input, send a response to the user, etc.

The knowledge graph processing program 11 can provide following functions with above three modules.

1. missing link prediction, as shown in FIG. 4

2. subgraph extraction, as shown in FIG. 5, FIG. 6A and FIG. 6B

3. instance view, as shown in FIG. 7A, FIG. 7B, FIG. 7C and FIG. 8

4. concept view, as shown in FIG. 9

5. search view, as shown in FIG. 10

6. filtering view, as shown in FIG. 11A, FIG. 11B and FIG. 11C

Now, above functions will be described hereinafter in details by referring to FIG. 4 to FIG. 11C.

1. missing link prediction

Predicted missing links in the knowledge graph can be shown to users, and preferably, user can choose to add the missing links via interaction with the interaction module 113, such as via an editing interface to improve data quality. Path Ranking Algorithm (PRA) can be used as the link prediction model here (Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104 (1) : 11-33, 2015) .

For missing link prediction, the processing module 111 can be configured to: predict at least one missing link in a first knowledge graph, and for each missing link, measure effect on the first knowledge graph by adding the missing link. The displaying module 112 can be configured to: for each missing link, display the measured effect to a user. the interaction module 113, configured to for each missing link, receive the user’s decision on whether to add the missing link based on the measured effect. The processing module 111 can be further configured to process the first knowledge graph according to the user’s decision.

Optionally, when measuring effect on the first knowledge graph by adding the missing link, the processing module 111 can be further configured to: calculate at least one of the following metrics to measure the effect of adding the missing link in the first knowledge graph (as shown on the left of FIG. 4) :

- PageRank;

- betweenness;

- closeness;

And when displaying the measured effect to a user, the displaying module 112 can be further configured to: for each aspect, display degree of effect according to measured value of the aspect.

PageRank, betweenness, and closeness are metrics used in graph theory to measure the effect before and after adding a missing link. Optionally, as shown in FIG. 4, three arcs above an icon corresponding to a node can be used to represent the three metrics respectively, and for each metric, different colors can be used to present the effect of adding a missing link (for example, difference before and after adding a missing link) .

Optionally, the processing module can be further configured to: for each missing link, determine paths for predicting the missing link and display the paths to the user for inspecting the reason for the missing link. When receiving the user’s decision on whether to add the missing link based on the measured effect, the interaction module 113 can be further configured to receive the user’s decision on whether to add the at least one missing link based on the measured effect and the paths.

For example, a directed acyclic graph (DAG) can be used to show the paths of model prediction. As shown on the right of FIG. 4, same path can be merged, and sequential color can be used to represent the weight of path, the darker the color, the higher the weight.

2. subgraph extraction

To show the part of a knowledge graph which a user is interested in, subgraph extraction can be implemented, wherein the interaction module 113 can be configured to: receive the user’s search request for a first node (z) in the first knowledge graph and receive the user’s indication of currently focusing on a second node (y) in the first knowledge graph. The processing module 111 can be configured to: generate a second knowledge graph including the first node (z) , the second node (y) and at least one third node (x) in the first knowledge graph, wherein the user’s interest on each third node (x) is higher than a first pre-defined threshold and the user’s interest on a third node (x) is calculated based on the relation of the third node (x) to the first node (z) and the second node (y) , the tighter the relation, the more interest on the third node (x) .

Optionally, DOI (Degree of Interest) can be used to measure the user's interest in an entity (node) in the knowledge graph (F. van Ham and A. Perer. "search, show context, expand on demand" : Supporting large graph exploration with degree-of-interest. IEEE Transactions on Visualization and Computer Graphics, 15 (6) : 953-960, Nov 2009) . The DOI-based subgraph extract algorithm can be used to calculate the DOI of the nodes in the knowledge graph according to the user's input. Then we take the entities with the highest DOI to display, thereby eliminating a large number of irrelevant entities (nodes) and relationships (links) . In this way, we can reduce visual clutter and the complexity of the front-end layout algorithm, which can help users efficiently analyze the result.

The calculation of DOI can contains three parts:

1) a priori interest API (x)

API (x) can be the degree to be calculated, PageRank (Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. 1998) . Value and any other metric that can be used to measure the importance of node x;

2) user interest UI (x; z) for a given node z searched by the user

UI(x; z) can be the similarity between node x and node z, which can be the cosine similarity, Jaccard coefficient, etc. The specific calculation method can be determined according to the actual application scenario;

3) distance D (x; y)

distance D (x; y) can refer to the distance between node x and the currently focal node y, which is generally the length of the shortest path on the graph.

By combining the above three parts, the degree of interest DOI (x|y, z) can be denoted as followed:

DOI (x|y, z) =αAPI (x) + βUI (x, z) + γD (x, y)

That is, given the node z searched by the user and the current focal node y, the user's DOI on node x can be denoted as the weighted sum of a priori interest API (x) , user interest UI (x, z) , and distance D (x, y) , where α, β, andγcan be used to control the weight of these three items respectively.

Preferably, when the number of nodes in the knowledge graph is not large, we can use a global manner to calculate the DOI of all nodes, which can improve the accuracy of the subgraph extract result.

However, when applying to a large-scale knowledge graph, it may take a lot of time in calculating the DOI of all nodes. To tackle this problem, a greedy manner to calculate the DOI can be used (F. van Ham and A. Perer. "search, show context, expand on demand" : Supporting large graph exploration with degree-of-interest. IEEE Transactions on Visualization and Computer Graphics, 15 (6) : 953-960, Nov 2009) . Given the number of nodes in knowledge graph n, the number of nodes to show S, an initial node set F and a list of potential candidates L= {y} , where S << n, Van's algorithm is as followed:

1) Remove the node x with the highest DOI from L and put x into F. If it's the first iteration, take x as the focal node x ₀.

2) Calculate the DOI for all immediate neighbors N (x) and add them into list L.

3) Iteratively execute from Step 1) until the size of the set F reaches the number S or L is empty.

4) Generate the induced subgraph Gs of F as the final result.

In the case of using a max-heap as the data structure to maintain list L and the node degree is constant, each iteration requires O (1 + log S) . The total time complexity required to add S nodes is O (Slog S) . However, there is a power-law distribution of the degrees of nodes in the knowledge graph (Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the internet topology. ACM SIGCOMM computer communication review, 29 (4) : 251-262, 1999) . There are a few nodes with extremely high degrees. These nodes are connected to a large number of other nodes, causing them to be frequently accessed. The calculation of the interest of all neighbors of theses nodes may consume a lot of time and affect the efficiency of interaction.

In present disclosure, a clustering algorithm is presented to reduce the DOI calculation of the neighbors of high degree nodes, which can achieve the effect of approximate acceleration. Given the number of nodes S to be displayed, k-means algorithm (John A Hartigan and Manchek A Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 28 (1) : 100-108, 1979) can be used to cluster all nodes. A degree threshold T can be set. When the degree of the node is higher than T, the nodes under each cluster of the node can be arranged in descending order of degree and retain the top-Snodes.

Initially, given the number of nodes in knowledge graph n, the number of nodes to show S, an initial node set F and a list of potential candidates L= {y} , where S<<n, our algorithm can be as followed:

1) Remove the node x with the highest DOI from L and put x into F. If it's the first iteration, take x as the focal node x ₀ and the cluster label of x ₀ as z ₀, nodes denoted with circles filled in slashes in FIG. 6A and FIG. 6B.

2) If the degree of x is lower than T , as shown in FIG. 6A, calculate the DOI for all immediate neighbors N (x) and put it into the list L; if the degree of x is higher than T, as shown in FIG. 6B, we only calculate the DOI for the neighbors whose cluster label is z ₀ and with the top-Sdegree and put them into the list L.

4) Generate the induced subgraph Gs of F as the final result.

After the above procedure, the time complexity to calculate the DOI of the neighbors of high degree nodes is reduced to O (S) , so that the total time complexity of the algorithm reaches O (S ² log S) , which can meet the requirements of real-time computing.

As a result, calculation efficiency of DOI can be improved and valuable information can be displayed to users. Also, after building some appropriate indexes, data can be efficiently read from the at least one memory 102 without loading the whole knowledge graph 12 at runtime, which can significantly reduce the cache occupation and data loading required time.

3. instance view

For uses’ easily and efficiently obtaining valuable information from a knowledge graph, a “focus+context” visualization technique can be adopted. After getting the extracted subgraph, or based on the original whole knowledge graph, sampling (such as random sampling) can be used to sample the links to reduce the visual clutter.

Optionally, the processing module 111 can be configured to: take the first node or the second node as a current focal node and take neighbors of the current focal node on which the user’s interest is higher than a second pre-defined threshold as centroid nodes, generate an initial layout by force-directed layout algorithm based on the current focal node and the centroid nodes while making the current focal node as close to the center as possible. The processing module 111 can repeat following steps to generate a final layout based on the initial layout until convergence:

- calculating the Voronoi diagram of all the centroid nodes;

- doing layout by force-directed layout algorithm while making the current focal node as close to the center as possible and ensuring all the neighbors of each centroid node except the current focal node in their corresponding Voronoi cells.

And the display module 112 can be further configured to display the final layout.

For example, the currently searched node (z, the first node) or clicked node (y, the second node) can be taken as focal node n _f , take the neighbors of the focal node whose degree is higher than 1 as centroid nodes n _c, and take the neighbors of each centroid node (except n _f) as sub-nodes n _s. The layout algorithm to generate the final layout can be as followed:

1) Use a force-directed layout algorithm to generate an initial layout. During the layout generation process, we can make the focal node n _f as close to the center as possible.

2) Calculate the Voronoi diagram of all centroid nodes n _c.

3) Continue to do layout by the force-directed layout algorithm. During the layout generation process, we can make the focal node n _f as close to the center as possible and ensure that all sub-nodes n _s is not out of their corresponding Voronoi cells.

4) Repeat from Step 2) until reach convergence.

FIG. 7A～FIG. 7C depict the final layout generation process, wherein FIG. 7A shows how to use a force-directed layout to generate an initial layout and select several nodes by degree as centroid nodes (in bold edged rectangles) . FIG. 7B shows how to calculate Voronoi diagram according to the given centroid nodes. FIG. 7C shows how to iterate until reach convergence. FIG. 5 shows an example of an instance view for a knowledge graph.

With the above mentioned process, centroid nodes nc can spread the screen and all sub-nodes ns can be inside their Voronoi cells, which makes unimportant links as short as possible, and thus reduces link crossing and visual clutter.

Optionally, when displaying the final layout, the display module 112 can be further configured to execute at least one of following steps:

- displaying an icon representing concept for each node;

- displaying a link in a way representing relationship type;

- displaying a node in a way representing number of unshown neighbors;

- displaying a missing link with a dotted line;

- displaying a missing link in a way representing confidence.

As shown in FIG. 8, icons can be used to represent concept (such as university, country and city) of a node and qualitative colors can be used to represent different relationship types, and a metaphor like a tail can be used to represent the number of unshown neighbors. A dotted line can be used to represent a missing link and sequential colors can be used to represent the confidence of the missing line. Users can clearly distinguish the three different concepts of country, city, and university from the figure, as well as three different relationship types.

4. concept view

The concept view can display information from a higher level than entity (node) level, which can help users to understand the concept hierarchy in an instance view. The concept hierarchy structure of all concepts can be very large, in present disclosure only the part of interest can be displayed.

For generating a concept graph, the processing module 111 can be further configured to: find first concepts in the final layout, find a lowest common ancestor between each concept pair in the first concepts, and find children of each lowest common ancestor, then generate a concept graph including all found concepts and the first concepts. The display module 112 can be further configured to display the concept graph.

As shown in FIG. 9, given current concepts C ₀ in an instance view, first the lowest common ancestor between each concept pair of C ₀ can be found. Child nodes of the lowest common ancestor can be found to show users how the two concepts are separated. Then the found concepts together with C ₀ can form a concept graph that can be visualized.

Here, a bubble treemap (Jochen G¨ortler, Christoph Schulz, Daniel Weiskopf, and Oliver Deussen. Bubble treemaps for uncertainty visualization. IEEE Transactions on Visualization and Computer Graphics, 24 (1) : 719–728, 2018) can be used to show the hierarchy structure of the concept graph.

After user’s hovering over a node in an instance view, the corresponding concept in the concept graph view can be also highlighted, and vice versa.

5. search view

Users can input some words to get recommended nodes in the search view, as shown in FIG. 10, after getting the users'input, the search view will recommend some options for users to choose from. If users choose one option, the instance view will show the search result.

Due to the number of nodes in the knowledge graph may be too large, matching the input results directly will take a lot of time. Therefore, a prefix tree (trie) can be adopted as the data structure to store the names of the nodes. The time complexity of each query is O (m) , m is the length of the input string, which meets the needs of real-time interaction. In addition, since there may be many nodes that satisfy the same prefix, at most top-5 degree nodes are shown here.

6. filtering view

Users can filter out nodes by both numerical (FIG. 11A) and categorical (FIG. 11B) attributes in the filter view. Besides, a zig-zag layout direct acyclic graph (DAG) can be adopted to show the summary of applied rules (FIG. 11C) , where users can click on a specific filter rule to modify it again.

Although the processing module 111, the displaying module 112, the interaction module 113 are described above as software modules of the knowledge graph processing program 11. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.

It should be mentioned that the present disclosure may include apparatuses having different architecture than shown in FIG. 2. The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 3.

Various methods in accordance with the present disclosure may be carried out. One exemplary method 100 according to the present disclosure includes steps shown in FIG. 3.

By executing following steps S101～S105, and optional S106 and S107, missing links can be predicted and added in a knowledge graph:

- S101: predicting at least one missing link in a first knowledge graph; and for each missing link,

- S102: measuring effect on the first knowledge graph by adding the missing link;

- S103: displaying the measured effect to a user;

- S104: receiving the user’s decision on whether to add the missing link based on the measured effect;

- S105: processing the first knowledge graph according to the user’s decision.

Optionally, the step S102 measuring effect on the first knowledge graph by adding the missing link can include: calculating at least one of the following aspects to measure the effect of adding the missing link in the first knowledge graph: PageRank, betweenness and closeness; the step S103 displaying the measured effect to a user can include: for each aspect, displaying degree of effect according to measured value of the aspect.

Optionally, the method 100 can further include: for each missing link,

- S106: determining paths for predicting the missing link;

- S107: displaying the paths to the user;

The step S104 receiving the user’s decision on whether to add the missing link based on the measured effect can include: receiving the user’s decision on whether to add the at least one missing link based on the measured effect and the paths.

By executing following steps S108～S110, an extracted subgraph can be acquired:

- S108: receiving the user’s search request for a first node in the first knowledge graph;

- S109: receiving the user’s indication of currently focusing on a second node in the first knowledge graph;

- S110: generating a second knowledge graph including the first node, the second node and at least one third node in the first knowledge graph, wherein the user’s interest on each third node is higher than a first pre-defined threshold and the user’s interest on a third node is calculated based on the relation of the third node to the first node and the second node, the tighter the relation, the more interest on the third node.

By executing following steps S111～S116, an instance view can be displayed:

- S111: taking the first node or the second node as a current focal node;

- S112: taking neighbors of the current focal node on which the user’s interest is higher than a second pre-defined threshold as centroid nodes;

- S113: generating an initial layout by force-directed layout algorithm based on the current focal node and the centroid nodes while making the current focal node as close to the center as possible;

Steps S114～S115 can be repeated to generate a final layout based on the initial layout until convergence:

- S114: calculating the Voronoi diagram of all the centroid nodes;

- S115: doing layout by force-directed layout algorithm while making the current focal node as close to the center as possible and ensuring all the neighbors of each centroid node except the current focal node in their corresponding Voronoi cells;

Finally, with step S116: the final layout can be displayed.

Optionally, the step S116 displaying the final layout can include at least one of following sub steps:

- S1161: displaying an icon representing concept for each node;

- S1162: displaying a link in a way representing relationship type;

- S1163: displaying a node in a way representing number of unshown neighbors;

- S1164: displaying a missing link with a dotted line;

- S1165: displaying a missing link in a way representing confidence.

By executing following steps S117～S121, a concept view can be generated and displayed:

- S117: finding first concepts in the final layout;

- S118: finding a lowest common ancestor between each concept pair in the first concepts;

- S119: finding children of each lowest common ancestor;

- S120: generating a concept graph including all found concepts and the first concepts;

- S121: displaying the concept graph.

For other embodiments of the method 100, details can be referred to the description to the knowledge graph processing program 11.

While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.

Claims

A method (100) for missing link prediction for a knowledge graph, comprising:

- predicting (S101) at least one missing link in a first knowledge graph;

- for each missing link,

- measuring (S102) effect on the first knowledge graph by adding the missing link;

- displaying (S103) the measured effect to a user;

- receiving (S104) the user’s decision on whether to add the missing link based on the measured effect;

- processing (S105) the first knowledge graph according to the user’s decision.
the method according to claim 1, wherein

- measuring (S102) effect on the first knowledge graph by adding the missing link comprises: calculating at least one of the following metrics to measure the effect of adding the missing link in the first knowledge graph:

- PageRank;

- betweenness;

- closeness;

- displaying (S103) the measured effect to a user comprises: for each aspect, displaying degree of effect according to measured value of the aspect.
the method according to claim 1, wherein

- the method further comprises: for each missing link,

- determining (S106) paths for predicting the missing link;

- displaying (S107) the paths to the user;

- receiving (S104) the user’s decision on whether to add the missing link based on the measured effect comprises:

- receiving the user’s decision on whether to add the at least one missing link based on the measured effect and the paths.
the method according to claim 1, further comprising:

- receiving (S108) the user’s search request for a first node in the first knowledge graph;

- receiving (S109) the user’s indication of currently focusing on a second node in the first knowledge graph;

- generating (S110) a second knowledge graph including the first node, the second node and at least one third node in the first knowledge graph, wherein the user’s interest on each third node is higher than a first pre-defined threshold and the user’s interest on a third node is calculated based on the relation of the third node to the first node and the second node, the tighter the relation, the more interest on the third node.
the method according to claim 4, further comprising:

- taking (S111) the first node or the second node as a current focal node;

- taking (S112) neighbors of the current focal node on which the user’s interest is higher than a second pre-defined threshold as centroid nodes;

- generating (S113) an initial layout by force-directed layout algorithm based on the current focal node and the centroid nodes while making the current focal node as close to the center as possible;

- repeating following steps to generate a final layout based on the initial layout until convergence:

- calculating (S114) the Voronoi diagram of all the centroid nodes;

- doing (S115) layout by force-directed layout algorithm while making the current focal node as close to the center as possible and ensuring all the neighbors of each centroid node except the current focal node in their corresponding Voronoi cells;

- displaying (S116) the final layout.
the method according to claim 5, wherein displaying (S116) the final layout comprises at least one of following steps:

- displaying (S1161) an icon representing concept for each node;

- displaying (S1162) a link in a way representing relationship type;

- displaying (S1163) a node in a way representing number of unshown neighbors;

- displaying (S1164) a missing link with a dotted line;

- displaying (S1165) a missing link in a way representing confidence.
the method according to claim 5, further comprising:

- finding (S117) first concepts in the final layout;

- finding (S118) a lowest common ancestor between each concept pair in the first concepts;

- finding (S119) children of each lowest common ancestor;

- generating (S120) a concept graph including all found concepts and the first concepts;

- displaying (S121) the concept graph.
An apparatus (10) for missing link prediction for a knowledge graph, comprising:

- a processing module (111) , configured to:

- predict at least one missing link in a first knowledge graph;

- for each missing link, measure effect on the first knowledge graph by adding the missing link;

- a displaying module (112) , configured to: for each missing link, display the measured effect to a user;

- an interaction module (113) , configured to: for each missing link, receive the user’s decision on whether to add the missing link based on the measured effect;

- the processing module (111) , further configured to process the first knowledge graph according to the user’s decision.
the apparatus according to claim 8, wherein

- when measuring effect on the first knowledge graph by adding the missing link, the processing module (111) is further configured to: calculate at least one of the following metrics to measure the effect of adding the missing link in the first knowledge graph:

- PageRank;

- betweenness;

- closeness;

- when displaying the measured effect to a user, the displaying module (112) is further configured to: for each aspect, display degree of effect according to measured value of the aspect.
the apparatus according to claim 8, wherein

- the processing module (111) is further configured to: for each missing link,

- determine paths for predicting the missing link;

- display the paths to the user;

- when receiving the user’s decision on whether to add the missing link based on the measured effect, the interaction module (113) is further configured to:

- receive the user’s decision on whether to add the at least one missing link based on the measured effect and the paths.
the apparatus according to claim 8, wherein

- the interaction module (113) is further configured to:

- receive the user’s search request for a first node in the first knowledge graph;

- receive the user’s indication of currently focusing on a second node in the first knowledge graph;

- the processing module (111) is further configured to: generate a second knowledge graph including the first node, the second node and at least one third node in the first knowledge graph, wherein the user’s interest on each third node is higher than a first pre-defined threshold and the user’s interest on a third node is calculated based on the relation of the third node to the first node and the second node, the tighter the relation, the more interest on the third node.
the apparatus according to claim 11, wherein

- the processing module (111) is further configured to:

- take the first node or the second node as a current focal node;

- take neighbors of the current focal node on which the user’s interest is higher than a second pre-defined threshold as centroid nodes;

- generate an initial layout by force-directed layout algorithm based on the current focal node and the centroid nodes while making the current focal node as close to the center as possible;

- repeat following steps to generate a final layout based on the initial layout until convergence:

- calculating the Voronoi diagram of all the centroid nodes;

- doing layout by force-directed layout algorithm while making the current focal node as close to the center as possible and ensuring all the neighbors of each centroid node except the current focal node in their corresponding Voronoi cells;

- the display module (112) is further configured to display the final layout.
the apparatus according to claim 12, wherein when displaying the final layout, the display module (112) is further configured to execute at least one of following steps:

- displaying an icon representing concept for each node;

- displaying a link in a way representing relationship type;

- displaying a node in a way representing number of unshown neighbors;

- displaying a missing link with a dotted line;

- displaying a missing link in a way representing confidence.
the apparatus according to claim 12, wherein

- the processing module (111) is further configured to:

- find first concepts in the final layout;

- find a lowest common ancestor between each concept pair in the first concepts;

- find children of each lowest common ancestor;

- generate a concept graph including all found concepts and the first concepts;

- the display module (112) is further configured to: display the concept graph.
An apparatus (10) for missing link prediction for a knowledge graph, comprising:

- at least one processor (102) ;

- at least one memory (101) , coupled to the at least one processor (102) , configured to execute method according to any of claims 1～7.
A computer-readable medium for missing link prediction for a knowledge graph, storing computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to any of claims 1～7.