CN113157934A

CN113157934A - Knowledge graph origin processing method and system, electronic device and storage medium

Info

Publication number: CN113157934A
Application number: CN202110246378.XA
Authority: CN
Inventors: 杨学; 马永征; 陈闻宇
Original assignee: China Internet Network Information Center
Current assignee: China Internet Network Information Center
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-07-23

Abstract

The invention provides a knowledge-graph origin processing method and system, electronic equipment and a storage medium, wherein the knowledge-graph origin processing method comprises the following steps: tracking derivation of query results on the knowledge graph based on cause origin and process tracing; wherein the causal origin is a set of relational edges provided by particular derivations of particular result items; the process traces back in a representation of a source polynomial representing a potential match to any set of queries; upon an update of the knowledge-graph, the origin polynomial is updated based on the updated knowledge-graph. The present invention tracks derivations of query results on a knowledge graph by proposing dynamic provenance polynomial concepts and encoding relational edges that generate answers. Repeated calculation of the query result after the knowledge graph is updated every time is avoided, the query efficiency of the knowledge graph is greatly improved, and computing resources are saved; meanwhile, the accuracy of origin tracing of the query result is ensured.

Description

Knowledge graph origin processing method and system, electronic device and storage medium

Technical Field

The present invention relates to the field of message passing technologies, and in particular, to a method and a system for processing an origin of a knowledge graph, an electronic device, and a storage medium.

Background

Knowledge graphs have increasingly become the mainstay of many knowledge-centric key applications. The use of large-scale knowledge graphs, which model interrelationships between various entities occurring in real life, has become common in many knowledge-centric applications. In addition to playing a key role in network search systems (e.g., google Knowledge Graph, Microsoft Bing Satori, etc.), they are also used in e-government, technical support, drug management, academic search, etc. scenarios.

While some knowledge-graphs are elaborated by hand, most of the large-scale knowledge-graphs that are in practical use are automatically constructed using one or more information extraction pipelines on various underlying data sources, based on the same or similar extraction techniques. Thus, a knowledge-graph may contain knowledge entities that are related by different mechanisms, or query results may be generated by a combination of facts from different sources. Therefore, it is not sufficient to track the fine-grained sources of a single entity in only one knowledge-graph, and we also need to track the traversability of the results of a single query. Therefore, it becomes important to establish a source trace for query results to determine how the results are produced. Result traceability is useful for assessing the trustworthiness of query results, generation of knowledge maps themselves, and providing answer interpretations.

In many applications, some queries are set as long-lasting, high-frequency queries whose query results are often displayed in a solid state based on efficiency and resource considerations. However, the knowledge graph itself is constantly changing due to changes in source data, improvements in extraction techniques, refinement/enrichment of information, and the like. Therefore, a problem is brought, namely how to efficiently and dynamically trace the source of the query result when the large-scale knowledge graph is queried, instead of recalculating the knowledge graph from the beginning every time the knowledge graph is updated, which becomes an important link for improving the working efficiency of the knowledge graph.

Disclosure of Invention

The invention provides a knowledge graph origin processing method and system, electronic equipment and a storage medium, which are used for solving the technical defects in the prior art.

The invention provides a knowledge graph origin processing method, which comprises the following steps:

tracking derivation of query results on the knowledge graph based on cause origin and process tracing;

wherein the causal origin is a set of relational edges provided by particular derivations of particular result items; the process traces back in a representation of a source polynomial representing a potential match to any set of queries;

upon an update of the knowledge-graph, the origin polynomial is updated based on the updated knowledge-graph.

Preferably, the method for processing the origin of the knowledge-graph, wherein the potential match represents that any sub-graph of the knowledge-graph can become an actual match of a query after inserting an edge, the potential match includes a first pattern potential match and a second pattern potential match, the first pattern potential match is a triple pattern of a newly added edge matching query, so that the sub-graph becomes an actual match of the query; the second pattern potential matches are triplets of a newly added edge matching query, making the subgraph an actual match for the query.

Preferably, the method for processing the origin of the knowledge-graph comprises the following steps:

the registered queries are classified according to the type of potential match into regular queries representing queries with only a first pattern potential match and multi-mapped queries representing queries with a second pattern potential match.

Preferably, the method for processing the origin of the knowledge-graph comprises the following steps: obtaining information of sub-queries from a basic graph schema of a query by maintaining each time;

the sub-queries represent: if the size of the given query is n, namely n triple patterns exist, deleting one triple pattern to generate two subgraphs in the generated sub-query at most; these two sub-graphs are denoted SQ1 and SQ2, without loss of generality, | SQ1| ═ k, | SQ2| ═ n-k-1 is set, where 0 ≦ k ≦ n-1.

The invention also provides a knowledge-graph origin processing system, comprising:

the query result tracking module is used for tracking derivation of a query result on the knowledge graph based on reason origin and process tracing;

and the origin polynomial updating module is used for updating the origin polynomial based on the updated knowledge graph when the knowledge graph is updated.

Preferably, the system for processing knowledge-graph provenance comprises a first module for matching a first pattern potential match with a second module for matching a second pattern potential match with a second pattern potential match, and a second module for matching a second pattern potential match with a second pattern potential match; the second pattern potential matches are triplets of a newly added edge matching query, making the subgraph an actual match for the query.

Preferably, the system for processing the knowledge-graph origin comprises:

the query classification module is used for classifying the registered queries into conventional queries and multi-mapping queries according to the types of the potential matches, wherein the conventional queries represent the queries with only the first pattern potential matches, and the multi-mapping queries represent the queries with the second pattern potential matches.

Preferably, the system for processing the knowledge-graph origin comprises:

the sub-query module is used for obtaining the information of the sub-query from the basic graph mode of the query by maintaining each time;

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for processing knowledge-graph origins as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of knowledge-graph provenance processing as described in any one of the above.

The invention provides a knowledge graph origin processing method, which tracks derivation of query results on a knowledge graph by proposing a dynamic origin polynomial concept and encoding relationship edges for generating answers. When the knowledge graph is updated, the origin polynomial is updated based on the updated knowledge graph, repeated calculation of query results after the knowledge graph is updated every time is avoided, query efficiency of the knowledge graph is greatly improved, and computing resources are saved; meanwhile, the accuracy of origin tracing of the query result is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a knowledge-graph provenance processing method provided by the present invention;

FIG. 2 is a schematic diagram of a knowledge-graph provenance processing system provided by the present invention;

FIG. 3 is a schematic structural diagram of an electronic device provided by the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a knowledge graph origin processing method, which is shown in figure 1 and comprises the following steps:

s1, tracking derivation of the query result on the knowledge graph based on reason origin and process tracing;

at S1, the query origin includes two important categories, namely, a reason origin, which is a set of edges provided by a particular derivation of a particular result item, and a process trace back. On the other hand, the method is used for describing the interaction among relations in the derivation process of one answer by figurative coding retrospectively. In the origin of the cause, the expression mode is set to be expressed as that a polynomial and the edge of each single item represent binding to result in an answer, and the same or highly approximate result is obtained by combining a plurality of groups of different relational edges; for process tracing, more information about the production result needs to be captured, using the origin polynomial representation.

The basis of the construction method of the origin polynomial is to match various concepts of a specific query q. Intuitively, these potential matches for a query correspond to subgraphs of the knowledge-graph that partially match the given query graph pattern. For maintenance of potential matches for queries, only one relationship edge needs to be inserted to convert them to a full match.

Consider a query Q that contains n triplet patterns. Assuming that S ∈ G is a subgraph, it can match n-1 triplet patterns of query Q, and only 1 triplet pattern does not match. The latter new edge e unmatched triplet pattern matched Q is the new subgraph added to S, which becomes an actual match for Q, in other words S ∈ a (Q), the query result set of queries for a (Q), S is the potential match. Any sub-graph S of the knowledge-graph G, after inserting an edge, can become an actual match for the query Q, called a potential match.

at the time of the update of the knowledge-graph, the origin polynomial is updated based on the updated knowledge-graph, S2.

The potential match represents that any sub-graph of the knowledge graph can become an actual match of a query after inserting an edge, the potential match comprises a first pattern potential match and a second pattern potential match, the first pattern potential match is a triple pattern of a newly added edge matching query, and the sub-graph becomes an actual match of the query; the second pattern potential matches are triplets of a newly added edge matching query, making the subgraph an actual match for the query.

The present invention tracks derivations of query results on a knowledge graph by proposing dynamic provenance polynomial concepts and encoding relational edges that generate answers. When the knowledge graph is updated, the origin polynomial is updated based on the updated knowledge graph, repeated calculation of query results after the knowledge graph is updated every time is avoided, query efficiency of the knowledge graph is greatly improved, and computing resources are saved; meanwhile, the accuracy of origin tracing of the query result is ensured. At the same time, automatic maintenance updates to the provenance polynomial can be implemented in the face of updates (insertion and deletion of facts) to the underlying knowledge-graph.

Specifically, the first pattern potential match is a 1:1 potential match. If the newly added edge e matches only one triplet pattern of query Q, such that S ═ S ue { e } becomes one actual match for Q, then sub-graph S is referred to as a 1:1 potential match, denoted by PM1: 1. The second pattern potential match is a 1: M potential match, and if a newly added edge e matches the three patterns of query Q, making S ═ S { e } an actual match for Q, then sub-graph S is called a 1: M potential match, denoted by PM1: M.

The method comprises the following steps:

the registered queries are classified according to the type of potential match into regular queries representing queries with only a first pattern potential match and multi-mapped queries representing queries with a second pattern potential match. Both queries are join queries, which differ only in the distribution of predicates in the ternary patterns. Queries with only 1:1 potential matches (no 1: m potential matches) are called regular queries, consisting of Q_regAnd (4) showing. A query with 1: m potential matches (possibly in addition to 1:1 potential matches) is called a multiple-mapping query, denoted by Q_mmAnd (4) showing.

The potential matches and queries are separated into different classes because they need to be processed in different ways. The query registration process policy is based on the following.

Introduction 1: PM1:1 may satisfy one and only one child query of size n-1 given parent query Q of size n.

2, leading: the PM1 m cannot satisfy a child query of size n-1 given a parent query Q of size n.

And 3, introduction: after a conditional edge insertion, PM1:m satisfies parent query Q if and only if PM1:m satisfies all children of Q.

The results of long-term queries, maintained for all registrations, are maintained using an originating polynomial, along with facts relevant to the query. By maintaining information for sub-queries at a time obtained from the basic graph schema of the query. In order to improve efficiency in all queries/sub-queries in the workload, the queried results are shared AND utilized, AND the sub-query execution plans generated by using an AND-OR graph are combined to form a single global execution tree. When the knowledge-graph is updated, the use of the screening and optimization paradigm to compute updated query results also facilitates fast recalculation of sub-query results.

Preferably, the above method comprises: obtaining information of sub-queries from a basic graph schema of a query by maintaining each time;

Depending on the size of the subgraph and the connection points of the results, the sub-queries can be divided into the following different types:

the sub-query has a singleton sub-graph k equal to 0, and is generated when the triple pattern corresponding to the edge of the leaf node of the query graph is deleted.

Type II a subgraph contains only one triple pattern, i.e. k 1, in which case eventually all edges are identified as tie points using the triple pattern in SQ2, which may justify a large number of potential entities in the knowledge graph. To avoid this, this patent only identifies the connection point from a (SQ 1).

Both types III, SQ1 and SQ2 contain at least two triplet patterns, i.e., 2 ≦ k < n-2. For example, a deletion composer generates two sub-queries, each of which differs in size by 2, which results in at most | A (SQ1) | + | A (SQ2) | nexus.

Type IV in the last type, deleting a triplet pattern from a query does not disconnect the query graph, i.e., k is n-1, yielding 2 potential matches per match SQ 1.

The specific implementation process of the invention is as follows:

data set, using the common YAG02 data set.

The query set, for the YAG02 data set, was validated using RDF-3X.

The knowledge-graph samples, for each knowledge-graph, generate an insertion workload in the following manner. From the initial starting graph, a pair of unconnected vertices is randomly selected and connected with a randomly selected predicate.

Neo4j was selected to store a knowledge graph, baseline system. Origin computation customization is supported using an open-source tripleprofv.

The knowledge-graph origin processing system provided by the invention is described below, and the knowledge-graph origin processing system described below and the knowledge-graph origin processing method described above can be referred to correspondingly.

The embodiment of the invention discloses a knowledge graph origin processing system, which is shown in figure 2 and comprises the following components:

a query result tracking module 10, configured to track derivation of a query result on a knowledge graph based on cause origin and process tracing;

an origin polynomial updating module 20 for updating the origin polynomial based on the updated knowledge-graph upon an update of the knowledge-graph.

The system of the present invention comprises:

Further, the system of the present invention comprises:

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a method of knowledge-graph origin processing, the method comprising:

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, the computer is capable of performing a method of knowledge-graph provenance processing, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that when executed by a processor is implemented to perform a method of knowledge-graph provenance processing, the method comprising:

The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A knowledge-graph provenance processing method, comprising:

2. The method of claim 1, wherein the potential matches represent that any sub-graph of the knowledge-graph can be an actual match of a query after inserting an edge, and the potential matches include a first pattern potential match and a second pattern potential match, and the first pattern potential match is a triple pattern of a newly added edge matching query, so that the sub-graph becomes an actual match of the query; the second pattern potential matches are triplets of a newly added edge matching query, making the subgraph an actual match for the query.

3. The method of knowledge-graph provenance processing according to claim 2, comprising:

4. The method of knowledge-graph provenance processing according to claim 1, comprising: obtaining information of sub-queries from a basic graph schema of a query by maintaining each time;

5. A knowledge-graph provenance processing system, comprising:

6. The system of claim 5, wherein the potential matches represent any sub-graph of the knowledge-graph that can be an actual match of a query after inserting an edge, the potential matches comprising a first pattern potential match and a second pattern potential match, the first pattern potential match being a triple pattern of a newly added edge matching query, such that the sub-graph becomes an actual match of the query; the second pattern potential matches are triplets of a newly added edge matching query, making the subgraph an actual match for the query.

7. The system of knowledge-graph provenance processing according to claim 6, comprising:

8. The system of knowledge-graph provenance processing according to claim 5, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of knowledge-graph origin processing according to any one of claims 1 to 4 are implemented when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of knowledge-graph provenance processing of any of claims 1 to 4.