CN111339334A - Data query method and system for heterogeneous graph database - Google Patents

Data query method and system for heterogeneous graph database Download PDF

Info

Publication number
CN111339334A
CN111339334A CN202010086983.0A CN202010086983A CN111339334A CN 111339334 A CN111339334 A CN 111339334A CN 202010086983 A CN202010086983 A CN 202010086983A CN 111339334 A CN111339334 A CN 111339334A
Authority
CN
China
Prior art keywords
sub
node
query
metadata
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010086983.0A
Other languages
Chinese (zh)
Other versions
CN111339334B (en
Inventor
唐烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010086983.0A priority Critical patent/CN111339334B/en
Publication of CN111339334A publication Critical patent/CN111339334A/en
Application granted granted Critical
Publication of CN111339334B publication Critical patent/CN111339334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Abstract

The application discloses a data query method and a data query system for a heterogeneous graph database. In the method, the incidence relation between metadata in a unified view and metadata in a plurality of sub-images of a heterogeneous graph database is preset; acquiring a unified query request which accords with metadata in a unified view, and decomposing the unified query request into a plurality of sub-queries corresponding to different sub-graphs according to the incidence relation; respectively executing a plurality of sub-queries in a plurality of subgraphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-queries and the sub-query results accord with metadata in the corresponding subgraphs; and merging the plurality of sub-query results according to the association relation to obtain the query result conforming to the metadata in the unified view.

Description

Data query method and system for heterogeneous graph database
Technical Field
The present description relates to the field of database technology.
Background
The traditional database is difficult to process relational operation (such as industries of social interaction, E-commerce, finance, retail, Internet of things and the like), so that a database is generated at random, and the database supports massive complex data relational operation. At present, the development of graph databases is in the growth of prosperous stages, and various graph databases such as Neo4j, JanusGraph, AmazonNeptune and the like have appeared, and even application scenarios of the graph databases are simulated by relational databases. When graph data is actually used, there may be a scenario of cross-graph query (graph data may be stored in the above-mentioned different storage engines), which brings difficulty to obtain graph data. For example, if the consumption habits of the user need to be acquired from the three subgraphs, namely the merchant graph, the user graph and the commodity graph, the data in the three subgraphs need to be simultaneously requested, different results are processed, and then result combination is carried out.
Therefore, in the prior art, when a developer develops a graph data application, if data needs to be requested across subgraphs or across engines, the developer needs to be familiar with the definition of different subgraphs and the details of different graph execution engines, and simultaneously, the data of each subgraph needs to be manually merged. The acquisition process of the graph data is very complicated, and the quality of the graph data is not high.
Disclosure of Invention
The specification provides a data query method and a data query system for a heterogeneous graph database, and a user does not need to sense the details of each sub-graph, can shield the complexity of the heterogeneous graph database and return a uniform graph data query result.
The application discloses a data query method of a heterogeneous graph database, which comprises the following steps:
presetting incidence relations between metadata in the unified view and metadata in a plurality of sub-images of the heterogeneous graph database;
acquiring a unified query request which accords with metadata in the unified view, and decomposing the unified query request into a plurality of sub-queries corresponding to different sub-graphs according to the incidence relation;
respectively executing the sub-queries in a plurality of sub-graphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-queries and the sub-query results accord with metadata in the corresponding sub-graphs;
and merging the plurality of sub-query results according to the incidence relation to obtain the query results conforming to the metadata in the unified view.
In a preferred embodiment, the method further includes presetting an adaptation rule of the data structure in the plurality of subgraphs to the data structure of the unified graph;
before the combining the plurality of sub-query results according to the incidence relation, the method further includes:
and converting the plurality of sub-query results into a unified graph data structure which accords with metadata in the unified view according to the adaptation rule.
In a preferred embodiment, the executing the plurality of sub-queries in the plurality of subgraphs corresponding to the plurality of sub-queries respectively further includes:
and respectively routing different sub-queries to different execution engines to execute the query operation.
In a preferred embodiment, the execution engine includes one or any combination of the following:
one or more graph engines, one or more relational database engines, one or more non-relational database engines other than the graph engines.
In a preferred embodiment, the metadata includes one or any combination of the following: graph basic information, point definition, edge definition and attribute definition.
In a preferred embodiment, the decomposing the unified query request into a plurality of sub-queries corresponding to different subgraphs according to the association relationship further includes:
obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request;
obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the incidence relation;
and constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
In a preferred embodiment, the merging the multiple sub-query results according to the association relationship further includes:
traversing each node in the first sub-query result, and traversing a second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation;
and if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether corresponding attributes in the first node and the second node are equal, and if so, fusing the first node and the second node.
In a preferred embodiment, the merging the first node and the second node further includes:
linking an ingress edge of the second node to the first node;
adding the attribute information of the second node to the attribute of the first node.
In a preferred embodiment, the merging the multiple sub-query results according to the association relationship further includes:
traversing each node in the third sub-query result, and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relation;
if the type of the fourth node traversed in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a right side between the third node and the fourth node.
The application also discloses a data query system of the heterogeneous graph database, which comprises:
the metadata center stores incidence relations between the metadata in the unified view and the metadata in a plurality of subgraphs of the heterogeneous database;
the analysis routing layer is used for decomposing the unified query request which accords with the metadata in the unified view into a plurality of sub-queries which correspond to different subgraphs according to the incidence relation, wherein the sub-queries accord with the metadata in the corresponding subgraphs; (ii) a
The engine execution layer is used for respectively executing the sub-queries in the sub-graphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-query results accord with metadata in the corresponding sub-graphs;
and the data assembly layer is used for merging the plurality of sub-query results according to the incidence relation to obtain the query results conforming to the metadata in the unified view.
In a preferred example, the metadata center further stores an adaptation rule from a data structure in the plurality of subgraphs to a data structure in a unified graph;
the system also comprises a data adaptation layer which is used for converting a plurality of sub-query results output by the engine execution layer into a unified graph data structure which accords with metadata in the unified view according to the adaptation rules and is used by the data assembly layer.
In a preferred embodiment, the parsing routing layer routes different sub-queries to different execution engines of the engine execution layer respectively to execute the query operation.
In a preferred embodiment, the execution engine includes one or any combination of the following:
one or more graph engines, one or more relational database engines, one or more non-relational database engines other than the graph engines.
In a preferred embodiment, the metadata includes one or any combination of the following: graph basic information, point definition, edge definition and attribute definition.
In a preferred example, the parsing routing layer constructs the sub-query by:
obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request;
obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the incidence relation;
and constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
In a preferred example, the data assembly layer merges the sub-query results by:
traversing each node in the first sub-query result, and traversing a second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation;
and if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether corresponding attributes in the first node and the second node are equal, and if so, fusing the first node and the second node.
In a preferred embodiment, the merging the first node and the second node further includes:
linking an ingress edge of the second node to the first node;
adding the attribute information of the second node to the attribute of the first node.
In a preferred example, the data assembly layer merges the sub-query results by:
traversing each node in the third sub-query result, and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relation;
if the type of the fourth node traversed in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a right side between the third node and the fourth node.
The application also discloses a data query system of the heterogeneous graph database, which comprises:
a memory for storing computer executable instructions; and the number of the first and second groups,
a processor, coupled with the memory, for implementing the steps in the method as described above when executing the computer-executable instructions.
The present application also discloses a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the steps in the method as described above.
In the implementation manner of the description, each sub-graph information is defined through a unified graph data view, data query is performed on the basis of the graph data view, complexity of a heterogeneous graph database is shielded, a unified graph data query result is returned, a user does not need to sense details of each sub-graph, a data query process is simpler and faster, database development cost is reduced, and development efficiency is improved.
And combining the sub-graph query results by using the association relationship through point fusion, wherein the obtained results are completely matched with the query of the user on the unified view, and the user can not feel the complexity of the heterogeneous database completely.
By newly building the combination of the sub-graph query results on the equilateral, the combination can be realized with high efficiency, and the required data operation is minimum.
A large number of technical features are described in the specification, and are distributed in various technical solutions, so that the specification is too long if all possible combinations of the technical features (namely, the technical solutions) in the application are listed. In order to avoid this problem, the respective technical features disclosed in the above summary of the invention of the present specification, the respective technical features disclosed in the following embodiments and examples, and the respective technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (which should be regarded as having been described in the present specification) unless such a combination of the technical features is technically impossible. For example, in one example, the feature a + B + C is disclosed, in another example, the feature a + B + D + E is disclosed, and the features C and D are equivalent technical means for the same purpose, and technically only one feature is used, but not simultaneously employed, and the feature E can be technically combined with the feature C, then the solution of a + B + C + D should not be considered as being described because the technology is not feasible, and the solution of a + B + C + E should be considered as being described.
Drawings
FIG. 1 is a flow chart illustrating a method for querying data from a heterogeneous database according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a data query system for a heterogeneous database according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a data query flow for a heterogeneous database in one embodiment of the present description;
FIG. 4 shows two subgraphs and their schemas in an embodiment of the present description;
FIG. 5 is an example of a merge sub-graph of a merge node in one embodiment of the present description;
fig. 6 is an example of creating a pair of equilateral merged subgraphs in one embodiment of the present description.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application may be implemented without these technical details and with various changes and modifications based on the following embodiments.
Description of partial concepts:
schema: metadata, information used to describe the database object.
Embodiments of the present description will be described in further detail below with reference to the accompanying drawings.
A first embodiment of the present specification relates to a method for querying data of a heterogeneous database, a flow of which is shown in fig. 1, the method including the steps of:
in step 102, the association relationship between the metadata in the unified view and the metadata in the multiple sub-images of the heterogeneous database is preset. The incidence relation represents the incidence relation of the same data in the subgraph and the unified view. Optionally, in an embodiment, an adaptation rule of the data structure in the multiple subgraphs to the data structure of the unified graph may also be preset. Optionally, the metadata comprises one or any combination of the following: graph basic information, point definition, edge definition and attribute definition.
In step 104, a unified query request conforming to metadata in the unified view is obtained, and the unified query request is decomposed into a plurality of sub-queries corresponding to different sub-graphs according to the association relationship. Optionally, in an embodiment, the step further includes: and obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request. And obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the association relation. And constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
Then step 106 is entered, a plurality of sub-queries are executed in a plurality of subgraphs corresponding to the plurality of sub-queries respectively, and a plurality of sub-query results are obtained, wherein the sub-queries and the sub-query results conform to metadata in the corresponding subgraphs. Optionally, the different sub-queries are routed to different execution engines, respectively, to perform the query operation. The execution engine may be a single engine or a hybrid engine. In particular, the execution engine may be one or more graph engines, or one or more relational database engines, or one or more graph engines and one or more non-relational database engines other than a graph engine, or the like. Each independent query execution engine should have the capability to identify sub-query components, execute sub-query components, and identify sub-query component results.
Step 108 is then entered to convert the plurality of sub-query results into a unified graph data structure that conforms to the metadata in the unified view according to the adaptation rules. This step is optional. By converting the sub-query results into a unified view data structure that conforms to the metadata in the unified view, subsequent merging operations can be conveniently performed.
And then step 110 is entered, and the plurality of sub-query results are merged according to the association relationship to obtain the query result conforming to the metadata in the unified view.
Step 104 and step 110 are processes when a graph query request is received, step 102 is a preset, and step 102 does not need to be operated each time a graph query request is received.
By establishing the incidence relation among different subgraphs, a unified graph data view is defined, so that the details of the distribution of the underlying graph data in different subgraphs and different graph storage engines are shielded, and the purpose of querying the data of the unified graph is realized.
There are a number of ways in which the merging of the sub-query results in step 110 may be accomplished.
In one embodiment, a point fusion approach is used. Specifically, step 110 further comprises: and traversing each node in the first sub-query result (namely one of the plurality of sub-query results), and traversing the second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation. And if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether the corresponding attributes of the first node and the second node are equal, and if so, fusing the first node and the second node. Wherein, fuse first node and second node, further include: linking the incoming and outgoing edges of the second node to the first node (i.e., relationship fusion), and adding the attribute information of the second node to the attributes of the first node (i.e., attribute fusion).
In another embodiment, an equilateral pair is added. Specifically, step 110 further comprises: and traversing each node in the third sub-query result (namely one of the plurality of sub-query results), and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relationship. And if the type of the traversed fourth node in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a pair of equal sides between the third node and the fourth node.
The unified view may have different implementations. For example, the metadata in all sub-graphs may be collected together, only one of the equivalent metadata in different sub-graphs is reserved, and then the correspondence between the deleted metadata and the reserved metadata is set in the association relationship. For another example, each metadata in the unified view may be completely and independently defined, and then the metadata in the unified view and the metadata in each sub-graph establish a corresponding relationship through an association relationship.
A second embodiment of the present specification relates to a data query system for a heterogeneous database, which is structured as shown in fig. 2, and includes:
and the metadata center stores the association relationship between the metadata in the unified view and the metadata in the plurality of subgraphs of the heterogeneous database. Optionally, the metadata comprises one or any combination of the following: graph basic information, point definition, edge definition and attribute definition. Optionally, the metadata center further stores an adaptation rule of the data structure in the multiple subgraphs to the data structure of the unified graph.
And the analysis routing layer is used for decomposing the unified query request which accords with the metadata in the unified view into a plurality of sub-queries which correspond to different subgraphs according to the incidence relation, wherein the sub-queries accord with the metadata in the corresponding subgraphs. Optionally, the parsing routing layer routes the different sub-queries to different execution engines of the engine execution layer respectively to execute the query operation.
And the engine execution layer is used for respectively executing the sub-queries in the sub-graphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-query results accord with the metadata in the corresponding sub-graphs. Optionally, the execution engine comprises one or any combination of the following: one or more graph engines, one or more relational database engines, one or more non-relational database engines other than the graph engines.
And the data adaptation layer is used for converting a plurality of sub-query results output by the engine execution layer into a unified graph data structure which accords with metadata in a unified view according to the adaptation rules and is used by the data assembly layer. The data adaptation layer is optional.
And the data assembly layer is used for merging the plurality of sub-query results according to the association relation to obtain the query results conforming to the metadata in the unified view.
Optionally, in one embodiment, the resolution routing layer constructs the sub-query by: and obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request. And obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the association relation. And constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
There are various ways for the data assembly layer to merge multiple sub-query results.
In one embodiment, a point fusion approach is used. Specifically, step 110 further comprises: and traversing each node in the first sub-query result, and traversing the second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation. And if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether the corresponding attributes of the first node and the second node are equal, and if so, fusing the first node and the second node. Wherein, fuse first node and second node, further include: linking the incoming and outgoing edges of the second node to the first node (i.e., relationship fusion), and adding the attribute information of the second node to the attributes of the first node (i.e., attribute fusion).
In another embodiment, an equilateral pair is added. Specifically, step 110 further comprises: and traversing each node in the third sub-query result, and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relation. And if the type of the traversed fourth node in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a pair of equal sides between the third node and the fourth node.
The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.
In order to better understand the technical solutions of the present description, the following description is given with reference to a specific example, in which the listed details are mainly for the sake of understanding, and are not intended to limit the scope of the present application.
FIG. 3 illustrates a data query flow of a heterogeneous database, in one embodiment.
First, a user initiates a graph data query request through a graphical interface. The graph data fields, filter conditions, and return results in the request should satisfy the Schema (metadata) definition in the unified graph data view (referred to as "unified view").
After receiving the graph query request, the parsing routing layer decomposes the query request into a plurality of sub-query components by lexical analysis and syntactic analysis according to the query request and in combination with the Schema association (for example, including the graph data Schema definition) acquired from the metadata center. Different sub-query components are routed to different execution engines depending on the type of sub-query component. The execution engine may be a single engine or a hybrid engine.
And then, the execution engines receiving the sub-query components in the execution engine layer respectively perform query in the sub-graphs corresponding to the execution engines, and output the query results of the sub-query components.
And then, the data adaptation layer accepts the query result of each sub-query component, analyzes the original result, and converts the original request result into a uniform graph data structure through the Schema adaptation rule obtained from the metadata center.
And then, the data assembly layer receives the sub-query adaptation result output by the data adaptation layer and starts to assemble the unified result by combining the Schema incidence relation obtained from the metadata center. Such as merging multiple subgraphs, duplicate point/duplicate point deduplication, establishing new graph association, etc. Finally, returning a unified graph query result.
The individual blocks of fig. 3 are explained below.
And the metadata center is used for unified graph view management, and provides Schema definition of different subgraphs, association relation between different subgraphs, adaptation rules from different subgraphs to a unified result layer and the like.
The metadata center stores the configuration information in the steps of the unified map query. The main configuration information is:
a) schema information of the subgraph comprises graph basic information, point/edge definition, attribute definition and the like;
b) the subgraph return result is converted into an adaptation rule of a standard Schema;
c) and (4) association relation between different subgraphs.
To assist understanding, fig. 4 shows a simplified example in which there are two sub-graphs graph a, graph b, graph a containing two types of nodes Person, Card, one type of edge has _ Card; the graph B comprises three types of nodes, namely User, Food and Book, and two types of edges like and has _ read. The Schema information of the subgraph should include: node type, point attribute information, start node type of edge, target node type of edge, edge type, etc. Schema information for this edge of has _ card can be defined in the Json format as follows, with other point/edge Schema definitions being similar.
Figure BDA0002382410140000131
In this example, the Person node in graph a and the User node in graph b are the same type of node, both represent "users", and then a direct association relationship between two sub-graphs can be defined as:
GraphA.Person=GraphB.User
GraphA.Person.name=GraphB.User.name
the meaning of the above association relation: if the values of the name attribute of a Person node in graph A and a User node in graph B are equal, the two nodes can be regarded as the same node. The association relationship mainly consists of two parts, namely type mapping and attribute mapping, in the above example, for simplification, only one attribute mapping name is defined, and more mapping attributes are also possible. After the incidence relation is defined, the topological structures of GrahA and GraphB are fused to form a larger network topological structure, and the relation of graph data is richer.
The sub-graphs form a unified graph data view (namely a unified view) by defining the association relationship among the sub-graphs, and the query request of the user can be expanded based on the large view logically.
And the analysis routing layer is used for analyzing the unified query statement into a plurality of sub-query components by combining syntactic analysis and lexical analysis and configuration information (including incidence relation) of the metadata center, and routing the sub-query components to different graph query engines. For example, if the node Person is involved in the request after parsing, we need to decompose the request into two sub-query components including Person and User according to the configuration information in the "metadata center". For the original graph query request RawRequest, the detailed process for decomposing into subquery components is as follows:
1. analyzing the original graph query request to obtain a node type VertexTypeA related to the query request, wherein the analysis process may involve syntactic analysis and lexical analysis;
2. requesting a metadata configuration center to acquire associated configuration information SubGraphConfig between subgraphs;
3. obtaining a VertexTypeB associated node of the VertexTypeA through SubGraphConfig;
a) if VertexTypeB exists: constructing a sub-query request RequestA and a RequestB according to the configuration information of the VertexTypeA and the VertexTypeB and the condition information in the original request RawRequest, wherein the two requests are final query requests;
b) if the VertexTypeB does not exist, the final query request is a RawRequest;
4. and taking the request generated in the previous step as a result, and handing the result to the next module for processing.
Taking the two subgraphs GraphA and GraphB defined above as examples, if the first-degree relationship of the User node needs to be queried, the original request content is as follows: request (vertexType ═ User, querytype ═ Navigation, degree ═ 1, Conditon ═ …). Firstly, analyzing a routing layer to obtain the node type in the query as User, and then according to the incidence relation between subgraphs: user obtains the correlation node as Person. The final constructed sub-query components are:
1.Request(vertexType=User,queryTypye=Navigation,degree=1,Conditon=…)
2.Request(vertexType=Person,queryTypye=Navigation,degree=1,Conditon=…)
after the parsing routing layer decomposes the query request, it is further required to identify a graph execution engine corresponding to each sub-query component in each sub-query component, so as to obtain an execution result of the sub-query component from different graph execution engines.
And the engine execution layer is used for analyzing the sub-query components transmitted by the routing layer and returning the query results of the sub-query components. The engine execution layer is composed of a series of query execution engines, which can be: one or more graph engines, or one or more relational database engines, or one or more graph engines and one or more non-relational database engines other than a graph engine. Each independent query execution engine should have the capability to identify sub-query components, execute sub-query components, and identify sub-query component results.
And the data adaptation layer is used for analyzing the query result of the sub-query component, and converting the sub-query result into a standard graph data Schema by combining the configuration information in the metadata center. The data adaptation layer mainly solves the problem that the returned results of different query execution engines are not uniform. Firstly, the data adaptation layer receives the query result of the sub-query component, analyzes the original result information, and then acquires a data adaptation rule from a metadata center; and finally, converting the analyzed original result information into standard point, edge, path and attribute field models based on a data adaptation rule.
And the data assembly layer is used for combining and fusing the adaptation results of different sub-queries according to the incidence relation between the sub-graphs in the metadata center, and finally returning a unified graph query result. And the data assembly layer receives the adapted sub-query component results, and performs deduplication (removing repeated points and edges), fusion (combining points/edges on the same type) and reconstruction (newly building point and edge relations) on the sub-query component results according to the incidence relation among the sub-graphs under the view.
Combining different subgraphs into a graph through the subgraph association relation defined by a metadata center, namely 'fusion', which is mainly divided into two modes of point fusion and peer edge establishment:
the point fusion process is as follows:
1. traversing each node in the subgraph A, wherein the type of a currently traversed point is VertexA, and the type is VertexTypeA;
2. if the traversed point VertexA has associated nodes (the association relation is obtained from the metadata center), starting to traverse all the nodes in the subgraph B, wherein the currently traversed node in the subgraph B is VertexB
2.1 if the type VertexTypeB of the VertexB is not equal to VertexTypeA, continuing the iteration process, otherwise, continuing to execute the following steps;
2.2 according to the attribute mapping information in the association relation, continuously judging whether the corresponding attributes in VertexA and VertexB are equal:
2.2.1 if not equal, returning to the step 2 to continue execution;
2.2.2 if the attributes are equal, starting the fusion of VertexA and VertexB, wherein the fusion mainly comprises two steps of attribute fusion and relationship fusion, wherein the relationship fusion refers to linking the in-out edge of VertexB to VertexA, and the attribute fusion refers to adding the attribute information on VertexB to the attribute of VertexA.
2.3 continue the iteration of step 2.
3. If VertexA does not have an associated node, continue the iteration of step 1.
As shown in fig. 5, for two nodes Person and User in the subgraph graph a and graph B in fig. 4, there is an association relationship, and the attribute name is xxx, the two nodes may be node fused, and the edge relationship of the node Person in the subgraph a after fusion is automatically linked to the node User in the subgraph B.
The general steps of establishing equilateral fusion of two subgraphs are consistent with the main steps of point fusion, the only difference being: after two peer nodes are found, the subgraph is linked by a mode of newly building a virtual edge, and the two peer nodes are reserved.
As shown in fig. 5, for two nodes Person and User in the sub-graphs graph a and graph b in fig. 4, there is an association relationship (attribute name is xxx), edge reconstruction may be performed, and a larger topology structure is formed by constructing a new pair of equal sides.
It should be noted that, as will be understood by those skilled in the art, the implementation functions of the modules shown in the embodiment of the data query system for heterogeneous databases described above can be understood by referring to the relevant description of the data query method for heterogeneous databases. The functions of the modules shown in the embodiment of the data query system for heterogeneous database described above can be implemented by a program (executable instructions) running on a processor, and can also be implemented by specific logic circuits. The data query system of the heterogeneous database according to the embodiment of the present disclosure may also be stored in a computer-readable storage medium if it is implemented in the form of a software function module and sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present specification. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present description are not limited to any specific combination of hardware and software.
Accordingly, the present specification embodiments also provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the method embodiments of the present specification. Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
In addition, the present specification also provides a data query system for a heterogeneous database, which includes a memory for storing computer executable instructions, and a processor; the processor is configured to implement the steps of the method embodiments described above when executing the computer-executable instructions in the memory.
In one embodiment, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. The aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a Flash memory (Flash), a hard disk, or a solid state disk. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. In one embodiment, the data query system for a heterogeneous database further includes a bus and a communication interface. The processor, memory and communication interface are all interconnected by a bus. The communication interface may be a wireless communication interface or a wired communication interface for enabling the processor to communicate with other devices.
It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.
All documents mentioned in this specification are to be considered as being incorporated in their entirety into the disclosure of this specification so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims (20)

1. A data query method of a heterogeneous database comprises the following steps:
presetting incidence relations between metadata in the unified view and metadata in a plurality of sub-images of the heterogeneous graph database;
acquiring a unified query request which accords with metadata in the unified view, and decomposing the unified query request into a plurality of sub-queries corresponding to different sub-graphs according to the incidence relation;
respectively executing the sub-queries in a plurality of sub-graphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-queries and the sub-query results accord with metadata in the corresponding sub-graphs;
and merging the plurality of sub-query results according to the incidence relation to obtain the query results conforming to the metadata in the unified view.
2. The method for querying data in a heterogeneous database according to claim 1, wherein the method further comprises presetting an adaptation rule of a data structure in the plurality of subgraphs to a data structure in a unified graph;
before the combining the plurality of sub-query results according to the incidence relation, the method further includes:
and converting the plurality of sub-query results into a unified graph data structure which accords with metadata in the unified view according to the adaptation rule.
3. The method for querying data in a heterogeneous database according to claim 1, wherein said executing said sub-queries in respective sub-graphs corresponding to said sub-queries further comprises:
and respectively routing different sub-queries to different execution engines to execute the query operation.
4. The method for data query of a heterogeneous database according to claim 3, wherein said execution engine comprises one or any combination of:
one or more graph engines, one or more relational database engines, one or more non-relational database engines other than the graph engines.
5. The method for data query of a heterogeneous database according to claim 1, wherein said metadata comprises one or any combination of: graph basic information, point definition, edge definition and attribute definition.
6. The method for querying data in a heterogeneous database according to claim 1, wherein said decomposing said unified query request into a plurality of sub-queries corresponding to different sub-graphs according to said association further comprises:
obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request;
obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the incidence relation;
and constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
7. The method for querying data in a heterogeneous database according to any of claims 1-6, wherein said merging the sub-query results according to the association further comprises:
traversing each node in the first sub-query result, and traversing a second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation;
and if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether corresponding attributes in the first node and the second node are equal, and if so, fusing the first node and the second node.
8. The method for data query of a heterogeneous database according to claim 7, wherein said merging said first node and said second node further comprises:
linking an ingress edge of the second node to the first node;
adding the attribute information of the second node to the attribute of the first node.
9. The method for querying data in a heterogeneous database according to any of claims 1-6, wherein said merging the sub-query results according to the association further comprises:
traversing each node in the third sub-query result, and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relation;
if the type of the fourth node traversed in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a right side between the third node and the fourth node.
10. A data query system for a heterogeneous graph database, comprising:
the metadata center stores incidence relations between the metadata in the unified view and the metadata in a plurality of subgraphs of the heterogeneous database;
the analysis routing layer is used for decomposing the unified query request which accords with the metadata in the unified view into a plurality of sub-queries which correspond to different subgraphs according to the incidence relation, wherein the sub-queries accord with the metadata in the corresponding subgraphs; (ii) a
The engine execution layer is used for respectively executing the sub-queries in the sub-graphs corresponding to the sub-queries to obtain a plurality of sub-query results, wherein the sub-query results accord with metadata in the corresponding sub-graphs;
and the data assembly layer is used for merging the plurality of sub-query results according to the incidence relation to obtain the query results conforming to the metadata in the unified view.
11. The data query system for a heterogeneous database according to claim 10,
the metadata center also stores the adaptation rules from the data structures in the multiple subgraphs to the data structure of the unified graph;
the system also comprises a data adaptation layer which is used for converting a plurality of sub-query results output by the engine execution layer into a unified graph data structure which accords with metadata in the unified view according to the adaptation rules and is used by the data assembly layer.
12. The data query system for a heterogeneous database according to claim 10, wherein said parsing routing layer routes different sub-queries to different execution engines of said engine execution layer, respectively, for executing query operations.
13. The data query system for a heterogeneous database according to claim 12, wherein said execution engine comprises one or any combination of:
one or more graph engines, one or more relational database engines, one or more non-relational database engines other than the graph engines.
14. The data query system for a heterogeneous database according to claim 10, wherein said metadata comprises one or any combination of: graph basic information, point definition, edge definition and attribute definition.
15. The data query system for a heterogeneous database according to claim 10, wherein said parse-routing layer constructs sub-queries by:
obtaining metadata in the unified view related to the unified query request through syntactic analysis and lexical analysis of the unified query request;
obtaining metadata in each sub-graph corresponding to the metadata in the unified view according to the incidence relation;
and constructing a sub-query for each sub-graph according to the unified query request and the metadata in each sub-graph.
16. The data query system for a heterogeneous database according to any of claims 10-15, wherein said data assembling layer merges said plurality of sub-query results by:
traversing each node in the first sub-query result, and traversing a second sub-query result where the associated node is located if the traversed first node has the associated node according to the association relation;
and if the type of the second node traversed in the second sub-query result is the same as that of the first node, further judging whether corresponding attributes in the first node and the second node are equal, and if so, fusing the first node and the second node.
17. The data query system for a heterogeneous database according to claim 16, wherein said merging said first node and said second node further comprises:
linking an ingress edge of the second node to the first node;
adding the attribute information of the second node to the attribute of the first node.
18. The data query system for a heterogeneous database according to any of claims 10-15, wherein said data assembling layer merges said plurality of sub-query results by:
traversing each node in the third sub-query result, and traversing the fourth sub-query result where the associated node is located if the traversed third node has the associated node according to the association relation;
if the type of the fourth node traversed in the fourth sub-query result is the same as that of the third node, further judging whether the corresponding attributes in the third node and the fourth node are equal, and if so, constructing a right side between the third node and the fourth node.
19. A data query system for a heterogeneous graph database, comprising:
a memory for storing computer executable instructions; and the number of the first and second groups,
a processor, coupled with the memory, for implementing the steps in the method of any of claims 1-9 when executing the computer-executable instructions.
20. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the steps in the method of any one of claims 1 to 9.
CN202010086983.0A 2020-02-11 2020-02-11 Data query method and system for heterogeneous graph database Active CN111339334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010086983.0A CN111339334B (en) 2020-02-11 2020-02-11 Data query method and system for heterogeneous graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010086983.0A CN111339334B (en) 2020-02-11 2020-02-11 Data query method and system for heterogeneous graph database

Publications (2)

Publication Number Publication Date
CN111339334A true CN111339334A (en) 2020-06-26
CN111339334B CN111339334B (en) 2023-04-07

Family

ID=71183346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010086983.0A Active CN111339334B (en) 2020-02-11 2020-02-11 Data query method and system for heterogeneous graph database

Country Status (1)

Country Link
CN (1) CN111339334B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163010A (en) * 2020-08-26 2021-01-01 浙江蓝卓工业互联网信息技术有限公司 Cross-data-source query method and device for database
CN112287179A (en) * 2020-06-30 2021-01-29 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm and graph database
CN112433999A (en) * 2020-11-05 2021-03-02 北京浪潮数据技术有限公司 Traversal method for Janus graph client and related components
CN112784119A (en) * 2021-01-14 2021-05-11 内蒙古蒙商消费金融股份有限公司 Data query and synchronization optimization method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243595A1 (en) * 2001-09-28 2004-12-02 Zhan Cui Database management system
CN101436192A (en) * 2007-11-16 2009-05-20 国际商业机器公司 Method and apparatus for optimizing inquiry aiming at vertical storage type database
US20100169351A1 (en) * 2008-12-30 2010-07-01 International Business Machines Corporation Unifying hetrogenous data
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
WO2014069983A2 (en) * 2012-11-01 2014-05-08 Mimos Berhad A system and method for distributed querying of linked semantic webs
CN105210058A (en) * 2012-12-14 2015-12-30 微软技术许可有限责任公司 Graph query processing using plurality of engines
US20160283511A1 (en) * 2015-03-24 2016-09-29 International Business Machines Corporation Systems and methods for query evaluation over distributed linked data stores
CN108108456A (en) * 2017-12-28 2018-06-01 重庆邮电大学 A kind of information resources distributed enquiring method based on metadata
CN108959433A (en) * 2018-06-11 2018-12-07 北京大学 A kind of method and system extracting knowledge mapping and question and answer from software project data
CN109492131A (en) * 2018-09-18 2019-03-19 华为技术有限公司 A kind of diagram data storage method and device
CN109791544A (en) * 2016-09-30 2019-05-21 微软技术许可有限责任公司 To analyzing when scheming the inquiry inquired across subgraph
US20190325075A1 (en) * 2018-04-18 2019-10-24 Oracle International Corporation Efficient, in-memory, relational representation for heterogeneous graphs
CN110389968A (en) * 2019-07-31 2019-10-29 中国工商银行股份有限公司 Aggregate query method, aggregate query device, equipment and medium
CN110609904A (en) * 2019-09-11 2019-12-24 深圳众赢维融科技有限公司 Graph database data processing method and device, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243595A1 (en) * 2001-09-28 2004-12-02 Zhan Cui Database management system
CN101436192A (en) * 2007-11-16 2009-05-20 国际商业机器公司 Method and apparatus for optimizing inquiry aiming at vertical storage type database
US20100169351A1 (en) * 2008-12-30 2010-07-01 International Business Machines Corporation Unifying hetrogenous data
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
WO2014069983A2 (en) * 2012-11-01 2014-05-08 Mimos Berhad A system and method for distributed querying of linked semantic webs
CN105210058A (en) * 2012-12-14 2015-12-30 微软技术许可有限责任公司 Graph query processing using plurality of engines
US20160283511A1 (en) * 2015-03-24 2016-09-29 International Business Machines Corporation Systems and methods for query evaluation over distributed linked data stores
CN109791544A (en) * 2016-09-30 2019-05-21 微软技术许可有限责任公司 To analyzing when scheming the inquiry inquired across subgraph
CN108108456A (en) * 2017-12-28 2018-06-01 重庆邮电大学 A kind of information resources distributed enquiring method based on metadata
US20190325075A1 (en) * 2018-04-18 2019-10-24 Oracle International Corporation Efficient, in-memory, relational representation for heterogeneous graphs
CN108959433A (en) * 2018-06-11 2018-12-07 北京大学 A kind of method and system extracting knowledge mapping and question and answer from software project data
CN109492131A (en) * 2018-09-18 2019-03-19 华为技术有限公司 A kind of diagram data storage method and device
CN110389968A (en) * 2019-07-31 2019-10-29 中国工商银行股份有限公司 Aggregate query method, aggregate query device, equipment and medium
CN110609904A (en) * 2019-09-11 2019-12-24 深圳众赢维融科技有限公司 Graph database data processing method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
索勃: "大规模图数据处理与分析关键技术研究" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287179A (en) * 2020-06-30 2021-01-29 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm and graph database
CN112287179B (en) * 2020-06-30 2024-02-23 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm with graph database
CN112163010A (en) * 2020-08-26 2021-01-01 浙江蓝卓工业互联网信息技术有限公司 Cross-data-source query method and device for database
CN112163010B (en) * 2020-08-26 2024-04-12 蓝卓数字科技有限公司 Cross-data source query method and device for database
CN112433999A (en) * 2020-11-05 2021-03-02 北京浪潮数据技术有限公司 Traversal method for Janus graph client and related components
CN112433999B (en) * 2020-11-05 2023-12-22 北京浪潮数据技术有限公司 Janusgraph client traversing method and related components
CN112784119A (en) * 2021-01-14 2021-05-11 内蒙古蒙商消费金融股份有限公司 Data query and synchronization optimization method and device

Also Published As

Publication number Publication date
CN111339334B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111339334B (en) Data query method and system for heterogeneous graph database
US10146801B2 (en) Apparatus and method for distributed graph processing
Campinas et al. Introducing RDF graph summary with application to assisted SPARQL formulation
Harth et al. Data summaries for on-demand queries over linked data
EP2674875B1 (en) Method, controller, program and data storage system for performing reconciliation processing
US9372891B2 (en) System and method for querying hybrid multi data sources
Görlitz et al. Federated data management and query optimization for linked open data
US20060265352A1 (en) Methods and apparatus for information integration in accordance with web services
JP4975783B2 (en) Hierarchy construction method and hierarchy construction system
US20040243595A1 (en) Database management system
US20090083314A1 (en) Method of Manipulating Information Objects and of Accessing Such Objects in a Computer Environment
CN104123288A (en) Method and device for inquiring data
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
Alarabi et al. TAREEG: A MapReduce-based system for extracting spatial data from OpenStreetMap
CN104077297A (en) Query method and query device based on body
CN110888672B (en) Expression engine implementation method and system based on metadata architecture
US10706124B2 (en) Storage and retrieval of structured content in unstructured user-editable content stores
CN110990423A (en) SQL statement execution method, device, equipment and storage medium
Song et al. Optimizing subgraph matching over distributed knowledge graphs using partial evaluation
Nicklas et al. A schema-based approach to enable data integration on the fly
US11074401B2 (en) Merging delta object notation documents
CN114116785A (en) Distributed SPARQL query optimization method based on minimum attribute cut
US20210124799A1 (en) Generation and application of object notation deltas
Bai et al. An integration approach of multi-source heterogeneous fuzzy spatiotemporal data based on RDF
US20050267909A1 (en) Storing multipart XML documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant