CN115803731A - Database management system and method for graph view selection of relational database databases - Google Patents

Database management system and method for graph view selection of relational database databases Download PDF

Info

Publication number
CN115803731A
CN115803731A CN202080101557.8A CN202080101557A CN115803731A CN 115803731 A CN115803731 A CN 115803731A CN 202080101557 A CN202080101557 A CN 202080101557A CN 115803731 A CN115803731 A CN 115803731A
Authority
CN
China
Prior art keywords
graph
management system
views
database management
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080101557.8A
Other languages
Chinese (zh)
Inventor
张超
陆嘉恒
韩小纯
张信用
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN115803731A publication Critical patent/CN115803731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Abstract

A database management system for performing graph queries based on one or more graph views in a relational database is disclosed. The database management system is to: determining a plurality of map views in the relational database according to one or more previous map queries in the relational database to obtain a plurality of candidate map views in the relational database. Further, the database management system is to: storing a subset of the plurality of candidate graph views in the relational database; executing a graph query in the relational database according to the subset of the plurality of candidate graph views in the relational database. The database management system may increase the processing speed of graph queries through the selected subset of the plurality of candidate graph views in the relational database.

Description

Database management system and method for graph view selection of relational database databases
Technical Field
The present invention relates to database systems. More particularly, the present invention relates to a database management system and method for performing one or more graph queries.
Background
Graph data is becoming more and more popular in many application areas, such as social networking, biology and medicine, and real-time routing and tracking. This trend has driven the proliferation of graph databases, as graph databases typically perform well in these scenarios. Two key factors that make a graph database successful are the Property Graph Model (PGM) and the accompanying graph query language, such as the Gremlin query language. The attribute graph model represents graph data in a natural manner such that vertices and edges in the graph are associated with a set of attributes, where each attribute is a key-value pair. The graph query language of Gremlin et al enables users to compactly query property graphs of their applications, which have rich functionality and unique functionality, including declarative pattern matching, graph traversal, and actual reachability queries. With the rapid spread of the attribute graph model and graph query language in graph data management, the relational database implements both graph technologies on the relational storage area. These types of systems may be referred to as relational databases that implement a property graph model on a relational storage area, convert a graph query into a relational query, such as an SQL query, and finally execute the relational query, such as an SQL query, in a relational query engine. FIG. 1 illustrates an exemplary relationship database 100. Relational database 100 is used to process graph queries 101 on a relational database management system (RDBMS) 105 according to a property graph (i.e., property graph model 103).
However, conventional relational database databases have some limitations in processing (i.e., answering) graph queries. First, these methods are limited to specific graph queries that can be converted to or blended with SQL, and thus do not support sophisticated graph query functionality. Second, these methods are inefficient when performing complex graph queries because computing the graph from the relational memory area at runtime incurs performance overhead. Finally, these approaches do not take into account the resource budget to accommodate the graph data, which is critical to the end user and generally cannot be affected.
Disclosure of Invention
It is an object of the present invention to provide an improved database management system and method to improve the performance of graph queries in relational database databases.
The above and other objects are achieved by the subject matter claimed by the independent claims. Other implementations are apparent from the dependent claims, the description and the drawings.
According to a first aspect, a database management system is provided for executing one or more graph queries according to one or more graph views in a relational database. The database management system includes a memory and a processor. The processor is configured to determine a plurality of graph views in the relationship database based on one or more previously processed graph queries in the relationship database to obtain a plurality of candidate graph views in the relationship database. Further, the processor in the database management system is to: storing a selected subset of the plurality of candidate graph views in the relational database in the memory; executing one or more graph queries in the relational database according to the subset of the plurality of candidate graph views in the relational database. Advantageously, the database management system provided in the first aspect may select and maintain a subset of "optimized" candidate graph views in the relational database within memory constraints of the database management system, thereby improving the performance of graph queries. The database management system may be configured to execute the one or more graph queries sequentially or substantially in parallel.
A relational database, as used herein, implements a attributed graph model on a relational database and is used to convert a graph query into an SQL query, which is then executed in a relational query engine. Thus, a relational database is a database that supports both graphs and relational models. The graph query may be a Gremlin graph query.
In one possible implementation, each candidate graph view may occupy a respective portion of the memory in the database management system that may be of different sizes. The processor in the database management system may be further operable to: determining the subset of the plurality of candidate map views in the relational database according to a total size of the memory available to store the selected subset of the plurality of candidate map views in the relational database and/or one or more performance indicators associated with the plurality of candidate map views. As described in detail below, the one or more performance indicators (or performance metric values) may be one or more local performance indicators associated with respective candidate graph views and/or one or more global performance indicators associated with a set of candidate graph views.
In another possible implementation, the database management system may further include a graph query engine and a relational query engine; the database management system is further configured to: determining a local performance indicator for each of the plurality of candidate graph views by comparing one or more queries processed by the graph query engine using the respective candidate graph view to the same one or more queries not processed by the relational query engine using the respective candidate graph view.
In yet another possible implementation, the database management system is further configured to: converting the one or more graph queries for the graph query engine into one or more relational queries for the relational query engine when the one or more graph queries are not processable (i.e., cannot be answered) by the graph query engine.
In yet another possible implementation, the database management system is further configured to: determining the subset of the plurality of candidate map views in the relational database according to the plurality of local performance indicators for the plurality of candidate map views and a total size of the memory available for storing the selected subset of the plurality of candidate map views in the relational database.
In yet another possible implementation, the database management system is further configured to determine whether the one or more graph queries can be processed by the graph query engine using the respective candidate graph view.
In yet another possible implementation, the one or more performance indicators associated with the plurality of candidate graph views include a plurality of global performance indicators, wherein each global performance indicator is associated with a respective different combination of the plurality of candidate graph views. The database management system is further configured to: determining the subset of the plurality of candidate graph views in the relational database according to the plurality of global performance indicators. In other words, the database management system may be configured to: determining the subset of the plurality of candidate graph views in the relational database within constraints for storing the selected candidate graph views in the memory by selecting a combination of candidate graph views that provides a maximum benefit (i.e., a maximum global performance metric). The respective combinations of candidate graph views and their global performance indicators (i.e., total revenue) may define states and define the process of determining the subset of the plurality of candidate graph views in the relational database as a state search. The initial state may correspond to an input candidate view set, i.e., a plurality of candidate view views with a certain benefit (i.e., global performance indicator). By adding, modifying and removing views from the initial state, another state, i.e. a candidate view combination, can be obtained from the initial state, and the profit (i.e. the global performance index) of the other state can be higher than that of the initial state, and so on.
In yet another possible implementation, the database management system is further configured to: determining a global performance indicator for a respective combination of the plurality of candidate graph views by splitting the respective candidate graph view into at least two candidate sub-graph views. The process of splitting the respective candidate subgraph view into at least two candidate subgraph views is referred to herein as a split (session) operation.
In yet another possible implementation, the database management system is further configured to: a new candidate graph view is generated by merging subgraphs of at least two of the multiple candidate graph views, and a global performance indicator for a respective combination (i.e., respective state) of the multiple candidate graph views is determined. The process of merging subgraphs of at least two of the multiple candidate graph views is referred to herein as a fusion (fusion) operation. The main purpose of the splitting and merging operations is to identify and merge common sub-graph components of at least two of the plurality of candidate graph views.
In yet another possible implementation, each candidate graph view includes a candidate graph view schema and candidate graph view content; the database management system is further configured to: determining the plurality of candidate graph views from one or more previous graph queries by mapping the one or more previous graph queries to one or more candidate graph view schemas.
In yet another possible implementation, the database management system is further configured to: the one or more previous graph queries are mapped (i.e., converted) to the one or more candidate graph-view schemas by sequentially mapping the nodes and edges of the one or more previous graph queries to the nodes and edges of the one or more candidate graph-view schemas. The database management system may also implement a filtering-and-verification framework for checking whether a given graph query can be answered by a candidate graph view so constructed.
In yet another possible implementation, the database management system is further configured to: generating a respective edge-lead graph from the one or more previous graph queries to generate respective candidate graph view content for the respective candidate graph view. In one implementation, the database management system is further configured to: traversing the edges sequentially from the starting edge according to the query mode; for each accessed pattern edge, the edge and the end point in all matching results in the data graph are added to the view graph until all patterns are accessed. Since the matching result of an edge depends on its previous traversal, the database management system may replicate the previous traversal and cache the end points of the access as intermediate results, and then use these results to compute the matching result for the subsequent traversal.
In yet another possible implementation, the database management system is further configured to: determining a cost metric value for each candidate graph view; the number of candidate graph views is limited by removing candidate graph views for which the cost metric value is greater than the cost metric threshold. Thus, the database management system may be used to tailor the candidate graph according to its cost metric value.
In yet another possible implementation, the database management system is further configured to: receiving one or more other graph queries; dynamically adjusting the subset of the plurality of candidate graph views stored in the memory in the relational database in accordance with the one or more other graph queries in the relational database. When a new query is contained in a previous candidate view, the database management system may update the revenue of the corresponding view. When a new query is not included in a previous candidate graph view, the database management system may store a new selected subset of updated candidate graph views in the relational database in the memory. In other words, if there is insufficient memory, newly selected candidate view selections that are not among the previous view selections may be cached and obsolete views deleted.
According to a second aspect, a method is provided for performing graph queries according to one or more graph views in a relational database. The method comprises the following steps: determining a plurality of graph views in the relational database according to one or more previously processed graph queries in the relational database to obtain a plurality of candidate graph views in the relational database. Furthermore, the method comprises the steps of: storing a selected subset of the plurality of candidate graph views in the relational database. Furthermore, the method comprises the steps of: executing a graph query in the relational database according to the subset of the plurality of candidate graph views in the relational database. Advantageously, the method provided by the second aspect may select and maintain a subset of "optimized" graph views in the relational database, thereby improving the performance of graph queries.
The method provided by the second aspect of the invention may be performed by the system provided by the first aspect of the invention. Thus, the further features of the method provided by the second aspect of the invention are directly derivable from the functionality of the system provided by the first aspect of the invention and its different implementations described above and below.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Embodiments of the invention are described in detail below with reference to the attached drawing figures, wherein:
FIG. 1 is a schematic diagram of a general architecture of a relationship database;
FIG. 2a is a schematic diagram of a database management system used to perform graph queries, as provided by one embodiment;
FIG. 2b is a diagram of the structural and functional components of a database management system used to perform graph queries, provided by one embodiment;
FIG. 3 is a flowchart of the processing steps implemented by the database management system provided by one embodiment;
FIG. 4 illustrates a first algorithm implemented by the database management system provided by the embodiment;
FIG. 5 illustrates a second algorithm implemented by the database management system provided by an embodiment;
FIG. 6 illustrates a third algorithm implemented by the database management system provided by one embodiment;
FIG. 7 illustrates an indexing scheme implemented by a database management system to maintain multiple candidate graph views provided by an embodiment;
FIG. 8 compares query performance of multiple queries between a database management system and a relational database management system provided by an embodiment;
FIGS. 9a and 9b compare view size and runtime between two algorithms implemented by a database management system provided by various embodiments;
FIG. 10 compares view revenue and resource budget between two algorithms implemented by database management systems provided by various embodiments;
FIG. 11 is a flow diagram of a database management method provided by an embodiment.
In the following, the same reference numerals refer to the same features or at least functionally equivalent features.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific aspects of embodiments of the invention or which may be used. It is to be understood that embodiments of the invention may be used in other respects, and may include structural or logical changes not depicted in the drawings. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
For example, it is to be understood that the disclosure relating to the described method is equally applicable to a corresponding device or system for performing the method, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may include one or more units (e.g., functional units) to perform the described one or more method steps (e.g., one unit performs one or more steps, or multiple units perform one or more steps, respectively), even if such one or more units are not explicitly described or shown in the figures. On the other hand, for example, if a particular apparatus is described in terms of one or more units (e.g., functional units), the corresponding method may include one step to perform the function of the one or more units (e.g., one step to perform the function of the one or more units, or multiple steps that each perform the function of one or more of the units), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.
FIG. 2a is a schematic diagram of a database management system 200 provided by one embodiment. Database management system 200 includes a processor 200a and a memory 200b. The processor 200a is configured to execute one or more graph queries, such as Gremlin queries, according to one or more graph views in a relational database. The relational database, as used herein, implements a Property Graph Model (PGM) on a relational database and is used to convert a graph query into an SQL query, which is then executed in a relational query engine. Thus, a relational database is a database that supports both graphs and relational models.
Gremlin is a query language that is widely supported in graph databases and may correspond to SQL in graph database systems. Gremlin is a graph traversal language for traversing property graphs, and may represent queries as one or more graph traversals. In particular, a declarative traversal is represented by a set of traversal patterns, which can map to a sub-graph query. The pattern matching of Gremlin is a generic traversal that can also be mapped from other traversals, such as linear traversal, nested traversal, and path traversal.
FIG. 2b is a schematic diagram of the structural and functional components (i.e., the overall architecture) of database management system 200 for performing graph queries, as provided by one embodiment. As detailed below in the context of fig. 2a and 2b, the processor 200a in the database management system 200 is configured to determine a plurality of graph views in the relationship database 205 from one or more previously processed graph queries in the relationship database 205 to obtain a plurality of candidate graph views 203 a-203 n in the relationship database 205. Further, the processor 200a in the database management system 200 is configured to: storing a selected subset of the plurality of candidate graph views 203 a-203 n in the relational database in the memory 200 b; a graph query in a relational database is performed according to a subset of the plurality of candidate graph views 203 a-203 n in the relational database. As described in detail below, in the embodiment shown in FIG. 2b, the relational database includes an SQL converter 208, an SQL query engine 210, and a relational database, namely a relational table 211. Advantageously, the database management system 200 may select and maintain a subset of "optimized" candidate graph views in the relationship graph databases 208, 210, 211 within the memory constraints of the database management system 200, thereby improving the performance of the graph query.
Thus, the database management system 200 leverages materialized graph views that may be pre-cached to speed up graph query speed in relational graph databases, such as the relational database databases 208, 210, 211 shown in FIG. 2 b. Materialized views (or simply "views") refer to queries whose results are stored and maintained to facilitate access to data in their underlying base tables. Materialized views are a natural embodiment of the idea of pre-computation and caching in databases. The database system may use results that have been computed, stored, and maintained, rather than computing queries from the underlying data.
Before describing in detail the various embodiments of the database management system 200, an overview of some aspects of the database management system 200 provided by one embodiment is provided below. One embodiment provides a graph view auto-selector implemented by the database management system 200 that can quickly and rationally recommend views that favor a given graph workload in a relational database. The candidate graph view may be constructed from the query history through a query-to-schema mapping mechanism. The benefit of the view may be calculated by evaluating the queries in the relationship engine and the memory map engine 205. One embodiment provides a database management system 200 that may implement one of two methods for selecting a view based on a view revenue value defined by one or more local performance indicators or global performance indicators. Incremental view maintenance may be implemented by the database management system 200 provided by one embodiment to handle incoming new data and workloads.
Candidate views are obtained from the query history, where each view may be associated with a weight based on query frequency. Views that are not of interest may be clipped or eliminated depending on the threshold of query cost and the size of the set of tables used. The remaining subsets that meet the user-defined resource budget may be considered candidate views to determine which are the best view choices, i.e., candidate view subsets.
In one embodiment of the database management system 200, the graph view is obtained by the query-to-schema mapping mechanism converting the Gremlin query into a graph query schema and then saving all sub-graph isomorphism embedding into the data graph g with respect to a given query. A filtering and verification framework may be used to check whether a given query can be answered by a constructed view. In one embodiment, the database management system 200 may utilize materialized views to evaluate the cost and benefit of the contained queries. An assessment of view revenue may be calculated from the view and the relationship database.
Optionally, the Gremlin and Gremlin2SQL optimizers may be used to obtain a profit value and view size for each view, taking into account the query workload, a set of candidate views, and the memory space budget. By using the candidate views to answer the query, the benefit may define a cost savings. The view size measures the memory space occupied by the view. The dynamic programming method of the knapsack-type can be implemented by the database management system 200 provided by an embodiment to generate candidate view recommendations, i.e., a subset of the candidate views 203 a-203 n at a given memory space budget.
Optionally, a graph inheritance algorithm is implemented by the database management system 200 provided by one embodiment to explore the containment relationships and common parts between the candidate graph views 203a through 203n, taking into account the query workload, the set of candidate views 203a through 203n, and the memory space budget. From which a smaller subset of views can be generated. According to one embodiment, in a two-stage framework, each graph schema is decomposed into a set of small sub-graph slices, then common slices are merged, and the remaining slices are merged. Finally, one embodiment provides a database management system 200 that may implement a greedy approach to recommending view selection, i.e., selecting a subset of candidate graph views 203 a-203 n.
In one embodiment, the database management system 200 implements an incremental view maintenance process in the case of data and workload updates. For data updates, the memory view may be updated accordingly, and if the number of updates reaches an update threshold, the materialized view may be recalculated from the base table for update efficiency. With respect to workload updates, in one embodiment, a graph gene index may be used to determine whether a new workload is included in a previous graph or gene. If so, the revenue associated with the view used is updated. Otherwise, a greedy selection algorithm may be invoked to select a new set of candidate views. If there is insufficient memory space, a newly selected view that is not in the previous view selection may be cached and obsolete views deleted.
A detailed description of different embodiments of the database management system 200 is provided below. The first part of the following description provides a high level architectural description of how the database management system 200 provided by one embodiment selects candidate graph views 203 a-203 n under a given workload. Another portion of the description below provides details regarding candidate view selection, query mapping, and candidate view recommendation and view maintenance implemented by different embodiments of the database management system 200.
1. Overview of the System
As shown in fig. 2b, the most important part of the database management system 200 is the abstraction layer 201 implemented by the processor 200a, the abstraction layer 201 being located above the RDBMS 210, receiving a set of graph queries, in particular Gremlin queries, and constructing one or more graph views in the memory 200b, which can then be used to speed up future graph query speeds. The first component in the abstraction layer 201 is an automatic view manager 206, which may be implemented by the processor 200a, which may be responsible for three tasks. The first task is to extract the view data 207 from the underlying base table 211. This is accomplished by translating between queries and views. The second task is to select a view to store in memory 200b according to one or more selection algorithms detailed below. The third task is to maintain view definitions and data incrementally as the data and workload change.
In addition to view manager 206, another component is a memory map view component 202 that caches a selected subset of candidate map views that may be used to answer queries using a map query engine 205 (e.g., tinkerPop map engine 205). In one embodiment, the view structure may adopt a structure of TinkerPop, that is, a memory attribute map implementation of TinkerPop. After mapping the query and view to a graph schema, graph query engine 205 may use a modified subgraph containment algorithm to determine whether the query is contained in the view, as described in detail below.
Another component other than the diagram view component 202 is an SQL translator, such as the Gremlin2SQL translator 208, which converts a graph query (e.g., a Gremlin query) into an SQL query, which is then sent to the relational query engine 210 for execution when the diagram view cannot answer the query. In one embodiment, this conversion may be accomplished by mapping the vertex set and edge set of the graph to the relational table 211 and converting a set of Gremlin tubes into predefined SQL templates. As described above, in the embodiment shown in FIG. 2b, the relational database comprises SQL converter 208, SQL query engine 210, and relational database, relational table 211.
Some of the concepts described above will be illustrated by the following examples of graph queries in a social network. For someone p (id = 4145), find p's two-hop friends. Such queries facilitate recommending new friends to a target person in a social network. Specifically, such a query may be represented using Gremlin as follows:
g.V().has('personid','4145').out("knows").out("knows")
typically, such a Gremlin query would be converted to a SQL query by the Gremlin2SQL technique, as follows:
WITH friends AS(SELECT*FROM customer c INNER JOIN knows k1 on c.id=k1.personid1 where c.id='4145')
WITH 2hopfriends AS(SELECT k2.personid2 as id from friends f JOIN knows k2 on f.personid2=k2.personid1")
SELECT 2f.name from 2hopfriends 2f JOIN customer c on c.id=2f.id;
as can be seen from the above example, a simple Gremlin traversal query would translate to an SQL query with three inlinks, which would incur a significant amount of overhead. As described above and as described in detail below, the database management system 200 may select and maintain a subset of "optimized" candidate graph views in the relational database within the memory constraints of the database management system 200, thereby improving the performance of graph queries in the relational databases 208, 210, 211.
FIG. 3 is a flow diagram of processing stages or steps implemented by the database management system 200 provided by one embodiment for selecting and maintaining a subset of the candidate graph views 203 a-203 n. In a first stage 301, a set of Gremlin queries is obtained. In another stage 303, a set of candidate view views having view mode and view content is generated, as described in detail below. In a further stage 305, view revenue and size are determined according to a filtering and verification framework, and finally an optimized view selection is determined, as described in detail below. In a final stage 307, candidate view selections are maintained, and the selections are refreshed if new workloads appear.
2. Query driven view architecture
The following definitions of the data diagram, the Gremlin schema query, and the Gremlin diagram view are used to describe in more detail the execution of query-driven construction by the database management system 200 provided by one embodiment on the candidate diagram views 203 a-203 n.
The data graph is a multiple relation, attribute, directed graph G = (V, E, L, P), where:
(1) V is a set of vertices;
(2)
Figure BDA0003969923970000071
wherein, (v, v ') represents the edge of vertex v to v';
(3) L is a label function such that for each vertex V ∈ V and edge E ∈ E, L (V) or L (E) is a label in the alphabet Σ;
(4) P is a function such that for each node V ∈ V and each edge E ∈ E, P (V) or P (E) is a set of key/value pairs called attributes. Intuitively, P represents attributes of a vertex, such as name, age, gender, or an edge, such as a timestamp.
The Gremlin mode query is a directed graph Q on a data graph G g =(V p ,E p L, f), wherein:
(1)V p and E p Respectively a mode vertex set and an ordered mode edge set;
(2) L is a label function such that V ∈ V for each vertex p And edge E ∈ E p L (v) or L (e) is a label in the alphabet Σ;
(3) f is a Boolean predicate such that V ∈ V for each vertex p And edge E ∈ E p F (v) and f (e) are applicable.
Is shown as V (Q) g ) Graph view of (a) results from a Gremlin mode query Q g Having a and Q g The same definition and representation is by V G (Q g ) The view content comprising a representation of the view content represented by Q g All subgraphs embedded are composed of edges and nodes. Thus, V G (Q g ) Can be used to answer Q g And does not access the data graph g or other materialized data.
In one exemplary embodiment, the graph query may be a graph query in a social network that looks up friends of someone with id '001' and also returns the friends ' workplace and school. The corresponding Gremlin graph query pattern may be represented as:
g.V().match(
__.as('p').has('id','001').out('knows').as('f'),
__.as('f').out('workAt').as('c'),
__.as('f').out('studyAt').as('s'))
.select('p','f','c','s')
it should be appreciated that in this example, the Gremlin graph query pattern is composed of two components: a data graph g and a match () step declaring a set of connected traversal patterns, wherein the first pattern is a starting edge whose out-point has a Boolean function has ('id', '001'); the as () step tags the query node and the out () step defines a tagged outgoing edge for each traversal pattern.
The graphic view, such as the Gremlin graphic view, consists of two parts, namely a view mode and view content. As described above, embodiments of the database management system 200 employ a query-driven approach that generates candidate views from workloads, i.e., queries that the database management system 200 has previously processed. However, all materialized views of the workload are generally not feasible because they may overwhelm the enumeration space of the query cost evaluation. Thus, in one embodiment, the database management system 200 is used to crop candidate views. In one embodiment, the database management system 200 may be used to crop candidate views using three cropping stages or conditions. In one embodiment, the database management system 200 may be used to use one or more of the following three clipping stages or conditions.
The database management system 200 may be used in the first clipping stage to determine whether the cost of the respective candidate view 203 a-203 n is a fraction of the total cost of the workload, or whether its cost is less than a small threshold, e.g., 1 second (intuitively, low cost queries that can be handled well in the underlying relationship engine should not be considered). The cost may be derived from historical statistical information (if available) or may be a cost estimated by the optimizer for the underlying database.
In an optional or other clipping stage, the database management system 200 may determine whether the set of reference tables for the candidate graph views 203 a-203 n are of small size and appear infrequently. This criterion is based on the following observations: if a graph query relates to a subset of tables that occur infrequently, the query is unlikely to be reused in the future.
In an optional or other clipping phase, database management system 200 may be used to determine whether the number of rows it uses is greater than the upper limit of the memory resource budget. This means that the maximum memory resource budget to accommodate the graph data cannot be affected and, therefore, such queries are not considered. One way to obtain the size of the rows used by a query is to invoke a relational optimizer to estimate the rows used by a given query.
In view of a set of Gremlin graph queries, the database management system 200 is used in one embodiment to derive graph traversals parsed from the queries, and to normalize the Gremlin graph queries to Gremlin pattern queries. In another step, the database management system 200 is used to: the Gremlin mode query is mapped to a graph query mode by adding stepwise traversal patterns starting from a starting vertex. Since, in one embodiment, the query node is provided with a given alias, the mapping process may also map the alias tag to a node tag in the schema graph. For example, for the example above, the first traversal pattern may be derived as p (id = 001) - > f, while the following two patterns f- > c, f- > s may be added to the pattern diagram. Finally, the labels p, f, c, s may be mapped to people, companies, and schools inferred from the schema graph.
In yet another phase, the database management system 200 is used to generate view content. In one embodiment, the database management system 200 is used to create edge-exported graphs as view content by: (1) Traversing the edges sequentially from the starting edge according to the query mode; (2) For each accessed mode edge, adding all matching results of the edge in the data graph g and the end point thereof into the view graph; and (3) when all modes are accessed, the process is terminated. Since the matching result of an edge depends on its previous traversal, the previous traversal may be replicated and the end points of the accesses cached as intermediate results, and then used to compute the matching result for the subsequent traversal.
3. Revenue-based view assessment
Hereinafter, v denotes candidate views 203a to 203 at a given workload Q. To quantify the positive impact of caching view v, database management system 200 may be configured to use revenue b as a local performance indicator for view v, as defined below. The benefit b of view v is the cost savings associated with processing the query using view v as compared to using the underlying relational store (i.e., database R). Mathematically, this can be expressed using the following equation:
Figure BDA0003969923970000091
wherein w i Representing queries Q in a workload Q i The cost of the query evaluated on the relational databases 208, 210, 211 and the cost of the view are represented by cost (qR) and cost (qv), respectively.
To calculate view revenue, i.e., a local performance index, in one embodiment, the database management system 200 may be used to obtain a corresponding query evaluation cost using the PROFILE feature of the Gremlin query. In the case where the view total revenue is negative or small, the database management system 200 may crop these views during the selection of candidate views.
However, before evaluating the cost of a query through a view, the database management system 200 may be used to determine whether the query may be answered by the view. In one embodiment, database management system 200 may be used to implement a filtering and validation framework, detailed below to achieve this goal.
In one embodiment, the filtering and validation framework implemented by database management system 200 includes two main phases detailed below. In the first phase, the database management system 200 is used to check whether the view schema is included in the query schema. If not, the corresponding query is filtered out (i.e., cropped) and is not evaluated on the checked view. In the second phase, the database management system 200 is also used to verify that the view content contains all matching results for a given query. The two phases of the filtering and validation framework implemented by database management system 200 provided by one embodiment are described in more detail below.
In one embodiment, the database management system 200 may be used to verify that a query pattern is contained in a view pattern according to the definition contained in the following pattern. Q 'is queried if and only if the following conditions hold' g Is included in another query Q g In the pattern of (1), is represented as
Figure BDA0003969923970000092
1. Is present from Q' g To Q g Is mapped isomorphically to M, so that Q' g Is Q g Sub-graph embedding;
2. for each query node v 'e Q' g Mapping nodes v = M (v ') and v' have the same label and the same boolean predicate;
3. for each side e '∈ Q' g Mapping edge e = M (e ') has the same label and the same boolean predicate as e'.
Although finding all sub-graph isomorphism mappings is NP-Hard (NP-Hard), several practical algorithms can be implemented to find sub-graph isomorphism in polynomial time. In one embodiment, database management system 200 may be used to implement the VF2 algorithm disclosed in "a (sub) graph isomorphic algorithm for matching large graph graphs" or a variant thereof (capable of running in quadratic time to check the pattern inclusion between two queries), published by l.p.cordela, p.foggia, c.sansone, and m.vento 2004 on pages 1367-1372 of IEEE pattern analysis and machines intelligent journal 26 (10). More specifically, in one embodiment, the database management system 200 may implement the VF2 algorithm to find the sub-graph isomorphism under condition 1 above. If so, the database management system 200 may check whether the above conditions 2 and 3 are also true.
Suppose that
Figure BDA0003969923970000093
And will V G (Q g ) Is represented as Q g The database management system 200 may then be used to: determining that V can be used is determined by determining that the following two conditions hold G (Q g ) Answer Q' g Is shown as
Figure BDA0003969923970000094
1. From each side E '∈ E' p To edge mode E ∈ E p Where Ep is V P (Q g ) The set of edge patterns.
2. For and e '∈ Q' g Edge e without mapping belongs to Q g If the end point v in the edge e is reached e And v 'is e.Q' g If there is a mapping, then node v e = M (v') must appear at Q g In the previous traversal pattern of (2).
As described above, conditions 1 and 2 above may cause the database management system 200 to verify the view content V G (Q g ) Comprising Q' g All sub-graph embedding (but Q' g And Q g With subgraph isomorphism present). This is because the diagram view V G (Q g ) Is a graph derived by the database management system 200 from query Q g Is constructed sequentially. Thus, some not Q 'may be traversed' g Of mode side of (1), thereby losing Q' g Partial results of (c). This can be further explained on the basis of the following simple example of chain traversal. Assume two pattern queries: q1: a → b → c and Q2: a → b → c → d, it is understood that the pattern of Q1 is included in the pattern of Q2. But do notIf Q2's starting edge is c → d, there is no mapping to Q1, then view V is due to the constraint of query node d G (Q2) there is no guarantee that all matching vertices including query node c are included. In this case, view V G (Q2) cannot be used to answer the graph query Q1. However, if the starting edge of graph query Q2 is not c → d, then it will be appreciated that c → d must be traversed after pattern b → c. Thus, in this case, the graph query Q1 may use the view V G (Q2) to respond.
Using the definitions introduced above, fig. 4 illustrates a first algorithm implemented by the database management system 200 for determining whether a view response graph query can be used. The algorithm shown in FIG. 4 queries Q in a pattern g View V (Q' g ) And query Q g And Q' g Taking the sub-graph isomorphism M as input, and executing the following steps:
(1) By checking for Q' g Whether or not there is an edge mapping in M (Ep) for each edge of (E) to verify whether or not the above condition 1 holds, where E p Represents V P (Q g ) A set of edge patterns;
(2) Identifying Q 'by iteration' g Edge of (c) access vertex map M (V) p ) And add the visited vertex to the representation D v Verifying whether the condition 2 is satisfied in the dictionary;
(3) Once the boolean value is returned, the process terminates.
If dictionary D v Comprises a first reaction with M (V) p ) All the vertices of (2), the database management system 200 is used to validate the graph query Q g Can be seen from the view V G (Q' g ) And (6) responding.
4. Automatic view selection
An embodiment of view selection implemented by database management system 200 is described in more detail below. More specifically, the first part of the description below explains in more detail the generic view selection problem that is addressed by the database management system 200. The second and third sections described below describe in more detail the revenue-Driven Algorithm (BDA) and Graph-Genetic Algorithm (GGA) implemented by an embodiment of the database management system 200 (with a global performance index) for selecting candidate view subsets.
The general view selection problem solved by the database management system 200 can be expressed as follows. Considering the candidate set of views V and the quantity S (defining the available space of the memory 200 b) under a particular workload Q, the view selection problem is to select the set of views V derived from V s B (V) s Q) is maximized with the constraint of V s The total memory space occupied is less than S. Size of view V or view size s v May be defined as the total number of edges and nodes of the view content of the corresponding view.
Even at V s Is a subset of V and V s The view selection problem is also NP-hard in the special case where each view in (a) is independent (i.e. cannot be merged with other views). For this static case, database management system 200 may reduce the knapsack problem, i.e., finding a set of k items with space occupation S1 \8230, 8230, sk and revenue b1 \8230, \8230, bk, in order to maximize the sum of the revenue of the selected items to fill the memory space budget S. In one embodiment, database management system 200 may also be used to handle dynamic situations, where V s The views in (1) can be changed by adding, modifying, or removing views, etc. The computation of this embodiment is more complex because the space of the set of potential views is very large and it is generally not feasible in practice to explore all possible combinations. The corresponding algorithm implemented by database management system 200 for each setting is described in more detail below.
FIG. 5 illustrates a possible Algorithm for graph view selection, referred to herein as a revenue-Driven Algorithm (BDA), implemented by the database management system 200. The BDA considers the view selection problem as a static case of the knapsack problem. It uses a dynamic programming strategy to implement the following phases: (i) Obtaining the size of each view V in the candidate view set V (e.g., in one embodiment, the size of the memory map may be directly used as the view size, since the memory map is constructed); (ii) By enumerating S V To evaluate the yield (i.e., local performance indicator) of each view V in the candidate view set V, where B V To representView revenue vector, S V Representing a magnitude vector; (iv) (iv) obtaining the selected subset of candidate views using a bottom-up approach as opposed to step (iii).
This can be further illustrated by the following example of three graph queries Q1, Q2, Q3 in a social network:
Q1:g.V().has('personid','001').as('p').out('knows').as('f').select('p','f')
Q2:g.V().has('personid','001').as('p').out('knows')
.as('f').out('isLocatedIn').as('c').select('p','f','c')
Q3:g.V().has('personid','001').as('p').match(
__.as('p').out('knows').as('f'),
__.as('f').out('workAt').as('c'),
__.as('f').out('studyAt').as('u'))
.select('p','f','c','u')
where Q1 is to find a friend of a person (id = '001'), Q2 is to find a friend of the same person and its location, and Q3 is to return to the workplace and graduation university of the friend of the person (id = '001'). Q1 is determined to be contained in both Q2 and Q3 by invoking the VF2 algorithm with verification framework implemented by the database management system 200 provided by one embodiment for checking for pattern containment. Thus, the set of candidate views may be reduced to { Q2, Q3}.
A Graph Genetic Algorithm (GGA) for selecting a subset of candidate views implemented by an embodiment of the database management system 200 is described in more detail below in the context of fig. 6, fig. 6 showing a GGA. It will be appreciated that the database management system 200 implements GGA to model the view selection process as a state search, where each state includes a set of selected views and a total revenue (i.e., a global performance index). The initial state corresponds to the original candidate view set V with a certain gain b (V). The database management system 200 is used to obtain another state V from the view set V by view addition, modification, and removal of the initial state k It may have a greater profit b (V) k ) And so on. In thatIn one embodiment, the database management system 200 is configured to: if state V k Is not greater than the original candidate view set V, then the state V is cancelled k And continues to apply other view transitions starting from the previously obtained state. In one embodiment, database management system 200 may be used to: if the available memory space is sufficient, all candidate views are cached. However, in practice, the available memory space is often limited. Thus, in one embodiment, the database management system 200 is operable to determine a subset of the plurality of candidate graph views in the relational database databases 208, 210, 211 within constraints defined by the total size of the subset of memory 200b available for storing the candidate graph views 203 a-203 n, in accordance with the revenue associated with the respective view set.
To add, modify, and remove views from the initial state, one embodiment provides a database management system 200 for implementing two atomic probability operations for the transition diagram view, namely a split operation and a merge operation. In one embodiment, the split operation is a transformation that splits a view into multiple sub-graph views, and the merge operation is a transformation that merges and joins the sub-graphs of the multiple views. Through the splitting operation and the merging operation, the database management system 200 is used to identify and merge common subgraph parts across views. This type of transformation is most interesting because finding a common subgraph of graphs is an NP-hard problem. Since an exhaustive search of common subgraphs in a graph can be very costly, especially for large workloads, one embodiment provides a database management system 200 for implementing a probabilistic optimization algorithm to balance efficiency and optimality.
In one embodiment, the splitting operation implemented by the database management system 200 uses a vertex cut method based on Depth First Search (DFS) to find the hooks of the graph, and then decomposes the graph into multiple sub-graphs according to the hooks of the graph. These subgraphs may be referred to as the genes of the graph. For example, in the above example, the hitching point for Q2 is node f, which can break Q2 down into two genes: p (id = '001') → f and f → c. In the DFS method, the vertex u is a hooking point if one of the following two conditions holds: (i) vertex u is the root of the DFS tree, with at least two child vertices; (ii) Vertex u is not the root and has a child vertex v, such that no vertex in the subtree rooted at v has a back edge of one of the parent vertices of u.
In one embodiment, the fusion operation implemented by database management system 200 merges and joins views by the following steps.
(i) The VF2 algorithm with verification method is called to check the subgraph inclusion between the graph genes and other graphs. If so, continuing with step (ii), otherwise, continuing with step (iii).
(ii) The remaining genes were examined for commonalities with those of the other figures. These genes, if any, are incorporated into the figure. Otherwise, these genes are linked to genes of the same map.
(iii) The remaining genes are incorporated into the figure.
It will be appreciated that unlike BDA which only considers single view cases (associated with local performance indicators), GGA involves a multi-view answer (associated with global performance indicators) in which the graph genes have been partitioned and connected to different graph views. In one embodiment, database management system 200 is used to implement a greedy approach that operates as follows.
(1) For each query Q in the workload Q, it is checked whether each query can be answered by a single view. If so, the revenue is evaluated and a transition is made to step (3). Otherwise, it is checked whether each query can be contained in multiple views with genes, and these views can include a hypergraph for query q. If yes, go to step (2). Otherwise, go to step (3).
(2) The total benefit of the view set V is evaluated by the following equation:
Figure BDA0003969923970000121
wherein the cost comprises a part of the evaluation cost ∑ v∈V cost (qv) and combined cost
Figure BDA0003969923970000122
Summing; cost (q | R) represents the number of relational graphsThe cost of the query on the databases 208, 210, 211.
After the total profit b (V, q) is evaluated, the profit is assigned to each V ∈ V in proportion to the partial evaluation cost (i.e., cost (q | V)).
(3) This process is repeated until all queries have been answered.
(4) The view with the largest benefit is iteratively selected until the total size exceeds the memory size constraint S.
5. Index-based view maintenance
In one embodiment, the database management system 200 is used to support view maintenance in the case of data updates and workload updates. In one embodiment, this is accomplished by a two-level graph gene index (referred to herein as the GG index). Fig. 7 shows a plurality of two-level graph gene indices 203a1 to 203ai, 203b1 to 203bj, and 203N1 to 203Nk for a plurality of candidate graph views 203a to 203N. Advantageously, the GG index is the result of the GGA described above, and therefore does not have to be generated from scratch. Once the subset of candidate map views has been determined by the GGA, a GG index is automatically generated and may be cached in memory 200b of the database management system 200 to maintain the views. As described in detail below, the GG index may include two integers or levels, and the database management system 200 may be configured to identify graph views 203 a-203 n using a first integer (or level) of the GG index and to identify an included graph gene using a second integer (or level) of the GG index, as shown in fig. 7.
To handle view data updates and ensure consistency of the view data and the underlying base data, the database management system 200 may be used to update the base data and then synchronize the updates to the view data. Updating the data of the selected subset of candidate views is typically simple because the data is stored in the memory 200b of the database management system 200. Thus, these can be updated directly. However, updating the data of the materialized view requires special handling because of the large number of updates that can incur significant costs due to random disk I/O operations. To address this problem, the database management system 200 provided by one embodiment implements a lazy update policy that delays updates for a fixed time and recalculates the materialized view from the underlying base data if the number of updates reaches the view size.
In the case of workload updates, the database management system 200 may be used to use the GG index to determine whether a new workload is contained in a selected view or gene. If so, i.e., a new workload is included in the selected view or gene, then the database management system 200 may be used to update only the revenue for the view and gene used. Otherwise, the revenue for the new workload is computed and the last step of greedy selection is invoked again to select a new set of views. The selected view that is not in the new view set is replaced by a view in the new view set. More specifically, the validation process implemented by the database management system 200 may include the following stages in view of the new query q.
(1) For each view v in the first level of the GG index, it is checked using Algorithm 1 whether it is
Figure BDA0003969923970000131
In the case of (c). If so, the view revenue is updated accordingly. Otherwise, go to (2).
(2) Iterative examination of each Gene
Figure BDA0003969923970000132
If so, the gene is added to the matched set M (g) and a fusion operation is performed on the remaining entries in the set.
(3) Checking for the presence
Figure BDA0003969923970000134
Figure BDA0003969923970000133
The case (1). If so, the revenue for the view set is updated accordingly, as new queries can be answered by the genes. Otherwise, go to (4).
(4) Algorithm 3 is invoked to generate a new generation and new selection. If there is insufficient memory space, the newly selected view that is not in the previous view selection is cached and the stale view is deleted.
6. Performance of
The performance of an embodiment of the database management system 200 has been tested using the following settings. The database management system 200 has been implemented on a machine with 4 cores i5-4590 CPU and 16GB RAM. Tinkerrpop v3.3.0 has been chosen as the graph computation engine. The memory map view is implemented as a TinkerGraph structure, and the view is materialized as a graph ml file. The underlying SQL engine is SQLG v2.0.2 which supports the Gremlin2SQL function.
An LDBC graph dataset with a scale factor of SF1 was created, resulting in a graph with approximately 100 ten thousand vertices and 200 ten thousand edges. An exemplary workload includes a total of 9 Gremlin-mode queries Q1 through Q9.
Figure 8 compares the performance of all queries Q1 through Q9 between a database management system 200 (referred to as "GRView") provided by an embodiment and a SQL engine SQLG (for each query, the bar on the left shows the performance of the database management system 200 provided by an embodiment, while the bar on the right shows the SQLG performance). The results clearly show that the method implemented by database management system 200 provided by one embodiment can significantly speed up the response speed of graph queries compared to the relational method (i.e., SQLG). There are two reasons for the acceleration to 16 times. First, an increase in the number of edges traversed corresponds directly to the query cost, since the view approach has a relatively small graph size, thus avoiding unnecessary graph traversal (e.g., both views v1 and v2 can answer query q, whereas view v1 would answer query q faster since view v1 is smaller than view v2, since its query processing has to process less data). Second, because one embodiment provides a database management system 200 that utilizes a memory map view and a native map execution engine to execute input map queries, it may speed up the execution of complex map queries for relational storage by avoiding costly connections.
Fig. 9a and 9b show a comparison between the performance of embodiments of a database management system 200 implementing BDA and GGA. More specifically, FIG. 9a compares the view sizes used by embodiments of the database management system 200 implementing BDAs and GGAs, while FIG. 9b compares the runtime for answering queries. The first observation is that embodiments of the database management system 200 implementing GGAs reduce the space of view size to 58% compared to embodiments of the database management system 200 implementing BDAs, which illustrates that GGAs are effective in identifying common portions of a view of a graph. The second observation is that, overall, GGAs respond to queries faster than BDAs. The reason is that the view set of the GGA is smaller than that of the BDA, so its view construction and query evaluation times are short.
Fig. 10 illustrates a comparison between the performance of an embodiment of a database management system 200 implementing BDA and GGA in optimizing overall view revenue. More specifically, the memory space budget was adjusted from 100000 to 400000 (100000 adjustments at a time) to compare the view gains obtained by the two different embodiments. As can be seen from fig. 10, the results indicate that GGA converges faster and has higher revenue than BDA. When the space budget reaches 200000, as shown in fig. 9a, the view size of the GGA is less than 150000, so all views in the GGA can be cached in the memory, and thus better performance is obtained in answering the query. On the other hand, as the spatial budget further increases to 400000,bda may have a constant benefit, i.e. all views in the BDA may be cached to obtain a spatial budget of 400000. However, GGAs always have greater view revenue than BDAs, confirming that GGAs can provide better view selection through view state transitions.
While several embodiments of the database management system 200 have been described above within the Gremlin framework, it should be understood that embodiments of the database management system 200 may be used to accelerate other types of graph queries, such as cryptographic (Cypher) queries in relational database databases. The cryptographic query also declares a graph pattern MATCH using MATCH syntax. Generally, embodiments of the database management system 200 described above may also be implemented using a cryptographic framework or variations thereof. This may only require modifying the query parsing to identify the ordered pattern and generate the graph view. The view benefits calculated by the database management system 200 provided by the above embodiments may be calculated differently with respect to view selection. For example, a possible variation of the embodiment of database management system 200 described above is a cost model approach, which may incorporate other view factors such as maintenance costs.
Embodiments of database management system 200 may be implemented in other applications for more complex graph analysis, such as online analytical processing (OLAP) for graph analysis. While the embodiments described above focus on graph pattern matching, the embodiments may be extended to support more complex Gremlin graph query steps, such as controls (repeat, branch 8230; or algorithms (e.g., shortest path, web page ranking). The main challenge is how to check the query containment of the view in such cases. Further, while it has been considered above that a view may fully answer a query, embodiments of the database management system 200 may also be used to partially answer a query. In other words, embodiments of the database management system 200 are able to answer more queries combined with the underlying base table.
FIG. 11 is a schematic diagram of a method 1100 for performing graph queries according to one or more graph views 203 a-203 n in a relationship database 208, 210, 211. Method 1100 includes step 1101: a plurality of map views 203a to 203n in a relationship database 208, 210, 211 is determined from one or more map queries in the relationship database 208, 210, 211 to obtain a plurality of candidate map views 203a to 203n in the relationship database 208, 210, 211. Further, the method 1100 comprises the step 1103: a subset of the plurality of candidate graph views 203a to 203n in the relational database 208, 210, 211 is stored, for example, in the memory 200b. Method 1100 includes a further step 1105: graph queries in the relationship graph databases 208, 210, 211 are performed according to a subset of the plurality of candidate graph views in the relationship graph databases 208, 210, 211.
Those skilled in the art will understand that the "blocks" ("units") in the various figures (methods and apparatus) represent or describe the functionality of embodiments of the present invention (rather than separate "units" in hardware or software) and thus equally describe the functionality or features of apparatus embodiments as well as method embodiments (units = steps).
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatus, and methods may be implemented in other ways. For example, the described embodiments of the apparatus are merely exemplary. For example, the cell division is only a logical functional division, and may be another division in an actual implementation. For example, various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented. Further, the mutual coupling or direct coupling or communicative connection shown or described may be realized by means of some interfaces. Direct coupling or communicative connection between devices or units may be achieved through electrical, mechanical, or other means.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.
In addition, the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically present separately, or two or more units may be integrated into one unit.

Claims (15)

1. A database management system (200) for performing graph queries according to one or more graph views in a relational database (208, 210, 211), the database management system (200) comprising:
a memory (200 b);
a processor (200 a) configured to:
determining a plurality of graph views (203 a to 203 n) in the relational database (208, 210, 211) from one or more graph queries in the relational database (208, 210, 211) to obtain a plurality of candidate graph views (203 a to 203 n) in the relational database (208, 210, 211);
storing a subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211) in the memory (200 b);
performing a graph query in the relational database (208, 210, 211) according to the subset of the plurality of candidate graph views (203 a to 203 n) in the relational database (208, 210, 211).
2. The database management system (200) of claim 1, wherein each candidate graph view (203 a-203 n) occupies a respective portion of the memory (200 b);
the database management system (200) is further configured to: determining the subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211) according to a total size of the selected subset of the plurality of candidate graph views (203 a-203 n) in the memory (200 b) available for storing the relational database (208, 210, 211) and/or one or more performance indicators associated with the plurality of candidate graph views (203 a-203 n).
3. The database management system (200) of claim 2, further comprising a graph query engine (205) and a relational query engine (210);
the database management system (200) is further configured to: determining respective performance indicators for the plurality of candidate graph views (203 a-203 n) by comparing one or more queries processed by the graph query engine (205) using the respective candidate graph views (203 a-203 n) to the same one or more queries processed by the relational query engine (210).
4. The database management system (200) of claim 3, wherein the database management system (200) is further configured to: converting the one or more graph queries for the graph query engine (205) into one or more relational queries for the relational query engine (210) when the one or more graph queries are not processable by the graph query engine (205).
5. The database management system (200) according to claim 3 or 4, wherein the database management system (200) is further configured to: determining the subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211) in dependence on the plurality of performance indicators of the plurality of candidate graph views (203 a-203 n) and a total size of the memory (200 b) available for storing the selected subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211).
6. The database management system (200) of any of the preceding claims, wherein the database management system (200) is further configured to determine whether the one or more graph queries can be processed by the graph query engine (205) using the respective candidate graph view (203 a-203 n).
7. The database management system (200) of claim 2, wherein the one or more performance indicators associated with the plurality of candidate graph views (203 a-203 n) comprise a plurality of global performance indicators, wherein each global performance indicator is associated with a respective combination of the plurality of candidate graph views (203 a-203 n); the database management system (200) is further configured to: determining the subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211) in accordance with the plurality of global performance indicators.
8. The database management system (200) of claim 7, wherein the database management system (200) is further configured to: determining a global performance indicator for a respective combination of the plurality of candidate graph views (203 a to 203 n) by dividing the respective candidate graph view (203 a to 203 n) into at least two candidate sub-graph views.
9. The database management system (200) according to claim 7 or 8, wherein the database management system (200) is further configured to: generating a new candidate graph view by merging subgraphs of at least two of the plurality of candidate graph views (203 a-203 n), determining a global performance indicator for a respective combination of the plurality of candidate graph views (203 a-203 n).
10. The database management system (200) according to any of the preceding claims, wherein each candidate graph view (203 a-203 n) comprises a candidate graph view schema and candidate graph view content; the database management system (200) is further configured to: determining the plurality of candidate graph views from the one or more graph queries by mapping the one or more graph queries to one or more candidate graph view schemas (203 a-203 n).
11. The database management system (200) of claim 10, wherein the database management system (200) is further configured to: mapping the one or more graph queries to the one or more candidate graph view schemas by sequentially mapping nodes and edges of the one or more graph queries to nodes and edges of the one or more candidate graph view schemas.
12. The database management system (200) according to claim 10 or 11, wherein the database management system (200) is further configured to: generating a respective edge-derived graph from the one or more graph queries to generate respective candidate graph view content for the respective candidate graph view.
13. The database management system (200) according to any of the preceding claims, wherein the database management system (200) is further configured to: determining respective cost metric values for the candidate graph views (203 a-203 n); limiting the number of candidate graph views (203 a-203 n) by removing candidate graph views (203 a-203 n) for which a cost metric value is greater than a cost metric threshold.
14. The database management system (200) according to any of the preceding claims, wherein the database management system (200) is further configured to: receiving one or more other graph queries; adjusting the subset of the one or more candidate graph views (203 a-203 n) in the relationship database (208, 210, 211) in the memory (200 b) in accordance with the one or more other image queries in the relationship database (208, 210, 211).
15. A method (1100) for performing a graph query from one or more graph views (203 a to 203 n) in a relational database (208, 210, 211), the method (1100) comprising:
determining (1101) a plurality of graph views (203 a to 203 n) in the relationship database (208, 210, 211) from one or more graph queries in the relationship database (208, 210, 211) to obtain a plurality of candidate graph views (203 a to 203 n) in the relationship database (208, 210, 211);
storing (1103) a subset of the plurality of candidate graph views (203 a to 203 n) in the relational database (208, 210, 211);
executing (1105) a graph query in the relational database (208, 210, 211) according to the subset of the plurality of candidate graph views (203 a-203 n) in the relational database (208, 210, 211).
CN202080101557.8A 2020-06-09 2020-06-09 Database management system and method for graph view selection of relational database databases Pending CN115803731A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/095161 WO2021248319A1 (en) 2020-06-09 2020-06-09 Database management system and method for graph view selection for relational-graph database

Publications (1)

Publication Number Publication Date
CN115803731A true CN115803731A (en) 2023-03-14

Family

ID=78846652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080101557.8A Pending CN115803731A (en) 2020-06-09 2020-06-09 Database management system and method for graph view selection of relational database databases

Country Status (4)

Country Link
US (1) US20230126509A1 (en)
EP (1) EP4150476A4 (en)
CN (1) CN115803731A (en)
WO (1) WO2021248319A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076725A (en) * 2023-09-12 2023-11-17 北京云枢创新软件技术有限公司 Method, electronic device and medium for searching tree nodes based on underlying data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156725B (en) * 2011-04-01 2013-07-31 中国测绘科学研究院 Method for enhancing inquiring performance of data warehouse
CN107111617B (en) * 2014-12-19 2021-06-08 微软技术许可有限责任公司 Graph processing in a database
US10339179B2 (en) * 2016-04-11 2019-07-02 Oracle International Corporation Graph processing system that can define a graph view from multiple relational database tables
CN106250457B (en) * 2016-07-28 2019-07-12 国网山东省电力公司电力科学研究院 The inquiry processing method and system of big data platform Materialized View
US10467229B2 (en) * 2016-09-30 2019-11-05 Microsoft Technology Licensing, Llc. Query-time analytics on graph queries spanning subgraphs
US10789242B2 (en) * 2018-04-25 2020-09-29 Microsoft Technology Licensing, Llc Managing materialized views in eventually consistent distributed data stores

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076725A (en) * 2023-09-12 2023-11-17 北京云枢创新软件技术有限公司 Method, electronic device and medium for searching tree nodes based on underlying data
CN117076725B (en) * 2023-09-12 2024-02-09 北京云枢创新软件技术有限公司 Method, electronic device and medium for searching tree nodes based on underlying data

Also Published As

Publication number Publication date
EP4150476A4 (en) 2023-05-31
WO2021248319A1 (en) 2021-12-16
US20230126509A1 (en) 2023-04-27
EP4150476A1 (en) 2023-03-22

Similar Documents

Publication Publication Date Title
US8326825B2 (en) Automated partitioning in parallel database systems
CN107111617B (en) Graph processing in a database
Conn et al. Use of quadratic models with mesh-adaptive direct search for constrained black box optimization
US7739269B2 (en) Incremental repair of query plans
US9141666B2 (en) Incremental maintenance of range-partitioned statistics for query optimization
CN104137095B (en) System for evolution analysis
US8037039B2 (en) Runtime class database operation
US20200265049A1 (en) Materialized graph views for efficient graph analysis
Yeganeh et al. A framework for data quality aware query systems
US11455283B2 (en) Candidate element selection using significance metric values
CN111460234B (en) Graph query method, device, electronic equipment and computer readable storage medium
US9020954B2 (en) Ranking supervised hashing
US20190005092A1 (en) Query optimization using propagated data distinctness
CN109992590B (en) Approximate space keyword query method and system with digital attributes in traffic network
US9594783B2 (en) Index selection for XML database systems
CN108710662B (en) Language conversion method and device, storage medium, data query system and method
Lee et al. CrowdK: Answering top-k queries with crowdsourcing
Malki et al. Data services with uncertain and correlated semantics
US20230126509A1 (en) Database management system and method for graph view selection for a relational-graph database
CN114253995A (en) Data tracing method, device, equipment and computer readable storage medium
Matuszka et al. Geodint: towards semantic web-based geographic data integration
Jing et al. SPARQL graph pattern rewriting for OWL-DL inference queries
Reniers et al. Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF
Brisaboa et al. Rank-based strategies for cleaning inconsistent spatial databases
Chen et al. Query grouping–based multi‐query optimization framework for interactive SQL query engines on Hadoop

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination