US20150186461A1 - Cardinality Estimation Using Spanning Trees - Google Patents

Cardinality Estimation Using Spanning Trees Download PDF

Info

Publication number
US20150186461A1
US20150186461A1 US14/145,777 US201314145777A US2015186461A1 US 20150186461 A1 US20150186461 A1 US 20150186461A1 US 201314145777 A US201314145777 A US 201314145777A US 2015186461 A1 US2015186461 A1 US 2015186461A1
Authority
US
United States
Prior art keywords
equivalence
predicate
query
node
cardinality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/145,777
Other versions
US9922088B2 (en
Inventor
Anisoar NICA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sybase Inc
Original Assignee
Sybase Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sybase Inc filed Critical Sybase Inc
Priority to US14/145,777 priority Critical patent/US9922088B2/en
Publication of US20150186461A1 publication Critical patent/US20150186461A1/en
Assigned to SYBASE, INC. reassignment SYBASE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NICA, ANISOARA
Application granted granted Critical
Publication of US9922088B2 publication Critical patent/US9922088B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06F17/30442
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24545Selectivity estimation or determination

Definitions

  • the embodiments relate generally to databases and more specifically to query optimization using cardinality estimation.
  • DBMS Database Management System
  • a user issues a query to the DBMS that conforms to a defined query language.
  • DBMS determines a query plan for the query. Once determined, the DBMS then uses the query plan to execute the query.
  • a DBMS relies on cardinality estimates that estimate the sizes (i.e., how many rows) of queries and sub-queries. Cardinality estimates are used to assess the efficiency (e.g., cost) of the query plan before the query plan is executed.
  • FIG. 1 is an example database computing environment in which embodiments can be implemented.
  • FIG. 2 is a block diagram of a cardinality estimator, according to an embodiment.
  • FIGS. 3A-B are diagrams of a forest of graphs corresponding to the join equivalence classes for a query and the minimum spanning trees in the forest of graphs, according to an embodiment.
  • FIG. 4 is a flowchart of a method for generating a cardinality estimate for a query, according to an embodiment.
  • FIG. 5 is a flowchart of a method for generating a cardinality estimate for a sub-query, according to an embodiment.
  • FIG. 6 is a diagram of an algorithm for computing the cardinality of a query, according to an embodiment.
  • FIG. 7 is diagram of an algorithm for determining join equivalence classes for a query, according to an embodiment.
  • FIG. 8 is a diagram of an algorithm for computing a selectivity estimate for using spanning trees for a query, according to an embodiment.
  • FIG. 9 is a diagram of an algorithm for computing the selectivity estimate for using spanning trees for a sub-query, according to an embodiment.
  • FIG. 10 is a diagram of an algorithm for computing the minimum spanning trees for a graph forest of join equivalence classes, according to an embodiment.
  • FIG. 11 is a block diagram of an example computer system in which embodiments may be implemented.
  • cardinality estimates are used for generating an optimal query plan for a query.
  • FIG. 1 is an example database computing environment 100 in which embodiments can be implemented.
  • Computing environment 100 includes a database management system (DBMS) 140 and client 110 that communicates DBMS 140 .
  • DBMS 140 may be a system executing on a server and accessible to client 110 over a network, such as network 120 , described below.
  • client 110 is represented in FIG. 1 as a separate physical machine from DBMS 140 , this is presented by way of example, and not limitation.
  • client 110 occupies the same physical system as DBMS 140 .
  • client 110 is a software application which requires access to DBMS 140 .
  • a user may operate client 110 to request access to DBMS 140 .
  • client and user will be used interchangeably to refer to any hardware, software, or human requestor, such as client 110 , accessing DBMS 140 either manually or automatically. Additionally, both client 110 and DBMS 140 may execute within a computer system, such as an example computer system discussed in FIG. 11 .
  • Network 120 may be any network or combination of networks that can carry data communications.
  • Such a network 120 may include, but is not limited to, a local area network, metropolitan area network, and/or wide area network that include the Internet.
  • DBMS 140 receives a query, such as query 102 , from client 110 .
  • Query 102 is used to request, modify, append, or otherwise manipulate or access data in database storage 150 .
  • Query 102 is transmitted to DBMS 140 by client 110 using syntax which conforms to a query language.
  • the query language is a Structured Query Language (“SQL”), but may be another query language.
  • SQL Structured Query Language
  • DBMS 140 is able to interpret query 102 in accordance with the query language and, based on the interpretation, generate requests to database storage 150 .
  • Query 102 may be generated by a user using client 110 or by an application executing on client 110 .
  • DBMS 140 Upon receipt, DBMS 140 begins to process query 102 . Once processed, the result of the processed query is transmitted to client 110 as query result 104 .
  • DBMS 140 includes a parser 162 , a normalizer 164 , a compiler 166 , and an execution unit 168 .
  • Parser 162 parses the received queries 102 .
  • parser 162 may convert query 102 into a binary tree data structure which represents the format of query 102 .
  • other types of data structures may be used.
  • parser 162 passes the parsed query to a normalizer 164 .
  • Normalizer 164 normalizes the parsed query. For example, normalizer 164 eliminates redundant SQL constructs from the parsed query. Normalizer 164 also performs error checking on the parsed query that confirms that the names of the tables in the parsed query conform to the names of tables 180 . Normalizer 164 also confirms that relationships among tables 180 , as described by the parsed query, are valid.
  • normalizer 164 passes the normalized query to compiler 166 .
  • Compiler 166 compiles the normalized query into machine-readable format. The compilation process determines how query 102 is executed by DBMS 140 . To ensure that query 102 is executed efficiently, compiler 166 uses a query optimizer 170 to generate an access plan for executing the query.
  • Query optimizer 170 analyzes the query and determines a query plan for executing the query.
  • the query plan retrieves and manipulates information in the database storage 150 in accordance with the query semantics. This may include choosing the access method for each table accessed, choosing the order in which to perform a join operation on the tables, and choosing the join method to be used in each join operation. As there may be multiple strategies for executing a given query using combinations of these operations, query optimizer 170 generates and evaluates a number of strategies from which to select the best strategy to execute the query.
  • query optimizer 170 generates multiple query plans. Once generated, query optimizer 170 selects a query plan from the multiple query plans to execute the query.
  • the selected query plan may be a cost efficient plan, a query plan that uses the least amount of memory in DBMS 140 , a query plan that executes the quickest, or any combination of the above, to give a few examples.
  • DBMS 140 uses cardinality estimator 172 .
  • Cardinality estimator 172 generates an estimate of the size (i.e., number of rows) of a query plan before the query plan is executed. Based on cardinality estimates, query optimizer 170 costs and selects an efficient query plan that executes query 102 from multiple query plans.
  • FIG. 2 is a block diagram 200 of a cardinality estimator, according to an embodiment.
  • cardinality estimator receives query 102 as input and generates a cardinality estimate 204 as output.
  • Query 102 accesses data in one or more tables.
  • query Q that accesses tables R i may be presented as:
  • each table R has a set of attributes, denoted as attr(R).
  • query 102 also contains a predicate.
  • a predicate is a condition that may be evaluated in query 102 .
  • a predicate may be a join predicate.
  • a join predicate is a predicate that specifies a join operation that links several tables together or a particular set of attributes when evaluated.
  • An evaluated join predicate typically returns a join table as a result.
  • a n ⁇ are attributes based on which tables T and S are joined, such that A ⁇ attr (T) and A ⁇ attr (S).
  • a short form for the join predicate T[A] S[A] may be represented as the conjunct predicate
  • cardinality estimator 172 determines a cardinality estimate over join equivalence classes for a join predicate.
  • a join equivalence class for a predicate may be non-constant or constant.
  • a tuple is an ordered set of elements, which is here an ordered set of a subset of attributes and a constant vector.
  • cardinality estimator 172 may determine a cardinality estimate over a set of join equivalence classes. For example, given a set of join equivalence classes corresponding to the logical expression:
  • cardinality estimator 172 computes the cardinality estimation for a sub-expression involving relations ⁇ R i1 , . . . R it ⁇ ⁇ ⁇ R 1 , . . . , R m ⁇ :
  • predicate p′(R i1 , . . . . R it ) contains all conjuncts of the original predicate p(R 1 , . . . , R m ) which refer exclusively to the tables (R i1 , . . . R it ).
  • cardinality estimator 172 determines cardinality estimate 204 over a set of join equivalence classes. From the one or more undirected graphs, cardinality estimator 172 identifies minimum spanning tees that link the vertices of the undirected graph.
  • cardinality estimator 172 identifies minimum spanning tees that link the vertices of the undirected graph.
  • a spanning tree of an undirected graph is a tree that connects all vertices in the undirected graph.
  • an edge in the undirected graph has assigned a weight which in the graphs corresponding to the join equivalence classes may represent the quality of the selectivity estimation of an edge.
  • cardinality estimator 172 determines a set of vertices V (or nodes), where each node R i [A] represents a relation and the relation's attributes that are participating in the join equivalence class. In an embodiment, for a constant join equivalence class, cardinality estimator 172 generates an extra node for the constant vector (c 1 , . . . c n ) as discussed below. In an embodiment, cardinality estimator 172 also generates a set of edges E of the form (R i , R j , (p 1 , . . . , p n )), for each join predicate (p 1 , . . . , p n ) between R i and R j In an embodiment, cardinality estimator 172 annotates each edge with one or more predicates that correspond to the join predicates between the two nodes of the edge.
  • V is a set of vertices
  • E is a set of edges connecting the vertices.
  • the set of vertices V are:
  • V ⁇ R 1 [A], . . . ,R m [A ],( c 1 , . . . ,c n ) ⁇
  • V is a set of vertices
  • E is a set of edges connecting the vertices.
  • the set of vertices V are:
  • V ⁇ R 1 [A], . . . ,R m [A] ⁇
  • FIG. 3 A is an example graph generated using a cardinality estimator, according to an embodiment.
  • DBMS 140 receives query Q, such that:
  • predicate p includes multiple join predicates p1 to p21 at shown below:
  • cardinality estimator 172 Based on the above query Q, cardinality estimator 172 generates one constant equivalence class to the predicate p′, where:
  • Graph G1(V 1 , E 1 ), including the vertices V 1 and edges E 1 described above is illustrated in diagram 300 A in FIG. 3A .
  • Graph G1 is a graph showing a constant equivalence class for the predicate p′. Additionally, FIG. 3A also includes a listing of predicates p1 to p12 that comprise predicate p′ that defines the edges in graph G1.
  • cardinality estimator 172 also generates a second equivalence class that corresponds to a predicate p′′, where:
  • FIG. 3A Graph G2(V 2 , E 2 ), including the vertices V 2 and edges E 2 described above is also illustrated in FIG. 3A .
  • Graph G2 is a diagram of a constant equivalence class for the predicate p′′. Additionally, FIG. 3A also includes a listing of predicates p1 to p21 that comprise predicate p′′ and defines edges in graph G2.
  • cardinality estimator 172 determines the spanning trees in the graphs. From the spanning trees, cardinality estimator 172 determines the minimum spanning tree and calculates the cardinality estimate from the minimum spanning tree as discussed below.
  • predicate pi which annotates more than one edge in equivalence graphs corresponding to a query Q.
  • the predicate p5 annotates the edge (R 4 [A, B], c(A, cB), (p5, p6)) in graph G1 and the edge (R 4 [A], c(A), (p5)) in graph G2.
  • cardinality estimator 172 determines a cardinality estimate
  • cardinality estimator 172 generates a cardinality estimate based on a subset S of the join predicates representing some relationships of the relations ⁇ R 1 , . . . R n ⁇ that forms spanning trees.
  • cardinality estimator 172 excludes redundant (or repeating) predicates from the spanning tree.
  • FIG. 3B is a diagram 300 B of the example spanning tress of graphs G1 and G2, according to an embodiment.
  • the dashed lines between the nodes in graphs G1 and G2 indicate the edges that are included in the spanning trees for graphs G1 and G2.
  • the subset of predicates that are included in the spanning tree for graphs G1 and G2 are p5, p6, p7, p8, p11, p12, p14 and p20, as shown in bold in FIG. 3B .
  • cardinality estimator 172 sets T(G( ⁇ (p))) as a spanning tree of G( ⁇ (p).
  • the predicates in the spanning tree T(G( ⁇ (p))) can be defined as a set of predicates annotating the edges in the spanning tree T(G( ⁇ (p))).
  • predicates can be defined as:
  • cardinality estimator 172 applies the following theorem to calculate cardinality estimation for query Q. If T(G( ⁇ (p))) is a spanning tree of G( ⁇ (p)), then the conjunct generated by the spanning tree infers the original predicate p, such as:
  • v 1 , v 2 , (p 1 , . . . , p n ) an edge in G( ⁇ (p)) ⁇ is inferred by the conjunct generated by the spanning tree:
  • cardinality estimator 172 determines that a query Q, such as:
  • a predicate s has e equivalent classes ⁇ (s 1 ), . . . , ⁇ (s e ) ⁇ is equivalent to a query Q′ for which the predicate s was replaced with the predicates of a spanning trees induced by ⁇ T(G( ⁇ (s 1 ))), . . . , T(G ⁇ (s e ))) ⁇ .
  • predicates of the forest for the graphs of join equivalence classes are predicates that annotate the spanning trees, with duplicates removed, such as:
  • cardinality estimator 172 uses operation ⁇ d to remove duplicate predicates.
  • cardinality estimator 172 rewrites query Q using the predicates in the forest of join equivalence classes Preds( ⁇ T(G( ⁇ (s 1 ))), . . . , T(G ⁇ (s e ))) ⁇ ), such as:
  • cardinality estimator 172 determines a cardinality estimate for query Q 0 where:
  • cardinality estimator 172 can split predicate s into e join equivalence classes.
  • Cardinality estimator 172 determines the cardinality of query Q 0 . To determine cardinality of query Q 0 , cardinality estimator 172 first determines the selectivity estimate of query Q using the spanning trees for the join predicate s referred to as selectivity(s). Cardinality estimator 172 then uses selectivity(s) to selectivity(x′) and the cardinalities of the relations R i to compute card (Q 0 ), as illustrated below:
  • selectivity(s) STSelectivity(Q) and can be computed using an algorithm in FIG. 8 , according to an embodiment.
  • cardinality estimator 172 When cardinality estimator 172 applies the above algorithm to graphs G1 and G2 shown in FIGS. 3A and 3B , cardinality estimator 172 generates cardinality estimate 204 for query Q associated with graphs G1 and G2 as:
  • FIG. 4 is a flowchart of a method 400 of a cardinality estimator generating a cardinality estimate for a query Q, according to an embodiment.
  • Method 400 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof.
  • the method 400 is performed by cardinality estimator 172 .
  • the cardinality estimator receives a query Q.
  • the cardinality estimator determines the minimum set of equivalence classes. For example, cardinality estimator 172 determines e equivalence classes ⁇ (s 1 ), . . . , ⁇ (s e ) ⁇ . As discussed above, equivalence classes are determined based on sets of common attributes that are included in tables joined in query Q.
  • the cardinality estimator determines the minimum spanning tress for cardinality estimation.
  • the edges represent predicate attributes, where each edge has a weight computed by DBMS 140 .
  • the weight for an edge representing a join predicate between two tables is a property vector of the edge, including, for example, the confidence level of the selectivity estimation for an edge; the quality of the selectivity estimation expressed as the properties of the join predicates. Such properties may include a type of relationship between the two tables.
  • a function which compares the two property vectors of the edges can be used, according to an embodiment.
  • cardinality estimator 172 determines the minimum spanning trees as the trees that include all vertices in the join equivalence graphs, such that the nodes of the graphs are connected using the edges associated with the lowest weights. In an embodiment, cardinality estimator 172 defines the spanning trees as ⁇ T(G( ⁇ (s 1 ))), . . . , T(G ⁇ (s e ))) ⁇ .
  • the cardinality estimator uses the predicate associated with the spanning trees to determine the cardinality estimate for query Q. For example, cardinality estimator 172 determines the cardinality estimate for query Q based on the predicates associated with edges of the minimum spanning trees determined in operation 408 and multiplied by the selectivity of these predicate, such as:
  • selectivity( s ) ⁇ e ⁇ Edges( ⁇ T(G( ⁇ (s 1 ))), . . . ,T(G( ⁇ (s e ))) ⁇ ) (selectivity( e ))
  • cardinality estimator 172 determines a cardinality estimate 204 for query Q
  • query optimizer 170 uses cardinality estimate 204 to determine a query plan from query Q.
  • cardinality estimator 172 also determines a cardinality estimate for a sub-expression of query Q, such as sub-expression Q′.
  • FIG. 5 is a flowchart of a method 500 of a cardinality estimator generating a cardinality estimate for a sub-section Q′ of query Q, according to an embodiment.
  • Method 500 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof.
  • the method 500 is performed by cardinality estimator 172 .
  • the cardinality estimator receives a query Q.
  • the cardinality estimator receives a subset of tables.
  • cardinality estimator 172 receives a subset of tables that query Q manipulates, such as ⁇ Ri 1 , . . . , Ri t ⁇ ⁇ ⁇ Ri, . . . , Rn ⁇ .
  • the cardinality estimator determines the equivalence classes.
  • cardinality estimator 172 determines e equivalence classes ⁇ (s 1 ), ⁇ (s e ) ⁇ .
  • equivalence classes are determined based on sets of common attributes that are included in tables joined in query Q.
  • the cardinality estimator generates join equivalence graphs.
  • the cardinality estimator determines the sub-graphs from the join equivalence graphs that include the subset of tables. For example, for each graph G( ⁇ (s i )), cardinality estimator 172 determines the vertex induced sub-graph G′( ⁇ (s i )), where the vertex induced sub-graph G′( ⁇ (s i )) includes only vertices ⁇ Ri 1 , . . . , Ri t ⁇ . In an embodiment, cardinality estimator 172 represents the join equivalence sub-graphs as ⁇ G′( ⁇ (s 1 )), . . . , G′ ⁇ (s e )) ⁇ .
  • the cardinality estimator determines the minimum spanning tress for cardinality estimation. For example, in the join equivalence sub-graphs ⁇ G′( ⁇ (s 1 )), . . . , G′ ⁇ (s e )) ⁇ , the edges represent predicate attributes, where each predicate attribute has a weight specified by DBMS 140 as described above.
  • cardinality estimator 172 determines the minimum spanning trees as the trees that include all vertices in the join equivalence sub-graphs G′( ⁇ (s i )), such that the nodes of the graphs are connected using the edges associated with the lowest weights. In an embodiment, cardinality estimator 172 defines the spanning trees as ⁇ T(G( ⁇ (s 1 ))), . . . , T(G ⁇ (s e ))) ⁇ .
  • the cardinality estimator uses the predicate associated with the spanning trees to determine the cardinality estimate for sub-query Q′. For example, cardinality estimator 172 determines the selectivity estimation for sub-query Q′ based on the edges associated with the best spanning trees determined in operation 512 and multiplied by the selectivity of the predicate, such as:
  • selectivity( s ′) ⁇ e ⁇ Edges( ⁇ T(G′( ⁇ (s 1 ))), . . . ,T(G′( ⁇ (s e ))) ⁇ ) (selectivity( e ))
  • FIG. 6 is a diagram of an algorithm 600 for computing the cardinality of a query, according to an embodiment.
  • Algorithm 600 may be implemented by cardinality estimator 172 to generate a cardinality estimate 204 .
  • Algorithm 600 uses algorithms 700 - 1000 discussed in FIGS. 7-10 to generate cardinality estimate 204 for query Q.
  • Selectivity estimation for Q is computed by algorithm 800 (STSelectivity(Q) call).
  • Algorithm 600 uses this new selectivity estimation and other estimates to compute the final cardinality estimation for the whole query Q 0 (at line 3 ).
  • the cardinality estimate is card(Q o ) is computed as:
  • card( Q o ) f (selectivity( s ),selectivity( x ′),cardinality( R 1), . . . ,cardinality( Rn )).
  • FIG. 7 is diagram of an algorithm 700 for determining join equivalence classes for a query, according to an embodiment. as required by cardinality estimator 172 during operations 404 and 506
  • cardinality estimator 172 determines join equivalence classes for a predicate.
  • the input to algorithm 700 is a query Q, such as:
  • the output of algorithm 700 is a minimum set of equivalence classes, such as such as ⁇ (s 1 ), . . . , ⁇ (s e ) ⁇ .
  • the details of algorithm 700 are included in FIG. 7 .
  • FIG. 8 is a diagram of an algorithm 800 for computing a selectivity estimate for using spanning trees for a query used by cardinality estimator 172 during operation 204 , according to an embodiment.
  • the input to algorithm 800 is a query Q, such as:
  • algorithm 800 uses algorithm 700 to generate a minimum set of the equivalence classes, such as set ⁇ (s 1 ), . . . , ⁇ (s e ) ⁇ , at step 1. From the minimum set of the equivalence classes, cardinality estimator 172 generates graphs ⁇ G( ⁇ (s 1 ))), . . . , G ⁇ (s e )) ⁇ , one graph for each equivalence class, at step 2. Cardinality estimator 172 then uses algorithm 1000 to generate minimum spanning trees ⁇ T(G( ⁇ (s 1 ))), . . .
  • cardinality estimator uses the minimum spanning trees ⁇ T(G( ⁇ (s 1 ))), . . . , T(G ⁇ (s e ))) ⁇ to determine selectivity for predicate s.
  • FIG. 9 is a diagram of an algorithm 900 for computing the selectivity estimate for using spanning trees for a sub-query, according to an embodiment.
  • the first input to algorithm 900 is a query Q, such as:
  • the second input to algorithm 900 is a subset of tables of tables in a query Q, such as:
  • the output of algorithm 900 is an estimated selectivity of a sub-predicate s where s′ includes predicates on the subset ⁇ R i1 , . . . R it ⁇ .
  • algorithm 900 uses algorithm 700 to generate a minimum set of the equivalence classes, such as set ⁇ (s 1 ), . . . , ⁇ (s e ) ⁇ , at step 1. From the minimum set of the equivalence classes, cardinality estimator 172 generates graphs ⁇ G( ⁇ (s 1 ))), . . . , G ⁇ (s e )) ⁇ , one graph for each equivalence class, at step 2.
  • Cardinality estimator 172 then computes a vertex induced sub-graph G′( ⁇ (s i )) for vertices of a subset of tables ⁇ R i1 , . . . . R it ⁇ , at step 3.
  • cardinality estimator 172 determines, using the algorithm 1000 , the minimum spanning trees ⁇ T(G′( ⁇ (s 1 ))), . . . , T(G′( ⁇ (s e ))) ⁇ from the new forest of sub-graph ⁇ G′( ⁇ (s 1 )), . . . , G′ ⁇ (s e )) ⁇ .
  • cardinality estimator 172 uses the minimum spanning trees ⁇ T(G′( ⁇ (s 1 ))), . . . , T(G′( ⁇ (s e ))) ⁇ to determine selectivity for predicate s′ for a sub-predicate s′, such as:
  • selectivity( s ′) ⁇ e ⁇ Edges( ⁇ T(G′( ⁇ (s 1 ))), . . . ,T(G′( ⁇ (s e ))) ⁇ ) (selectivity( e ))
  • FIG. 10 is a diagram of an algorithm 1000 for computing the minimum spanning trees, as required by cardinality estimator 172 during operations 408 and 512 , for a graph forest of join equivalence classes, according to an embodiment.
  • the edges in the a graph such as graph ⁇ G( ⁇ (s 1 ))), . . . , G( ⁇ (s e )) ⁇ , have different weights.
  • Cardinality estimator 172 may assign the weights to the edges based on a pre-configured criteria in DBMS 140 .
  • cardinality estimator 172 may evaluate the weights to the edges based on a betterQuality(e i , e j ) function which compares the two edges e i , e j , and determine the quality of the edge for inclusion into a minimum spanning tree.
  • the betterQuality(e i , e j ) function is included as part of algorithm 1000 .
  • cardinality estimator 172 Based on the quality of edges as determined by the weights, cardinality estimator 172 generates the minimum spanning trees from the forest of graphs ⁇ G( ⁇ (s 1 ))), . . . , G( ⁇ (s e )) ⁇ .
  • the output of algorithm 1000 is a set of best spanning trees ⁇ T(G( ⁇ (s 1 ))), . . . , T(G( ⁇ (s e ))) ⁇ .
  • cardinality estimator 172 identifies the edges in the forest of graphs ⁇ G( ⁇ (s 1 ))), . . . , G( ⁇ (s e )) ⁇ in a set E, where E may be defines as:
  • Cardinality estimator 172 uses algorithm 1000 to traverse though the edges using better Quality(e i , e j ) function, and compare the edges in set E.
  • cardinality estimator 172 identifies an edge using better Quality(e i , e j ) function, such as the edge having a lowest weight
  • algorithm 1000 adds the edge to a set of edges that together form the minimum spanning trees.
  • algorithm 1000 includes an add(d) function, where d is the edge identified using the betterQuality(e i , e j ) function that cardinality estimator 172 attempts to add to the minimum spanning trees.
  • the add(d) function is discussed in detail in FIG. 10 .
  • Computer system 1100 can be any well-known computer capable of performing the functions described herein, such as computers available from International Business Machines, Apple, Sun, HP, Dell, Sony, Toshiba, etc.
  • Computer system 1100 includes one or more processors (also called central processing units, or (CPUs), such as a processor 1104 .
  • processors also called central processing units, or (CPUs)
  • Processor 1104 is connected to a communication infrastructure or bus 1106 .
  • One or more processors 1104 may each be a graphics processing unit (GPU).
  • a GPU is a processor that is a specialized electronic circuit designed to rapidly process mathematically intensive applications on electronic devices.
  • the GPU may have a highly parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images and videos.
  • Computer system 1100 also includes user input/output device(s) 1103 , such as monitors, keyboards, pointing devices, etc., which communicate with communication infrastructure 1106 through user input/output interface(s) 1102 .
  • user input/output device(s) 1103 such as monitors, keyboards, pointing devices, etc., which communicate with communication infrastructure 1106 through user input/output interface(s) 1102 .
  • Computer system 1100 also includes a main or primary memory 1108 , such as random access memory (RAM).
  • Main memory 1308 may include one or more levels of cache.
  • Main memory 1108 has stored therein control logic (i.e., computer software) and/or data.
  • Computer system 1100 may also include one or more secondary storage devices or memory 1110 .
  • Secondary memory 1110 may include, for example, a hard disk drive 1112 and/or a removable storage device or drive 1114 .
  • Removable storage drive 1114 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 1114 may interact with a removable storage unit 1118 .
  • Removable storage unit 1118 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data.
  • Removable storage unit 1118 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
  • Removable storage drive 1114 reads from and/or writes to removable storage unit 1118 in a well-known manner.
  • secondary memory 1110 may include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 1100 .
  • Such means, instrumentalities or other approaches may include, for example, a removable storage unit 1122 and an interface 1120 .
  • the removable storage unit 1122 and the interface 1120 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 1100 may farther include a communication or network interface 1124 .
  • Communication interface 1124 enables computer system 1100 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 1128 ).
  • communication interface 1124 may allow computer system 1100 to communicate with remote devices 1128 over communications path 1126 , which may be wired and/or wireless, and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 1100 via communication path 1126 .
  • a tangible apparatus or article of manufacture comprising a tangible computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device.
  • control logic software stored thereon
  • control logic when executed by one or more data processing devices (such as computer system 1100 ), causes such data processing devices to operate as described herein.
  • embodiments may operate with software, hardware, and/or operating system implementations other than those described herein.
  • references herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein.

Abstract

A system, computer-implemented method, and computer-program product embodiments for determining a cardinality estimate for a query. A cardinality estimator identifies a predicate in a query, where the predicate is split into a plurality of equivalence classes. The cardinality estimator then generates a plurality of equivalence graphs from the plurality of equivalence classes, one equivalence graph for an equivalence class. Spanning trees are identified from the plurality of equivalence graphs, and the cardinality estimator then determines the cardinality estimate for the query from the spanning trees.

Description

    BACKGROUND
  • 1. Field
  • The embodiments relate generally to databases and more specifically to query optimization using cardinality estimation.
  • 2. Background
  • Computer databases have become a prevalent means for data storage and retrieval. A database user will commonly access the underlying data in a database using a Database Management System (“DBMS”). A user issues a query to the DBMS that conforms to a defined query language. When a DBMS receives a query, it determines a query plan for the query. Once determined, the DBMS then uses the query plan to execute the query. As part of determining an efficient query plan, a DBMS relies on cardinality estimates that estimate the sizes (i.e., how many rows) of queries and sub-queries. Cardinality estimates are used to assess the efficiency (e.g., cost) of the query plan before the query plan is executed.
  • BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES
  • The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments and, together with the description, further serve to explain the principles of the embodiments and to enable a person skilled in the relevant art to make and use the embodiments.
  • FIG. 1 is an example database computing environment in which embodiments can be implemented.
  • FIG. 2 is a block diagram of a cardinality estimator, according to an embodiment.
  • FIGS. 3A-B are diagrams of a forest of graphs corresponding to the join equivalence classes for a query and the minimum spanning trees in the forest of graphs, according to an embodiment.
  • FIG. 4 is a flowchart of a method for generating a cardinality estimate for a query, according to an embodiment.
  • FIG. 5 is a flowchart of a method for generating a cardinality estimate for a sub-query, according to an embodiment.
  • FIG. 6 is a diagram of an algorithm for computing the cardinality of a query, according to an embodiment.
  • FIG. 7 is diagram of an algorithm for determining join equivalence classes for a query, according to an embodiment.
  • FIG. 8 is a diagram of an algorithm for computing a selectivity estimate for using spanning trees for a query, according to an embodiment.
  • FIG. 9 is a diagram of an algorithm for computing the selectivity estimate for using spanning trees for a sub-query, according to an embodiment.
  • FIG. 10 is a diagram of an algorithm for computing the minimum spanning trees for a graph forest of join equivalence classes, according to an embodiment.
  • FIG. 11 is a block diagram of an example computer system in which embodiments may be implemented.
  • DETAILED DESCRIPTION
  • Provided herein are system, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for generating cardinality estimates, where cardinality estimates are used for generating an optimal query plan for a query.
  • FIG. 1 is an example database computing environment 100 in which embodiments can be implemented. Computing environment 100 includes a database management system (DBMS) 140 and client 110 that communicates DBMS 140. DBMS 140 may be a system executing on a server and accessible to client 110 over a network, such as network 120, described below. Although client 110 is represented in FIG. 1 as a separate physical machine from DBMS 140, this is presented by way of example, and not limitation. In an additional embodiment, client 110 occupies the same physical system as DBMS 140. In a further embodiment, client 110 is a software application which requires access to DBMS 140. In another embodiment, a user may operate client 110 to request access to DBMS 140. Throughout this specification, the terms client and user will be used interchangeably to refer to any hardware, software, or human requestor, such as client 110, accessing DBMS 140 either manually or automatically. Additionally, both client 110 and DBMS 140 may execute within a computer system, such as an example computer system discussed in FIG. 11.
  • Client 110 and DBMS 140 may communicate over network 120. Network 120 may be any network or combination of networks that can carry data communications. Such a network 120 may include, but is not limited to, a local area network, metropolitan area network, and/or wide area network that include the Internet.
  • DBMS 140 receives a query, such as query 102, from client 110. Query 102 is used to request, modify, append, or otherwise manipulate or access data in database storage 150. Query 102 is transmitted to DBMS 140 by client 110 using syntax which conforms to a query language. In a non-limiting embodiment, the query language is a Structured Query Language (“SQL”), but may be another query language. DBMS 140 is able to interpret query 102 in accordance with the query language and, based on the interpretation, generate requests to database storage 150.
  • Query 102 may be generated by a user using client 110 or by an application executing on client 110. Upon receipt, DBMS 140 begins to process query 102. Once processed, the result of the processed query is transmitted to client 110 as query result 104.
  • To process query 102, DBMS 140 includes a parser 162, a normalizer 164, a compiler 166, and an execution unit 168.
  • Parser 162 parses the received queries 102. In an embodiment, parser 162 may convert query 102 into a binary tree data structure which represents the format of query 102. In other embodiments, other types of data structures may be used.
  • When parsing is complete, parser 162 passes the parsed query to a normalizer 164. Normalizer 164 normalizes the parsed query. For example, normalizer 164 eliminates redundant SQL constructs from the parsed query. Normalizer 164 also performs error checking on the parsed query that confirms that the names of the tables in the parsed query conform to the names of tables 180. Normalizer 164 also confirms that relationships among tables 180, as described by the parsed query, are valid.
  • Once normalization is complete, normalizer 164 passes the normalized query to compiler 166. Compiler 166 compiles the normalized query into machine-readable format. The compilation process determines how query 102 is executed by DBMS 140. To ensure that query 102 is executed efficiently, compiler 166 uses a query optimizer 170 to generate an access plan for executing the query.
  • Query optimizer 170 analyzes the query and determines a query plan for executing the query. The query plan retrieves and manipulates information in the database storage 150 in accordance with the query semantics. This may include choosing the access method for each table accessed, choosing the order in which to perform a join operation on the tables, and choosing the join method to be used in each join operation. As there may be multiple strategies for executing a given query using combinations of these operations, query optimizer 170 generates and evaluates a number of strategies from which to select the best strategy to execute the query.
  • In an embodiment, query optimizer 170 generates multiple query plans. Once generated, query optimizer 170 selects a query plan from the multiple query plans to execute the query. The selected query plan may be a cost efficient plan, a query plan that uses the least amount of memory in DBMS 140, a query plan that executes the quickest, or any combination of the above, to give a few examples.
  • In one embodiment, in order for query optimizer 170 to generate and select a query plan, DBMS 140 uses cardinality estimator 172. Cardinality estimator 172 generates an estimate of the size (i.e., number of rows) of a query plan before the query plan is executed. Based on cardinality estimates, query optimizer 170 costs and selects an efficient query plan that executes query 102 from multiple query plans.
  • FIG. 2 is a block diagram 200 of a cardinality estimator, according to an embodiment. In block diagram 200, cardinality estimator receives query 102 as input and generates a cardinality estimate 204 as output.
  • Query 102, such as query Q accesses data in one or more tables. The one or more tables that are accessed using query Q may be represented as tables Ri, i=0 . . . n. In an embodiment, query Q that accesses tables Ri may be presented as:

  • Q=σ p(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • In an embodiment, each table R has a set of attributes, denoted as attr(R). The list of attributes on a table R, may be represented as R[A], where the set of attributes A attr (R), A={A1, A2, . . . , An}.
  • In an embodiment, R[A1, A2, . . . , An] is a subset of attributes of table R, such as A={A1, . . . , An}attr(R).
  • In an embodiment, query 102 also contains a predicate. A predicate is a condition that may be evaluated in query 102. In an embodiment, a predicate may be a join predicate. A join predicate is a predicate that specifies a join operation that links several tables together or a particular set of attributes when evaluated. An evaluated join predicate, typically returns a join table as a result. For example, a join predicate on tables T and S may be represented as equijoin predicates T[A]=S[A] where A={A1, . . . , . An} are attributes based on which tables T and S are joined, such that Aattr (T) and Aattr (S). A short form for the join predicate T[A] S[A] may be represented as the conjunct predicate

  • ̂kε{1, . . . ,n},(T[A k ]=S[A k])
  • In an embodiment, to determine a cardinality estimate 204 for query 102, cardinality estimator 172 determines a cardinality estimate over join equivalence classes for a join predicate. A join equivalence class for a predicate may be non-constant or constant. A non-constant join equivalence class ε(p) for a join predicate p=A, ̂i,jε{1, . . . m},i≠j (Ri[A]=Rj[A]) is defined as a set of attribute subsets participating in the join predicate ε(p)=({R1 [A], . . . Rm[A]}), where A={A1, . . . , An} is a set of common attributes for tables Ri, iε{1, . . . , m}.
  • A constant join equivalence class cε(p) for a join predicate p=(̂i,jε{1, . . . , m},i≠j (Ri[A]=Rj[A]))̂(̂iε{1, . . . m},kε{1, . . . , n}(Ri[Ak]=ck)) is defined as a tuple of a set of attribute subsets and a constant vector, such that cε(p)=({R1[A], . . . , Rm[A]}, (c1, . . . , cn)) where A={A1, . . . , An} is a set of common attributes for tables Ri, iε{1, . . . , m}. A person skilled in the art will appreciate that a tuple is an ordered set of elements, which is here an ordered set of a subset of attributes and a constant vector.
  • In an embodiment, cardinality estimator 172 may determine a cardinality estimate over a set of join equivalence classes. For example, given a set of join equivalence classes corresponding to the logical expression:

  • σp(R1, . . . ,Rm)(R 1
    Figure US20150186461A1-20150702-P00001
    R 2
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R m)
  • cardinality estimator 172 computes the cardinality estimation for a sub-expression involving relations {Ri1, . . . Rit}{R1, . . . , Rm}:

  • σp′(Ri1, . . . ,Rit)(R 1
    Figure US20150186461A1-20150702-P00001
    R 2
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R it)
  • where the predicate p′(Ri1, . . . . Rit) contains all conjuncts of the original predicate p(R1, . . . , Rm) which refer exclusively to the tables (Ri1, . . . Rit).
  • In an embodiment, to determine cardinality estimate 204 over a set of join equivalence classes, cardinality estimator 172 generates one or more undirected graphs. From the one or more undirected graphs, cardinality estimator 172 identifies minimum spanning tees that link the vertices of the undirected graph. A person skilled in the art will appreciate that a spanning tree of an undirected graph is a tree that connects all vertices in the undirected graph. Additionally, an edge in the undirected graph has assigned a weight which in the graphs corresponding to the join equivalence classes may represent the quality of the selectivity estimation of an edge.
  • To build an undirected graph G={V, E}, cardinality estimator 172 determines a set of vertices V (or nodes), where each node Ri[A] represents a relation and the relation's attributes that are participating in the join equivalence class. In an embodiment, for a constant join equivalence class, cardinality estimator 172 generates an extra node for the constant vector (c1, . . . cn) as discussed below. In an embodiment, cardinality estimator 172 also generates a set of edges E of the form (Ri, Rj, (p1, . . . , pn)), for each join predicate (p1, . . . , pn) between Ri and Rj In an embodiment, cardinality estimator 172 annotates each edge with one or more predicates that correspond to the join predicates between the two nodes of the edge.
  • To build an undirected graph for a constant join equivalence class cε(p)=({R1 [A], . . . , Rm[A]}, (c1, . . . , cn)), cardinality estimator 172 defines the graph of cε(p) as G(cε(p))={V, E}, where V is a set of vertices and E is a set of edges connecting the vertices. In an embodiment the set of vertices V are:

  • V={R 1 [A], . . . ,R m [A],(c 1 , . . . ,c n)}
  • and the set of edges E, are:

  • E={(R i [A],R j [A],(R i [A 1 ]=R j [A 1 ], . . . ,R i [An]=R j [An]))|i,jε{1, . . . m}}∪{(R i [A],(c 1 , . . . ,c n),(R i [A 1 ]=c 1 , . . . ,R i [A n ]=c n))|iε{1, . . . ,m}}
  • To build an undirected graph for a non-constant join equivalence class ε(p)=({R1 [A], . . . , Rm[A]}), cardinality estimator 172 defines the graph of ε(p) as G(ε(p))={V, E}, where V is a set of vertices and E is a set of edges connecting the vertices. In an embodiment the set of vertices V are:

  • V={R 1 [A], . . . ,R m [A]}
  • And the set of edges E are:

  • E={(R i [A],Rj[A],(R i [A 1 ]=R j [A 1 ], . . . ,R i [A n ]=R j [A n]))∥i,jε{1, . . . m}}.
  • FIG. 3 A is an example graph generated using a cardinality estimator, according to an embodiment. In FIG. 3A, DBMS 140 receives query Q, such that:

  • Q=σ p(R 1
    Figure US20150186461A1-20150702-P00001
    R 2
    Figure US20150186461A1-20150702-P00001
    R 3
    Figure US20150186461A1-20150702-P00001
    R 4
    Figure US20150186461A1-20150702-P00001
    R 5)
  • and where the predicate p includes multiple join predicates p1 to p21 at shown below:
  • p = ( R 5 [ A ] = cA p 1 R 5 [ B ] = cB p 2 R 1 [ A ] = R 5 [ A ] p 3 R 1 [ B ] = R 5 [ B ] p 4 R 4 [ A ] = cA p 5 R 4 [ B ] = cB p 6 R 1 [ A ] = cA p 7 R 1 [ B ] = cB p 8 R 1 [ A ] = R 4 [ A ] p 9 R 1 [ B ] = R 4 [ B ] p 10 R 4 [ A ] = R 5 [ A ] p 11 R 4 [ B ] = R 5 [ B ] p 12 R 1 [ A ] = R 3 [ A ] p 13 R 3 [ A ] = R 5 [ A ] p 14 R 2 [ A ] = R 5 [ A ] p 15 R 2 [ A ] = cA p 16 R 2 [ A ] = R 4 [ A ] p 17 R 1 [ A ] = R 2 [ A ] p 18 R 3 [ A ] = cA p 19 R 2 [ A ] = R 3 [ A ] p 20 R 3 [ A ] = R 4 [ A ] ) p 21
  • Based on the above query Q, cardinality estimator 172 generates one constant equivalence class to the predicate p′, where:

  • p′=((̂i,jε{1,4,5}(Ri[A,B]=Rj[A,B]))̂(̂iε{1,4,5}(Ri[A]=cÂRi[B]=cB)))
  • and the constant equivalence class cε(p′) is:

  • cε(p′)={R 1 [A,B],R 4 [A,B],R 5 [A,B],(cA,cB)}
  • Cardinality estimator 172 then represents the constant equivalence class cε(p′) as graph G1=(V1, E1) where:
  •   V1 = {R1[A,B], R4[A,B], R5[A,B], (cA; cB)}
    and
      E1 = {
        (R1[A,B], R4[A,B], (p9;p10)), (R1[A,B], R5[A,B], (p3; p4)),
        (R1[A,B], (cA, cB), (p7; p8)),
        (R4[A,B], R5[A,B], (p11; p12)), (R4[A,B], (cA, cB), (p5; p6)),
        (R5[A,B], (cA, cB), (p1; p2))
        }
  • Graph G1(V1, E1), including the vertices V1 and edges E1 described above is illustrated in diagram 300A in FIG. 3A. Graph G1 is a graph showing a constant equivalence class for the predicate p′. Additionally, FIG. 3A also includes a listing of predicates p1 to p12 that comprise predicate p′ that defines the edges in graph G1.
  • In an embodiment, cardinality estimator 172 also generates a second equivalence class that corresponds to a predicate p″, where:

  • p″=((̂i,jε{1,2,3,4,5}(Ri[A]=Rj[A]))̂(̂iε{1,2,3,4,5}(Ri[A]=cA)
  • and where the constant equivalence class cε(p″) is:

  • cε(p″)={R 1 [A],R 2 [A],R 3 [A],R 4 [A],R 5 [A],(cA)}
  • Cardinality estimator 172 then represents the constant equivalence class cε(p″) as graph G2=(V2, E2) where:
  • V2 = {R1[A], R2[A], R3[A], R4[A], R5[A], (cA)}
    and
    E2 = {
      (R1[A,B], (cA,(R1[A,B], (cA,(R1[A], R2[A],(p18)), (R1[A], R3[A],
      (p13)), (R1[A] ,R4[A], (p9)); (R1[A],
    (R1[A,B], (cA,(R1[A,B], (cA,R5[A], (p3), (R1[A], (cA), (p7)),
    (R2[A], R3[A], (p20)), (R2[A], R4[A], (p17)), (R2[A], R5[A], (p15)),
     (R2[A], (cA), (p16)),
    (R3[A], R4[A], (p21)), (R3[A], R5[A], (p14)), (R3[A], (cA), (p19)),
    (R4[A], R5[A], (p11)), (R4[A], (cA), (p5)),
    (R5[A], (cA); (p1))
    }
  • Graph G2(V2, E2), including the vertices V2 and edges E2 described above is also illustrated in FIG. 3A. Graph G2 is a diagram of a constant equivalence class for the predicate p″. Additionally, FIG. 3A also includes a listing of predicates p1 to p21 that comprise predicate p″ and defines edges in graph G2.
  • Once, cardinality estimator 172 generates graphs for equivalence classes, such as example graphs G1 and G2 for example query Q, cardinality estimator 172 determines the spanning trees in the graphs. From the spanning trees, cardinality estimator 172 determines the minimum spanning tree and calculates the cardinality estimate from the minimum spanning tree as discussed below.
  • As shown in FIG. 3A, there may exist a predicate pi which annotates more than one edge in equivalence graphs corresponding to a query Q. For example, the predicate p5 annotates the edge (R4[A, B], c(A, cB), (p5, p6)) in graph G1 and the edge (R4[A], c(A), (p5)) in graph G2. When cardinality estimator 172 determines a cardinality estimate, cardinality estimator 172 generates a cardinality estimate based on a subset S of the join predicates representing some relationships of the relations {R1, . . . Rn} that forms spanning trees. In an embodiment, cardinality estimator 172 excludes redundant (or repeating) predicates from the spanning tree.
  • For example, let a spanning tree T(G) be a tree in the fully connected, undirected graph G, that includes all vertices of graph G. In this case, a graph G having n vertices has n(n-2) such spanning trees T(G). FIG. 3B is a diagram 300B of the example spanning tress of graphs G1 and G2, according to an embodiment. The dashed lines between the nodes in graphs G1 and G2 indicate the edges that are included in the spanning trees for graphs G1 and G2. The subset of predicates that are included in the spanning tree for graphs G1 and G2 are p5, p6, p7, p8, p11, p12, p14 and p20, as shown in bold in FIG. 3B.
  • To determine a spanning tree from a join equivalence graphs, such as G1 and G2, cardinality estimator 172 sets T(G(ε(p))) as a spanning tree of G(ε(p). The predicates in the spanning tree T(G(ε(p))) can be defined as a set of predicates annotating the edges in the spanning tree T(G(ε(p))). For example, predicates can be defined as:

  • Preds(T(G(ε(p))){p i|(v 1 ,v 2,(p 1 , . . . ,p n)) an edge in T(G(ε(p)))}
  • In an embodiment, cardinality estimator 172 applies the following theorem to calculate cardinality estimation for query Q. If T(G(ε(p))) is a spanning tree of G(ε(p)), then the conjunct generated by the spanning tree infers the original predicate p, such as:

  • cεPreds(T(G(ε(p)))) c)
    Figure US20150186461A1-20150702-P00002
    p
  • Also, any predicate qε{p,|v1, v2, (p1, . . . , pn) an edge in G(ε(p))} is inferred by the conjunct generated by the spanning tree:

  • cεPreds(T(G(ε(p)))) c)
    Figure US20150186461A1-20150702-P00002
    q
  • Based on the theorem above, cardinality estimator 172 determines that a query Q, such as:

  • Q=σ s(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • where a predicate s has e equivalent classes {ε(s1), . . . , ε(se)} is equivalent to a query Q′ for which the predicate s was replaced with the predicates of a spanning trees induced by {T(G(ε(s1))), . . . , T(Gε(se)))}.
  • Applying the theorem above, let {T(G(ε(s1))), . . . , T(Gε(se)))} be spanning trees of a forest for graphs {G(ε(s1))), . . . , Gε(se))}. In an embodiment, predicates of the forest for the graphs of join equivalence classes are predicates that annotate the spanning trees, with duplicates removed, such as:

  • Preds({T(G(ε(s 1))), . . . ,T(G(ε(s e)))})=∪iε{1, . . . e} d(Preds(T(G(ε(s i)))))
  • where cardinality estimator 172 uses operation ∪d to remove duplicate predicates.
  • In an embodiment, cardinality estimator 172 rewrites query Q using the predicates in the forest of join equivalence classes Preds({T(G(ε(s1))), . . . , T(Gε(se)))}), such as:
  • Q = σ s ( R 1 R n } = ? ( ( R 1 R n ) ? indicates text missing or illegible when filed
  • In an embodiment, based on the theorem above, cardinality estimator 172 determines a cardinality estimate for query Q0 where:

  • Q ox(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • by first rewriting the predicate x such that:
  • Q 0 = σ x ( ( σ s ( R 1 R n ) Q
  • where cardinality estimator 172 can split predicate s into e join equivalence classes.
  • Cardinality estimator 172 then determines the cardinality of query Q0. To determine cardinality of query Q0, cardinality estimator 172 first determines the selectivity estimate of query Q using the spanning trees for the join predicate s referred to as selectivity(s). Cardinality estimator 172 then uses selectivity(s) to selectivity(x′) and the cardinalities of the relations Ri to compute card (Q0), as illustrated below:

  • card(Q 0)=f(selectivity(s),selectivity(x′),cardinality(R 1), . . . ,cardinality(R n))
  • In an embodiment, selectivity(s)=STSelectivity(Q) and can be computed using an algorithm in FIG. 8, according to an embodiment.
  • When cardinality estimator 172 applies the above algorithm to graphs G1 and G2 shown in FIGS. 3A and 3B, cardinality estimator 172 generates cardinality estimate 204 for query Q associated with graphs G1 and G2 as:

  • Cardinality Estimate=sel(pp6)×sel(pp8)×sel(p11̂p12)×sel(p20)×sel(p14)
  • FIG. 4 is a flowchart of a method 400 of a cardinality estimator generating a cardinality estimate for a query Q, according to an embodiment. Method 400 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof. In one implementation, the method 400 is performed by cardinality estimator 172.
  • At operation 402, the cardinality estimator receives a query Q. For example, cardinality estimator 172 receives query Q, where query Q=σs(R1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    Rn) and the predicate s can be split into e equivalent classes.
  • At operation 404, the cardinality estimator determines the minimum set of equivalence classes. For example, cardinality estimator 172 determines e equivalence classes {ε(s1), . . . , ε(se)}. As discussed above, equivalence classes are determined based on sets of common attributes that are included in tables joined in query Q.
  • At operation 406, the cardinality estimator generates join equivalence graphs based on the minimum set of equivalence classes. For example, cardinality estimator 172 generates a forest of join equivalence graphs, each graph corresponding to a join equivalence class ε(si), where i=1 to e.
  • At operation 408, the cardinality estimator determines the minimum spanning tress for cardinality estimation. For example, in the join equivalence graphs, the edges represent predicate attributes, where each edge has a weight computed by DBMS 140. In an embodiment, the weight for an edge representing a join predicate between two tables is a property vector of the edge, including, for example, the confidence level of the selectivity estimation for an edge; the quality of the selectivity estimation expressed as the properties of the join predicates. Such properties may include a type of relationship between the two tables. Example properties include, if the join predicate is of the form “primary key primary key”, “primary key=foreign key”, “foreign key=foreign key”, “unique constraint attributes=attributes”, or “index attributes attributes”. When comparing the weights of two edges (during minimum spanning trees building) a function which compares the two property vectors of the edges can be used, according to an embodiment.
  • In an embodiment, cardinality estimator 172 determines the minimum spanning trees as the trees that include all vertices in the join equivalence graphs, such that the nodes of the graphs are connected using the edges associated with the lowest weights. In an embodiment, cardinality estimator 172 defines the spanning trees as {T(G(ε(s1))), . . . , T(Gε(se)))}.
  • At operation 410, the cardinality estimator uses the predicate associated with the spanning trees to determine the cardinality estimate for query Q. For example, cardinality estimator 172 determines the cardinality estimate for query Q based on the predicates associated with edges of the minimum spanning trees determined in operation 408 and multiplied by the selectivity of these predicate, such as:

  • selectivity(s)=ΠeεEdges({T(G(ε(s 1 ))), . . . ,T(G(ε(s e )))})(selectivity(e))
  • Once cardinality estimator 172 determines a cardinality estimate 204 for query Q, query optimizer 170 uses cardinality estimate 204 to determine a query plan from query Q.
  • In an embodiment, cardinality estimator 172 also determines a cardinality estimate for a sub-expression of query Q, such as sub-expression Q′. FIG. 5 is a flowchart of a method 500 of a cardinality estimator generating a cardinality estimate for a sub-section Q′ of query Q, according to an embodiment. Method 500 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof. In one implementation, the method 500 is performed by cardinality estimator 172.
  • At operation 502, the cardinality estimator receives a query Q. For example, cardinality estimator 172 receives query Q, where query Q=σs(R1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    Rn) and the predicate s can be split into e equivalent classes.
  • At operation 504, the cardinality estimator receives a subset of tables. For example, cardinality estimator 172 receives a subset of tables that query Q manipulates, such as {Ri1, . . . , Rit}{Ri, . . . , Rn}.
  • At operation 506, the cardinality estimator determines the equivalence classes.
  • For example, cardinality estimator 172 determines e equivalence classes {ε(s1), ε(se)}. As discussed above, equivalence classes are determined based on sets of common attributes that are included in tables joined in query Q.
  • At operation 508, the cardinality estimator generates join equivalence graphs. For example, cardinality estimator 172 generates a forest of join equivalence graph for each join equivalence class ε(si), where i=1 to e.
  • At operation 510, the cardinality estimator determines the sub-graphs from the join equivalence graphs that include the subset of tables. For example, for each graph G(ε(si)), cardinality estimator 172 determines the vertex induced sub-graph G′(ε(si)), where the vertex induced sub-graph G′(ε(si)) includes only vertices {Ri1, . . . , Rit}. In an embodiment, cardinality estimator 172 represents the join equivalence sub-graphs as {G′(ε(s1)), . . . , G′ε(se))}.
  • At operation 512, the cardinality estimator determines the minimum spanning tress for cardinality estimation. For example, in the join equivalence sub-graphs {G′(ε(s1)), . . . , G′ε(se))}, the edges represent predicate attributes, where each predicate attribute has a weight specified by DBMS 140 as described above. In an embodiment, cardinality estimator 172 determines the minimum spanning trees as the trees that include all vertices in the join equivalence sub-graphs G′(ε(si)), such that the nodes of the graphs are connected using the edges associated with the lowest weights. In an embodiment, cardinality estimator 172 defines the spanning trees as {T(G(ε(s1))), . . . , T(Gε(se)))}.
  • At operation 514, the cardinality estimator uses the predicate associated with the spanning trees to determine the cardinality estimate for sub-query Q′. For example, cardinality estimator 172 determines the selectivity estimation for sub-query Q′ based on the edges associated with the best spanning trees determined in operation 512 and multiplied by the selectivity of the predicate, such as:

  • selectivity(s′)=ΠeεEdges({T(G′(ε(s 1 ))), . . . ,T(G′(ε(s e )))})(selectivity(e))
  • FIG. 6 is a diagram of an algorithm 600 for computing the cardinality of a query, according to an embodiment. Algorithm 600 may be implemented by cardinality estimator 172 to generate a cardinality estimate 204. Once cardinality estimator 172 receives query Qox(R1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    Rn), cardinality estimator 172 uses algorithm 600 to rewrite the predicate x to infer new predicates that complete join equivalence classes, such that Qox′s(R1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    Rn)), where Q=σs(R1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    Rn), and predicate s of query Q is split into e join equivalence classes. Algorithm 600 then uses algorithms 700-1000 discussed in FIGS. 7-10 to generate cardinality estimate 204 for query Q. Selectivity estimation for Q is computed by algorithm 800 (STSelectivity(Q) call). Algorithm 600 uses this new selectivity estimation and other estimates to compute the final cardinality estimation for the whole query Q0 (at line 3). The cardinality estimate is card(Qo) is computed as:

  • card(Q o)=f(selectivity(s),selectivity(x′),cardinality(R1), . . . ,cardinality(Rn)).
  • FIG. 7 is diagram of an algorithm 700 for determining join equivalence classes for a query, according to an embodiment. as required by cardinality estimator 172 during operations 404 and 506
  • As discussed above, cardinality estimator 172 determines join equivalence classes for a predicate. The input to algorithm 700 is a query Q, such as:

  • Q=σ s(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • which can be decomposed into one or more equivalence classes. In an embodiment, the output of algorithm 700 is a minimum set of equivalence classes, such as such as {ε(s1), . . . , ε(se)}. The details of algorithm 700 are included in FIG. 7.
  • FIG. 8 is a diagram of an algorithm 800 for computing a selectivity estimate for using spanning trees for a query used by cardinality estimator 172 during operation 204, according to an embodiment. The input to algorithm 800 is a query Q, such as:

  • Q=σ s(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • The output of algorithm 800 is selectivity of a predicate s. In an embodiment, algorithm 800 uses algorithm 700 to generate a minimum set of the equivalence classes, such as set {ε(s1), . . . , ε(se)}, at step 1. From the minimum set of the equivalence classes, cardinality estimator 172 generates graphs {G(ε(s1))), . . . , Gε(se))}, one graph for each equivalence class, at step 2. Cardinality estimator 172 then uses algorithm 1000 to generate minimum spanning trees {T(G(ε(s1))), . . . , T(Gε(se)))} from the graphs {G(ε(s1))), G(ε(se))}, at step 3. At step 4, cardinality estimator uses the minimum spanning trees {T(G(ε(s1))), . . . , T(Gε(se)))} to determine selectivity for predicate s.
  • FIG. 9 is a diagram of an algorithm 900 for computing the selectivity estimate for using spanning trees for a sub-query, according to an embodiment. The first input to algorithm 900 is a query Q, such as:

  • Q=σ s(R 1
    Figure US20150186461A1-20150702-P00001
    . . .
    Figure US20150186461A1-20150702-P00001
    R n)
  • The second input to algorithm 900 is a subset of tables of tables in a query Q, such as:

  • {R i1 , . . . ,R it }{R 1 , . . . ,R n}.
  • The output of algorithm 900 is an estimated selectivity of a sub-predicate s where s′ includes predicates on the subset {Ri1, . . . Rit}. In an embodiment, algorithm 900 uses algorithm 700 to generate a minimum set of the equivalence classes, such as set {ε(s1), . . . , ε(se)}, at step 1. From the minimum set of the equivalence classes, cardinality estimator 172 generates graphs {G(ε(s1))), . . . , Gε(se))}, one graph for each equivalence class, at step 2. Cardinality estimator 172 then computes a vertex induced sub-graph G′(ε(si)) for vertices of a subset of tables {Ri1, . . . . Rit}, at step 3. At step 4, cardinality estimator 172 determines, using the algorithm 1000, the minimum spanning trees {T(G′(ε(s1))), . . . , T(G′(ε(se)))} from the new forest of sub-graph {G′(ε(s1)), . . . , G′ε(se))}. At step 5, cardinality estimator 172 uses the minimum spanning trees {T(G′(ε(s1))), . . . , T(G′(ε(se)))} to determine selectivity for predicate s′ for a sub-predicate s′, such as:

  • selectivity(s′)=ΠeεEdges({T(G′(ε(s 1 ))), . . . ,T(G′(ε(s e )))})(selectivity(e))
  • FIG. 10 is a diagram of an algorithm 1000 for computing the minimum spanning trees, as required by cardinality estimator 172 during operations 408 and 512, for a graph forest of join equivalence classes, according to an embodiment. As discussed above, the edges in the a graph, such as graph {G(ε(s1))), . . . , G(ε(se))}, have different weights. Cardinality estimator 172 may assign the weights to the edges based on a pre-configured criteria in DBMS 140. In an embodiment, cardinality estimator 172 may evaluate the weights to the edges based on a betterQuality(ei, ej) function which compares the two edges ei, ej, and determine the quality of the edge for inclusion into a minimum spanning tree. The betterQuality(ei, ej) function is included as part of algorithm 1000. Based on the quality of edges as determined by the weights, cardinality estimator 172 generates the minimum spanning trees from the forest of graphs {G(ε(s1))), . . . , G(ε(se))}.
  • In an embodiment, the input to algorithm 1000 is the forest of graphs generated by the minimum set of join equivalence classes for query Q, such as, minε(Q)={G(ε(s1)), . . . , G(ε(se)}. The output of algorithm 1000 is a set of best spanning trees {T(G(ε(s1))), . . . , T(G(ε(se)))}.
  • In an embodiment, cardinality estimator 172 identifies the edges in the forest of graphs {G(ε(s1))), . . . , G(ε(se))} in a set E, where E may be defines as:

  • E=∪ iε{1.ldots,e}Edges(G(ε(s i)))
  • Cardinality estimator 172 then uses algorithm 1000 to traverse though the edges using better Quality(ei, ej) function, and compare the edges in set E. When cardinality estimator 172 identifies an edge using better Quality(ei, ej) function, such as the edge having a lowest weight, algorithm 1000 adds the edge to a set of edges that together form the minimum spanning trees. In an embodiment, algorithm 1000 includes an add(d) function, where d is the edge identified using the betterQuality(ei, ej) function that cardinality estimator 172 attempts to add to the minimum spanning trees. The add(d) function is discussed in detail in FIG. 10.
  • Various embodiments can be implemented, for example, using one or more well-known computer systems, such as computer system 1100 shown in FIG. 11. Computer system 1100 can be any well-known computer capable of performing the functions described herein, such as computers available from International Business Machines, Apple, Sun, HP, Dell, Sony, Toshiba, etc.
  • Computer system 1100 includes one or more processors (also called central processing units, or (CPUs), such as a processor 1104. Processor 1104 is connected to a communication infrastructure or bus 1106.
  • One or more processors 1104 may each be a graphics processing unit (GPU). In an embodiment, a GPU is a processor that is a specialized electronic circuit designed to rapidly process mathematically intensive applications on electronic devices. The GPU may have a highly parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images and videos.
  • Computer system 1100 also includes user input/output device(s) 1103, such as monitors, keyboards, pointing devices, etc., which communicate with communication infrastructure 1106 through user input/output interface(s) 1102.
  • Computer system 1100 also includes a main or primary memory 1108, such as random access memory (RAM). Main memory 1308 may include one or more levels of cache. Main memory 1108 has stored therein control logic (i.e., computer software) and/or data.
  • Computer system 1100 may also include one or more secondary storage devices or memory 1110. Secondary memory 1110 may include, for example, a hard disk drive 1112 and/or a removable storage device or drive 1114. Removable storage drive 1114 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 1114 may interact with a removable storage unit 1118. Removable storage unit 1118 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 1118 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device. Removable storage drive 1114 reads from and/or writes to removable storage unit 1118 in a well-known manner.
  • According to an exemplary embodiment, secondary memory 1110 may include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 1100. Such means, instrumentalities or other approaches may include, for example, a removable storage unit 1122 and an interface 1120. Examples of the removable storage unit 1122 and the interface 1120 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 1100 may farther include a communication or network interface 1124. Communication interface 1124 enables computer system 1100 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 1128). For example, communication interface 1124 may allow computer system 1100 to communicate with remote devices 1128 over communications path 1126, which may be wired and/or wireless, and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 1100 via communication path 1126.
  • In an embodiment, a tangible apparatus or article of manufacture comprising a tangible computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 1100, main memory 1108, secondary memory 1110, and removable storage units 1118 and 1122, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 1100), causes such data processing devices to operate as described herein.
  • Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use the embodiments using data processing devices, computer systems and/or computer architectures other than that shown in FIG. 11. In particular, embodiments may operate with software, hardware, and/or operating system implementations other than those described herein.
  • It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections (if any), is intended to be used to interpret the claims. The Summary and Abstract sections (if any) may set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit the embodiments or the appended claims in any way.
  • While the embodiments has been described herein with reference to exemplary embodiments for exemplary fields and applications, it should be understood that the embodiments are not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of the embodiments. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.
  • Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments may perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.
  • References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein.
  • The breadth and scope of the embodiments should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (20)

What is claimed is:
1. A computer-implemented method for generating a cardinality estimate, comprising:
identifying a predicate in a query, wherein the predicate is split into a plurality of equivalence classes;
generating a plurality of equivalence graphs from the plurality of equivalence classes;
identifying spanning trees from the plurality of equivalence graphs; and
generating a cardinality estimate based on the spanning trees.
2. The computer-implemented method of claim 1, wherein the query manipulates data in a plurality of tables, wherein a table includes a plurality of attributes.
3. The computer-implemented method of claim 1, wherein an equivalence class in the plurality of the equivalence classes shares a set of attributes common to the plurality of tables.
4. The computer-implemented method of claim 1, wherein an equivalence class in the plurality of the equivalence classes comprises a constant vector.
5. The computer-implemented method of claim 1, wherein generating an equivalence graph in the plurality of equivalence graphs, further comprises:
generating a first node and a second node, wherein the first node corresponds to a first table and a set of attributes and the second node corresponds to a second table and a set of attributes, wherein the first table, the second table and the attributes are included in an equivalence class from the plurality of the equivalence classes;
generating an edge between the first node and the second node; and
annotating the edge with a second predicate referencing the attributes between the first node and the second node, wherein the second predicate is a component of the query predicate.
6. The computer-implemented method of claim 5, wherein generating the equivalence graph further comprises:
generating a third node including a constant vector;
generating a second edge between a first node and a third node; and
annotating the second edge with a third predicate referencing the attributes between the second table and the constant vector, wherein the third predicate is a component of the query predicate.
7. The computer-implemented method of claim 1, wherein generating the cardinality estimate from the spanning trees, further comprises:
identifying a set of predicates in the spanning trees; and
multiplying for each predicate in the set of predicates by a selectivity associated with each edge corresponding to the predicate.
8. A system for generating a cardinality estimate, comprising:
a memory; and
a processor coupled to the memory and configured to:
identify a predicate in a query, wherein the predicate is split into a plurality of equivalence classes;
generate a plurality of equivalence graphs from the plurality of equivalence classes;
identify spanning trees from the plurality of equivalence graphs; and
generate a cardinality estimate based on the spanning trees.
9. The system of claim 8, wherein the query manipulates data in a plurality of tables, wherein a table includes a plurality of attributes.
10. The system of claim 8, wherein an equivalence class in the plurality of equivalence classes shares a set of attributes common to the plurality of tables.
11. The system of claim 8, wherein an equivalence class in the plurality of equivalence classes comprises a constant vector.
12. The system of claim 8, wherein to generate an equivalence graph in the plurality of the equivalence graphs, the processor is further configured to:
generate a first node and a second node, wherein the first node corresponds to a first table and a set of attributes and the second node corresponds to a second table and a set of attributes, wherein the first table, the second table and the attributes are included in an equivalence class in the plurality of equivalence classes;
generate an edge between the first node and the second node; and
annotate the edge with a second predicate referencing the attribute between the first node and the second node, wherein the second predicate is a component of the query predicate.
13. The system of claim 12, to generate the equivalence graph, the processor is further configured to:
generate a third node including a constant vector;
generate a second edge between a first node and a third node; and
annotate the second edge with a third predicate referencing the attributes between the second table and the constant vector, wherein the third predicate is a component of the query predicate.
14. The system of claim 8, wherein to generate the cardinality estimator from the spanning trees, the processor is further configured to:
identify a set of predicates in the spanning trees; and
multiply each predicate in the set of predicates by a selectivity associated with each edge corresponding to the predicate.
15. A tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations that generate a cardinality estimate, the operations comprising:
identifying a predicate in a query, wherein the predicate is split into a plurality of equivalence classes;
generating a plurality of equivalence graphs from the plurality of equivalence classes;
identifying spanning trees from the plurality of equivalence graphs; and
generating a cardinality estimate based on the spanning trees.
16. The tangible computer-readable device of claim 15, wherein the query manipulates data in a plurality of tables, wherein a table includes a plurality of attributes.
17. The tangible computer-readable device of claim 15, wherein an equivalence class in the plurality of equivalence classes shares a set of attributes common to the plurality of tables.
18. The tangible computer-readable device of claim 15, wherein an equivalence class in the plurality of equivalence classes comprises a constant vector.
19. The tangible computer-readable device of claim 15, wherein generating an equivalence graph in the plurality of equivalence graphs, further comprises operations comprising:
generating a first node and a second node, wherein the first node corresponds to a first table and a set of attributes and the second node corresponds to a second table and a set of attributes, wherein the first table, the second table and the attributes are included in an equivalence class from the plurality of the equivalence classes;
generating an edge between the first node and the second node; and
annotating the edge with a second predicate referencing the attributes between the first node and the second node, wherein the second predicate is a component of the query predicate.
20. The tangible computer-readable device of claim 15, wherein generating the cardinality estimate from the sparring trees, further comprises operations, comprising:
identifying a set of predicates in the spanning trees; and
multiplying for each predicate in the set of predicates by a selectivity associated with each edge corresponding to the predicate.
US14/145,777 2013-12-31 2013-12-31 Cardinality estimation using spanning trees Active 2035-10-09 US9922088B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/145,777 US9922088B2 (en) 2013-12-31 2013-12-31 Cardinality estimation using spanning trees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/145,777 US9922088B2 (en) 2013-12-31 2013-12-31 Cardinality estimation using spanning trees

Publications (2)

Publication Number Publication Date
US20150186461A1 true US20150186461A1 (en) 2015-07-02
US9922088B2 US9922088B2 (en) 2018-03-20

Family

ID=53482012

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/145,777 Active 2035-10-09 US9922088B2 (en) 2013-12-31 2013-12-31 Cardinality estimation using spanning trees

Country Status (1)

Country Link
US (1) US9922088B2 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092546A1 (en) * 2014-09-26 2016-03-31 Oracle International Corporation System and method for query processing with table-level predicate pushdown in a massively parallel or distributed database environment
US20160292300A1 (en) * 2015-03-30 2016-10-06 Alcatel Lucent Usa Inc. System and method for fast network queries
US20180210922A1 (en) * 2017-01-26 2018-07-26 Sap Se Application programming interface for database access
US10089357B2 (en) 2014-09-26 2018-10-02 Oracle International Corporation System and method for generating partition-based splits in a massively parallel or distributed database environment
US10089377B2 (en) 2014-09-26 2018-10-02 Oracle International Corporation System and method for data transfer from JDBC to a data warehouse layer in a massively parallel or distributed database environment
US10180973B2 (en) 2014-09-26 2019-01-15 Oracle International Corporation System and method for efficient connection management in a massively parallel or distributed database environment
US10372707B2 (en) 2016-11-29 2019-08-06 Sap Se Query execution pipelining with pump operators
US10380114B2 (en) 2014-09-26 2019-08-13 Oracle International Corporation System and method for generating rowid range-based splits in a massively parallel or distributed database environment
US10387421B2 (en) 2014-09-26 2019-08-20 Oracle International Corporation System and method for generating size-based splits in a massively parallel or distributed database environment
US10394818B2 (en) 2014-09-26 2019-08-27 Oracle International Corporation System and method for dynamic database split generation in a massively parallel or distributed database environment
US10423619B2 (en) 2016-11-29 2019-09-24 Sap Se Query plan generation for precompiled and code generating query operations
US10521426B2 (en) 2016-11-29 2019-12-31 Sap Se Query plan generation for split table query operations
US10528596B2 (en) 2014-09-26 2020-01-07 Oracle International Corporation System and method for consistent reads between tasks in a massively parallel or distributed database environment
US10558661B2 (en) 2016-11-29 2020-02-11 Sap Se Query plan generation based on table adapter
US10671625B2 (en) 2017-01-26 2020-06-02 Sap Se Processing a query primitive call on a value identifier set
US10733184B2 (en) 2016-11-29 2020-08-04 Sap Se Query planning and execution with source and sink operators
US10860579B2 (en) 2017-01-30 2020-12-08 Sap Se Query planning and execution with reusable memory stack
US10885032B2 (en) 2016-11-29 2021-01-05 Sap Se Query execution pipelining with shared states for query operators
US11016973B2 (en) 2016-11-29 2021-05-25 Sap Se Query plan execution engine
US20220382733A1 (en) * 2017-02-27 2022-12-01 Qlik Tech International AB Methods And Systems For Extracting And Visualizing Patterns In Large-Scale Data Sets

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10942891B2 (en) 2019-03-27 2021-03-09 Ownbackup Ltd. Reducing number of queries on a relational database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100223276A1 (en) * 2007-03-27 2010-09-02 Faleh Jassem Al-Shameri Automated Generation of Metadata for Mining Image and Text Data
US20110029508A1 (en) * 2009-07-31 2011-02-03 Al-Omari Awny K Selectivity-based optimized-query-plan caching
US8086598B1 (en) * 2006-08-02 2011-12-27 Hewlett-Packard Development Company, L.P. Query optimizer with schema conversion
US20120278307A1 (en) * 2009-04-09 2012-11-01 Paraccel, Inc. System and method for processing database queries

Family Cites Families (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5412806A (en) 1992-08-20 1995-05-02 Hewlett-Packard Company Calibration of logical cost formulae for queries in a heterogeneous DBMS using synthetic database
US5918225A (en) 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
US5515502A (en) 1993-09-30 1996-05-07 Sybase, Inc. Data backup system with methods for stripe affinity backup to multiple archive devices
CA2146171C (en) 1995-04-03 2000-01-11 Bernhard Schiefer Method for estimating cardinalities for query processing in a relational database management system
US5956706A (en) 1997-05-09 1999-09-21 International Business Machines Corporation Method and system for limiting the cardinality of an SQL query result
US5951695A (en) 1997-07-25 1999-09-14 Hewlett-Packard Company Fast database failover
US6189142B1 (en) 1998-09-16 2001-02-13 International Business Machines Corporation Visual program runtime performance analysis
US6105020A (en) 1999-10-11 2000-08-15 International Business Machines Corporation System and method for identifying and constructing star joins for execution by bitmap ANDing
US6516310B2 (en) 1999-12-07 2003-02-04 Sybase, Inc. System and methodology for join enumeration in a memory-constrained environment
WO2001045018A1 (en) 1999-12-17 2001-06-21 Dorado Network Systems Corporation Purpose-based adaptive rendering
CA2427354A1 (en) 2000-10-31 2002-08-01 Michael Philip Kaufman System and method for generating automatic user interface for arbitrarily complex or large databases
US6694507B2 (en) 2000-12-15 2004-02-17 International Business Machines Corporation Method and apparatus for analyzing performance of object oriented programming code
US6928580B2 (en) 2001-07-09 2005-08-09 Hewlett-Packard Development Company, L.P. Distributed data center system protocol for continuity of service in the event of disaster failures
US7305421B2 (en) 2001-07-16 2007-12-04 Sap Ag Parallelized redo-only logging and recovery for highly available main memory database systems
CA2359296A1 (en) 2001-10-18 2003-04-18 Ibm Canada Limited-Ibm Canada Limitee Method of cardinality estimation using statistical soft constraints
US6801905B2 (en) 2002-03-06 2004-10-05 Sybase, Inc. Database system providing methodology for property enforcement
US6947927B2 (en) 2002-07-09 2005-09-20 Microsoft Corporation Method and apparatus for exploiting statistics on query expressions for optimization
US7024528B2 (en) 2002-08-21 2006-04-04 Emc Corporation Storage automated replication processing
US8121978B2 (en) 2002-11-15 2012-02-21 Sybase, Inc. Database system providing improved methods for data replication
US9213365B2 (en) 2010-10-01 2015-12-15 Z124 Method and system for viewing stacked screen displays using gestures
US7146363B2 (en) 2003-05-20 2006-12-05 Microsoft Corporation System and method for cardinality estimation based on query execution feedback
US7467168B2 (en) 2003-06-18 2008-12-16 International Business Machines Corporation Method for mirroring data at storage locations
US7299226B2 (en) 2003-06-19 2007-11-20 Microsoft Corporation Cardinality estimation of joins
US7546598B2 (en) 2003-09-03 2009-06-09 Sap Aktiengesellschaft Measuring software system performance using benchmarks
US7203685B2 (en) 2003-12-11 2007-04-10 International Business Machines Corporation Apparatus and method for estimating cardinality when data skew is present
US7447710B2 (en) 2003-12-11 2008-11-04 Sybase, Inc. Database system providing self-tuned parallel database recovery
US7337164B2 (en) 2004-03-31 2008-02-26 Sap Ag Fast search with very large result set
US7644397B2 (en) 2004-06-19 2010-01-05 Apple Inc. Software performance analysis using data mining
US8281014B2 (en) 2004-12-28 2012-10-02 Sap Ag Session lifecycle management within a multi-tiered enterprise network
US7689989B2 (en) 2004-12-28 2010-03-30 Sap Ag Thread monitoring using shared memory
US8126870B2 (en) 2005-03-28 2012-02-28 Sybase, Inc. System and methodology for parallel query optimization using semantic-based partitioning
US7337167B2 (en) 2005-04-14 2008-02-26 International Business Machines Corporation Estimating a number of rows returned by a recursive query
US20060288332A1 (en) 2005-06-21 2006-12-21 Microsoft Corporation Workflow debugger
US8196104B2 (en) 2005-08-31 2012-06-05 Sap Ag Systems and methods for testing application accessibility
US8255369B2 (en) 2005-11-30 2012-08-28 Oracle International Corporation Automatic failover configuration with lightweight observer
US7668879B2 (en) 2005-11-30 2010-02-23 Oracle International Corporation Database system configured for automatic failover with no data loss
US7882079B2 (en) 2005-11-30 2011-02-01 Oracle International Corporation Database system configured for automatic failover with user-limited data loss
US7549079B2 (en) 2005-11-30 2009-06-16 Oracle International Corporation System and method of configuring a database system with replicated data and automatic failover and recovery
US8448137B2 (en) 2005-12-30 2013-05-21 Sap Ag Software model integration scenarios
US7904889B2 (en) 2006-06-30 2011-03-08 Sap Ag Business process model debugger
US7933869B2 (en) 2006-12-29 2011-04-26 Sap Ag Method and system for cloning a tenant database in a multi-tenant system
AU2008286676A1 (en) 2007-08-16 2009-02-19 Indaran Proprietary Limited Method and apparatus for presenting content
US8336023B2 (en) 2007-10-22 2012-12-18 Oracle International Corporation Extensible code visualization
US8548980B2 (en) 2007-12-28 2013-10-01 Sybase Inc. Accelerating queries based on exact knowledge of specific rows satisfying local conditions
US8234633B2 (en) 2008-01-09 2012-07-31 Sap Ag Incident simulation support environment and business objects associated with the incident
US7917502B2 (en) 2008-02-27 2011-03-29 International Business Machines Corporation Optimized collection of just-in-time statistics for database query optimization
US8200634B2 (en) 2008-10-08 2012-06-12 Sap Ag Zero downtime maintenance using a mirror approach
US8645922B2 (en) 2008-11-25 2014-02-04 Sap Ag System and method of implementing a concurrency profiler
US8745622B2 (en) 2009-04-22 2014-06-03 International Business Machines Corporation Standalone software performance optimizer system for hybrid systems
US8627317B2 (en) 2010-03-25 2014-01-07 International Business Machines Corporation Automatic identification of bottlenecks using rule-based expert knowledge
US8522217B2 (en) 2010-04-20 2013-08-27 Microsoft Corporation Visualization of runtime analysis across dynamic boundaries
US8386431B2 (en) 2010-06-14 2013-02-26 Sap Ag Method and system for determining database object associated with tenant-independent or tenant-specific data, configured to store data partition, current version of the respective convertor
US8356010B2 (en) 2010-08-11 2013-01-15 Sap Ag Online data migration
US8442952B1 (en) 2011-03-30 2013-05-14 Emc Corporation Recovering in deduplication systems
US8417669B2 (en) 2011-06-01 2013-04-09 Sybase Inc. Auto-correction in database replication
US9058371B2 (en) 2011-11-07 2015-06-16 Sap Se Distributed database log recovery
US9213728B2 (en) 2011-12-14 2015-12-15 Sap Se Change data capturing during an upgrade
US9239868B2 (en) 2012-06-19 2016-01-19 Microsoft Technology Licensing, Llc Virtual session management and reestablishment
US8874508B1 (en) 2012-10-02 2014-10-28 Symantec Corporation Systems and methods for enabling database disaster recovery using replicated volumes
US10824622B2 (en) 2013-11-25 2020-11-03 Sap Se Data statistics in data management systems
US20150199262A1 (en) 2014-01-16 2015-07-16 Vivek Bhavsar Runtime code visualization
US9400720B2 (en) 2014-04-18 2016-07-26 Sybase, Inc. Flexible high availability disaster recovery with a set of database servers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8086598B1 (en) * 2006-08-02 2011-12-27 Hewlett-Packard Development Company, L.P. Query optimizer with schema conversion
US20100223276A1 (en) * 2007-03-27 2010-09-02 Faleh Jassem Al-Shameri Automated Generation of Metadata for Mining Image and Text Data
US20120278307A1 (en) * 2009-04-09 2012-11-01 Paraccel, Inc. System and method for processing database queries
US20110029508A1 (en) * 2009-07-31 2011-02-03 Al-Omari Awny K Selectivity-based optimized-query-plan caching

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10394818B2 (en) 2014-09-26 2019-08-27 Oracle International Corporation System and method for dynamic database split generation in a massively parallel or distributed database environment
US10380114B2 (en) 2014-09-26 2019-08-13 Oracle International Corporation System and method for generating rowid range-based splits in a massively parallel or distributed database environment
US20160092546A1 (en) * 2014-09-26 2016-03-31 Oracle International Corporation System and method for query processing with table-level predicate pushdown in a massively parallel or distributed database environment
US10078684B2 (en) * 2014-09-26 2018-09-18 Oracle International Corporation System and method for query processing with table-level predicate pushdown in a massively parallel or distributed database environment
US10089357B2 (en) 2014-09-26 2018-10-02 Oracle International Corporation System and method for generating partition-based splits in a massively parallel or distributed database environment
US10089377B2 (en) 2014-09-26 2018-10-02 Oracle International Corporation System and method for data transfer from JDBC to a data warehouse layer in a massively parallel or distributed database environment
US10180973B2 (en) 2014-09-26 2019-01-15 Oracle International Corporation System and method for efficient connection management in a massively parallel or distributed database environment
US10528596B2 (en) 2014-09-26 2020-01-07 Oracle International Corporation System and method for consistent reads between tasks in a massively parallel or distributed database environment
US10387421B2 (en) 2014-09-26 2019-08-20 Oracle International Corporation System and method for generating size-based splits in a massively parallel or distributed database environment
US20160292300A1 (en) * 2015-03-30 2016-10-06 Alcatel Lucent Usa Inc. System and method for fast network queries
US10733184B2 (en) 2016-11-29 2020-08-04 Sap Se Query planning and execution with source and sink operators
US10521426B2 (en) 2016-11-29 2019-12-31 Sap Se Query plan generation for split table query operations
US10558661B2 (en) 2016-11-29 2020-02-11 Sap Se Query plan generation based on table adapter
US10372707B2 (en) 2016-11-29 2019-08-06 Sap Se Query execution pipelining with pump operators
US10885032B2 (en) 2016-11-29 2021-01-05 Sap Se Query execution pipelining with shared states for query operators
US11016973B2 (en) 2016-11-29 2021-05-25 Sap Se Query plan execution engine
US10423619B2 (en) 2016-11-29 2019-09-24 Sap Se Query plan generation for precompiled and code generating query operations
US20180210922A1 (en) * 2017-01-26 2018-07-26 Sap Se Application programming interface for database access
US10671625B2 (en) 2017-01-26 2020-06-02 Sap Se Processing a query primitive call on a value identifier set
US10776353B2 (en) * 2017-01-26 2020-09-15 Sap Se Application programming interface for database access
US10860579B2 (en) 2017-01-30 2020-12-08 Sap Se Query planning and execution with reusable memory stack
US20220382733A1 (en) * 2017-02-27 2022-12-01 Qlik Tech International AB Methods And Systems For Extracting And Visualizing Patterns In Large-Scale Data Sets

Also Published As

Publication number Publication date
US9922088B2 (en) 2018-03-20

Similar Documents

Publication Publication Date Title
US9922088B2 (en) Cardinality estimation using spanning trees
US11681702B2 (en) Conversion of model views into relational models
US10095742B2 (en) Scalable multi-query optimization for SPARQL
US10133778B2 (en) Query optimization using join cardinality
Shen et al. Discovering queries based on example tuples
US7676450B2 (en) Null aware anti-join
CN113711197B (en) Placement of adaptive aggregation operators and attributes in query plans
US8015176B2 (en) Method and system for cleansing sequence-based data at query time
US7730055B2 (en) Efficient hash based full-outer join
US10191943B2 (en) Decorrelation of user-defined function invocations in queries
US20080010240A1 (en) Executing alternative plans for a SQL statement
US20030084025A1 (en) Method of cardinality estimation using statistical soft constraints
US9582553B2 (en) Systems and methods for analyzing existing data models
US10664477B2 (en) Cardinality estimation in databases
US8694551B2 (en) Auditing queries using query differentials
US10268724B2 (en) Techniques for improving the performance of complex queries
US20150370857A1 (en) Multi-dimensional data statistics
US10936595B2 (en) Deferring and/or eliminating decompressing database data
US20240012824A1 (en) Bi-gram cardinality estimation in a graph database
US20210263929A1 (en) Framework for providing intermediate aggregation operators in a query plan
US20150347506A1 (en) Methods and apparatus for specifying query execution plans in database management systems
Al-Harbi et al. PHD-Store: an adaptive SPARQL engine with dynamic partitioning for distributed RDF repositories
US11971888B2 (en) Placement of adaptive aggregation operators and properties in a query plan
Karthik Robust query processing

Legal Events

Date Code Title Description
AS Assignment

Owner name: SYBASE, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NICA, ANISOARA;REEL/FRAME:037036/0684

Effective date: 20140108

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4